spreadsheetbench-verified

v1.0

A benchmark evaluating AI agents on real-world spreadsheet manipulation tasks (400 tasks from verified_400). Tasks involve Excel file manipulation including formula writing, data transformation, formatting, and conditional logic.

uvx harbor run -d spreadsheetbench-verified@1.0

Tasks (400)

10452
uvx harbor run -d spreadsheetbench-verified@1.0 -t 10452
0ad7209
105-24
uvx harbor run -d spreadsheetbench-verified@1.0 -t 105-24
0ad7209
10747
uvx harbor run -d spreadsheetbench-verified@1.0 -t 10747
0ad7209
108-24
uvx harbor run -d spreadsheetbench-verified@1.0 -t 108-24
0ad7209
109-21
uvx harbor run -d spreadsheetbench-verified@1.0 -t 109-21
0ad7209
11276
uvx harbor run -d spreadsheetbench-verified@1.0 -t 11276
0ad7209
118-50
uvx harbor run -d spreadsheetbench-verified@1.0 -t 118-50
0ad7209
11842
uvx harbor run -d spreadsheetbench-verified@1.0 -t 11842
0ad7209
120-24
uvx harbor run -d spreadsheetbench-verified@1.0 -t 120-24
0ad7209
12307
uvx harbor run -d spreadsheetbench-verified@1.0 -t 12307
0ad7209
12864
uvx harbor run -d spreadsheetbench-verified@1.0 -t 12864
0ad7209
13-1
uvx harbor run -d spreadsheetbench-verified@1.0 -t 13-1
0ad7209
130-9
uvx harbor run -d spreadsheetbench-verified@1.0 -t 130-9
0ad7209
13284
uvx harbor run -d spreadsheetbench-verified@1.0 -t 13284
0ad7209
141-20
uvx harbor run -d spreadsheetbench-verified@1.0 -t 141-20
0ad7209
142-12
uvx harbor run -d spreadsheetbench-verified@1.0 -t 142-12
0ad7209
142-19
uvx harbor run -d spreadsheetbench-verified@1.0 -t 142-19
0ad7209
14240
uvx harbor run -d spreadsheetbench-verified@1.0 -t 14240
0ad7209
146-49
uvx harbor run -d spreadsheetbench-verified@1.0 -t 146-49
0ad7209
147-48
uvx harbor run -d spreadsheetbench-verified@1.0 -t 147-48
0ad7209
15380
uvx harbor run -d spreadsheetbench-verified@1.0 -t 15380
0ad7209
15387
uvx harbor run -d spreadsheetbench-verified@1.0 -t 15387
0ad7209
156-14
uvx harbor run -d spreadsheetbench-verified@1.0 -t 156-14
0ad7209
1563
uvx harbor run -d spreadsheetbench-verified@1.0 -t 1563
0ad7209
15671
uvx harbor run -d spreadsheetbench-verified@1.0 -t 15671
0ad7209
157-4
uvx harbor run -d spreadsheetbench-verified@1.0 -t 157-4
0ad7209
160-6
uvx harbor run -d spreadsheetbench-verified@1.0 -t 160-6
0ad7209
165-23
uvx harbor run -d spreadsheetbench-verified@1.0 -t 165-23
0ad7209
16511
uvx harbor run -d spreadsheetbench-verified@1.0 -t 16511
0ad7209
168-17
uvx harbor run -d spreadsheetbench-verified@1.0 -t 168-17
0ad7209
17-35
uvx harbor run -d spreadsheetbench-verified@1.0 -t 17-35
0ad7209
170-13
uvx harbor run -d spreadsheetbench-verified@1.0 -t 170-13
0ad7209
17111
uvx harbor run -d spreadsheetbench-verified@1.0 -t 17111
0ad7209
177-6
uvx harbor run -d spreadsheetbench-verified@1.0 -t 177-6
0ad7209
178-22
uvx harbor run -d spreadsheetbench-verified@1.0 -t 178-22
0ad7209
1818
uvx harbor run -d spreadsheetbench-verified@1.0 -t 1818
0ad7209
183-8
uvx harbor run -d spreadsheetbench-verified@1.0 -t 183-8
0ad7209
18645
uvx harbor run -d spreadsheetbench-verified@1.0 -t 18645
0ad7209
188-39
uvx harbor run -d spreadsheetbench-verified@1.0 -t 188-39
0ad7209
18935
uvx harbor run -d spreadsheetbench-verified@1.0 -t 18935
0ad7209
191-40
uvx harbor run -d spreadsheetbench-verified@1.0 -t 191-40
0ad7209
192-22
uvx harbor run -d spreadsheetbench-verified@1.0 -t 192-22
0ad7209
1925
uvx harbor run -d spreadsheetbench-verified@1.0 -t 1925
0ad7209
194-19
uvx harbor run -d spreadsheetbench-verified@1.0 -t 194-19
0ad7209
203-15
uvx harbor run -d spreadsheetbench-verified@1.0 -t 203-15
0ad7209
208-20
uvx harbor run -d spreadsheetbench-verified@1.0 -t 208-20
0ad7209
209-30
uvx harbor run -d spreadsheetbench-verified@1.0 -t 209-30
0ad7209
22-47
uvx harbor run -d spreadsheetbench-verified@1.0 -t 22-47
0ad7209
227-40
uvx harbor run -d spreadsheetbench-verified@1.0 -t 227-40
0ad7209
23-24
uvx harbor run -d spreadsheetbench-verified@1.0 -t 23-24
0ad7209
230-16
uvx harbor run -d spreadsheetbench-verified@1.0 -t 230-16
0ad7209
236-22
uvx harbor run -d spreadsheetbench-verified@1.0 -t 236-22
0ad7209
24-23
uvx harbor run -d spreadsheetbench-verified@1.0 -t 24-23
0ad7209
247-24
uvx harbor run -d spreadsheetbench-verified@1.0 -t 247-24
0ad7209
250-20
uvx harbor run -d spreadsheetbench-verified@1.0 -t 250-20
0ad7209
254-34
uvx harbor run -d spreadsheetbench-verified@1.0 -t 254-34
0ad7209
262-17
uvx harbor run -d spreadsheetbench-verified@1.0 -t 262-17
0ad7209
263-1
uvx harbor run -d spreadsheetbench-verified@1.0 -t 263-1
0ad7209
267-18
uvx harbor run -d spreadsheetbench-verified@1.0 -t 267-18
0ad7209
267-21
uvx harbor run -d spreadsheetbench-verified@1.0 -t 267-21
0ad7209
269-43
uvx harbor run -d spreadsheetbench-verified@1.0 -t 269-43
0ad7209
269-44
uvx harbor run -d spreadsheetbench-verified@1.0 -t 269-44
0ad7209
2768
uvx harbor run -d spreadsheetbench-verified@1.0 -t 2768
0ad7209
279-23
uvx harbor run -d spreadsheetbench-verified@1.0 -t 279-23
0ad7209
28-7
uvx harbor run -d spreadsheetbench-verified@1.0 -t 28-7
0ad7209
280-17
uvx harbor run -d spreadsheetbench-verified@1.0 -t 280-17
0ad7209
283-32
uvx harbor run -d spreadsheetbench-verified@1.0 -t 283-32
0ad7209
290-1
uvx harbor run -d spreadsheetbench-verified@1.0 -t 290-1
0ad7209
290-27
uvx harbor run -d spreadsheetbench-verified@1.0 -t 290-27
0ad7209
297-42
uvx harbor run -d spreadsheetbench-verified@1.0 -t 297-42
0ad7209
3002
uvx harbor run -d spreadsheetbench-verified@1.0 -t 3002
0ad7209
302-1
uvx harbor run -d spreadsheetbench-verified@1.0 -t 302-1
0ad7209
304-35
uvx harbor run -d spreadsheetbench-verified@1.0 -t 304-35
0ad7209
30930
uvx harbor run -d spreadsheetbench-verified@1.0 -t 30930
0ad7209
31011
uvx harbor run -d spreadsheetbench-verified@1.0 -t 31011
0ad7209
31202
uvx harbor run -d spreadsheetbench-verified@1.0 -t 31202
0ad7209
31628
uvx harbor run -d spreadsheetbench-verified@1.0 -t 31628
0ad7209
31746
uvx harbor run -d spreadsheetbench-verified@1.0 -t 31746
0ad7209
31915
uvx harbor run -d spreadsheetbench-verified@1.0 -t 31915
0ad7209
32023
uvx harbor run -d spreadsheetbench-verified@1.0 -t 32023
0ad7209
32093
uvx harbor run -d spreadsheetbench-verified@1.0 -t 32093
0ad7209
32255
uvx harbor run -d spreadsheetbench-verified@1.0 -t 32255
0ad7209
32293
uvx harbor run -d spreadsheetbench-verified@1.0 -t 32293
0ad7209
32337
uvx harbor run -d spreadsheetbench-verified@1.0 -t 32337
0ad7209
32438
uvx harbor run -d spreadsheetbench-verified@1.0 -t 32438
0ad7209
325-44
uvx harbor run -d spreadsheetbench-verified@1.0 -t 325-44
0ad7209
32562
uvx harbor run -d spreadsheetbench-verified@1.0 -t 32562
0ad7209
32612
uvx harbor run -d spreadsheetbench-verified@1.0 -t 32612
0ad7209
32789
uvx harbor run -d spreadsheetbench-verified@1.0 -t 32789
0ad7209
32902
uvx harbor run -d spreadsheetbench-verified@1.0 -t 32902
0ad7209
33157
uvx harbor run -d spreadsheetbench-verified@1.0 -t 33157
0ad7209
333-29
uvx harbor run -d spreadsheetbench-verified@1.0 -t 333-29
0ad7209
334-11
uvx harbor run -d spreadsheetbench-verified@1.0 -t 334-11
0ad7209
33722
uvx harbor run -d spreadsheetbench-verified@1.0 -t 33722
0ad7209
34033
uvx harbor run -d spreadsheetbench-verified@1.0 -t 34033
0ad7209
341-14
uvx harbor run -d spreadsheetbench-verified@1.0 -t 341-14
0ad7209
341-40
uvx harbor run -d spreadsheetbench-verified@1.0 -t 341-40
0ad7209
3413
uvx harbor run -d spreadsheetbench-verified@1.0 -t 3413
0ad7209
34210
uvx harbor run -d spreadsheetbench-verified@1.0 -t 34210
0ad7209
343-20
uvx harbor run -d spreadsheetbench-verified@1.0 -t 343-20
0ad7209