rexbench

v1.0

A benchmark to evaluate the ability of AI agents to extend existing AI research through research experiment implementation tasks. Original benchmark: https://github.com/tinlaboratory/rexbench. Website: https://rexbench.com/.

uvx harbor run -d rexbench@1.0

uvx harbor run -d rexbench@1.0

Tasks (2)

cogs

uvx harbor run -d rexbench@1.0 -t cogs

uvx harbor run -d rexbench@1.0 -t cogs

eac9201

othello

uvx harbor run -d rexbench@1.0 -t othello

uvx harbor run -d rexbench@1.0 -t othello

eac9201