harbor
BixBench - A benchmark for evaluating AI agents on bioinformatics and computational biology tasks.
uvx harbor run -d bixbench@1.5
uvx harbor run -d bixbench@1.5 -t bix-8-q6
uvx harbor run -d bixbench@1.5 -t bix-8-q7
uvx harbor run -d bixbench@1.5 -t bix-9-q3
uvx harbor run -d bixbench@1.5 -t bix-9-q4
uvx harbor run -d bixbench@1.5 -t bix-9-q5