researchcodebench

v1.0

ResearchCodeBench evaluates AI agents' ability to implement algorithms from academic papers. Contains 212 code implementation tasks across 20 ML/AI research problems from top-tier venues (ICLR, NeurIPS, CVPR, COLM). Tests paper comprehension, algorithm understanding, and precise code implementation skills with 1,449 lines of reference code.

uvx harbor run -d researchcodebench@1.0

Tasks (212)

siss_importance_sampling_weights
uvx harbor run -d researchcodebench@1.0 -t siss_importance_sampling_weights
69581ca
siss_subtracted_importance_sampled_scores_importance_sampling_with_mixture
uvx harbor run -d researchcodebench@1.0 -t siss_subtracted_importance_sampled_scores_importance_sampling_with_mixture
69581ca
tabdiff_compute_total_noise_for_categorical_features_with_learnable_k
uvx harbor run -d researchcodebench@1.0 -t tabdiff_compute_total_noise_for_categorical_features_with_learnable_k
69581ca
tabdiff_compute_total_noise_for_numerical_features_with_learnable_rho
uvx harbor run -d researchcodebench@1.0 -t tabdiff_compute_total_noise_for_numerical_features_with_learnable_rho
69581ca
tabdiff_initialize_the_learnable_feature-wise_parameter_k_for_categorical_features
uvx harbor run -d researchcodebench@1.0 -t tabdiff_initialize_the_learnable_feature-wise_parameter_k_for_categorical_features
69581ca
tabdiff_initialize_the_learnable_feature-wise_parameter_rho_for_numerical_features
uvx harbor run -d researchcodebench@1.0 -t tabdiff_initialize_the_learnable_feature-wise_parameter_rho_for_numerical_features
69581ca
tabdiff_make_sure_learnable_parameter_ks_for_categorical_features_are_positive
uvx harbor run -d researchcodebench@1.0 -t tabdiff_make_sure_learnable_parameter_ks_for_categorical_features_are_positive
69581ca
tabdiff_make_sure_learnable_parameter_rhos_are_greater_than_rho_offset
uvx harbor run -d researchcodebench@1.0 -t tabdiff_make_sure_learnable_parameter_rhos_are_greater_than_rho_offset
69581ca
tanh-init_identity_matrix
uvx harbor run -d researchcodebench@1.0 -t tanh-init_identity_matrix
69581ca
tanh-init_identity_matrix_else
uvx harbor run -d researchcodebench@1.0 -t tanh-init_identity_matrix_else
69581ca
tanh-init_proposed_weight_initialization
uvx harbor run -d researchcodebench@1.0 -t tanh-init_proposed_weight_initialization
69581ca
tanh-init_update
uvx harbor run -d researchcodebench@1.0 -t tanh-init_update
69581ca