Datasets

A Harbor task is an instruction, sandbox environment, and test script. A dataset is a collection of tasks for evals and training. Datasets sometimes define custom metrics that aggregate rewards across tasks.

Tasks can belong to multiple datasets. You can create datasets to be targeted eval or training groups. For example, you may want to grab 10 tasks from a few different benchmarks to create a composite benchmark.

There are three ways to use datasets:

Local datasets: run a local directory of tasks.
Published datasets: run a dataset from the Harbor registry.
Git repository datasets: run a dataset from any Git repository.

Local datasets

Run a local dataset with --path or -p:

harbor run -p "<path/to/dataset>" -a "<agent>" -m "<model>"

Published datasets

Datasets can be published and shared with members of your organization or publicly on the Harbor registry. If you publish your dataset privately, all members of your organization can run it. If you publish it publicly, anyone can run it.

Run a published dataset with --dataset or -d:

harbor run -d "my-org/my-dataset@1.0" -a "<agent>" -m "<model>"

To learn how to create and publish a dataset, see Publishing a dataset.

Git repository datasets

Run a dataset directly from any Git repository with --repo:

harbor run --repo org/repo-name -d "my-dataset" -a "<agent>" -m "<model>"

This resolves registry.json from the repo and runs the named dataset. Supports GitHub, GitLab, and Hugging Face URLs, with optional ref pinning (@v1.0, @main).

See Git Repository Datasets for the full guide.

Datasets

Local datasets

Published datasets

Git repository datasets

Related docs

On this page