harbor

Datasets

Datasets

Running a dataset

A Harbor task is an instruction, sandbox environment, and test script. A dataset is a collection of tasks for evals and training. Datasets sometimes define custom metrics that aggregate rewards across tasks.

Tasks can belong to multiple datasets. You can create datasets to be targeted eval or training groups. For example, you may want to grab 10 tasks from a few different benchmarks to create a composite benchmark.

There are two ways to use datasets:

  1. Local datasets: run a local directory of tasks.
  2. Published datasets: run a dataset from the Harbor registry.

Local datasets

Run a local dataset with --path or -p:

harbor run -p "<path/to/dataset>" -a "<agent>" -m "<model>"

Published datasets

Datasets can be published and shared with members of your organization or publicly on the Harbor registry. If you publish your dataset privately, all members of your organization can run it. If you publish it publicly, anyone can run it.

Run a published dataset with --dataset or -d:

harbor run -d "my-org/my-dataset@1.0" -a "<agent>" -m "<model>"

To learn how to create and publish a dataset, see Publishing a dataset.

On this page