harbor

Datasets

Git Repository Datasets

Run datasets from any Git repository

Harbor can resolve datasets directly from Git repositories using the --repo flag. This lets you run benchmarks hosted in any GitHub, GitLab, or Hugging Face repo without publishing to the Harbor registry first.

Quick start

harbor run --repo org/repo-name -d my-dataset -a claude-code -m anthropic/claude-sonnet-4

This clones the repository, finds registry.json (the dataset manifest), and resolves the named dataset.

Specifying the repository

The --repo flag accepts several formats:

# GitHub shorthand (defaults to github.com)
--repo org/repo-name

# Pinned to a branch, tag, or commit
--repo org/repo-name@v1.0
--repo org/repo-name@main
--repo org/repo-name@abc1234

# Full URL
--repo https://github.com/org/repo-name

# Subdirectory via /tree/ path
--repo https://github.com/org/repo-name/tree/main/benchmarks

# Hugging Face
--repo https://huggingface.co/datasets/org/repo-name

# GitLab
--repo https://gitlab.com/org/repo-name

When no @ref is specified, the repository's default branch is used.

How it works

  1. Harbor parses the --repo value to extract the host, org, name, ref, and optional subdirectory.
  2. It resolves the ref to an immutable Git SHA via git ls-remote.
  3. The repository is sparse-checked-out into a local cache at ~/.harbor/cache/.
  4. Harbor reads the registry.json in the repo (or subdirectory) to discover available datasets.
  5. The --dataset name is matched against the registry and tasks are resolved from the repo.

Cached checkouts are keyed by SHA, so pinned refs (@v1.0, @abc1234) are fetched once and reused. Branch refs re-resolve the SHA each run.

Selecting a dataset

With --repo, the --dataset flag takes a bare name (no org/ prefix):

# Run the "lite" dataset from the repo's registry.json
harbor run --repo org/my-benchmarks -d lite -a claude-code -m anthropic/claude-sonnet-4

# Run a specific version of the dataset
harbor run --repo org/my-benchmarks -d lite@1.2 -a claude-code -m anthropic/claude-sonnet-4

If the repo's registry.json contains only one dataset, --dataset can be omitted.

Combining with other flags

Standard dataset flags work with --repo:

# Include specific tasks
harbor run --repo org/benchmarks -d suite \
  -i "task-a" -i "task-b" \
  -a claude-code -m anthropic/claude-sonnet-4

# Limit task count
harbor run --repo org/benchmarks -d suite \
  -l 10 \
  -a claude-code -m anthropic/claude-sonnet-4

# Custom registry.json path within the repo
harbor run --repo org/benchmarks \
  --registry-path benchmarks/registry.json \
  -d suite -a claude-code -m anthropic/claude-sonnet-4

Mutual exclusivity

--repo cannot be combined with --registry-url or --task / --task-git-url.

Repository layout

A git repository used with --repo should contain a registry.json at its root (or in the targeted subdirectory). This is the same format used by local --registry-path:

my-benchmarks/
├── registry.json
├── task-a/
│   ├── task.toml
│   ├── instruction.md
│   ├── environment/
│   └── tests/
├── task-b/
│   └── ...
└── ...

The registry.json maps dataset names to task lists. See the Harbor task format for task directory structure.

Authentication

Public repositories work without any configuration. For private repositories, Harbor uses the Git credentials available in your environment (SSH keys, credential helpers, GIT_ASKPASS, etc.). If git ls-remote can reach the repo, --repo will work.

On this page