harbor

Datasets

Metrics

Built-in and custom metrics for Harbor dataset evaluation

Metrics define how to aggregate rewards across tasks in a job or dataset. By default, Harbor averages rewards across tasks and treats missing rewards as 0.

However, you may want to define custom logic or handle missing rewards differently. You can do this by creating a metric.py. If you are working with a dataset.toml, you can run the following command to create a metric.py file and automatically add it to the [[files]] section of your dataset.toml:

harbor init --dataset "<org>/<name>" --with-metric

See Publishing a dataset for more details.

If you are running a local dataset, it will automatically use the metric.py.

Custom metrics with metric.py

A metric.py script must accept the following arguments:

  • -i, --input-path: rewards JSONL file
  • -o, --output-path: output JSON file with computed metrics

Example:

metric.py
# /// script
# dependencies = []
# ///
# To add a dependency: uv add --script metric.py <dependency>

import argparse
import json
from pathlib import Path


def main(input_path: Path, output_path: Path):
    rewards = []

    for line in input_path.read_text().splitlines():
        reward = json.loads(line)
        if reward is None:
            rewards.append(0)
        elif len(reward) != 1:
            raise ValueError(
                f"Expected exactly one key in reward dictionary, got {len(reward)}"
            )
        else:
            rewards.extend(reward.values())

    mean = sum(rewards) / len(rewards) if rewards else 0

    output_path.write_text(json.dumps({"mean": mean}))


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "-i",
        "--input-path",
        type=Path,
        required=True,
        help="Path to a jsonl file containing rewards, one json object per line.",
    )
    parser.add_argument(
        "-o",
        "--output-path",
        type=Path,
        required=True,
        help="Path to a json file where the metric will be written as a json object.",
    )
    args = parser.parse_args()
    main(args.input_path, args.output_path)

When you publish your dataset, it automatically publishes the metric.py file with it:

harbor publish "<path/to/dataset>" --public

metric.py output should be a JSON object. Multiple metric values are allowed:

{"mean": 0.81, "pass_rate": 0.5}

On this page