Search Results: harbor-framework

Found 4 Skills

rewardkit

Write Harbor task verifiers using Reward Kit. Use when creating or editing a task's tests/ directory, adding grading criteria, setting up LLM/agent judges, or designing verifiers that produce a reward score.

🇺🇸|EnglishTranslated

AI & Machine Learningharbor-framework/harbor

publish

Publish a Harbor task or dataset to the registry. Use when the user wants to upload, publish, or share tasks or datasets/benchmarks on the Harbor registry.

🇺🇸|EnglishTranslated

AI & Machine Learningharbor-framework/harbor

create-task

Create a new Harbor task for evaluating agents. Use when the user wants to scaffold, build, or design a new task, benchmark problem, or eval. Guides through instruction writing, environment setup, verifier design (pytest vs Reward Kit vs custom), and solution scripting.

🇺🇸|EnglishTranslated

AI & Machine Learningcoder/mux

tbench

Terminal-Bench integration for Mux agent benchmarking and failure analysis

🇺🇸|EnglishTranslated