Loading...
Loading...
Walk through a complete GAIA benchmark→submit flow — from key resolution through HAL-compatible package generation
npx skill4agent add ruvnet/ruflo gaia-submission| Requirement | Check |
|---|---|
| |
| |
| Node.js 20+ | |
| CLI built | |
# Run all pre-flight checks
/gaia validateclaude-sonnet-4-6/gaia cost --level=$LEVEL --limit=$LIMIT --models=$MODELS --voting=$VOTING/gaia run --level=$LEVEL --limit=$LIMIT --models=$MODELS --voting=$VOTING[12/53] 22.7% (5 passed of 22 scored) — est. remaining: $0.18npx @claude-flow/cli@latest memory store \
--namespace gaia-runs \
--key "run-$(date +%Y%m%d-%H%M)" \
--value '{"level":$LEVEL,"model":"$MODEL","total":$TOTAL,"passed":$PASSED,"pass_rate":$RATE,"est_cost_usd":$COST}'/gaia submit --results=~/.cache/ruflo/gaia/results-latest.jsonsubmission-<date>-<sha>/
├── results.jsonl ← HAL-compatible, one JSON per line
├── trajectories.jsonl ← full agent traces
├── metadata.json ← harness info, model, tool catalogue
├── manifest.md.json ← Ed25519-signed witness
└── README.md ← human summary + leaderboard comparison/gaia leaderboard --level=$LEVEL
/gaia history/gaia-debuggingnpx @claude-flow/cli@latest hooks post-task \
--task-id "gaia-submission-$(date +%Y%m%d)" \
--success true \
--train-neural truenpx @claude-flow/cli@latest memory store \
--namespace gaia-patterns \
--key "submission-notes-$(date +%Y%m%d)" \
--value "Level $LEVEL, $MODEL: $NOTES"