Loading...
Loading...
Run the corpus benchmark — booster locally, optional Gemini/Sonnet/Opus baselines — and persist a verifiable measured-vs-claimed table
npx skill4agent add ruvnet/ruflo cost-benchmarkscripts/bench.mjsdocs/benchmarks/runs/cost-booster-editcost-booster-routebench/booster-corpus.jsonBENCH_ANTHROPIC=1v3/agent-booster( cd v3 && node ../plugins/ruflo-cost-tracker/scripts/bench.mjs ) # booster only — free, ~85 ms
( cd v3 && BENCH_LLM_BASELINE=1 node ../plugins/ruflo-cost-tracker/scripts/bench.mjs ) # + Gemini 2.0 Flash (cheap)
( cd v3 && BENCH_LLM_BASELINE=1 BENCH_ANTHROPIC=1 \
node ../plugins/ruflo-cost-tracker/scripts/bench.mjs ) # + Sonnet 4.6 + Opus 4.7winRateescalationRatedocs/benchmarks/runs/latest.jsondocs/benchmarks/runs/<ISO-timestamp>.jsoncost-reportlatest.jsonwinRate ≥ 0.80scripts/smoke.shescalationRate| Env var | Default | Purpose |
|---|---|---|
| unset | |
| | Override the OpenAI-compat model |
| Gemini OpenAI shim | Override endpoint |
| unset | |
| | Comma-separated Claude IDs |
| timestamped file | Override output path |
| unset | Suppress markdown summary |
gcloud secretsGOOGLE_AI_API_KEYANTHROPIC_API_KEYBENCH_LLM_API_KEYBENCH_ANTHROPIC_API_KEYcost-booster-edit/SKILL.mdcost-report/SKILL.mdruns/latest.json