Search Results: benchmark-automation

Found 2 Skills

AI & Machine Learninggetcompanion-ai/feynman

autoresearch

Autonomous experiment loop that tries ideas, measures results, keeps what works, and discards what doesn't. Use when the user asks to optimize a metric, run an experiment loop, improve performance iteratively, or automate benchmarking.

🇺🇸|EnglishTranslated

Testing & QAvercel/vercel-plugin

benchmark-sandbox

Run vercel-plugin eval scenarios in Vercel Sandboxes instead of local WezTerm panels. Provisions ephemeral microVMs with Claude Code + plugin pre-installed, runs benchmark prompts, extracts hook artifacts, and produces coverage reports.

🇺🇸|EnglishTranslated

8 scripts/Attention