Loading...
Loading...
Found 78 Skills
Terminal-Bench integration for Mux agent benchmarking and failure analysis
Create new skills, modify and improve existing skills, and measure skill performance. Use when users want to create a skill from scratch, edit, or optimize an existing skill, run evals to test a skill, benchmark skill performance with variance analysis, or optimize a skill's description for better triggering accuracy.
Design and optimize AI agent action spaces, tool definitions, and observation formatting for higher completion rates.
Create new skills, modify and improve existing skills, and measure skill performance. Enhanced version with quick commands. Use when users want to create a skill from scratch, update or optimize an existing skill, run evals to test a skill, benchmark skill performance with variance analysis, or optimize a skill's description for better triggering accuracy. Triggers on phrases like "make a skill", "create a new skill", "build a skill for", "improve this skill", "optimize my skill", "test my skill", "turn this into a skill", "skill description optimization", or "help me create a skill".
CRITICAL: Use for performance optimization. Triggers: performance, optimization, benchmark, profiling, flamegraph, criterion, slow, fast, allocation, cache, SIMD, make it faster, 性能优化, 基准测试
Expert skill for AI model quantization and optimization. Covers 4-bit/8-bit quantization, GGUF conversion, memory optimization, and quality-performance tradeoffs for deploying LLMs in resource-constrained JARVIS environments.
Create an AI Evals Pack (eval PRD, test set, rubric, judge plan, results + iteration loop). Use for LLM evaluation, benchmarks, rubrics, error analysis/open coding, and ship/no-ship quality gates for AI features.
Write Foundry-based tests and scripts. Trigger phrases - foundry testing, write test, fuzz test, fork test, invariant test, deploy script, gas benchmark, coverage, or when working in tests/ or scripts/ directories.
Defines .NET test strategy, xUnit v3, integration/E2E, snapshots (Verify), Playwright, benchmarks, and quality gates.
Artillery Config Generator - Auto-activating skill for Performance Testing. Triggers on: artillery config generator, artillery config generator Part of the Performance Testing skill category.
Optimizes algorithms via autoresearch loop: benchmark, research, hypothesize, keep/discard
Optimize code performance through iterative improvements (max 2 rounds). Benchmark execution time and memory usage, compare against baseline implementations, and generate detailed optimization reports. Supports C++, Python, Java, Rust, and other languages.