Loading...
Loading...
Found 46 Skills
Diagnosis loop for hard bugs and performance regressions. Use when the user says "diagnose"/"debug this", or reports something broken/throwing/failing/slow.
Generate comprehensive test plans, manual test cases, regression test suites, and bug reports for QA engineers. Includes Figma MCP integration for design validation.
A disciplined diagnostic loop for tricky bugs and performance regressions. Reproduce → Minimize → Hypothesize → Instrument → Fix → Regression-test. Use this when the user says "diagnose this" / "debug this", reports a bug, states that something is broken/throwing errors/failing, or describes a performance regression.
Formal evaluation framework for Claude Code sessions implementing eval-driven development (EDD) principles
Performs manual testing of Story AC via executable bash scripts saved to tests/manual/. Creates reusable test suites per Story. Worker for ln-510.
Safe deployment of Polymarket trading bot with regression tests and active trade protection
Generate comprehensive test plans, test cases, regression test suites, automation annotations, and bug reports for QA engineers. Includes Figma MCP integration for design validation. Use when planning QA before execution, documenting test strategies, marking which flows require E2E follow-up, or creating structured bug reports. Do not use for executing tests against a live repository or running verification gates — use qa-execution for that.
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks Use when: agent testing, agent evaluation, benchmark agents, agent reliability, test agent.
Use when validating automation builds before launch or after significant changes.
Reproduce a bug from a Linear ticket with a failing test. Expects the full ticket context (title, description, comments) to be provided as input.
Automate QA regression testing with reusable test skills. Create login flows, dashboard checks, user creation, and other common test scenarios that run consistently.
Use when writing or reviewing tests for Python behavior, contracts, async lifecycles, or reliability paths. Also use when tests are flaky, coupled to implementation details, missing regression coverage, slow to run, or when unclear what tests a change needs.