Loading...
Loading...
Found 63 Skills
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks Use when: agent testing, agent evaluation, benchmark agents, agent reliability, test agent.
Automate QA regression testing with reusable test skills. Create login flows, dashboard checks, user creation, and other common test scenarios that run consistently.
Use when writing or reviewing tests for Python behavior, contracts, async lifecycles, or reliability paths. Also use when tests are flaky, coupled to implementation details, missing regression coverage, slow to run, or when unclear what tests a change needs.
Automatically generates comprehensive test suites (unit, integration, E2E) based on code and past testing patterns. Use when user says "write tests", "test this", "add coverage", or after fixing bugs to create regression tests. Eliminates testing friction for ADHD users.
Standardize requirement/feature changes in an existing codebase (especially Chrome extensions) by turning "改需求/需求变更/调整交互/改功能/重构流程" into a repeatable loop: clarify acceptance criteria, confirm current behavior from code, assess impact/risk, design the new logic, implement with small diffs, run a fixed regression checklist, and update docs/decision log. Use when the user feels the change process is chaotic, when edits tend to sprawl across files, or when changes touch manifest/service worker/OAuth/storage/UI and need reliable verification + rollback planning.
Debug and harden production LLM prompts — handle prompt injection, output format drift, instruction forgetting in long contexts, and cross-model portability issues. Use this skill when the user ships an LLM-powered feature to production and needs to diagnose why outputs are inconsistent, unsafe, or regressed after model updates — NOT for basic 'write a better prompt' questions.
Debug and fix bugs, errors, or unexpected behavior
Storybook UI component development and testing. Use for component testing.
Bug diagnosis and fixing specialist - analyzes errors, identifies root causes, provides fixes, and writes regression tests
Applies tests-first discipline (red/green/refactor) and adds regression tests for bugs. Use when implementing features, fixing bugs, or refactoring.
Design and implement visual regression testing for UI changes. Defines screenshot coverage, rendering stabilization, baseline management, and CI integration (e.g., Playwright screenshots, Percy/Chromatic). Use when UI/styling/layout changes need protection against regressions, or when adding screenshot-based tests to a web/WASM/desktop UI.
Percy integration. Manage data, records, and automate workflows. Use when the user wants to interact with Percy data.