Loading...
Loading...
Found 8 Skills
Design and implement comprehensive evaluation systems for AI agents. Use when building evals for coding agents, conversational agents, research agents, or computer-use agents. Covers grader types, benchmarks, 8-step roadmap, and production integration.
Complete reference for writing, running, and iterating on evals (automated conversation tests) for ADK agents. Covers eval file format, all assertion types, CLI usage, and per-primitive testing patterns.
Test PydanticAI agents using TestModel, FunctionModel, VCR cassettes, and inline snapshots. Use when writing unit tests, mocking LLM responses, or recording API interactions.
End-to-end testing patterns with Playwright — page objects, AI agent testing, visual regression, accessibility testing with axe-core, and CI integration. Use when writing E2E tests, setting up Playwright, implementing visual regression, or testing accessibility.
TDD-style testing methodology for skills using fresh subagent instances to prevent priming bias and validate skill effectiveness. Use when validating skill improvements, testing skill effectiveness, preventing priming bias, measuring skill impact on behavior. Do not use when implementing skills (use skill-authoring instead), creating hooks (use hook-authoring instead).
Deprecated alias for tdd skill
Use when facing 2 or more independent tasks that can be completed without shared state or sequential dependencies
Use when an AI agent should run protocols or workflow tests against kairos-dev (KAIROS MCP in this repo's dev environment). Covers AI–MCP integration and workflow-test flows; MCP-only, reports/ output.