Search Results: ai-agent-testing

Found 11 Skills

AI & Machine Learningsupercent-io/skills-templ...

agent-evaluation

Design and implement comprehensive evaluation systems for AI agents. Use when building evals for coding agents, conversational agents, research agents, or computer-use agents. Covers grader types, benchmarks, 8-step roadmap, and production integration.

🇺🇸|EnglishTranslated

10.1k

AI & Machine Learningbotpress/skills

adk-evals

Complete reference for writing, running, and iterating on evals (automated conversation tests) for ADK agents. Covers eval file format, all assertion types, CLI usage, and per-primitive testing patterns.

🇺🇸|EnglishTranslated

AI & Machine Learningathola/claude-night-marke...

subagent-testing

TDD-style testing methodology for skills using fresh subagent instances to prevent priming bias and validate skill effectiveness. Use when validating skill improvements, testing skill effectiveness, preventing priming bias, measuring skill impact on behavior. Do not use when implementing skills (use skill-authoring instead), creating hooks (use hook-authoring instead).

🇺🇸|EnglishTranslated

Testing & QAexistential-birds/beagle

pydantic-ai-testing

Test PydanticAI agents using TestModel, FunctionModel, VCR cassettes, and inline snapshots. Use when writing unit tests, mocking LLM responses, or recording API interactions.

🇺🇸|EnglishTranslated

AI & Machine Learningzhucl1006/ailesuperpowers

dispatching-parallel-agents

Use when facing 2 or more independent tasks that can be completed without shared state or sequential dependencies

🇨🇳|ChineseTranslated

AI & Machine Learningdebian777/kairos-mcp

kairos-development

Use when an AI agent should run protocols or workflow tests against kairos-dev (KAIROS MCP in this repo's dev environment). Covers AI–MCP integration and workflow-test flows; MCP-only, reports/ output.

🇺🇸|EnglishTranslated

AI & Machine Learningcoval-ai/coval-external-s...

design-persona

Design and create a simulation persona for testing an AI agent. Guides through use case selection, voice and language configuration, behavior prompt crafting, and interruption calibration. Use when user says "create a persona", "design a persona", "set up a test persona", "configure simulation persona", or "build a caller profile".

🇺🇸|EnglishTranslated

AI & Machine Learningcekura-ai/cekura-skills

cekura-eval-design

Use when the user asks to "create an evaluator", "create evals", "create a scenario", "write a test scenario", "design a test case", "test my agent", "build eval coverage", "plan a test suite", "create red team tests", "set up test profiles", "configure conditional actions", "write a conditional action evaluator", "build a deterministic test", "design an IVR test", "IVR navigation test", "write a unit test for a voice agent", "build a regression test", "scripted scenario", "scripted voice test", "structured evaluator", "exact flow test", "sequential conditions", "fixed sequence test", or "run evals". Covers individual evaluator design, suite coverage strategy, test profiles, mock-tool data design, conditional actions (deterministic / unit test / regression / IVR navigation flows), and best practices for workflow / red-team / edge-case / deterministic test types.

🇺🇸|EnglishTranslated

Testing & QAoimiragieo/agent-studio

testing-expert

Deprecated alias for tdd skill

🇺🇸|EnglishTranslated

Testing & QAyonatangross/orchestkit

testing-e2e

End-to-end testing patterns with Playwright — page objects, AI agent testing, visual regression, accessibility testing with axe-core, and CI integration. Use when writing E2E tests, setting up Playwright, implementing visual regression, or testing accessibility.

🇺🇸|EnglishTranslated

AI & Machine Learningcekura-ai/cekura-skills

cekura-onboarding

Use when the user says "get started with Cekura", "set up Cekura", "onboard to Cekura", "I'm new to Cekura", "help me set up my agent", "how do I use Cekura", "walk me through Cekura", "configure my project", "first time using Cekura", or needs guidance on initial platform setup. Covers two onboarding paths: **testing** (default — build evaluators and run simulated calls) and **observability** (ingest production call logs and evaluate them).

🇺🇸|EnglishTranslated