Search Results: agent-testing

Found 29 Skills

AI & Machine Learningathola/claude-night-marke...

subagent-testing

TDD-style testing methodology for skills using fresh subagent instances to prevent priming bias and validate skill effectiveness. Use when validating skill improvements, testing skill effectiveness, preventing priming bias, measuring skill impact on behavior. Do not use when implementing skills (use skill-authoring instead), creating hooks (use hook-authoring instead).

🇺🇸|EnglishTranslated

AI & Machine Learningsupercent-io/skills-templ...

agent-evaluation

Design and implement comprehensive evaluation systems for AI agents. Use when building evals for coding agents, conversational agents, research agents, or computer-use agents. Covers grader types, benchmarks, 8-step roadmap, and production integration.

🇺🇸|EnglishTranslated

10.1k

AI & Machine Learningaiskillstore/marketplace

agent-development

This skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.

🇺🇸|EnglishTranslated

1 scripts/Attention

AI & Machine Learninggalaxy-dawn/claude-schola...

agent-identifier

Use when creating or configuring Claude Code agents and their frontmatter.

🇺🇸|EnglishTranslated

1 scripts/Attention

AI & Machine Learningforcedotcom/afv-library

developing-agentforce

Build, modify, debug, and deploy agents with Agentforce Agent Script. TRIGGER when: user creates, modifies, or asks about .agent files or aiAuthoringBundle metadata; changes agent behavior, responses, or conversation logic; designs agent topics, actions, tools, sub-agents, or flow control; writes or reviews an Agent Spec; previews, debugs, deploys, publishes, or tests agents; uses Agent Script CLI commands (sf agent generate/preview/publish/test). DO NOT TRIGGER when: Apex development, Flow building, Prompt Template authoring, Experience Cloud configuration, or general Salesforce CLI tasks unrelated to Agent Script.

🇺🇸|EnglishTranslated

AI & Machine Learningneolabhq/context-engineer...

customaize-agent:test-prompt

Use when creating or editing any prompt (commands, hooks, skills, subagent instructions) to verify it produces desired behavior - applies RED-GREEN-REFACTOR cycle to prompt engineering using subagents for isolated testing

🇺🇸|EnglishTranslated

Testing & QAdavila7/claude-code-templ...

agent-evaluation

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks Use when: agent testing, agent evaluation, benchmark agents, agent reliability, test agent.

🇺🇸|EnglishTranslated

AI & Machine Learningnotque/claude-code-toolki...

testing-agents-with-subagents

RED-GREEN-REFACTOR testing for agents: dispatch subagents with known inputs, capture verbatim outputs, verify against expectations. Use when creating, modifying, or validating agents and skills. Use for "test agent", "validate agent", "verify agent works", or pre-deployment checks. Do NOT use for feature requests, simple prompt edits without behavioral impact, or agents with no structured output to verify.

🇺🇸|EnglishTranslated

AI & Machine Learningbotpress/skills

adk-evals

Complete reference for writing, running, and iterating on evals (automated conversation tests) for ADK agents. Covers eval file format, all assertion types, CLI usage, and per-primitive testing patterns.

🇺🇸|EnglishTranslated

AI & Machine Learningjohnmaeda/azure-ai-agent-...

azure-ai-agent-deploy

Deploy prompt-based Azure AI agents from YAML definitions to Azure AI Foundry projects. Use when users want to (1) create and deploy Azure AI agents, (2) set up Azure AI infrastructure, (3) deploy AI models to Azure, or (4) test deployed agents interactively. Handles authentication, RBAC, quotas, and deployment complexities automatically.

🇺🇸|EnglishTranslated

8 scripts/Attention

AI & Machine Learningeyadsibai/ltk

agent-evaluation

Use when evaluating agent performance, building test frameworks, measuring quality, or asking about "agent evaluation", "LLM-as-judge", "agent testing", "quality metrics", "evaluation rubrics", "agent benchmarks"

🇺🇸|EnglishTranslated

AI & Machine Learningpydantic/skills

building-pydantic-ai-agents

Build AI agents with Pydantic AI — tools, capabilities, structured output, streaming, testing, and multi-agent patterns. Use when the user mentions Pydantic AI, imports pydantic_ai, or asks to build an AI agent, add tools/capabilities, stream output, define agents from YAML, or test agent behavior.

🇺🇸|EnglishTranslated