Search Results: ai-testing

Found 15 Skills

AI & Machine Learningmastepanoski/claude-skill...

owasp-ai-testing

AI trustworthiness testing using OWASP AI Testing Guide v1. Execute 44 test cases across 4 layers (Application, Model, Infrastructure, Data) with practical payloads and remediation.

🇺🇸|EnglishTranslated

Testing & QAexistential-birds/beagle

pydantic-ai-testing

Test PydanticAI agents using TestModel, FunctionModel, VCR cassettes, and inline snapshots. Use when writing unit tests, mocking LLM responses, or recording API interactions.

🇺🇸|EnglishTranslated

AI & Machine Learninglebsral/dspy-programming-...

ai-testing-safety

Find every way users can break your AI before they do. Use when you need to red-team your AI, test for jailbreaks, find prompt injection vulnerabilities, run adversarial testing, do a safety audit before launch, prove your AI is safe for compliance, stress-test guardrails, or verify your AI holds up against adversarial users. Covers automated attack generation, iterative red-teaming with DSPy, and MIPROv2-optimized adversarial testing.

🇺🇸|EnglishTranslated

AI & Machine Learningmicrosoft/agent-skills

azure-ai-evaluation-py

Azure AI Evaluation SDK for Python. Use for evaluating generative AI applications with quality, safety, agent, and custom evaluators. Triggers: "azure-ai-evaluation", "evaluators", "GroundednessEvaluator", "evaluate", "AI quality metrics", "RedTeam", "agent evaluation".

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningmicrosoft/eval-guide

eval-suite-planner

Produces a concrete eval suite plan grounded in Microsoft's Eval Scenario Library and MS Learn agent evaluation guidance — scenario types, evaluation methods, quality signals, thresholds, and priority order — before any test cases are generated or evals are run.

🇺🇸|EnglishTranslated

Testing & QAmicrosoft/eval-guide

eval-generator

Generates eval test cases from an eval suite plan (output of /eval-suite-planner) or a plain-English agent description. Supports both single-response and conversation (multi-turn) evaluation modes. Outputs a Copilot Studio test set table, a CSV file for import (single-response only), and a docx report for human review.

🇺🇸|EnglishTranslated

AI & Machine Learninggithub/awesome-copilot

eval-driven-dev

Instrument Python LLM apps, build golden datasets, write eval-based tests, run them, and root-cause failures — covering the full eval-driven development cycle. Make sure to use this skill whenever a user is developing, testing, QA-ing, evaluating, or benchmarking a Python project that calls an LLM, even if they don't say "evals" explicitly. Use for making sure an AI app works correctly, catching regressions after prompt changes, debugging why an agent started behaving differently, or validating output quality before shipping.

🇺🇸|EnglishTranslated

AI & Machine Learninglivekit/agent-skills

livekit-agents

Build voice AI agents with LiveKit Cloud and the Agents SDK. Use when the user asks to "build a voice agent", "create a LiveKit agent", "add voice AI", "implement handoffs", "structure agent workflows", or is working with LiveKit Agents SDK. Provides opinionated guidance for the recommended path: LiveKit Cloud + LiveKit Inference. REQUIRES writing tests for all implementations.

🇺🇸|EnglishTranslated

AI & Machine Learningsammcj/agentic-coding

deepeval

Use when discussing or working with DeepEval (the python AI evaluation framework)

🇺🇸|EnglishTranslated

Testing & QAstablyai/agent-skills

stably-sdk-rules

AI rules for writing tests with Stably Playwright SDK. Use this skill when writing or modifying Playwright tests with Stably AI features. Covers when to use Playwright vs Stably methods, plus minimal patterns for aiAssert, extract, getLocatorsByAI, agent.act, Inbox, and Google auth.

🇺🇸|EnglishTranslated

AI & Machine Learningmarcohefti/zero-context-l...

zcl

Orchestrator workflow for running ZeroContext Lab (ZCL) attempts/suites with deterministic artifacts, trace-backed evidence, and fast post-mortems (shim support for "agent only types tool name").

🇺🇸|EnglishTranslated

AI & Machine Learningaxiomhq/skills

writing-evals

Scaffolds evaluation suites for the Axiom AI SDK. Generates eval files, scorers, flag schemas, and config from natural-language descriptions. Use when creating evals, writing scorers, setting up flag schemas, or configuring axiom.config.ts.

🇺🇸|EnglishTranslated

16 scripts/Attention