Loading...
Loading...
Found 143 Skills
DeepEval evaluation workflow for AI agents and LLM applications. TRIGGER when the user wants to evaluate or improve an AI agent, tool-using workflow, multi-turn chatbot, RAG pipeline, or LLM app; add evals; generate datasets or goldens; use deepeval generate; use deepeval test run; add tracing or @observe; send results to Confident AI; monitor production; run online evals; inspect traces; or iterate on prompts, tools, retrieval, or agent behavior from eval failures. AI agents are the primary use case. Covers Python SDK, pytest eval suites, CLI generation, tracing, Confident AI reporting, and agent-driven improvement loops. DO NOT TRIGGER for unrelated generic pytest, non-AI test setup, or non-DeepEval observability work unless the user asks to compare or migrate to DeepEval.
Expert in Python testing with pytest and test-driven development
Modern Python development with uv, ruff, mypy, and pytest. Use when: - Writing or reviewing Python code - Setting up Python projects or pyproject.toml - Choosing dependency management (uv, poetry, pip) - Configuring linting, formatting, or type checking - Organizing Python packages Keywords: Python, pyproject.toml, uv, ruff, mypy, pytest, type hints, virtual environment, lockfile, package structure
Creates test infrastructure with Vitest, xUnit, and pytest
Review generated or changed test code against universal testing rules before it ships. Best used reactively after an agent writes, edits, generates, or refactors tests, before presenting, committing, or merging them. Use for pytest (test_*.py, *_test.py), PHPUnit/Pest (*Test.php), Jest/Vitest (*.test.ts, *.spec.js), Go (*_test.go), files under tests/, __tests__/, or spec/, and review requests like 'write tests for X', 'add tests', 'test this', 'review these tests', or PR diffs containing tests. Can also guide test writing when explicitly invoked before the work. This skill is the quality gate that prevents AI-generated test bloat.
Test Temporal workflows with pytest, time-skipping, and mocking strategies. Covers unit testing, integration testing, replay testing, and local development setup. Use when implementing Temporal workflow tests or debugging test failures.
Guides the agent through running and writing Python tests with pytest. Triggered when users say "run tests", "write a test", "test this function", "add unit tests", "run pytest", "check test coverage", "debug failing test", "create test fixtures", "mock a dependency", or mention pytest, pytest-asyncio, pytest-cov, testing, unit tests, integration tests, test coverage, or test-driven development.
Test-driven development workflow enforcement for Python and React projects. Use when the user requests TDD, test-first development, or red-green-refactor methodology. Enforces strict cycle: write ONE failing test -> implement minimum code to pass -> refactor while green -> repeat. Applies to both backend (pytest) and frontend (Testing Library). Changes agent behavior to write tests before code. Does NOT provide testing patterns (use pytest-patterns or react-testing-patterns for how to write tests).
Optimizes Python library performance through profiling (cProfile, PyInstrument), memory analysis (memray, tracemalloc), benchmarking (pytest-benchmark), and optimization strategies. Use when analyzing performance bottlenecks, finding memory leaks, or setting up performance regression testing.
Use when running tests to validate implementations, collecting test evidence, or debugging failures. Load in TEST state. Covers unit tests (pytest/jest), API tests (curl), browser tests (Claude-in-Chrome), database verification. All results are code-verified, not LLM-judged.
Generate pytest test cases for Python functions and classes
Iterative code refinement through plan → code → evaluate → refine cycles. Runs lint checks (ruff), tests (pytest), and structured self-evaluation each cycle, then diagnoses failures and refines. Decomposes complex tasks into sequential phases, iterates up to 3 times per phase (10 total). Use when: the main agent delegates a code task with 'MODE: MORE_EFFORT', the user selects 'More Effort' code generation mode, or the task explicitly requests iterative refinement for higher code quality. Do NOT use for single-pass code generation (Lite mode), experiment pipeline orchestration (use experiment-pipeline), or diagnosing a specific experiment failure (use experiment-craft).