Determine how a target repository expects automated tests to be executed (commands, frameworks, prerequisites, and scope), then run the best matching test suite(s) with a safety-first interaction policy.
Core Objective
Primary Goal: Produce test execution results with evidence-based command selection and safety guardrails.
Success Criteria (ALL must be met):
✅ Test plan discovered: Evidence sources identified (docs, CI configs, or build manifests)
✅ Commands selected: Appropriate test commands chosen based on mode (fast/ci/full) and constraints
✅ User confirmation obtained: Approval received before installing dependencies, using network, or starting services
✅ Tests executed: Commands run with captured output and exit codes
✅ Results summarized: Test Plan Summary produced with evidence, commands, execution status, and failures (if any)
Acceptance Test: Can a developer reproduce the test execution by following the Test Plan Summary without additional context?
Scope Boundaries
This skill handles:
Discovering test commands from repository evidence (docs, CI, build manifests)
Selecting appropriate test commands based on mode and constraints
Executing tests with safety guardrails and user confirmation
Summarizing test results with evidence and failure diagnostics
This skill does NOT handle:
Test quality assessment or coverage analysis (use
review-testing
)
Fixing failing tests or debugging test failures (use
run-repair-loop
)
Writing new tests or test infrastructure (use development skills)
Reviewing test code for best practices (use
review-testing
)
Handoff point: When tests complete (pass or fail), hand off to
run-repair-loop
for fixing failures or
review-testing
for quality assessment.
Use Cases
You cloned a repo and want the correct test command without guessing.
A repo has multiple test layers (unit/integration/e2e) and you need a safe default run plan.
CI is failing and you want to reproduce locally by running the same commands used in workflows.
Behavior
Establish scope and constraints (ask if ambiguous)
If the user did not specify, default to a fast, local, non-destructive run:
Unit tests only, no external services, no Docker, no network-dependent setup.
Ask the user to choose a mode if needed:
fast
: unit tests only, minimal setup.
ci
: mirror CI workflow commands as closely as possible.
full
: include integration/e2e tests and service dependencies.
Ask whether Docker is allowed, whether network access is allowed, and whether installing dependencies is allowed.
Discover the test plan (evidence-based)
Read these sources in order; stop early if a clear, explicit test command is found:
README.md
,
CONTRIBUTING.md
,
TESTING.md
,
docs/testing*
,
Makefile
CI configs:
.github/workflows/*.yml
,
.gitlab-ci.yml
,
azure-pipelines.yml
,
Jenkinsfile
Build manifests:
package.json
,
pyproject.toml
,
setup.cfg
,
tox.ini
,
go.mod
,
pom.xml
,
build.gradle*
,
*.csproj
,
Cargo.toml
Identify:
Primary test entrypoints (
npm test
,
pnpm test
,
yarn test
,
pytest
,
tox
,
go test
,
dotnet test
,
mvn test
,
gradle test
,
cargo test
, etc.)
Test layers and markers (unit vs integration vs e2e)
Command transcript snippets sufficient to debug failures (do not dump extremely long logs unless asked).
Restrictions
Hard Boundaries
Do not invent test commands when evidence exists (prefer docs/CI).
Do not install dependencies, run Docker, or start external services without confirmation.
Do not modify repository files unless the user explicitly requests it (exception: generating a report file if the user asked for artifacts).
Do not exfiltrate secrets; do not request sensitive credentials in chat.
Skill Boundaries (Avoid Overlap)
Do NOT do these (other skills handle them):
Test quality assessment: Evaluating test coverage, test design, or testing best practices → Use
review-testing
Fixing test failures: Debugging failing tests, repairing broken test code, or investigating root causes → Use
run-repair-loop
Writing tests: Creating new test cases, test infrastructure, or test frameworks → Use development/implementation skills
Code review: Reviewing test code for quality, maintainability, or best practices → Use
review-testing
Repository analysis: Comprehensive codebase structure analysis or architecture review → Use
review-codebase
When to stop and hand off:
Tests fail and user asks "why?" or "how to fix?" → Hand off to
run-repair-loop
for debugging and repair
User asks "are these tests good?" or "what's our coverage?" → Hand off to
review-testing
for quality assessment
User asks "can you write tests for X?" → Hand off to development workflow for test implementation
Tests pass and user asks "what should we test next?" → Hand off to
review-testing
for test strategy recommendations
Self-Check
Core Success Criteria (ALL must be met)
Test plan discovered: Evidence sources identified (docs, CI configs, or build manifests)
Commands selected: Appropriate test commands chosen based on mode (fast/ci/full) and constraints
User confirmation obtained: Approval received before installing dependencies, using network, or starting services
Tests executed: Commands run with captured output and exit codes
Results summarized: Test Plan Summary produced with evidence, commands, execution status, and failures (if any)
Process Quality Checks
Evidence-based selection: Did I identify at least one authoritative test instruction source (doc file, CI workflow, or build manifest)?
Safety guardrails applied: Did I ask for confirmation before any action that installs dependencies, uses network, starts Docker/services, or changes state?
Commands printed: Did I print the exact commands before running them?
Failures diagnosed: If tests failed, did I provide the first failing command, exit code, and likely root cause category?
No destructive operations: Did I avoid running destructive commands (
rm -rf
,
git clean
,
docker system prune
, database drops) without explicit approval?
No secret exfiltration: Did I avoid requesting sensitive credentials in chat and prefer
.env
files or documented local dev flows?
Acceptance Test
Can a developer reproduce the test execution by following the Test Plan Summary without additional context?
If NO: Test Plan Summary is incomplete. Add missing evidence, commands, or prerequisites.
If YES: Skill execution is complete. Proceed to handoff if needed.
Examples
Example 1: JavaScript repo with package.json
User: "Run tests for this repo."
Agent:
Inspect
package.json
scripts and
.github/workflows/*
.
Decide mode
fast
and propose:
npm test
(or
pnpm test
/
yarn test
if the repo standardizes on it)
Ask: "Install dependencies (
npm ci
) and allow network?"
Run:
npm ci
npm test
Summarize results and point to failing test output if any.
Example 2 (Edge case): Monorepo with integration tests requiring Docker
User: "Mirror CI locally."
Agent:
Parse
.github/workflows/ci.yml
and identify separate jobs:
backend unit tests
frontend tests
integration tests with
docker compose
Ask for confirmation:
allow Docker
allow network
which jobs to run (all vs only failing job)
Execute in a controlled order:
install deps per job
run unit tests first
bring up services for integration tests
If integration tests fail, summarize:
service health / port conflicts
missing env vars
how CI config differs from local
Appendix: Output contract
Each skill execution MUST produce a Test Plan Summary in this exact JSON format: