run-automated-tests

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Skill: Run Automated Tests

Skill:执行自动化测试

Purpose

用途

Determine how a target repository expects automated tests to be executed (commands, frameworks, prerequisites, and scope), then run the best matching test suite(s) with a safety-first interaction policy.

确定目标仓库的自动化测试执行规范(包含命令、框架、前置条件和测试范围),随后遵循安全优先的交互策略执行匹配度最高的测试套件。

Core Objective

核心目标

Primary Goal: Produce test execution results with evidence-based command selection and safety guardrails.
Success Criteria (ALL must be met):
  1. Test plan discovered: Evidence sources identified (docs, CI configs, or build manifests)
  2. Commands selected: Appropriate test commands chosen based on mode (fast/ci/full) and constraints
  3. User confirmation obtained: Approval received before installing dependencies, using network, or starting services
  4. Tests executed: Commands run with captured output and exit codes
  5. Results summarized: Test Plan Summary produced with evidence, commands, execution status, and failures (if any)
Acceptance Test: Can a developer reproduce the test execution by following the Test Plan Summary without additional context?

首要目标:通过实证型命令选择和安全防护机制输出测试执行结果。
成功标准(必须全部满足):
  1. 已明确测试方案:找到可信的参考依据来源(文档、CI配置或构建清单)
  2. 已选定执行命令:根据运行模式(快速/CI/全量)和约束条件选择合适的测试命令
  3. 已获得用户确认:在安装依赖、使用网络、启动服务前获得用户批准
  4. 已完成测试执行:运行命令并捕获输出内容和退出码
  5. 已输出结果汇总:生成包含参考依据、执行命令、运行状态、故障信息(如有)的测试方案汇总
验收标准:开发者是否可以仅通过测试方案汇总复现测试执行过程,无需额外上下文信息?

Scope Boundaries

适用范围边界

This skill handles:
  • Discovering test commands from repository evidence (docs, CI, build manifests)
  • Selecting appropriate test commands based on mode and constraints
  • Executing tests with safety guardrails and user confirmation
  • Summarizing test results with evidence and failure diagnostics
This skill does NOT handle:
  • Test quality assessment or coverage analysis (use
    review-testing
    )
  • Fixing failing tests or debugging test failures (use
    run-repair-loop
    )
  • Writing new tests or test infrastructure (use development skills)
  • Reviewing test code for best practices (use
    review-testing
    )
Handoff point: When tests complete (pass or fail), hand off to
run-repair-loop
for fixing failures or
review-testing
for quality assessment.
本Skill支持的能力
  • 从代码仓库可信来源(文档、CI配置、构建清单)中识别测试命令
  • 根据运行模式和约束条件选择合适的测试命令
  • 基于安全防护机制和用户确认执行测试
  • 汇总测试结果,附带参考依据和故障诊断信息
本Skill不支持的能力
  • 测试质量评估或覆盖率分析(请使用
    review-testing
  • 修复失败测试或调试测试故障(请使用
    run-repair-loop
  • 编写新测试或测试基础设施(请使用开发类Skill)
  • 评审测试代码的最佳实践合规性(请使用
    review-testing
交接节点:测试完成(无论成功失败)后,如需修复故障请交接给
run-repair-loop
,如需质量评估请交接给
review-testing

Use Cases

适用场景

  • You cloned a repo and want the correct test command without guessing.
  • A repo has multiple test layers (unit/integration/e2e) and you need a safe default run plan.
  • CI is failing and you want to reproduce locally by running the same commands used in workflows.
  • 你克隆了一个仓库,不想盲目试错,想找到正确的测试执行命令
  • 仓库包含多层测试体系(单元/集成/端到端),你需要安全的默认运行方案
  • CI运行失败,你希望通过执行工作流中相同的命令在本地复现问题

Behavior

运行逻辑

  1. Establish scope and constraints (ask if ambiguous)
    • If the user did not specify, default to a fast, local, non-destructive run:
      • Unit tests only, no external services, no Docker, no network-dependent setup.
    • Ask the user to choose a mode if needed:
      • fast
        : unit tests only, minimal setup.
      • ci
        : mirror CI workflow commands as closely as possible.
      • full
        : include integration/e2e tests and service dependencies.
    • Ask whether Docker is allowed, whether network access is allowed, and whether installing dependencies is allowed.
  2. Discover the test plan (evidence-based)
    • Read these sources in order; stop early if a clear, explicit test command is found:
      • README.md
        ,
        CONTRIBUTING.md
        ,
        TESTING.md
        ,
        docs/testing*
        ,
        Makefile
      • CI configs:
        .github/workflows/*.yml
        ,
        .gitlab-ci.yml
        ,
        azure-pipelines.yml
        ,
        Jenkinsfile
      • Build manifests:
        package.json
        ,
        pyproject.toml
        ,
        setup.cfg
        ,
        tox.ini
        ,
        go.mod
        ,
        pom.xml
        ,
        build.gradle*
        ,
        *.csproj
        ,
        Cargo.toml
    • Identify:
      • Primary test entrypoints (
        npm test
        ,
        pnpm test
        ,
        yarn test
        ,
        pytest
        ,
        tox
        ,
        go test
        ,
        dotnet test
        ,
        mvn test
        ,
        gradle test
        ,
        cargo test
        , etc.)
      • Test layers and markers (unit vs integration vs e2e)
      • Environment prerequisites (DB, Redis, Docker Compose, required env vars, secrets)
      • How CI sets up dependencies (services, caches, artifacts)
    • Prefer explicit instructions found in docs or CI over heuristics.
  3. Select an execution plan
    • If
      ci
      mode: derive the run sequence from the repo's CI workflow steps (closest match).
    • If
      fast
      mode: pick the most direct unit-test command with the least prerequisites.
    • If multiple stacks exist (e.g., backend + frontend), propose running each stack separately in a deterministic order.
    • If the plan requires dependency installation or service startup, request confirmation before proceeding.
  4. Execute with guardrails
    • Always print the exact commands you will run before running them.
    • Use a working directory rooted at the target repo (default
      .
      ).
    • Capture and summarize failures:
      • First failing command and exit code
      • The most relevant error excerpt
      • Next actions (missing toolchain, missing env var, service not running, etc.)
    • Avoid destructive operations:
      • Do not run
        rm -rf
        ,
        git clean -fdx
        ,
        docker system prune
        , or database drop/migrate commands without explicit user approval.
    • If the repo requires secrets, do not ask the user to paste secrets into chat. Prefer
      .env
      files, secret managers, or documented local dev flows.
  1. 明确范围和约束(存在歧义时主动询问用户)
    • 若用户未指定,默认使用快速、本地、无破坏性的运行模式:
      • 仅执行单元测试,不启动外部服务、不使用Docker、不执行依赖网络的初始化操作
    • 必要时请用户选择运行模式:
      • fast
        :仅执行单元测试,最少化初始化操作
      • ci
        :尽可能镜像CI工作流的执行命令
      • full
        :包含集成/端到端测试和服务依赖
    • 询问用户是否允许使用Docker、是否允许网络访问、是否允许安装依赖
  2. 基于实证确定测试方案
    • 按以下顺序读取参考来源,若找到明确的测试命令可提前终止:
      • README.md
        CONTRIBUTING.md
        TESTING.md
        docs/testing*
        Makefile
      • CI配置:
        .github/workflows/*.yml
        .gitlab-ci.yml
        azure-pipelines.yml
        Jenkinsfile
      • 构建清单:
        package.json
        pyproject.toml
        setup.cfg
        tox.ini
        go.mod
        pom.xml
        build.gradle*
        *.csproj
        Cargo.toml
    • 识别以下信息:
      • 核心测试入口(
        npm test
        pnpm test
        yarn test
        pytest
        tox
        go test
        dotnet test
        mvn test
        gradle test
        cargo test
        等)
      • 测试层级和标记(单元/集成/端到端)
      • 环境前置要求(数据库、Redis、Docker Compose、所需环境变量、密钥)
      • CI的依赖初始化方式(服务、缓存、制品)
    • 优先选择文档或CI中的明确说明,而非启发式推测的结果
  3. 选择执行方案
    • 若为
      ci
      模式:从仓库的CI工作流步骤中推导最匹配的运行序列
    • 若为
      fast
      模式:选择前置要求最少的最直接单元测试命令
    • 若存在多个技术栈(例如后端+前端),建议按确定顺序分别执行各栈的测试
    • 若方案需要安装依赖或启动服务,执行前需先请求用户确认
  4. 带防护机制执行
    • 执行命令前务必打印即将运行的完整命令内容
    • 工作目录默认设为目标仓库根目录(默认
      .
    • 捕获并汇总故障信息:
      • 首个失败的命令和退出码
      • 最相关的错误片段
      • 后续处理建议(缺失工具链、缺失环境变量、服务未运行等)
    • 避免破坏性操作:
      • 没有用户明确批准的情况下,不要运行
        rm -rf
        git clean -fdx
        docker system prune
        或者数据库删除/迁移命令
    • 若仓库需要密钥,不要要求用户在聊天中粘贴密钥,优先推荐
      .env
      文件、密钥管理器或官方文档说明的本地开发流程

Input & Output

输入与输出

Input
  • Target repository path (default
    .
    ).
  • Mode:
    fast
    (default),
    ci
    , or
    full
    .
  • Constraints: allow dependency install (yes/no), allow network (yes/no), allow Docker (yes/no).
Output
  • A short "Test Plan Summary" containing:
    • Evidence: which files/paths informed the plan
    • Chosen commands (in order)
    • Assumptions and prerequisites
    • What was executed and what was skipped (and why)
  • Command transcript snippets sufficient to debug failures (do not dump extremely long logs unless asked).
输入
  • 目标仓库路径(默认
    .
  • 运行模式:
    fast
    (默认)、
    ci
    full
  • 约束条件:是否允许安装依赖、是否允许使用网络、是否允许使用Docker
输出
  • 简短的「测试方案汇总」,包含:
    • 参考依据:方案参考的文件/路径
    • 选定的命令(按执行顺序)
    • 假设条件和前置要求
    • 已执行和已跳过的内容(及跳过原因)
  • 足够用于调试故障的命令片段(除非用户要求,不要输出超长日志)

Restrictions

限制规则

Hard Boundaries

硬边界

  • Do not invent test commands when evidence exists (prefer docs/CI).
  • Do not install dependencies, run Docker, or start external services without confirmation.
  • Do not modify repository files unless the user explicitly requests it (exception: generating a report file if the user asked for artifacts).
  • Do not exfiltrate secrets; do not request sensitive credentials in chat.
  • 存在可信参考依据时不要自行编造测试命令(优先选择文档/CI中的说明)
  • 未获得确认前不得安装依赖、运行Docker或启动外部服务
  • 除非用户明确要求,不得修改仓库文件(例外:若用户要求生成制品,可生成报告文件)
  • 不得泄露密钥;不得在聊天中索要敏感凭证

Skill Boundaries (Avoid Overlap)

技能边界(避免能力重叠)

Do NOT do these (other skills handle them):
  • Test quality assessment: Evaluating test coverage, test design, or testing best practices → Use
    review-testing
  • Fixing test failures: Debugging failing tests, repairing broken test code, or investigating root causes → Use
    run-repair-loop
  • Writing tests: Creating new test cases, test infrastructure, or test frameworks → Use development/implementation skills
  • Code review: Reviewing test code for quality, maintainability, or best practices → Use
    review-testing
  • Repository analysis: Comprehensive codebase structure analysis or architecture review → Use
    review-codebase
When to stop and hand off:
  • Tests fail and user asks "why?" or "how to fix?" → Hand off to
    run-repair-loop
    for debugging and repair
  • User asks "are these tests good?" or "what's our coverage?" → Hand off to
    review-testing
    for quality assessment
  • User asks "can you write tests for X?" → Hand off to development workflow for test implementation
  • Tests pass and user asks "what should we test next?" → Hand off to
    review-testing
    for test strategy recommendations
不要执行以下操作(对应能力由其他Skill提供)
  • 测试质量评估:评估测试覆盖率、测试设计或测试最佳实践 → 使用
    review-testing
  • 测试故障修复:调试失败测试、修复有问题的测试代码、排查根因 → 使用
    run-repair-loop
  • 测试编写:创建新测试用例、测试基础设施或测试框架 → 使用开发/实现类Skill
  • 代码评审:评审测试代码的质量、可维护性或最佳实践合规性 → 使用
    review-testing
  • 仓库分析:全面的代码库结构分析或架构评审 → 使用
    review-codebase
停止运行并触发交接的场景
  • 测试失败,用户询问「为什么失败?」或「怎么修复?」 → 交接给
    run-repair-loop
    进行调试和修复
  • 用户询问「这些测试质量怎么样?」或「我们的覆盖率是多少?」 → 交接给
    review-testing
    进行质量评估
  • 用户询问「你可以为X功能写测试吗?」 → 交接给开发工作流实现测试
  • 测试通过,用户询问「接下来我们应该测试什么?」 → 交接给
    review-testing
    提供测试策略建议

Self-Check

自检清单

Core Success Criteria (ALL must be met)

核心成功标准(必须全部满足)

  • Test plan discovered: Evidence sources identified (docs, CI configs, or build manifests)
  • Commands selected: Appropriate test commands chosen based on mode (fast/ci/full) and constraints
  • User confirmation obtained: Approval received before installing dependencies, using network, or starting services
  • Tests executed: Commands run with captured output and exit codes
  • Results summarized: Test Plan Summary produced with evidence, commands, execution status, and failures (if any)
  • 已明确测试方案:找到可信的参考依据来源(文档、CI配置或构建清单)
  • 已选定执行命令:根据运行模式(快速/CI/全量)和约束条件选择合适的测试命令
  • 已获得用户确认:在安装依赖、使用网络、启动服务前获得用户批准
  • 已完成测试执行:运行命令并捕获输出内容和退出码
  • 已输出结果汇总:生成包含参考依据、执行命令、运行状态、故障信息(如有)的测试方案汇总

Process Quality Checks

流程质量检查

  • Evidence-based selection: Did I identify at least one authoritative test instruction source (doc file, CI workflow, or build manifest)?
  • Safety guardrails applied: Did I ask for confirmation before any action that installs dependencies, uses network, starts Docker/services, or changes state?
  • Commands printed: Did I print the exact commands before running them?
  • Failures diagnosed: If tests failed, did I provide the first failing command, exit code, and likely root cause category?
  • No destructive operations: Did I avoid running destructive commands (
    rm -rf
    ,
    git clean
    ,
    docker system prune
    , database drops) without explicit approval?
  • No secret exfiltration: Did I avoid requesting sensitive credentials in chat and prefer
    .env
    files or documented local dev flows?
  • 基于实证选择:我是否找到了至少一个权威的测试说明来源(文档文件、CI工作流或构建清单)?
  • 已应用安全防护:执行安装依赖、使用网络、启动Docker/服务、修改状态的操作前,我是否询问了用户确认?
  • 已打印命令内容:运行命令前我是否打印了完整的命令内容?
  • 已完成故障诊断:如果测试失败,我是否提供了首个失败命令、退出码和可能的根因分类?
  • 无破坏性操作:没有用户明确批准的情况下,我是否避免了运行破坏性命令(
    rm -rf
    git clean
    docker system prune
    、数据库删除等)?
  • 无密钥泄露风险:我是否避免了在聊天中索要敏感凭证,优先推荐
    .env
    文件或官方文档说明的本地开发流程?

Acceptance Test

验收测试

Can a developer reproduce the test execution by following the Test Plan Summary without additional context?
If NO: Test Plan Summary is incomplete. Add missing evidence, commands, or prerequisites.
If YES: Skill execution is complete. Proceed to handoff if needed.
开发者是否可以仅通过测试方案汇总复现测试执行过程,无需额外上下文信息?
如果答案为否:测试方案汇总不完整,补充缺失的参考依据、命令或前置要求。
如果答案为是:Skill执行完成,必要时触发交接流程。

Examples

示例

Example 1: JavaScript repo with package.json
User: "Run tests for this repo."
Agent:
  1. Inspect
    package.json
    scripts and
    .github/workflows/*
    .
  2. Decide mode
    fast
    and propose:
    • npm test
      (or
      pnpm test
      /
      yarn test
      if the repo standardizes on it)
  3. Ask: "Install dependencies (
    npm ci
    ) and allow network?"
  4. Run:
    • npm ci
    • npm test
  5. Summarize results and point to failing test output if any.
Example 2 (Edge case): Monorepo with integration tests requiring Docker
User: "Mirror CI locally."
Agent:
  1. Parse
    .github/workflows/ci.yml
    and identify separate jobs:
    • backend unit tests
    • frontend tests
    • integration tests with
      docker compose
  2. Ask for confirmation:
    • allow Docker
    • allow network
    • which jobs to run (all vs only failing job)
  3. Execute in a controlled order:
    • install deps per job
    • run unit tests first
    • bring up services for integration tests
  4. If integration tests fail, summarize:
    • service health / port conflicts
    • missing env vars
    • how CI config differs from local

示例1:包含package.json的JavaScript仓库
用户:「运行这个仓库的测试。」
Agent:
  1. 检查
    package.json
    的scripts字段和
    .github/workflows/*
    路径
  2. 确定使用
    fast
    模式,提出方案:
    • npm test
      (如果仓库统一使用
      pnpm test
      /
      yarn test
      则对应调整)
  3. 询问用户:「是否允许安装依赖(
    npm ci
    )和使用网络?」
  4. 执行命令:
    • npm ci
    • npm test
  5. 汇总结果,如果有失败测试指向对应的失败输出
示例2(边界场景):包含需要Docker的集成测试的Monorepo
用户:「在本地镜像CI的运行效果。」
Agent:
  1. 解析
    .github/workflows/ci.yml
    ,识别独立的任务:
    • 后端单元测试
    • 前端测试
    • 依赖
      docker compose
      的集成测试
  2. 请求用户确认:
    • 是否允许使用Docker
    • 是否允许使用网络
    • 需要运行哪些任务(全部 vs 仅失败的任务)
  3. 按可控顺序执行:
    • 按任务要求安装依赖
    • 先运行单元测试
    • 为集成测试启动所需服务
  4. 如果集成测试失败,汇总信息:
    • 服务健康状态/端口冲突
    • 缺失的环境变量
    • CI配置和本地环境的差异

Appendix: Output contract

附录:输出契约

Each skill execution MUST produce a Test Plan Summary in this exact JSON format:
json
{
  "test_plan_summary": {
    "mode": "fast | ci | full",
    "evidence": ["path/to/source1", "path/to/source2"],
    "commands": [
      {"command": "npm test", "purpose": "run unit tests", "order": 1}
    ],
    "prerequisites": ["npm ci", "Docker running"],
    "executed": ["npm ci", "npm test"],
    "skipped": ["integration tests - require Docker"],
    "result": {
      "status": "passed | failed | blocked",
      "exit_code": 0,
      "first_failure": {
        "command": "npm test",
        "exit_code": 1,
        "error_excerpt": "FAIL src/utils.test.js"
      }
    }
  }
}
ElementTypeDescription
mode
stringSelected mode:
fast
,
ci
, or
full
evidence
arraySource files that informed the test plan
commands
arraySelected test commands with purpose and order
prerequisites
arrayRequired setup steps
executed
arrayCommands actually run
skipped
arrayCommands skipped and reason
result.status
string
passed
,
failed
, or
blocked
result.exit_code
numberExit code of test command
result.first_failure
objectFirst failure details (if any)
This schema enables Agent consumption without prose parsing.
每次Skill执行必须输出严格符合以下JSON格式的测试方案汇总
json
{
  "test_plan_summary": {
    "mode": "fast | ci | full",
    "evidence": ["path/to/source1", "path/to/source2"],
    "commands": [
      {"command": "npm test", "purpose": "run unit tests", "order": 1}
    ],
    "prerequisites": ["npm ci", "Docker running"],
    "executed": ["npm ci", "npm test"],
    "skipped": ["integration tests - require Docker"],
    "result": {
      "status": "passed | failed | blocked",
      "exit_code": 0,
      "first_failure": {
        "command": "npm test",
        "exit_code": 1,
        "error_excerpt": "FAIL src/utils.test.js"
      }
    }
  }
}
字段类型说明
mode
string选定的运行模式:
fast
ci
full
evidence
array测试方案参考的源文件列表
commands
array选定的测试命令列表,包含用途和执行顺序
prerequisites
array所需的初始化步骤列表
executed
array实际运行的命令列表
skipped
array跳过的命令及原因列表
result.status
string运行状态:
passed
(通过)、
failed
(失败)或
blocked
(阻塞)
result.exit_code
number测试命令的退出码
result.first_failure
object首个失败的详情(如有)
该Schema支持Agent直接读取使用,无需解析自然语言文本。