run-test-plan

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Run Test Plan

运行测试计划

Execute a YAML test plan, run setup commands, health checks, and each test sequentially. Stop on first failure with rich debug output.
执行YAML测试计划,依次运行设置命令、健康检查和各项测试。首次失败时停止并输出丰富的调试信息。

Prerequisites

前置条件

  • agent-browser skill: Browser tests require the
    agent-browser:agent-browser
    skill to be available
  • agent-browser skill:浏览器测试需要
    agent-browser:agent-browser
    技能可用

Arguments

参数

  • --plan <path>
    : Path to test plan (default:
    docs/testing/test-plan.yaml
    )
  • --skip-setup
    : Skip setup commands and health checks (for re-running after failure)
  • --plan <path>
    :测试计划路径(默认值:
    docs/testing/test-plan.yaml
  • --skip-setup
    :跳过设置命令和健康检查(用于失败后重新运行)

Step 1: Parse Test Plan

步骤1:解析测试计划

Read and validate the test plan:
bash
undefined
读取并验证测试计划:
bash
undefined

Check file exists

检查文件是否存在

ls docs/testing/test-plan.yaml || { echo "Error: Test plan not found"; exit 1; }
ls docs/testing/test-plan.yaml || { echo "Error: Test plan not found"; exit 1; }

Validate YAML

验证YAML格式

python3 -c "import yaml; yaml.safe_load(open('docs/testing/test-plan.yaml'))" || { echo "Error: Invalid YAML"; exit 1; }

Extract from the YAML:
- `setup.commands`: List of setup commands
- `setup.health_checks`: List of URLs to poll
- `tests`: Array of test cases
python3 -c "import yaml; yaml.safe_load(open('docs/testing/test-plan.yaml'))" || { echo "Error: Invalid YAML"; exit 1; }

从YAML中提取内容:
- `setup.commands`:设置命令列表
- `setup.health_checks`:轮询的URL列表
- `tests`:测试用例数组

Step 2: Run Setup (unless --skip-setup)

步骤2:运行设置(除非使用--skip-setup)

2a. Check Prerequisites

2a. 检查前置条件

If
setup.prerequisites
exists, verify each one:
bash
undefined
如果存在
setup.prerequisites
,验证每一项:
bash
undefined

For each prerequisite in setup.prerequisites

针对setup.prerequisites中的每一项前置条件

<prerequisite.check> || { echo "Prerequisite not met: <prerequisite.name>"; exit 1; }
undefined
<prerequisite.check> || { echo "Prerequisite not met: <prerequisite.name>"; exit 1; }
undefined

2b. Set Environment Variables

2b. 设置环境变量

If
setup.env
exists, export each variable. Variables using
${VAR}
syntax should be resolved from the current environment:
bash
undefined
如果存在
setup.env
,导出每个变量。使用
${VAR}
语法的变量应从当前环境解析:
bash
undefined

For each key/value in setup.env

针对setup.env中的每一组键值对

export <key>="<value>"
undefined
export <key>="<value>"
undefined

2c. Build

2c. 构建

If
setup.build
exists, execute build commands sequentially:
bash
undefined
如果存在
setup.build
,依次执行构建命令:
bash
undefined

For each command in setup.build

针对setup.build中的每一条命令

<command> || { echo "Build failed: <command>"; exit 1; }
undefined
<command> || { echo "Build failed: <command>"; exit 1; }
undefined

2d. Start Services

2d. 启动服务

If
setup.services
exists, start long-running processes and wait for health checks:
bash
undefined
如果存在
setup.services
,启动长期运行的进程并等待健康检查:
bash
undefined

For each service in setup.services

针对setup.services中的每一项服务

nohup <service.command> > .beagle/service-<index>.log 2>&1 & echo $! > .beagle/service-<index>.pid

For each service with a `health_check`, poll until ready:

```bash
timeout=<service.health_check.timeout or 30>
url=<service.health_check.url>
elapsed=0

while [ $elapsed -lt $timeout ]; do
  if curl -s -o /dev/null -w "%{http_code}" "$url" | grep -qE "^(200|301|302)"; then
    echo "✓ Health check passed: $url"
    break
  fi
  sleep 2
  elapsed=$((elapsed + 2))
done

if [ $elapsed -ge $timeout ]; then
  echo "✗ Health check timeout: $url"
  exit 1
fi
nohup <service.command> > .beagle/service-<index>.log 2>&1 & echo $! > .beagle/service-<index>.pid

对于带有`health_check`的服务,轮询直到就绪:

```bash
timeout=<service.health_check.timeout or 30>
url=<service.health_check.url>
elapsed=0

while [ $elapsed -lt $timeout ]; do
  if curl -s -o /dev/null -w "%{http_code}" "$url" | grep -qE "^(200|301|302)"; then
    echo "✓ Health check passed: $url"
    break
  fi
  sleep 2
  elapsed=$((elapsed + 2))
done

if [ $elapsed -ge $timeout ]; then
  echo "✗ Health check timeout: $url"
  exit 1
fi

2e. Legacy Setup Format

2e. 旧版设置格式

If the plan uses the older flat format (
setup.commands
+
setup.health_checks
instead of
prerequisites
/
build
/
services
), fall back to executing
setup.commands
sequentially and polling
setup.health_checks
as before.
如果计划使用旧版扁平格式(
setup.commands
+
setup.health_checks
而非
prerequisites
/
build
/
services
),则回退到依次执行
setup.commands
并按之前的方式轮询
setup.health_checks

Step 4: Execute Tests Sequentially

步骤4:依次执行测试

For each test in the plan:
针对计划中的每个测试:

4a. Log Test Start

4a. 记录测试开始

markdown
undefined
markdown
undefined

Running: TC-XX - <test.name>

运行中:TC-XX - <test.name>

Context: <test.context>
undefined
上下文:<test.context>
undefined

4b. Execute Steps

4b. 执行步骤

For each step in
test.steps
, determine the step type and execute accordingly:
Shell commands (
run:
steps):
The most common step type. Execute the command via Bash and capture stdout, stderr, and exit code:
bash
undefined
针对
test.steps
中的每个步骤,确定步骤类型并执行:
Shell命令(
run:
步骤):
最常见的步骤类型。通过Bash执行命令并捕获标准输出、标准错误和退出码:
bash
undefined

Execute the command, capture output and exit code

执行命令,捕获输出和退出码

<command> 2>&1 echo "EXIT_CODE: $?"

Capture all output for evaluation in step 4c. Shell steps cover:
- CLI binary invocations (e.g., `./target/debug/myapp status --all`)
- Database queries (e.g., `psql "${DATABASE_URL}" -c "SELECT ..."`)
- File inspection (e.g., `ls -la /path/to/expected/output`)
- Process lifecycle checks (e.g., `timeout 5 ./myapp 2>&1 || true`)
- Any other command a human would type in a terminal

**curl actions (`action: curl` steps):**

```bash
curl -X <method> \
  -H "Content-Type: application/json" \
  <additional headers> \
  -d '<body>' \
  "<url>" \
  -o response.json \
  -w "%{http_code}" > status_code.txt
<command> 2>&1 echo "EXIT_CODE: $?"

捕获所有输出以便在步骤4c中评估。Shell步骤涵盖:
- CLI二进制调用(例如:`./target/debug/myapp status --all`)
- 数据库查询(例如:`psql "${DATABASE_URL}" -c "SELECT ..."`)
- 文件检查(例如:`ls -la /path/to/expected/output`)
- 进程生命周期检查(例如:`timeout 5 ./myapp 2>&1 || true`)
- 人类在终端中输入的任何其他命令

**curl操作(`action: curl`步骤):**

```bash
curl -X <method> \
  -H "Content-Type: application/json" \
  <additional headers> \
  -d '<body>' \
  "<url>" \
  -o response.json \
  -w "%{http_code}" > status_code.txt

Capture response for evaluation

捕获响应以便评估

cat response.json cat status_code.txt

**agent-browser CLI actions:**

Steps starting with `agent-browser` are browser automation commands:

```bash
cat response.json cat status_code.txt

**agent-browser CLI操作:**

以`agent-browser`开头的步骤是浏览器自动化命令:

```bash

Navigate

导航

agent-browser open <url>
agent-browser open <url>

Snapshot interactive elements (always do before interacting)

快照交互元素(交互前必须执行)

agent-browser snapshot -i
agent-browser snapshot -i

Interact using refs from snapshot output (@e1, @e2, etc.)

使用快照输出中的引用(@e1、@e2等)进行交互

agent-browser fill @<ref> "<value>" agent-browser click @<ref>
agent-browser fill @<ref> "<value>" agent-browser click @<ref>

Wait for conditions

等待条件

agent-browser wait --url "<pattern>" agent-browser wait --text "<text>" agent-browser wait --load networkidle
agent-browser wait --url "<pattern>" agent-browser wait --text "<text>" agent-browser wait --load networkidle

Capture evidence

捕获证据

agent-browser screenshot docs/testing/evidence/<test.id>.png

**Important:** Always run `agent-browser snapshot -i` before interacting with elements to get valid refs, and re-snapshot after navigation or significant DOM changes.

Save screenshots to `docs/testing/evidence/<test.id>.png`
agent-browser screenshot docs/testing/evidence/<test.id>.png

**重要提示:** 在与元素交互前务必运行`agent-browser snapshot -i`以获取有效引用,在导航或DOM发生重大变化后重新快照。

将截图保存到`docs/testing/evidence/<test.id>.png`

4c. Evaluate Result

4c. 评估结果

Using agent reasoning, compare actual outcome against
test.expected
:
  • Read the expected behavior description
  • Compare with actual response/screenshot
  • Determine PASS or FAIL
使用agent推理能力,将实际结果与
test.expected
进行比较:
  • 读取预期行为描述
  • 与实际响应/截图对比
  • 判断通过或失败

4d. On PASS

4d. 测试通过时

markdown
✓ TC-XX PASSED: <test.name>
Continue to next test.
markdown
✓ TC-XX 通过:<test.name>
继续执行下一个测试。

4e. On FAIL

4e. 测试失败时

Stop immediately. Go to Step 6.
立即停止。进入步骤6。

Step 5: On All Tests Pass

步骤5:所有测试通过时

markdown
undefined
markdown
undefined

Test Results: ALL PASSED

测试结果:全部通过

IDNameResult
TC-01<name>✓ PASS
TC-02<name>✓ PASS
.........
Total: N/N tests passed
ID名称结果
TC-01<name>✓ 通过
TC-02<name>✓ 通过
.........
总计: N/N 测试通过

Evidence

证据

Screenshots saved to
docs/testing/evidence/
截图已保存至
docs/testing/evidence/

Cleanup

清理

Stopping background services...

Clean up:
```bash
正在停止后台服务...

执行清理:
```bash

Kill background services

终止后台服务

for pidfile in .beagle/service-*.pid .beagle/dev-server.pid; do if [ -f "$pidfile" ]; then kill $(cat "$pidfile") 2>/dev/null rm "$pidfile" fi done
undefined
for pidfile in .beagle/service-*.pid .beagle/dev-server.pid; do if [ -f "$pidfile" ]; then kill $(cat "$pidfile") 2>/dev/null rm "$pidfile" fi done
undefined

Step 6: On Failure - Generate Debug Prompt

步骤6:测试失败时 - 生成调试提示

When a test fails, generate rich debug output:
当测试失败时,生成丰富的调试输出:

6a. Gather Context

6a. 收集上下文

bash
undefined
bash
undefined

Get changed files relevant to the failure

获取与失败相关的变更文件

git diff --name-only $(git merge-base HEAD origin/main)..HEAD
git diff --name-only $(git merge-base HEAD origin/main)..HEAD

Get recent changes in files mentioned in test.context

获取test.context中提及文件的近期变更

git diff $(git merge-base HEAD origin/main)..HEAD -- <relevant_files>
undefined
git diff $(git merge-base HEAD origin/main)..HEAD -- <relevant_files>
undefined

6b. Output Debug Report

6b. 输出调试报告

markdown
undefined
markdown
undefined

Test Failure: TC-XX - <test.name>

测试失败:TC-XX - <test.name>

What Failed

失败内容

Test: <test.name> Expected: <test.expected>
Actual: <Describe what actually happened - response code, error message, screenshot description>
测试: <test.name> 预期: <test.expected>
实际: <描述实际发生的情况 - 响应码、错误信息、截图说明>

Relevant Changes in This PR

本次PR中的相关变更

<For each file mentioned in test.context or related to the failure:> - `<file>` (lines X-Y) - <brief description of changes>
<针对test.context中提及或与失败相关的每个文件:>
  • <file>
    (第X-Y行)- <变更简要说明>

Evidence

证据

<If screenshot exists:> - Screenshot: `docs/testing/evidence/<test.id>.png` <If API response:> - Status code: <code> - Response body: ```json <response> ```
<如果存在截图:>
  • 截图:
    docs/testing/evidence/<test.id>.png
<如果存在API响应:>
  • 状态码:<code>
  • 响应体:
json
<response>

Error Details

错误详情

<If error message in response or logs:> ``` <error message> ```
<如果响应或日志中有错误信息:>
<error message>

Suggested Investigation

建议的调查方向

<Based on the error, suggest 2-3 specific things to check:>
  1. <First thing to check based on error type>
  2. <Second thing related to changed files>
  3. <Third thing about environment/setup>
<基于错误,建议2-3项具体检查内容:>
  1. <基于错误类型的首要检查内容>
  2. <与变更文件相关的次要检查内容>
  3. <关于环境/设置的检查内容>

Debug Session Prompt

调试会话提示

Copy this to start a new Claude session:

I'm debugging a test failure in branch
<branch>
.
Test: <test.name> Error: <brief error description>
<Summarize what the test was checking and what went wrong>
Relevant files: <List changed files related to this test>

Help me investigate why <specific failure reason>.

undefined
复制以下内容启动新的Claude会话:

我正在分支
<branch>
中调试测试失败问题。
测试: <test.name> 错误: <错误简要描述>
<总结测试的检查内容及失败原因>
相关文件: <列出与该测试相关的变更文件>

请帮助我调查为什么会出现<具体失败原因>。

undefined

6c. Preserve Evidence

6c. 保存证据

bash
undefined
bash
undefined

Ensure evidence directory exists

确保证据目录存在

mkdir -p docs/testing/evidence
mkdir -p docs/testing/evidence

Save failure context

保存失败上下文

cat > docs/testing/evidence/<test.id>-failure.md << 'EOF'
cat > docs/testing/evidence/<test.id>-failure.md << 'EOF'

Failure Report: <test.id>

失败报告:<test.id>

<Full debug report content> EOF ```
<完整调试报告内容> EOF
undefined

6d. Cleanup and Exit

6d. 清理并退出

bash
undefined
bash
undefined

Kill background services

终止后台服务

for pidfile in .beagle/service-*.pid .beagle/dev-server.pid; do if [ -f "$pidfile" ]; then kill $(cat "$pidfile") 2>/dev/null rm "$pidfile" fi done
undefined
for pidfile in .beagle/service-*.pid .beagle/dev-server.pid; do if [ -f "$pidfile" ]; then kill $(cat "$pidfile") 2>/dev/null rm "$pidfile" fi done
undefined

Test Results Summary Table

测试结果汇总表

Always output a summary table showing progress:
markdown
undefined
始终输出显示进度的汇总表:
markdown
undefined

Test Results

测试结果

IDNameResult
TC-01<name>✓ PASS
TC-02<name>✗ FAIL
TC-03<name>- SKIP
Passed: 1/3 Failed: TC-02

Tests after a failure are marked as SKIP (not executed).
ID名称结果
TC-01<name>✓ 通过
TC-02<name>✗ 失败
TC-03<name>- 跳过
通过: 1/3 失败: TC-02

失败后的测试标记为SKIP(未执行)。

Verification

验证

Before completing:
bash
undefined
完成前执行:
bash
undefined

Verify evidence directory exists

验证证据目录存在

ls -la docs/testing/evidence/
ls -la docs/testing/evidence/

List captured evidence

列出捕获的证据

ls docs/testing/evidence/.png docs/testing/evidence/.md 2>/dev/null

**Verification Checklist:**
- [ ] Setup commands executed successfully
- [ ] Health checks passed before test execution
- [ ] Each executed test has recorded result
- [ ] Evidence captured in `docs/testing/evidence/`
- [ ] On failure: debug prompt includes expected vs actual
- [ ] On failure: relevant PR changes listed
- [ ] Background processes cleaned up
ls docs/testing/evidence/.png docs/testing/evidence/.md 2>/dev/null

**验证清单:**
- [ ] 设置命令执行成功
- [ ] 测试执行前健康检查通过
- [ ] 每个已执行测试都记录了结果
- [ ] 证据已捕获至`docs/testing/evidence/`
- [ ] 失败时:调试提示包含预期与实际对比
- [ ] 失败时:列出了相关PR变更
- [ ] 后台进程已清理
- [ ] 失败证据已保存用于调试
- [ ] 调试提示可直接复制粘贴到新会话

Rules

规则

  • Stop on first test failure (do not continue to other tests)
  • Always capture evidence (screenshots, responses)
  • Include file:line references in debug prompts when possible
  • Use
    --skip-setup
    flag to re-run after fixing issues
  • Never hardcode secrets - use environment variables
  • Clean up background processes even on failure
  • Preserve failure evidence for debugging
  • Make debug prompts copy-paste ready for new sessions
  • 首次测试失败即停止(不继续执行其他测试)
  • 始终捕获证据(截图、响应)
  • 调试提示中尽可能包含文件:行号引用
  • 使用
    --skip-setup
    标志在修复问题后重新运行
  • 切勿硬编码密钥 - 使用环境变量
  • 即使失败也要清理后台进程
  • 保存失败证据用于调试
  • 使调试提示可直接复制粘贴到新会话