run-test-plan
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseRun Test Plan
运行测试计划
Execute a YAML test plan, run setup commands, health checks, and each test sequentially. Stop on first failure with rich debug output.
执行YAML测试计划,依次运行设置命令、健康检查和各项测试。首次失败时停止并输出丰富的调试信息。
Prerequisites
前置条件
- agent-browser skill: Browser tests require the skill to be available
agent-browser:agent-browser
- agent-browser skill:浏览器测试需要技能可用
agent-browser:agent-browser
Arguments
参数
- : Path to test plan (default:
--plan <path>)docs/testing/test-plan.yaml - : Skip setup commands and health checks (for re-running after failure)
--skip-setup
- :测试计划路径(默认值:
--plan <path>)docs/testing/test-plan.yaml - :跳过设置命令和健康检查(用于失败后重新运行)
--skip-setup
Step 1: Parse Test Plan
步骤1:解析测试计划
Read and validate the test plan:
bash
undefined读取并验证测试计划:
bash
undefinedCheck file exists
检查文件是否存在
ls docs/testing/test-plan.yaml || { echo "Error: Test plan not found"; exit 1; }
ls docs/testing/test-plan.yaml || { echo "Error: Test plan not found"; exit 1; }
Validate YAML
验证YAML格式
python3 -c "import yaml; yaml.safe_load(open('docs/testing/test-plan.yaml'))" || { echo "Error: Invalid YAML"; exit 1; }
Extract from the YAML:
- `setup.commands`: List of setup commands
- `setup.health_checks`: List of URLs to poll
- `tests`: Array of test casespython3 -c "import yaml; yaml.safe_load(open('docs/testing/test-plan.yaml'))" || { echo "Error: Invalid YAML"; exit 1; }
从YAML中提取内容:
- `setup.commands`:设置命令列表
- `setup.health_checks`:轮询的URL列表
- `tests`:测试用例数组Step 2: Run Setup (unless --skip-setup)
步骤2:运行设置(除非使用--skip-setup)
2a. Check Prerequisites
2a. 检查前置条件
If exists, verify each one:
setup.prerequisitesbash
undefined如果存在,验证每一项:
setup.prerequisitesbash
undefinedFor each prerequisite in setup.prerequisites
针对setup.prerequisites中的每一项前置条件
<prerequisite.check> || { echo "Prerequisite not met: <prerequisite.name>"; exit 1; }
undefined<prerequisite.check> || { echo "Prerequisite not met: <prerequisite.name>"; exit 1; }
undefined2b. Set Environment Variables
2b. 设置环境变量
If exists, export each variable. Variables using syntax should be resolved from the current environment:
setup.env${VAR}bash
undefined如果存在,导出每个变量。使用语法的变量应从当前环境解析:
setup.env${VAR}bash
undefinedFor each key/value in setup.env
针对setup.env中的每一组键值对
export <key>="<value>"
undefinedexport <key>="<value>"
undefined2c. Build
2c. 构建
If exists, execute build commands sequentially:
setup.buildbash
undefined如果存在,依次执行构建命令:
setup.buildbash
undefinedFor each command in setup.build
针对setup.build中的每一条命令
<command> || { echo "Build failed: <command>"; exit 1; }
undefined<command> || { echo "Build failed: <command>"; exit 1; }
undefined2d. Start Services
2d. 启动服务
If exists, start long-running processes and wait for health checks:
setup.servicesbash
undefined如果存在,启动长期运行的进程并等待健康检查:
setup.servicesbash
undefinedFor each service in setup.services
针对setup.services中的每一项服务
nohup <service.command> > .beagle/service-<index>.log 2>&1 &
echo $! > .beagle/service-<index>.pid
For each service with a `health_check`, poll until ready:
```bash
timeout=<service.health_check.timeout or 30>
url=<service.health_check.url>
elapsed=0
while [ $elapsed -lt $timeout ]; do
if curl -s -o /dev/null -w "%{http_code}" "$url" | grep -qE "^(200|301|302)"; then
echo "✓ Health check passed: $url"
break
fi
sleep 2
elapsed=$((elapsed + 2))
done
if [ $elapsed -ge $timeout ]; then
echo "✗ Health check timeout: $url"
exit 1
finohup <service.command> > .beagle/service-<index>.log 2>&1 &
echo $! > .beagle/service-<index>.pid
对于带有`health_check`的服务,轮询直到就绪:
```bash
timeout=<service.health_check.timeout or 30>
url=<service.health_check.url>
elapsed=0
while [ $elapsed -lt $timeout ]; do
if curl -s -o /dev/null -w "%{http_code}" "$url" | grep -qE "^(200|301|302)"; then
echo "✓ Health check passed: $url"
break
fi
sleep 2
elapsed=$((elapsed + 2))
done
if [ $elapsed -ge $timeout ]; then
echo "✗ Health check timeout: $url"
exit 1
fi2e. Legacy Setup Format
2e. 旧版设置格式
If the plan uses the older flat format ( + instead of //), fall back to executing sequentially and polling as before.
setup.commandssetup.health_checksprerequisitesbuildservicessetup.commandssetup.health_checks如果计划使用旧版扁平格式( + 而非//),则回退到依次执行并按之前的方式轮询。
setup.commandssetup.health_checksprerequisitesbuildservicessetup.commandssetup.health_checksStep 4: Execute Tests Sequentially
步骤4:依次执行测试
For each test in the plan:
针对计划中的每个测试:
4a. Log Test Start
4a. 记录测试开始
markdown
undefinedmarkdown
undefinedRunning: TC-XX - <test.name>
运行中:TC-XX - <test.name>
Context: <test.context>
undefined上下文:<test.context>
undefined4b. Execute Steps
4b. 执行步骤
For each step in , determine the step type and execute accordingly:
test.stepsShell commands ( steps):
run:The most common step type. Execute the command via Bash and capture stdout, stderr, and exit code:
bash
undefined针对中的每个步骤,确定步骤类型并执行:
test.stepsShell命令(步骤):
run:最常见的步骤类型。通过Bash执行命令并捕获标准输出、标准错误和退出码:
bash
undefinedExecute the command, capture output and exit code
执行命令,捕获输出和退出码
<command> 2>&1
echo "EXIT_CODE: $?"
Capture all output for evaluation in step 4c. Shell steps cover:
- CLI binary invocations (e.g., `./target/debug/myapp status --all`)
- Database queries (e.g., `psql "${DATABASE_URL}" -c "SELECT ..."`)
- File inspection (e.g., `ls -la /path/to/expected/output`)
- Process lifecycle checks (e.g., `timeout 5 ./myapp 2>&1 || true`)
- Any other command a human would type in a terminal
**curl actions (`action: curl` steps):**
```bash
curl -X <method> \
-H "Content-Type: application/json" \
<additional headers> \
-d '<body>' \
"<url>" \
-o response.json \
-w "%{http_code}" > status_code.txt<command> 2>&1
echo "EXIT_CODE: $?"
捕获所有输出以便在步骤4c中评估。Shell步骤涵盖:
- CLI二进制调用(例如:`./target/debug/myapp status --all`)
- 数据库查询(例如:`psql "${DATABASE_URL}" -c "SELECT ..."`)
- 文件检查(例如:`ls -la /path/to/expected/output`)
- 进程生命周期检查(例如:`timeout 5 ./myapp 2>&1 || true`)
- 人类在终端中输入的任何其他命令
**curl操作(`action: curl`步骤):**
```bash
curl -X <method> \
-H "Content-Type: application/json" \
<additional headers> \
-d '<body>' \
"<url>" \
-o response.json \
-w "%{http_code}" > status_code.txtCapture response for evaluation
捕获响应以便评估
cat response.json
cat status_code.txt
**agent-browser CLI actions:**
Steps starting with `agent-browser` are browser automation commands:
```bashcat response.json
cat status_code.txt
**agent-browser CLI操作:**
以`agent-browser`开头的步骤是浏览器自动化命令:
```bashNavigate
导航
agent-browser open <url>
agent-browser open <url>
Snapshot interactive elements (always do before interacting)
快照交互元素(交互前必须执行)
agent-browser snapshot -i
agent-browser snapshot -i
Interact using refs from snapshot output (@e1, @e2, etc.)
使用快照输出中的引用(@e1、@e2等)进行交互
agent-browser fill @<ref> "<value>"
agent-browser click @<ref>
agent-browser fill @<ref> "<value>"
agent-browser click @<ref>
Wait for conditions
等待条件
agent-browser wait --url "<pattern>"
agent-browser wait --text "<text>"
agent-browser wait --load networkidle
agent-browser wait --url "<pattern>"
agent-browser wait --text "<text>"
agent-browser wait --load networkidle
Capture evidence
捕获证据
agent-browser screenshot docs/testing/evidence/<test.id>.png
**Important:** Always run `agent-browser snapshot -i` before interacting with elements to get valid refs, and re-snapshot after navigation or significant DOM changes.
Save screenshots to `docs/testing/evidence/<test.id>.png`agent-browser screenshot docs/testing/evidence/<test.id>.png
**重要提示:** 在与元素交互前务必运行`agent-browser snapshot -i`以获取有效引用,在导航或DOM发生重大变化后重新快照。
将截图保存到`docs/testing/evidence/<test.id>.png`4c. Evaluate Result
4c. 评估结果
Using agent reasoning, compare actual outcome against :
test.expected- Read the expected behavior description
- Compare with actual response/screenshot
- Determine PASS or FAIL
使用agent推理能力,将实际结果与进行比较:
test.expected- 读取预期行为描述
- 与实际响应/截图对比
- 判断通过或失败
4d. On PASS
4d. 测试通过时
markdown
✓ TC-XX PASSED: <test.name>Continue to next test.
markdown
✓ TC-XX 通过:<test.name>继续执行下一个测试。
4e. On FAIL
4e. 测试失败时
Stop immediately. Go to Step 6.
立即停止。进入步骤6。
Step 5: On All Tests Pass
步骤5:所有测试通过时
markdown
undefinedmarkdown
undefinedTest Results: ALL PASSED
测试结果:全部通过
| ID | Name | Result |
|---|---|---|
| TC-01 | <name> | ✓ PASS |
| TC-02 | <name> | ✓ PASS |
| ... | ... | ... |
Total: N/N tests passed
| ID | 名称 | 结果 |
|---|---|---|
| TC-01 | <name> | ✓ 通过 |
| TC-02 | <name> | ✓ 通过 |
| ... | ... | ... |
总计: N/N 测试通过
Evidence
证据
Screenshots saved to
docs/testing/evidence/截图已保存至
docs/testing/evidence/Cleanup
清理
Stopping background services...
Clean up:
```bash正在停止后台服务...
执行清理:
```bashKill background services
终止后台服务
for pidfile in .beagle/service-*.pid .beagle/dev-server.pid; do
if [ -f "$pidfile" ]; then
kill $(cat "$pidfile") 2>/dev/null
rm "$pidfile"
fi
done
undefinedfor pidfile in .beagle/service-*.pid .beagle/dev-server.pid; do
if [ -f "$pidfile" ]; then
kill $(cat "$pidfile") 2>/dev/null
rm "$pidfile"
fi
done
undefinedStep 6: On Failure - Generate Debug Prompt
步骤6:测试失败时 - 生成调试提示
When a test fails, generate rich debug output:
当测试失败时,生成丰富的调试输出:
6a. Gather Context
6a. 收集上下文
bash
undefinedbash
undefinedGet changed files relevant to the failure
获取与失败相关的变更文件
git diff --name-only $(git merge-base HEAD origin/main)..HEAD
git diff --name-only $(git merge-base HEAD origin/main)..HEAD
Get recent changes in files mentioned in test.context
获取test.context中提及文件的近期变更
git diff $(git merge-base HEAD origin/main)..HEAD -- <relevant_files>
undefinedgit diff $(git merge-base HEAD origin/main)..HEAD -- <relevant_files>
undefined6b. Output Debug Report
6b. 输出调试报告
markdown
undefinedmarkdown
undefinedTest Failure: TC-XX - <test.name>
测试失败:TC-XX - <test.name>
What Failed
失败内容
Test: <test.name>
Expected:
<test.expected>
Actual:
<Describe what actually happened - response code, error message, screenshot description>
测试: <test.name>
预期:
<test.expected>
实际:
<描述实际发生的情况 - 响应码、错误信息、截图说明>
Relevant Changes in This PR
本次PR中的相关变更
<For each file mentioned in test.context or related to the failure:>
- `<file>` (lines X-Y) - <brief description of changes>
<针对test.context中提及或与失败相关的每个文件:>
- (第X-Y行)- <变更简要说明>
<file>
Evidence
证据
<If screenshot exists:>
- Screenshot: `docs/testing/evidence/<test.id>.png`
<If API response:>
- Status code: <code>
- Response body:
```json
<response>
```
<如果存在截图:>
- 截图:
docs/testing/evidence/<test.id>.png
<如果存在API响应:>
- 状态码:<code>
- 响应体:
json
<response>Error Details
错误详情
<If error message in response or logs:>
```
<error message>
```
<如果响应或日志中有错误信息:>
<error message>Suggested Investigation
建议的调查方向
<Based on the error, suggest 2-3 specific things to check:>
- <First thing to check based on error type>
- <Second thing related to changed files>
- <Third thing about environment/setup>
<基于错误,建议2-3项具体检查内容:>
- <基于错误类型的首要检查内容>
- <与变更文件相关的次要检查内容>
- <关于环境/设置的检查内容>
Debug Session Prompt
调试会话提示
Copy this to start a new Claude session:
I'm debugging a test failure in branch .
<branch>Test: <test.name>
Error: <brief error description>
<Summarize what the test was checking and what went wrong>
Relevant files:
<List changed files related to this test>
Help me investigate why <specific failure reason>.
undefined复制以下内容启动新的Claude会话:
我正在分支中调试测试失败问题。
<branch>测试: <test.name>
错误: <错误简要描述>
<总结测试的检查内容及失败原因>
相关文件:
<列出与该测试相关的变更文件>
请帮助我调查为什么会出现<具体失败原因>。
undefined6c. Preserve Evidence
6c. 保存证据
bash
undefinedbash
undefinedEnsure evidence directory exists
确保证据目录存在
mkdir -p docs/testing/evidence
mkdir -p docs/testing/evidence
Save failure context
保存失败上下文
cat > docs/testing/evidence/<test.id>-failure.md << 'EOF'
cat > docs/testing/evidence/<test.id>-failure.md << 'EOF'
Failure Report: <test.id>
失败报告:<test.id>
<Full debug report content>
EOF
```
<完整调试报告内容>
EOF
undefined6d. Cleanup and Exit
6d. 清理并退出
bash
undefinedbash
undefinedKill background services
终止后台服务
for pidfile in .beagle/service-*.pid .beagle/dev-server.pid; do
if [ -f "$pidfile" ]; then
kill $(cat "$pidfile") 2>/dev/null
rm "$pidfile"
fi
done
undefinedfor pidfile in .beagle/service-*.pid .beagle/dev-server.pid; do
if [ -f "$pidfile" ]; then
kill $(cat "$pidfile") 2>/dev/null
rm "$pidfile"
fi
done
undefinedTest Results Summary Table
测试结果汇总表
Always output a summary table showing progress:
markdown
undefined始终输出显示进度的汇总表:
markdown
undefinedTest Results
测试结果
| ID | Name | Result |
|---|---|---|
| TC-01 | <name> | ✓ PASS |
| TC-02 | <name> | ✗ FAIL |
| TC-03 | <name> | - SKIP |
Passed: 1/3
Failed: TC-02
Tests after a failure are marked as SKIP (not executed).| ID | 名称 | 结果 |
|---|---|---|
| TC-01 | <name> | ✓ 通过 |
| TC-02 | <name> | ✗ 失败 |
| TC-03 | <name> | - 跳过 |
通过: 1/3
失败: TC-02
失败后的测试标记为SKIP(未执行)。Verification
验证
Before completing:
bash
undefined完成前执行:
bash
undefinedVerify evidence directory exists
验证证据目录存在
ls -la docs/testing/evidence/
ls -la docs/testing/evidence/
List captured evidence
列出捕获的证据
ls docs/testing/evidence/.png docs/testing/evidence/.md 2>/dev/null
**Verification Checklist:**
- [ ] Setup commands executed successfully
- [ ] Health checks passed before test execution
- [ ] Each executed test has recorded result
- [ ] Evidence captured in `docs/testing/evidence/`
- [ ] On failure: debug prompt includes expected vs actual
- [ ] On failure: relevant PR changes listed
- [ ] Background processes cleaned upls docs/testing/evidence/.png docs/testing/evidence/.md 2>/dev/null
**验证清单:**
- [ ] 设置命令执行成功
- [ ] 测试执行前健康检查通过
- [ ] 每个已执行测试都记录了结果
- [ ] 证据已捕获至`docs/testing/evidence/`
- [ ] 失败时:调试提示包含预期与实际对比
- [ ] 失败时:列出了相关PR变更
- [ ] 后台进程已清理
- [ ] 失败证据已保存用于调试
- [ ] 调试提示可直接复制粘贴到新会话Rules
规则
- Stop on first test failure (do not continue to other tests)
- Always capture evidence (screenshots, responses)
- Include file:line references in debug prompts when possible
- Use flag to re-run after fixing issues
--skip-setup - Never hardcode secrets - use environment variables
- Clean up background processes even on failure
- Preserve failure evidence for debugging
- Make debug prompts copy-paste ready for new sessions
- 首次测试失败即停止(不继续执行其他测试)
- 始终捕获证据(截图、响应)
- 调试提示中尽可能包含文件:行号引用
- 使用标志在修复问题后重新运行
--skip-setup - 切勿硬编码密钥 - 使用环境变量
- 即使失败也要清理后台进程
- 保存失败证据用于调试
- 使调试提示可直接复制粘贴到新会话