compare-test-runs
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseCompare Test Runs
测试运行对比
Quick Start
快速开始
You'll typically receive two test run identifiers. Follow these steps:
- Run for both base and head test runs.
tuist test show <id> --json - Run and
tuist test module list <test-run-id> --jsonto get module and suite breakdowns.tuist test suite list <test-run-id> --json - Run to get individual test case results.
tuist test case run list <identifier> --json - Compare failures, flaky tests, durations, and overall status.
- Inspect failing test cases with .
tuist test case run show <id> --json - Summarize findings with actionable recommendations.
你通常会收到两个测试运行标识符,请遵循以下步骤:
- 分别针对基准和待对比的测试运行,执行 命令。
tuist test show <id> --json - 执行 和
tuist test module list <test-run-id> --json命令获取模块和测试套件的细分数据。tuist test suite list <test-run-id> --json - 执行 命令获取单个测试用例的结果。
tuist test case run list <identifier> --json - 对比失败用例、不稳定测试、耗时和整体状态。
- 使用 命令检查失败的测试用例。
tuist test case run show <id> --json - 总结发现的问题,并给出可执行的建议。
Step 1: Resolve Test Runs
步骤1:定位测试运行
If base/head are test run IDs or dashboard URLs
如果基准/待对比对象是测试运行ID或者仪表盘URL
Fetch each directly:
bash
tuist test show <base-id> --json
tuist test show <head-id> --json直接获取两者的数据:
bash
tuist test show <base-id> --json
tuist test show <head-id> --jsonIf base/head are branch names
如果基准/待对比对象是分支名称
List recent test runs on each branch to identify test run IDs:
bash
tuist test list --git-branch <base-branch> --json --page-size 5
tuist test list --git-branch <head-branch> --json --page-size 5Pick the latest test run ID from each branch's results.
列出每个分支上的最近测试运行记录,找到对应的测试运行ID:
bash
tuist test list --git-branch <base-branch> --json --page-size 5
tuist test list --git-branch <head-branch> --json --page-size 5从每个分支的结果中选择最新的测试运行ID。
Defaults
默认规则
- If no base is provided, use the project's default branch (usually ).
main - If no head is provided, detect the current git branch.
- 如果未提供基准分支,使用项目的默认分支(通常为)。
main - 如果未提供待对比分支,自动检测当前git分支。
Step 2: Compare Top-Level Metrics
步骤2:对比核心指标
After fetching both test runs, compare:
| Metric | What to check |
|---|---|
| Flag if base passed but head failed |
| Flag if head is >10% slower |
| Note if test count changed (new or removed tests) |
| Compare failure counts |
| Compare flaky counts |
| Flag significant changes |
获取到两次测试运行的数据后,对比以下内容:
| 指标 | 检查内容 |
|---|---|
| 若基准运行通过但待对比运行失败,需标记 |
| 若待对比运行耗时增加超过10%,需标记 |
| 若测试用例数量变化(新增或删除用例),需记录 |
| 对比失败用例数量 |
| 对比不稳定测试数量 |
| 标记出现显著变化的情况 |
Step 3: Get Module and Suite Breakdowns
步骤3:获取模块和测试套件的细分数据
Fetch module and suite-level results for both test runs to understand which areas regressed:
bash
tuist test module list <base-test-run-id> --json
tuist test module list <head-test-run-id> --json
tuist test suite list <base-test-run-id> --json
tuist test suite list <head-test-run-id> --jsonMatch modules and suites by name across both runs to identify areas with new failures or duration regressions.
获取两次测试运行的模块和测试套件层级的结果,定位出现回归的领域:
bash
tuist test module list <base-test-run-id> --json
tuist test module list <head-test-run-id> --json
tuist test suite list <base-test-run-id> --json
tuist test suite list <head-test-run-id> --json按名称匹配两次运行的模块和测试套件,识别出现新失败或者耗时回归的区域。
Step 4: Get Individual Test Case Results
步骤4:获取单个测试用例的结果
Fetch test case runs for both test runs:
bash
tuist test case run list <identifier> --json --page-size 100Match test cases by their + + across both runs.
namemodule_namesuite_name获取两次测试运行的所有测试用例执行记录:
bash
tuist test case run list <identifier> --json --page-size 100通过++组合匹配两次运行的测试用例。
namemodule_namesuite_nameStep 5: Classify Changes
步骤5:变更分类
Group test cases into categories:
- New failures: Tests that passed in base but failed in head.
- Fixed tests: Tests that failed in base but passed in head.
- Newly flaky: Tests not flaky in base but flaky in head.
- No longer flaky: Tests that were flaky in base but stable in head.
- New tests: Tests present in head but not in base.
- Removed tests: Tests present in base but not in head.
- Duration regressions: Tests with >50% duration increase.
将测试用例归入以下类别:
- 新增失败:基准运行中通过但待对比运行中失败的测试
- 已修复测试:基准运行中失败但待对比运行中通过的测试
- 新增不稳定测试:基准运行中稳定但待对比运行中不稳定的测试
- 已修复不稳定测试:基准运行中不稳定但待对比运行中稳定的测试
- 新增测试:仅出现在待对比运行中的测试
- 已删除测试:仅出现在基准运行中的测试
- 耗时回归:耗时增加超过50%的测试
Step 6: Inspect Failures
步骤6:检查失败用例
For each new failure, get detailed information:
bash
tuist test case run show <test-case-run-id> --jsonKey fields to examine:
- -- the assertion or error message
failures[].message - -- source file path
failures[].path - -- exact line of failure
failures[].line_number - -- type of issue
failures[].issue_type - -- if present, shows retry behavior (flaky detection)
repetitions - -- crash data if test runner crashed
crash_report
针对每个新增失败用例,获取详细信息:
bash
tuist test case run show <test-case-run-id> --json需要检查的关键字段:
- -- 断言或错误信息
failures[].message - -- 源文件路径
failures[].path - -- 失败的精确行号
failures[].line_number - -- 问题类型
failures[].issue_type - -- 如果存在,展示重试行为(用于不稳定测试检测)
repetitions - -- 测试运行器崩溃时的崩溃数据
crash_report
Step 7: Inspect Attachments
步骤7:检查附件
The output includes attachment and crash report information. Review:
tuist test case run show- Screenshots or UI test artifacts
- Log files or crash reports
- Any diagnostic data attached to failing runs
tuist test case run show- 截图或UI测试产物
- 日志文件或崩溃报告
- 失败运行附带的所有诊断数据
Summary Format
总结格式
Produce a summary with:
- Overall verdict: Better, worse, or neutral compared to base.
- New failures: List each with failure message, file path, and line number.
- New flaky tests: List with flakiness context.
- Fixed tests: List tests that are now passing.
- Duration: Overall and notable per-test regressions.
- Recommendations: Actionable next steps for each issue.
Example:
Test Run Comparison: base (run-123 on main) vs head (run-456 on feature-x)
Status: success -> failure -- REGRESSION
Duration: 120.5s -> 145.2s (+21%)
Tests: 342 -> 345 (3 new tests)
Failures: 0 -> 2 (2 new failures)
Flaky: 1 -> 3 (2 newly flaky)
New Failures:
1. AuthModuleTests/LoginTests/test_login_with_expired_token
Message: "Expected status 401, got 500"
File: Tests/AuthModuleTests/LoginTests.swift:42
Likely cause: Server error handling changed for expired tokens
2. NetworkTests/RetryTests/test_retry_on_timeout
Message: "Timed out waiting for retry"
File: Tests/NetworkTests/RetryTests.swift:87
Likely cause: Timeout threshold too low after network layer refactor
Newly Flaky:
1. CacheTests/WriteCacheTests/test_concurrent_writes (flaky in 3/5 runs)
Recommendations:
- Fix expired token handling in AuthModule
- Increase timeout in RetryTests or mock the network layer
- Investigate concurrent write synchronization in CacheTests生成包含以下内容的总结:
- 整体结论:对比基准运行是变好、变差还是无变化
- 新增失败:列出每个失败用例的错误信息、文件路径和行号
- 新增不稳定测试:列出相关的不稳定上下文信息
- 已修复测试:列出现在已通过的测试
- 耗时:整体耗时和需要注意的单测试回归情况
- 建议:针对每个问题的可执行后续步骤
示例:
Test Run Comparison: base (run-123 on main) vs head (run-456 on feature-x)
Status: success -> failure -- REGRESSION
Duration: 120.5s -> 145.2s (+21%)
Tests: 342 -> 345 (3 new tests)
Failures: 0 -> 2 (2 new failures)
Flaky: 1 -> 3 (2 newly flaky)
New Failures:
1. AuthModuleTests/LoginTests/test_login_with_expired_token
Message: "Expected status 401, got 500"
File: Tests/AuthModuleTests/LoginTests.swift:42
Likely cause: Server error handling changed for expired tokens
2. NetworkTests/RetryTests/test_retry_on_timeout
Message: "Timed out waiting for retry"
File: Tests/NetworkTests/RetryTests.swift:87
Likely cause: Timeout threshold too low after network layer refactor
Newly Flaky:
1. CacheTests/WriteCacheTests/test_concurrent_writes (flaky in 3/5 runs)
Recommendations:
- Fix expired token handling in AuthModule
- Increase timeout in RetryTests or mock the network layer
- Investigate concurrent write synchronization in CacheTestsDone Checklist
完成检查清单
- Resolved both base and head test runs
- Compared top-level metrics
- Fetched module and suite breakdowns for both runs
- Identified new failures, fixed tests, and flaky changes
- Inspected failure details for new failures
- Provided actionable recommendations with file paths
- 已定位基准和待对比的两次测试运行
- 已对比核心指标
- 已获取两次运行的模块和测试套件细分数据
- 已识别新增失败、已修复测试和不稳定测试变更
- 已检查新增失败用例的详细信息
- 已提供附带文件路径的可执行建议