compare-test-runs

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Compare Test Runs

测试运行对比

Quick Start

快速开始

You'll typically receive two test run identifiers. Follow these steps:

Run
```
tuist test show <id> --json
```
for both base and head test runs.

Run

tuist test module list <test-run-id> --json

and

tuist test suite list <test-run-id> --json

to get module and suite breakdowns.

Run

tuist test case run list <identifier> --json

to get individual test case results.

Compare failures, flaky tests, durations, and overall status.
Inspect failing test cases with
```
tuist test case run show <id> --json
```
.
Summarize findings with actionable recommendations.

你通常会收到两个测试运行标识符，请遵循以下步骤：

分别针对基准和待对比的测试运行，执行
```
tuist test show <id> --json
```
命令。

执行

tuist test module list <test-run-id> --json

和

tuist test suite list <test-run-id> --json

命令获取模块和测试套件的细分数据。

执行
```
tuist test case run list <identifier> --json
```
命令获取单个测试用例的结果。
对比失败用例、不稳定测试、耗时和整体状态。
使用
```
tuist test case run show <id> --json
```
命令检查失败的测试用例。
总结发现的问题，并给出可执行的建议。

Step 1: Resolve Test Runs

步骤1：定位测试运行

If base/head are test run IDs or dashboard URLs

如果基准/待对比对象是测试运行ID或者仪表盘URL

Fetch each directly:

bash

tuist test show <base-id> --json
tuist test show <head-id> --json

直接获取两者的数据：

bash

tuist test show <base-id> --json
tuist test show <head-id> --json

If base/head are branch names

如果基准/待对比对象是分支名称

List recent test runs on each branch to identify test run IDs:

bash

tuist test list --git-branch <base-branch> --json --page-size 5
tuist test list --git-branch <head-branch> --json --page-size 5

Pick the latest test run ID from each branch's results.

列出每个分支上的最近测试运行记录，找到对应的测试运行ID：

bash

tuist test list --git-branch <base-branch> --json --page-size 5
tuist test list --git-branch <head-branch> --json --page-size 5

从每个分支的结果中选择最新的测试运行ID。

Defaults

默认规则

If no base is provided, use the project's default branch (usually
```
main
```
).
If no head is provided, detect the current git branch.

如果未提供基准分支，使用项目的默认分支（通常为
```
main
```
）。
如果未提供待对比分支，自动检测当前git分支。

Step 2: Compare Top-Level Metrics

步骤2：对比核心指标

After fetching both test runs, compare:

Metric	What to check
`status`	Flag if base passed but head failed
`duration`	Flag if head is >10% slower
`total_test_count`	Note if test count changed (new or removed tests)
`failed_test_count`	Compare failure counts
`flaky_test_count`	Compare flaky counts
`avg_test_duration`	Flag significant changes

获取到两次测试运行的数据后，对比以下内容：

指标	检查内容
`status`	若基准运行通过但待对比运行失败，需标记
`duration`	若待对比运行耗时增加超过10%，需标记
`total_test_count`	若测试用例数量变化（新增或删除用例），需记录
`failed_test_count`	对比失败用例数量
`flaky_test_count`	对比不稳定测试数量
`avg_test_duration`	标记出现显著变化的情况

Step 3: Get Module and Suite Breakdowns

步骤3：获取模块和测试套件的细分数据

Fetch module and suite-level results for both test runs to understand which areas regressed:

bash

tuist test module list <base-test-run-id> --json
tuist test module list <head-test-run-id> --json

tuist test suite list <base-test-run-id> --json
tuist test suite list <head-test-run-id> --json

Match modules and suites by name across both runs to identify areas with new failures or duration regressions.

获取两次测试运行的模块和测试套件层级的结果，定位出现回归的领域：

bash

tuist test module list <base-test-run-id> --json
tuist test module list <head-test-run-id> --json

tuist test suite list <base-test-run-id> --json
tuist test suite list <head-test-run-id> --json

按名称匹配两次运行的模块和测试套件，识别出现新失败或者耗时回归的区域。

Step 4: Get Individual Test Case Results

步骤4：获取单个测试用例的结果

Fetch test case runs for both test runs:

bash

tuist test case run list <identifier> --json --page-size 100

Match test cases by their

name

module_name

suite_name

across both runs.

获取两次测试运行的所有测试用例执行记录：

bash

tuist test case run list <identifier> --json --page-size 100

通过

name

module_name

suite_name

组合匹配两次运行的测试用例。

Step 5: Classify Changes

步骤5：变更分类

Group test cases into categories:

New failures: Tests that passed in base but failed in head.
Fixed tests: Tests that failed in base but passed in head.
Newly flaky: Tests not flaky in base but flaky in head.
No longer flaky: Tests that were flaky in base but stable in head.
New tests: Tests present in head but not in base.
Removed tests: Tests present in base but not in head.
Duration regressions: Tests with >50% duration increase.

将测试用例归入以下类别：

新增失败：基准运行中通过但待对比运行中失败的测试
已修复测试：基准运行中失败但待对比运行中通过的测试
新增不稳定测试：基准运行中稳定但待对比运行中不稳定的测试
已修复不稳定测试：基准运行中不稳定但待对比运行中稳定的测试
新增测试：仅出现在待对比运行中的测试
已删除测试：仅出现在基准运行中的测试
耗时回归：耗时增加超过50%的测试

Step 6: Inspect Failures

步骤6：检查失败用例

For each new failure, get detailed information:

bash

tuist test case run show <test-case-run-id> --json

Key fields to examine:

```
failures[].message
```
-- the assertion or error message
```
failures[].path
```
-- source file path
```
failures[].line_number
```
-- exact line of failure
```
failures[].issue_type
```
-- type of issue
```
repetitions
```
-- if present, shows retry behavior (flaky detection)
```
crash_report
```
-- crash data if test runner crashed

针对每个新增失败用例，获取详细信息：

bash

tuist test case run show <test-case-run-id> --json

需要检查的关键字段：

```
failures[].message
```
-- 断言或错误信息
```
failures[].path
```
-- 源文件路径
```
failures[].line_number
```
-- 失败的精确行号
```
failures[].issue_type
```
-- 问题类型
```
repetitions
```
-- 如果存在，展示重试行为（用于不稳定测试检测）
```
crash_report
```
-- 测试运行器崩溃时的崩溃数据

Step 7: Inspect Attachments

步骤7：检查附件

The

tuist test case run show

output includes attachment and crash report information. Review:

Screenshots or UI test artifacts
Log files or crash reports
Any diagnostic data attached to failing runs

tuist test case run show

的输出包含附件和崩溃报告信息，请检查：

截图或UI测试产物
日志文件或崩溃报告
失败运行附带的所有诊断数据

Summary Format

总结格式

Produce a summary with:

Overall verdict: Better, worse, or neutral compared to base.
New failures: List each with failure message, file path, and line number.
New flaky tests: List with flakiness context.
Fixed tests: List tests that are now passing.
Duration: Overall and notable per-test regressions.
Recommendations: Actionable next steps for each issue.

Example:

Test Run Comparison: base (run-123 on main) vs head (run-456 on feature-x)

Status: success -> failure -- REGRESSION
Duration: 120.5s -> 145.2s (+21%)
Tests: 342 -> 345 (3 new tests)
Failures: 0 -> 2 (2 new failures)
Flaky: 1 -> 3 (2 newly flaky)

New Failures:
1. AuthModuleTests/LoginTests/test_login_with_expired_token
   Message: "Expected status 401, got 500"
   File: Tests/AuthModuleTests/LoginTests.swift:42
   Likely cause: Server error handling changed for expired tokens

2. NetworkTests/RetryTests/test_retry_on_timeout
   Message: "Timed out waiting for retry"
   File: Tests/NetworkTests/RetryTests.swift:87
   Likely cause: Timeout threshold too low after network layer refactor

Newly Flaky:
1. CacheTests/WriteCacheTests/test_concurrent_writes (flaky in 3/5 runs)

Recommendations:
- Fix expired token handling in AuthModule
- Increase timeout in RetryTests or mock the network layer
- Investigate concurrent write synchronization in CacheTests

生成包含以下内容的总结：

整体结论：对比基准运行是变好、变差还是无变化
新增失败：列出每个失败用例的错误信息、文件路径和行号
新增不稳定测试：列出相关的不稳定上下文信息
已修复测试：列出现在已通过的测试
耗时：整体耗时和需要注意的单测试回归情况
建议：针对每个问题的可执行后续步骤

示例：

Test Run Comparison: base (run-123 on main) vs head (run-456 on feature-x)

Status: success -> failure -- REGRESSION
Duration: 120.5s -> 145.2s (+21%)
Tests: 342 -> 345 (3 new tests)
Failures: 0 -> 2 (2 new failures)
Flaky: 1 -> 3 (2 newly flaky)

New Failures:
1. AuthModuleTests/LoginTests/test_login_with_expired_token
   Message: "Expected status 401, got 500"
   File: Tests/AuthModuleTests/LoginTests.swift:42
   Likely cause: Server error handling changed for expired tokens

2. NetworkTests/RetryTests/test_retry_on_timeout
   Message: "Timed out waiting for retry"
   File: Tests/NetworkTests/RetryTests.swift:87
   Likely cause: Timeout threshold too low after network layer refactor

Newly Flaky:
1. CacheTests/WriteCacheTests/test_concurrent_writes (flaky in 3/5 runs)

Recommendations:
- Fix expired token handling in AuthModule
- Increase timeout in RetryTests or mock the network layer
- Investigate concurrent write synchronization in CacheTests

Done Checklist

完成检查清单

Resolved both base and head test runs
Compared top-level metrics
Fetched module and suite breakdowns for both runs
Identified new failures, fixed tests, and flaky changes
Inspected failure details for new failures
Provided actionable recommendations with file paths

已定位基准和待对比的两次测试运行
已对比核心指标
已获取两次运行的模块和测试套件细分数据
已识别新增失败、已修复测试和不稳定测试变更
已检查新增失败用例的详细信息
已提供附带文件路径的可执行建议