compare-test-runs

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Compare Test Runs

测试运行对比

Quick Start

快速开始

You'll typically receive two test run identifiers. Follow these steps:
  1. Run
    tuist test show <id> --json
    for both base and head test runs.
  2. Run
    tuist test module list <test-run-id> --json
    and
    tuist test suite list <test-run-id> --json
    to get module and suite breakdowns.
  3. Run
    tuist test case run list <identifier> --json
    to get individual test case results.
  4. Compare failures, flaky tests, durations, and overall status.
  5. Inspect failing test cases with
    tuist test case run show <id> --json
    .
  6. Summarize findings with actionable recommendations.
你通常会收到两个测试运行标识符,请遵循以下步骤:
  1. 分别针对基准和待对比的测试运行,执行
    tuist test show <id> --json
    命令。
  2. 执行
    tuist test module list <test-run-id> --json
    tuist test suite list <test-run-id> --json
    命令获取模块和测试套件的细分数据。
  3. 执行
    tuist test case run list <identifier> --json
    命令获取单个测试用例的结果。
  4. 对比失败用例、不稳定测试、耗时和整体状态。
  5. 使用
    tuist test case run show <id> --json
    命令检查失败的测试用例。
  6. 总结发现的问题,并给出可执行的建议。

Step 1: Resolve Test Runs

步骤1:定位测试运行

If base/head are test run IDs or dashboard URLs

如果基准/待对比对象是测试运行ID或者仪表盘URL

Fetch each directly:
bash
tuist test show <base-id> --json
tuist test show <head-id> --json
直接获取两者的数据:
bash
tuist test show <base-id> --json
tuist test show <head-id> --json

If base/head are branch names

如果基准/待对比对象是分支名称

List recent test runs on each branch to identify test run IDs:
bash
tuist test list --git-branch <base-branch> --json --page-size 5
tuist test list --git-branch <head-branch> --json --page-size 5
Pick the latest test run ID from each branch's results.
列出每个分支上的最近测试运行记录,找到对应的测试运行ID:
bash
tuist test list --git-branch <base-branch> --json --page-size 5
tuist test list --git-branch <head-branch> --json --page-size 5
从每个分支的结果中选择最新的测试运行ID。

Defaults

默认规则

  • If no base is provided, use the project's default branch (usually
    main
    ).
  • If no head is provided, detect the current git branch.
  • 如果未提供基准分支,使用项目的默认分支(通常为
    main
    )。
  • 如果未提供待对比分支,自动检测当前git分支。

Step 2: Compare Top-Level Metrics

步骤2:对比核心指标

After fetching both test runs, compare:
MetricWhat to check
status
Flag if base passed but head failed
duration
Flag if head is >10% slower
total_test_count
Note if test count changed (new or removed tests)
failed_test_count
Compare failure counts
flaky_test_count
Compare flaky counts
avg_test_duration
Flag significant changes
获取到两次测试运行的数据后,对比以下内容:
指标检查内容
status
若基准运行通过但待对比运行失败,需标记
duration
若待对比运行耗时增加超过10%,需标记
total_test_count
若测试用例数量变化(新增或删除用例),需记录
failed_test_count
对比失败用例数量
flaky_test_count
对比不稳定测试数量
avg_test_duration
标记出现显著变化的情况

Step 3: Get Module and Suite Breakdowns

步骤3:获取模块和测试套件的细分数据

Fetch module and suite-level results for both test runs to understand which areas regressed:
bash
tuist test module list <base-test-run-id> --json
tuist test module list <head-test-run-id> --json

tuist test suite list <base-test-run-id> --json
tuist test suite list <head-test-run-id> --json
Match modules and suites by name across both runs to identify areas with new failures or duration regressions.
获取两次测试运行的模块和测试套件层级的结果,定位出现回归的领域:
bash
tuist test module list <base-test-run-id> --json
tuist test module list <head-test-run-id> --json

tuist test suite list <base-test-run-id> --json
tuist test suite list <head-test-run-id> --json
按名称匹配两次运行的模块和测试套件,识别出现新失败或者耗时回归的区域。

Step 4: Get Individual Test Case Results

步骤4:获取单个测试用例的结果

Fetch test case runs for both test runs:
bash
tuist test case run list <identifier> --json --page-size 100
Match test cases by their
name
+
module_name
+
suite_name
across both runs.
获取两次测试运行的所有测试用例执行记录:
bash
tuist test case run list <identifier> --json --page-size 100
通过
name
+
module_name
+
suite_name
组合匹配两次运行的测试用例。

Step 5: Classify Changes

步骤5:变更分类

Group test cases into categories:
  1. New failures: Tests that passed in base but failed in head.
  2. Fixed tests: Tests that failed in base but passed in head.
  3. Newly flaky: Tests not flaky in base but flaky in head.
  4. No longer flaky: Tests that were flaky in base but stable in head.
  5. New tests: Tests present in head but not in base.
  6. Removed tests: Tests present in base but not in head.
  7. Duration regressions: Tests with >50% duration increase.
将测试用例归入以下类别:
  1. 新增失败:基准运行中通过但待对比运行中失败的测试
  2. 已修复测试:基准运行中失败但待对比运行中通过的测试
  3. 新增不稳定测试:基准运行中稳定但待对比运行中不稳定的测试
  4. 已修复不稳定测试:基准运行中不稳定但待对比运行中稳定的测试
  5. 新增测试:仅出现在待对比运行中的测试
  6. 已删除测试:仅出现在基准运行中的测试
  7. 耗时回归:耗时增加超过50%的测试

Step 6: Inspect Failures

步骤6:检查失败用例

For each new failure, get detailed information:
bash
tuist test case run show <test-case-run-id> --json
Key fields to examine:
  • failures[].message
    -- the assertion or error message
  • failures[].path
    -- source file path
  • failures[].line_number
    -- exact line of failure
  • failures[].issue_type
    -- type of issue
  • repetitions
    -- if present, shows retry behavior (flaky detection)
  • crash_report
    -- crash data if test runner crashed
针对每个新增失败用例,获取详细信息:
bash
tuist test case run show <test-case-run-id> --json
需要检查的关键字段:
  • failures[].message
    -- 断言或错误信息
  • failures[].path
    -- 源文件路径
  • failures[].line_number
    -- 失败的精确行号
  • failures[].issue_type
    -- 问题类型
  • repetitions
    -- 如果存在,展示重试行为(用于不稳定测试检测)
  • crash_report
    -- 测试运行器崩溃时的崩溃数据

Step 7: Inspect Attachments

步骤7:检查附件

The
tuist test case run show
output includes attachment and crash report information. Review:
  • Screenshots or UI test artifacts
  • Log files or crash reports
  • Any diagnostic data attached to failing runs
tuist test case run show
的输出包含附件和崩溃报告信息,请检查:
  • 截图或UI测试产物
  • 日志文件或崩溃报告
  • 失败运行附带的所有诊断数据

Summary Format

总结格式

Produce a summary with:
  1. Overall verdict: Better, worse, or neutral compared to base.
  2. New failures: List each with failure message, file path, and line number.
  3. New flaky tests: List with flakiness context.
  4. Fixed tests: List tests that are now passing.
  5. Duration: Overall and notable per-test regressions.
  6. Recommendations: Actionable next steps for each issue.
Example:
Test Run Comparison: base (run-123 on main) vs head (run-456 on feature-x)

Status: success -> failure -- REGRESSION
Duration: 120.5s -> 145.2s (+21%)
Tests: 342 -> 345 (3 new tests)
Failures: 0 -> 2 (2 new failures)
Flaky: 1 -> 3 (2 newly flaky)

New Failures:
1. AuthModuleTests/LoginTests/test_login_with_expired_token
   Message: "Expected status 401, got 500"
   File: Tests/AuthModuleTests/LoginTests.swift:42
   Likely cause: Server error handling changed for expired tokens

2. NetworkTests/RetryTests/test_retry_on_timeout
   Message: "Timed out waiting for retry"
   File: Tests/NetworkTests/RetryTests.swift:87
   Likely cause: Timeout threshold too low after network layer refactor

Newly Flaky:
1. CacheTests/WriteCacheTests/test_concurrent_writes (flaky in 3/5 runs)

Recommendations:
- Fix expired token handling in AuthModule
- Increase timeout in RetryTests or mock the network layer
- Investigate concurrent write synchronization in CacheTests
生成包含以下内容的总结:
  1. 整体结论:对比基准运行是变好、变差还是无变化
  2. 新增失败:列出每个失败用例的错误信息、文件路径和行号
  3. 新增不稳定测试:列出相关的不稳定上下文信息
  4. 已修复测试:列出现在已通过的测试
  5. 耗时:整体耗时和需要注意的单测试回归情况
  6. 建议:针对每个问题的可执行后续步骤
示例:
Test Run Comparison: base (run-123 on main) vs head (run-456 on feature-x)

Status: success -> failure -- REGRESSION
Duration: 120.5s -> 145.2s (+21%)
Tests: 342 -> 345 (3 new tests)
Failures: 0 -> 2 (2 new failures)
Flaky: 1 -> 3 (2 newly flaky)

New Failures:
1. AuthModuleTests/LoginTests/test_login_with_expired_token
   Message: "Expected status 401, got 500"
   File: Tests/AuthModuleTests/LoginTests.swift:42
   Likely cause: Server error handling changed for expired tokens

2. NetworkTests/RetryTests/test_retry_on_timeout
   Message: "Timed out waiting for retry"
   File: Tests/NetworkTests/RetryTests.swift:87
   Likely cause: Timeout threshold too low after network layer refactor

Newly Flaky:
1. CacheTests/WriteCacheTests/test_concurrent_writes (flaky in 3/5 runs)

Recommendations:
- Fix expired token handling in AuthModule
- Increase timeout in RetryTests or mock the network layer
- Investigate concurrent write synchronization in CacheTests

Done Checklist

完成检查清单

  • Resolved both base and head test runs
  • Compared top-level metrics
  • Fetched module and suite breakdowns for both runs
  • Identified new failures, fixed tests, and flaky changes
  • Inspected failure details for new failures
  • Provided actionable recommendations with file paths
  • 已定位基准和待对比的两次测试运行
  • 已对比核心指标
  • 已获取两次运行的模块和测试套件细分数据
  • 已识别新增失败、已修复测试和不稳定测试变更
  • 已检查新增失败用例的详细信息
  • 已提供附带文件路径的可执行建议