Compare Test Runs
Quick Start
You'll typically receive two test run identifiers. Follow these steps:
- Run
tuist test show <id> --json
for both base and head test runs.
- Run
tuist test module list <test-run-id> --json
and tuist test suite list <test-run-id> --json
to get module and suite breakdowns.
- Run
tuist test case run list <identifier> --json
to get individual test case results.
- Compare failures, flaky tests, durations, and overall status.
- Inspect failing test cases with
tuist test case run show <id> --json
.
- Summarize findings with actionable recommendations.
Step 1: Resolve Test Runs
If base/head are test run IDs or dashboard URLs
Fetch each directly:
bash
tuist test show <base-id> --json
tuist test show <head-id> --json
If base/head are branch names
List recent test runs on each branch to identify test run IDs:
bash
tuist test list --git-branch <base-branch> --json --page-size 5
tuist test list --git-branch <head-branch> --json --page-size 5
Pick the latest test run ID from each branch's results.
Defaults
- If no base is provided, use the project's default branch (usually ).
- If no head is provided, detect the current git branch.
Step 2: Compare Top-Level Metrics
After fetching both test runs, compare:
| Metric | What to check |
|---|
| Flag if base passed but head failed |
| Flag if head is >10% slower |
| Note if test count changed (new or removed tests) |
| Compare failure counts |
| Compare flaky counts |
| Flag significant changes |
Step 3: Get Module and Suite Breakdowns
Fetch module and suite-level results for both test runs to understand which areas regressed:
bash
tuist test module list <base-test-run-id> --json
tuist test module list <head-test-run-id> --json
tuist test suite list <base-test-run-id> --json
tuist test suite list <head-test-run-id> --json
Match modules and suites by name across both runs to identify areas with new failures or duration regressions.
Step 4: Get Individual Test Case Results
Fetch test case runs for both test runs:
bash
tuist test case run list <identifier> --json --page-size 100
Match test cases by their
+
+
across both runs.
Step 5: Classify Changes
Group test cases into categories:
- New failures: Tests that passed in base but failed in head.
- Fixed tests: Tests that failed in base but passed in head.
- Newly flaky: Tests not flaky in base but flaky in head.
- No longer flaky: Tests that were flaky in base but stable in head.
- New tests: Tests present in head but not in base.
- Removed tests: Tests present in base but not in head.
- Duration regressions: Tests with >50% duration increase.
Step 6: Inspect Failures
For each new failure, get detailed information:
bash
tuist test case run show <test-case-run-id> --json
Key fields to examine:
- -- the assertion or error message
- -- source file path
- -- exact line of failure
- -- type of issue
- -- if present, shows retry behavior (flaky detection)
- -- crash data if test runner crashed
Step 7: Inspect Attachments
The
output includes attachment and crash report information. Review:
- Screenshots or UI test artifacts
- Log files or crash reports
- Any diagnostic data attached to failing runs
Summary Format
Produce a summary with:
- Overall verdict: Better, worse, or neutral compared to base.
- New failures: List each with failure message, file path, and line number.
- New flaky tests: List with flakiness context.
- Fixed tests: List tests that are now passing.
- Duration: Overall and notable per-test regressions.
- Recommendations: Actionable next steps for each issue.
Example:
Test Run Comparison: base (run-123 on main) vs head (run-456 on feature-x)
Status: success -> failure -- REGRESSION
Duration: 120.5s -> 145.2s (+21%)
Tests: 342 -> 345 (3 new tests)
Failures: 0 -> 2 (2 new failures)
Flaky: 1 -> 3 (2 newly flaky)
New Failures:
1. AuthModuleTests/LoginTests/test_login_with_expired_token
Message: "Expected status 401, got 500"
File: Tests/AuthModuleTests/LoginTests.swift:42
Likely cause: Server error handling changed for expired tokens
2. NetworkTests/RetryTests/test_retry_on_timeout
Message: "Timed out waiting for retry"
File: Tests/NetworkTests/RetryTests.swift:87
Likely cause: Timeout threshold too low after network layer refactor
Newly Flaky:
1. CacheTests/WriteCacheTests/test_concurrent_writes (flaky in 3/5 runs)
Recommendations:
- Fix expired token handling in AuthModule
- Increase timeout in RetryTests or mock the network layer
- Investigate concurrent write synchronization in CacheTests
Done Checklist
- Resolved both base and head test runs
- Compared top-level metrics
- Fetched module and suite breakdowns for both runs
- Identified new failures, fixed tests, and flaky changes
- Inspected failure details for new failures
- Provided actionable recommendations with file paths