journey-builder

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Journey Builder

Journey Builder

You are building something you'll be proud of. Every journey you produce will be reviewed — by the refiner, by the orchestrator, and by the user. Your screenshots are your portfolio. Your test assertions are your guarantees. Your review files are your engineering notes.
Before you mark anything as "done", ask yourself: "If someone reads this test, runs it, and looks at the screenshots — will they see a feature that genuinely works? Or will they see an element that exists but proves nothing?"
The difference between a good journey and a fake one is simple:
  • A good journey's screenshots tell a story you can follow
  • A good journey's assertions would FAIL if the feature broke
  • A good journey's reviews contain observations only someone who READ the screenshots would make
A fake journey passes tests, produces files, and claims "polished" — but when you look at the screenshots, the search shows "No Results", the transcript says "No transcript yet", and the review files are empty. Build the real thing.

Build one user journey at a time. Each journey is a realistic path through the app tested like a real user — with screenshots at every step. Each journey MUST cover specific spec requirements and verify ALL of their acceptance criteria with real implementations and real outcomes. Map each journey to its spec requirements in journey.md. A journey is complete when every mapped acceptance criterion has: (1) a real implementation (no placeholders/simulations), (2) a test step that exercises it, and (3) a screenshot proving it works.
Depth-chain principle: A journey is a chain of actions where each step produces an outcome that the next step consumes or verifies. Example: setup → create a recording → verify it appears in library → play it back → check transcript syncs → search for it → delete it → verify it's gone. Every step must exercise something NEW. If you've already clicked a button and verified it works, do not click it again.
你正在打造值得你骄傲的产品。你产出的每一段旅程都会经过评审——由优化器、编排器和用户共同审核。你的截图就是你的作品集,你的测试断言就是你的保障,你的评审文件就是你的工程笔记。
在你标记任何内容为「已完成」之前,先问问自己:「如果有人阅读这个测试、运行它、查看截图,他们会看到一个真正可用的功能吗?还是会看到一个存在但证明不了任何价值的元素?」
优秀的旅程和虚假的旅程区别很简单:
  • 优秀旅程的截图能讲出一个连贯的故事
  • 优秀旅程的断言会在功能故障时明确FAIL
  • 优秀旅程的评审包含只有真正看过截图的人才能写出的观察结论
虚假的旅程能通过测试、产出文件、宣称「已打磨完成」——但当你看截图时,搜索结果显示「无结果」,transcript显示「暂无转录内容」,评审文件是空的。一定要做真实可用的内容。

一次构建一个用户旅程。每个旅程都是真实用户使用应用的真实路径,每一步都附带截图。每个旅程必须覆盖特定的规格要求,通过真实实现和真实结果验证所有验收标准。在journey.md中将每个旅程与其对应的规格要求做映射。当每个映射的验收标准都满足以下条件时,旅程才算完成:(1) 有真实实现(无占位符/模拟实现),(2) 有对应的测试步骤验证功能,(3) 有证明功能可用的截图。
深度链原则: 旅程是一系列操作组成的链条,每一步的输出都是下一步的输入或验证对象。示例:初始化设置 → 创建录制 → 验证录制出现在资源库 → 播放录制 → 检查转录内容同步 → 搜索录制 → 删除录制 → 验证已删除。每一步都必须验证新的内容。如果你已经点击过某个按钮并验证过它可用,就不要再重复点击。

Prerequisites

前置条件

  • spec.md
    in project root
  • journeys/
    folder (created if missing)
  • Testable app (XCUITest for macOS, Playwright for web)
  • 项目根目录下存在
    spec.md
  • journeys/
    文件夹(如果不存在会自动创建)
  • 可测试的应用(macOS用XCUITest,网页用Playwright)

Step 0: Load Pitfalls (MANDATORY — do this FIRST)

步骤0:加载踩坑指南(强制要求——第一步就做)

Before ANY other work, fetch and read ALL pitfall files from the shared pitfalls gist:
bash
undefined
在做任何其他工作之前,先从共享的踩坑gist拉取并阅读所有坑点文件:
bash
undefined

List all files in the pitfalls gist

List all files in the pitfalls gist

gh gist view 84a5c108d5742c850704a5088a3f4cbf --files

Then read EVERY file:
```bash
gh gist view 84a5c108d5742c850704a5088a3f4cbf -f <filename>
Read each file completely. These contain hard-won solutions to blockers that WILL recur. Apply every relevant pitfall to your work in this session. Do NOT skip this step.
gh gist view 84a5c108d5742c850704a5088a3f4cbf --files

然后阅读每一个文件:
```bash
gh gist view 84a5c108d5742c850704a5088a3f4cbf -f <filename>
完整阅读每个文件,这些是针对一定会重复出现的阻塞问题的宝贵解决方案。在本次会话的工作中应用所有相关的坑点解决方案,绝对不要跳过这一步。

Adding New Pitfalls

添加新的坑点

When you encounter a blocker and find the solution, create a new pitfall file in the gist:
bash
gh gist edit 84a5c108d5742c850704a5088a3f4cbf -a <category>-<short-name>.md <<'EOF'
当你遇到阻塞问题并找到解决方案时,在gist中创建新的坑点文件:
bash
gh gist edit 84a5c108d5742c850704a5088a3f4cbf -a <category>-<short-name>.md <<'EOF'

<Title>

<Title>

Problem

Problem

<What went wrong — exact error message if possible>
<What went wrong — exact error message if possible>

Root Cause

Root Cause

<Why it happened>
<Why it happened>

Solution

Solution

<Exact fix — code, config, or command>
<Exact fix — code, config, or command>

Prevention

Prevention

<How to avoid this in future journeys> EOF ```
Name files by category:
xcodegen-*.md
,
xcuitest-*.md
,
codesign-*.md
,
swiftui-*.md
, etc.
<How to avoid this in future journeys> EOF ```
文件按类别命名:
xcodegen-*.md
xcuitest-*.md
codesign-*.md
swiftui-*.md
等。

Step 0.5: Copy Template Files (macOS only)

步骤0.5:复制模板文件(仅macOS需要)

Check if the UI test target has the shared helper files. If missing, copy them from the skill templates:
The
{skill-base-dir}
placeholder refers to the plugin root (
plugins/autocraft
).
bash
TEMPLATES="{skill-base-dir}/templates"
UI_TEST_DIR=$(find . -name "*UITests" -type d -maxdepth 1 | head -1)
检查UI测试target是否有共享的工具文件,如果缺失,从技能模板中复制:
{skill-base-dir}
占位符指向插件根目录(
plugins/autocraft
)。
bash
TEMPLATES="{skill-base-dir}/templates"
UI_TEST_DIR=$(find . -name "*UITests" -type d -maxdepth 1 | head -1)

Copy JourneyTestCase base class (snap helper with dedup + window launch fix)

Copy JourneyTestCase base class (snap helper with dedup + window launch fix)

if [ -n "$UI_TEST_DIR" ] && [ ! -f "$UI_TEST_DIR/JourneyTestCase.swift" ]; then cp "$TEMPLATES/JourneyTestCase.swift" "$UI_TEST_DIR/" fi

After copying, run `xcodegen generate` so Xcode picks up the new files.

Also ensure `project.yml` has these settings on the UI test target (prevents sandbox blocking file writes and missing windows):
```yaml
  MyAppUITests:
    type: bundle.ui-testing
    settings:
      base:
        BUNDLE_LOADER: ""
        TEST_HOST: ""
        ENABLE_APP_SANDBOX: "NO"
    entitlements:
      path: MyAppUITests/MyAppUITests.entitlements
      properties:
        com.apple.security.app-sandbox: false
if [ -n "$UI_TEST_DIR" ] && [ ! -f "$UI_TEST_DIR/JourneyTestCase.swift" ]; then cp "$TEMPLATES/JourneyTestCase.swift" "$UI_TEST_DIR/" fi

复制完成后,运行`xcodegen generate`让Xcode识别新文件。

同时确保`project.yml`中UI测试target有以下配置(避免沙箱阻止文件写入和窗口缺失问题):
```yaml
  MyAppUITests:
    type: bundle.ui-testing
    settings:
      base:
        BUNDLE_LOADER: ""
        TEST_HOST: ""
        ENABLE_APP_SANDBOX: "NO"
    entitlements:
      path: MyAppUITests/MyAppUITests.entitlements
      properties:
        com.apple.security.app-sandbox: false

Step 1: Check Journey State

步骤1:检查旅程状态

Read
journey-state.md
in the project root (create if missing). This file tracks which journeys are complete:
markdown
undefined
读取项目根目录下的
journey-state.md
(如果不存在就创建),这个文件追踪哪些旅程已经完成:
markdown
undefined

Journey State

Journey State

JourneyStatusTest DurationLast Updated
001-first-launch-setuppolished12m30s2026-03-28
002-settings-modelin-progress3m15s2026-03-28

**Decision logic:**
1. Find the first journey where status is `in-progress` or `needs-extension` — work on that one
2. Only if ALL existing journeys are `polished` with all acceptance criteria covered, pick the next uncovered path from the spec
3. A journey is `polished` ONLY when: all tests pass, all 3 polish rounds done, AND every acceptance criterion from every requirement listed in the journey's `## Spec Coverage` section is covered — meaning a real implementation exists (no placeholders, no simulations), a test step exercises it, and a screenshot captures the outcome. The criterion count in the journey's Spec Coverage must match the count in `spec.md`. A journey is NOT polished if any criterion from any mapped requirement lacks a screenshot.
JourneyStatusTest DurationLast Updated
001-first-launch-setuppolished12m30s2026-03-28
002-settings-modelin-progress3m15s2026-03-28

**决策逻辑:**
1. 找到第一个状态为`in-progress`或`needs-extension`的旅程——优先处理它
2. 只有当所有现有旅程都处于`polished`状态,且所有验收标准都已覆盖时,再从规格中选择下一个未覆盖的路径
3. 旅程只有满足以下所有条件时才是`polished`状态:所有测试通过,3轮打磨全部完成,且旅程`## 规格覆盖`部分列出的所有需求对应的每一条验收标准都已覆盖——即存在真实实现(无占位符、无模拟实现)、有测试步骤验证功能、有截图证明结果。旅程规格覆盖中的验收标准数量必须和`spec.md`中的数量完全匹配。如果任何映射需求的验收标准缺少截图,旅程就不算打磨完成。

Step 2: Read spec + existing journeys

步骤2:读取规格+现有旅程

Read
spec.md
. For every requirement, list ALL its acceptance criteria — do not skim. Read every
journeys/*/journey.md
. For each journey, note which acceptance criteria it has mapped AND whether each criterion has screenshot evidence (a screenshot whose step matches the criterion). You now have two sets:
  • Fully implemented criteria: appearing in a journey's
    ## Spec Coverage
    section AND having a corresponding screenshot
  • Uncovered criteria: not in any journey's Spec Coverage, OR in a journey but lacking screenshot evidence
This two-set distinction is your working ground truth for the rest of this run.
阅读
spec.md
,列出每个需求的所有验收标准——不要略读。阅读所有
journeys/*/journey.md
文件,为每个旅程记录它映射了哪些验收标准,以及每个标准是否有截图证据(步骤和标准对应的截图)。现在你有两个集合:
  • 已完全实现的标准:出现在旅程的
    ## 规格覆盖
    部分,且有对应的截图
  • 未覆盖的标准:没有出现在任何旅程的规格覆盖中,或者出现在旅程中但缺少截图证据
这两个集合的区分是你本次运行的工作基础。

Step 3: Pick or extend a journey

步骤3:选择或扩展旅程

If extending an existing journey (status is
in-progress
or
needs-extension
):
  • Read the existing
    journey.md
    and test file
  • Run the test and measure duration
  • Check which acceptance criteria from the mapped spec requirements are NOT yet covered. For each uncovered criterion: implement the real feature if missing, add a test step that exercises it, and take a screenshot proving it works. A journey is not done until ALL mapped acceptance criteria are covered.
  • Update
    journey.md
    with the new steps
If creating a new journey:
  • Find the longest uncovered user path
  • Create numbered folder:
    journeys/{NNN}-{name}/
  • Write
    journey.md
    as a depth-chain: each step produces output the next step uses
  • Spec mapping (MANDATORY — no cherry-picking): At the top of
    journey.md
    , list which spec requirements this journey covers. For each mapped requirement, you MUST list ALL of its acceptance criteria — not a subset. Count the criteria in
    spec.md
    for that requirement and list every one by number.
    CORRECT (all criteria listed for each requirement):
    markdown
    ## Spec Coverage
    - P0-0: First Launch Setup — criteria 1, 2, 3, 4, 5, 6 (all 6)
    - P0-2: Window Picker — criteria 1, 2, 3, 4, 5 (all 5)
    - P0-3: Screen + Audio Recording — criteria 1, 2, 3, 4, 5, 6, 7 (all 7)
    WRONG — DO NOT DO THIS (omits criteria):
    markdown
    - P0-2: Window Picker (criteria 1, 2, 5)   ← FORBIDDEN: criteria 3 and 4 silently dropped
    If a criterion requires data only available from a prior journey, defer it explicitly:
    markdown
    - P0-2: Window Picker — criteria 1, 2, 3 (this journey); criteria 4, 5 → journey 005 (requires recording created in journey 003)
    Each deferred criterion MUST appear in exactly one future journey's Spec Coverage. Every criterion from every mapped requirement must be owned by exactly one journey.
    Every criterion listed MUST be implemented and tested by the end of the journey.
  • Include: complete workflow (create → use → modify → verify → clean up), edge cases, error recovery, data persistence checks
Anti-repetition rule (HARD): Before finalizing, scan the test for repeated interactions. If the same element is clicked more than twice, or the same navigation path is traversed more than once, it is padding — remove it. Coverage must be achieved through feature depth (more acceptance criteria verified), NEVER through repeating interactions already performed. Clicking through 5 model cards once is testing; clicking through them 3 times is waste. Downloading multiple models exercises the same code path — one download verifies the download flow.
如果是扩展现有旅程(状态为
in-progress
needs-extension
):
  • 阅读现有的
    journey.md
    和测试文件
  • 运行测试并统计时长
  • 检查映射的规格需求中哪些验收标准还没有覆盖。对于每个未覆盖的标准:如果缺失真实功能就实现它,添加对应的测试步骤验证功能,拍摄证明功能可用的截图。直到所有映射的验收标准都被覆盖,旅程才算完成。
  • 用新步骤更新
    journey.md
如果是创建新旅程:
  • 找到最长的未覆盖用户路径
  • 创建编号文件夹:
    journeys/{NNN}-{name}/
  • 按照深度链结构编写
    journey.md
    :每一步的输出是下一步的输入
  • 规格映射(强制要求——不能挑拣):
    journey.md
    顶部列出这个旅程覆盖的规格需求。对于每个映射的需求,你必须列出它的所有验收标准——不能只列子集。统计
    spec.md
    中该需求的标准数量,按编号列出每一条。
    正确示例(列出每个需求的所有标准):
    markdown
    ## Spec Coverage
    - P0-0: First Launch Setup — criteria 1, 2, 3, 4, 5, 6 (all 6)
    - P0-2: Window Picker — criteria 1, 2, 3, 4, 5 (all 5)
    - P0-3: Screen + Audio Recording — criteria 1, 2, 3, 4, 5, 6, 7 (all 7)
    错误示例——绝对不要这么做(遗漏标准):
    markdown
    - P0-2: Window Picker (criteria 1, 2, 5)   ← 禁止:悄悄遗漏了标准3和4
    如果某个标准需要只有之前的旅程才能提供的数据,明确标注延后处理:
    markdown
    - P0-2: Window Picker — criteria 1, 2, 3 (本次旅程覆盖); criteria 4, 5 → journey 005(需要旅程003中创建的录制内容)
    每个延后的标准必须出现在恰好一个未来旅程的规格覆盖中。每个映射需求的每一条标准都必须被恰好一个旅程负责。
    列出的所有标准必须在旅程结束前完成实现和测试。
  • 包含:完整工作流(创建→使用→修改→验证→清理)、边界场景、错误恢复、数据持久化检查
反重复规则(严格执行): 最终确定前,扫描测试中的重复交互。如果同一个元素被点击超过两次,或者同一个导航路径被遍历超过一次,就是冗余内容——删除它。覆盖率必须通过功能深度(验证更多验收标准)实现,绝对不能通过重复已经执行过的交互实现。点击5个模型卡片一次是测试,点击3次是浪费。下载多个模型走的是相同的代码路径——一次下载就足够验证下载流程。

Step 4: Write the test

步骤4:编写测试

One test file. Act like a real user. Screenshot after every meaningful step via XCTAttachment (macOS) or Playwright locator screenshot (web). Name:
{journey}-{NNN}-{step}.png
. The extract script adds elapsed-time prefixes (
T00m05s_
) automatically — you do NOT add timestamps in code.
一个测试文件。像真实用户一样操作。每个有意义的步骤后通过XCTAttachment(macOS)或Playwright locator截图(网页)截图。命名规则:
{journey}-{NNN}-{step}.png
。提取脚本会自动添加耗时前缀(
T00m05s_
)——你不需要在代码中加时间戳。

Write Honest Tests

编写真实有效的测试

An honest test fails when the feature breaks. A dishonest test passes no matter what.
Ask yourself: "If I deleted the feature implementation, would this test still pass?" If yes — the test is dishonest. It's testing that UI elements exist, not that features work.
Common dishonest patterns:
  • XCTAssertTrue(hasResults || hasNoResults)
    — passes whether search works or not
  • if transcriptArea.exists { snap() } else { snap() }
    — takes a screenshot either way
  • XCTAssertTrue(element.exists)
    on a container that's always rendered, ignoring its content
Honest alternative: assert on the CONTENT, not just the container.
  • Search: assert result count > 0
  • Transcript: assert the text label has real content
  • Playback: assert "Video file not found" does NOT exist
真实有效的测试会在功能故障时失败。无效测试无论什么情况都会通过。
问问你自己:「如果我删掉功能实现,这个测试还会通过吗?」如果答案是肯定的——这个测试就是无效的。它只测试了UI元素存在,没有测试功能可用。
常见的无效测试模式:
  • XCTAssertTrue(hasResults || hasNoResults)
    —— 不管搜索功能是否可用都会通过
  • if transcriptArea.exists { snap() } else { snap() }
    —— 无论什么情况都会截图
  • 对始终会渲染的容器使用
    XCTAssertTrue(element.exists)
    ,忽略它的内容
有效替代方案:断言内容,而不只是容器存在。
  • 搜索:断言结果数量 > 0
  • 转录内容:断言文本标签有真实内容
  • 播放:断言不存在「未找到视频文件」提示

Snap helper with built-in timing measurement (MANDATORY)

带内置计时功能的截图工具(强制要求)

Every journey test MUST use a
snap()
helper that measures the gap since the last screenshot and writes it to a timing file in real-time. This is the enforcer — no gap > 5s goes unnoticed.
macOS — use JourneyTestCase base class (preferred):
If
JourneyTestCase.swift
was copied in Step 0.5, subclass it:
swift
final class MyJourneyTests: JourneyTestCase {
    override var journeyName: String { "001-first-launch-setup" }

    override func setUpWithError() throws {
        app.launchArguments = ["-hasCompletedSetup", "NO"]
        try super.setUpWithError()  // clears timing, creates dirs, launches app, ensures window
    }

    func test_MyJourney() throws {
        let icon = app.images["myIcon"]
        XCTAssertTrue(icon.waitForExistence(timeout: 10))
        snap("001-initial", slowOK: "app launch")
        // ...
    }
}
JourneyTestCase
provides:
  • snap(_:slowOK:)
    — screenshot + timing + disk write + dedup (skips if identical to previous)
  • setUpWithError()
    — clears timing, creates dirs, launches app, opens window if needed
  • tearDownWithError()
    — terminates app
每个旅程测试必须使用
snap()
工具,它会统计距离上一次截图的间隔,并实时写入计时文件。这是强制规则——任何超过5秒的间隔都会被检测到。
macOS——优先使用JourneyTestCase基类:
如果步骤0.5中已经复制了
JourneyTestCase.swift
,继承它:
swift
final class MyJourneyTests: JourneyTestCase {
    override var journeyName: String { "001-first-launch-setup" }

    override func setUpWithError() throws {
        app.launchArguments = ["-hasCompletedSetup", "NO"]
        try super.setUpWithError()  // clears timing, creates dirs, launches app, ensures window
    }

    func test_MyJourney() throws {
        let icon = app.images["myIcon"]
        XCTAssertTrue(icon.waitForExistence(timeout: 10))
        snap("001-initial", slowOK: "app launch")
        // ...
    }
}
JourneyTestCase
提供:
  • snap(_:slowOK:)
    —— 截图+计时+写入磁盘+去重(如果和上一张截图完全相同就跳过)
  • setUpWithError()
    —— 清空计时、创建目录、启动应用、按需打开窗口
  • tearDownWithError()
    —— 终止应用

Use .exists instead of waitForExistence (CRITICAL for speed)

使用.exists代替waitForExistence(对速度至关重要)

waitForExistence(timeout: N)
polls the accessibility tree every ~1 second. When an element doesn't exist, the full timeout is burned. This is the #1 cause of slow journey tests.
Rule: one
waitForExistence
per view transition,
.exists
for everything else.
swift
// SLOW — 238s test (original)
XCTAssertTrue(title.waitForExistence(timeout: 5))     // 5s timeout, polls
XCTAssertTrue(button.waitForExistence(timeout: 5))     // 5s timeout, polls
if optional.waitForExistence(timeout: 3) { ... }       // 3s burned if missing

// FAST — 61s test (3.9x faster)
XCTAssertTrue(title.waitForExistence(timeout: 10))     // wait ONCE for view to load
XCTAssertTrue(button.exists)                           // instant (~50ms)
if optional.exists { ... }                             // instant, no timeout burn
Pattern per phase:
  1. After a view transition (navigation, button click that changes screens), use
    waitForExistence()
    on ONE element to confirm the new view loaded
  2. For all other elements in that same view, use
    .exists
    (synchronous, ~50ms, no polling)
  3. Use live element references for clicks — they need current coordinates
  4. Repeat after the next navigation
Example:
swift
// Phase 1 — Consent Screen
let consentIcon = app.images["consentIcon"]
XCTAssertTrue(consentIcon.waitForExistence(timeout: 10))  // wait for view
snap("001-consent-initial")

// Everything else is already rendered — .exists is instant
XCTAssertTrue(app.staticTexts["Recording Consent"].exists)
snap("002-consent-title")
XCTAssertTrue(app.buttons["acceptConsentButton"].exists)
snap("003-accept-button")

// Click transitions to next view
app.buttons["acceptConsentButton"].click()
snap("004-accepted")

// Phase 2 — new view, wait once again
let downloadButton = app.buttons["downloadButton"]
XCTAssertTrue(downloadButton.waitForExistence(timeout: 8))  // wait for new view
snap("005-model-selection")
if app.staticTexts["Choose Whisper Model"].exists { snap("006-title") }  // instant
Web (Playwright) — equivalent pattern:
typescript
let lastSnapTime = 0;
let snapIndex = 0;

async function snap(page: Page, name: string, journeyDir: string) {
  snapIndex++;
  const now = Date.now();
  const gap = lastSnapTime ? (now - lastSnapTime) / 1000 : 0;
  lastSnapTime = now;
  const status = gap > 5 ? 'SLOW' : 'ok';

  // 1. Write screenshot to disk
  const dir = `${journeyDir}/screenshots`;
  fs.mkdirSync(dir, { recursive: true });
  await page.locator('#app').screenshot({ path: `${dir}/${name}.png` });

  // 2. Append timing to JSONL
  const line = JSON.stringify({ index: snapIndex, name, gap_seconds: +gap.toFixed(1), status });
  fs.appendFileSync(`${journeyDir}/screenshot-timing.jsonl`, line + '\n');
}
waitForExistence(timeout: N)
大约每秒轮询一次无障碍树。如果元素不存在,就会消耗完整的超时时长。这是旅程测试慢的首要原因。
规则:每个视图过渡只使用一次
waitForExistence
,其他所有情况用
.exists
swift
// 慢——测试耗时238s(原始版本)
XCTAssertTrue(title.waitForExistence(timeout: 5))     // 5s超时,轮询
XCTAssertTrue(button.waitForExistence(timeout: 5))     // 5s超时,轮询
if optional.waitForExistence(timeout: 3) { ... }       // 如果元素不存在就消耗3s

// 快——测试耗时61s(速度提升3.9倍)
XCTAssertTrue(title.waitForExistence(timeout: 10))     // 只等待一次视图加载
XCTAssertTrue(button.exists)                           // 即时返回(~50ms)
if optional.exists { ... }                             // 即时返回,无超时消耗
每个阶段的模式:
  1. 视图过渡后(导航、点击按钮切换页面),对一个元素使用
    waitForExistence()
    确认新视图加载完成
  2. 同一个视图中的所有其他元素都用
    .exists
    (同步,~50ms,无轮询)
  3. 点击操作使用实时元素引用——需要当前坐标
  4. 下次导航后重复以上步骤
示例:
swift
// 阶段1——同意协议页面
let consentIcon = app.images["consentIcon"]
XCTAssertTrue(consentIcon.waitForExistence(timeout: 10))  // 等待视图加载
snap("001-consent-initial")

// 其他所有元素都已经渲染——.exists即时返回
XCTAssertTrue(app.staticTexts["Recording Consent"].exists)
snap("002-consent-title")
XCTAssertTrue(app.buttons["acceptConsentButton"].exists)
snap("003-accept-button")

// 点击后过渡到下一个视图
app.buttons["acceptConsentButton"].click()
snap("004-accepted")

// 阶段2——新视图,再等待一次
let downloadButton = app.buttons["downloadButton"]
XCTAssertTrue(downloadButton.waitForExistence(timeout: 8))  // 等待新视图加载
snap("005-model-selection")
if app.staticTexts["Choose Whisper Model"].exists { snap("006-title") }  // 即时返回
网页(Playwright)——等效模式:
typescript
let lastSnapTime = 0;
let snapIndex = 0;

async function snap(page: Page, name: string, journeyDir: string) {
  snapIndex++;
  const now = Date.now();
  const gap = lastSnapTime ? (now - lastSnapTime) / 1000 : 0;
  lastSnapTime = now;
  const status = gap > 5 ? 'SLOW' : 'ok';

  // 1. Write screenshot to disk
  const dir = `${journeyDir}/screenshots`;
  fs.mkdirSync(dir, { recursive: true });
  await page.locator('#app').screenshot({ path: `${dir}/${name}.png` });

  // 2. Append timing to JSONL
  const line = JSON.stringify({ index: snapIndex, name, gap_seconds: +gap.toFixed(1), status });
  fs.appendFileSync(`${journeyDir}/screenshot-timing.jsonl`, line + '\n');
}

5-second gap rule

5秒间隔规则

Every gap between consecutive screenshots MUST be <= 5 seconds. The
snap()
helper writes each gap to
screenshot-timing.jsonl
in real-time. The journey-loop's timing watcher monitors this file and will kill the test if a SLOW entry is detected.
If the watcher kills your test, you will be restarted after the orchestrator investigates and fixes the slow gap. To avoid being killed:
  • Keep all
    waitForExistence
    timeouts <= 5s unless the operation genuinely requires longer
  • For unavoidable long waits (async downloads, app launch), pass
    slowOK:
    to the snap call:
    snap("042-download-done", slowOK: "model download requires async completion")
    . The watcher ignores
    SLOW-OK
    entries.
  • Add intermediate screenshots inside long wait loops so no single gap exceeds 3s:
    swift
    // Break a 30s download wait into 3s chunks with progress screenshots
    for i in 0..<10 {
        if doneButton.waitForExistence(timeout: 3) { break }
        snap("042-download-progress-\(i)")
    }
    snap("043-download-done")
连续截图之间的间隔必须<=5秒。
snap()
工具会实时把每个间隔写入
screenshot-timing.jsonl
。旅程循环的计时监控器会监控这个文件,如果检测到SLOW条目会终止测试
如果监控器终止了你的测试,编排器会调查并修复慢间隔问题后重启测试。要避免被终止:
  • 保持所有
    waitForExistence
    超时<=5s除非操作确实需要更长时间
  • 对于不可避免的长时间等待(异步下载、应用启动),给snap调用传
    slowOK:
    参数:
    snap("042-download-done", slowOK: "model download requires async completion")
    。监控器会忽略
    SLOW-OK
    条目。
  • 在长等待循环中添加中间截图,保证单个间隔不超过3s:
    swift
    // 把30秒的下载等待拆成3秒的块,附带进度截图
    for i in 0..<10 {
        if doneButton.waitForExistence(timeout: 3) { break }
        snap("042-download-progress-\(i)")
    }
    snap("043-download-done")

Step 5: Run the test and enforce timing

步骤5:运行测试并强制遵守计时规则

Run only this test. Fix failures. Measure wall-clock time. Extract screenshots:
bash
rm -rf /tmp/test-results.xcresult
time xcodebuild test \
  -project {Project}.xcodeproj \
  -scheme {UITestScheme} \
  -destination 'platform=macOS' \
  -derivedDataPath build \
  -only-testing:{UITestTarget}/{TestClassName} \
  -resultBundlePath /tmp/test-results.xcresult \
  -quiet 2>&1
Acceptance criteria check: After the test run, check which mapped acceptance criteria are NOT yet covered. If any are missing, go back to Step 3 and implement them. Do NOT proceed to polish until all mapped criteria have real implementations.
Timing: The journey-loop's watcher enforces the 5-second gap rule by monitoring
screenshot-timing.jsonl
in real-time. If your test is killed by the watcher, you'll be restarted after the gap is fixed. See the 5-second gap rule in Step 4.
只运行本次的测试。修复失败用例。统计实际耗时。提取截图:
bash
rm -rf /tmp/test-results.xcresult
time xcodebuild test \
  -project {Project}.xcodeproj \
  -scheme {UITestScheme} \
  -destination 'platform=macOS' \
  -derivedDataPath build \
  -only-testing:{UITestTarget}/{TestClassName} \
  -resultBundlePath /tmp/test-results.xcresult \
  -quiet 2>&1
验收标准检查: 测试运行完成后,检查哪些映射的验收标准还没有覆盖。如果有缺失,回到步骤3实现它们。在所有映射标准都有真实实现之前,不要进入打磨阶段。
计时: 旅程循环的监控器会实时监控
screenshot-timing.jsonl
强制遵守5秒间隔规则。如果你的测试被监控器终止,间隔修复后会重启。参考步骤4的5秒间隔规则。

Step 5.5: See What You Built

步骤5.5:查看你构建的内容

You just ran a test and produced screenshots. Now look at them — every single one.
Read each screenshot file in
journeys/{NNN}-{name}/screenshots/
with the Read tool. Don't skim. Don't assume. LOOK.
As you look at each screenshot, ask:
  • Does this show a feature WORKING, or just a UI element EXISTING?
  • If I showed this to a user, would they say "that works" or "that's blank"?
  • Does the search screenshot show actual results, or "No Results"?
  • Does the transcript screenshot show real text, or "No transcript yet"?
  • Does the playback screenshot show real video, or a placeholder?
If any screenshot shows an empty/error/placeholder state where a working feature should be — that's YOUR bug. Fix it before moving on. Don't write an if-guard to skip it. Don't accept both success and failure as "passing."
The screenshots are the truth. The test result is just a boolean.
For every journey phase that has NO screenshot: Read the test code for that section. Find which condition caused it to skip — an
if element.exists
guard that was false, a timeout, a vacuous assertion. Fix the root cause in the app code or test code so the phase actually executes. Then re-run and look again.
Do NOT proceed to Step 6 until every phase in the journey has screenshot evidence showing it works.
你刚运行完测试并生成了截图。现在查看每一张截图。
用Read工具读取
journeys/{NNN}-{name}/screenshots/
下的每个截图文件。不要略读,不要假设,认真看。
查看每张截图时问问自己:
  • 这张图显示的是功能WORKING,还是仅仅UI元素EXISTING
  • 如果我把这个给用户看,他们会说「that works」还是「that's blank」?
  • 搜索截图显示的是真实结果,还是「No Results」?
  • 转录内容截图显示的是真实文本,还是「No transcript yet」?
  • 播放截图显示的是真实视频,还是占位符?
如果任何截图在应该显示可用功能的地方显示为空/错误/占位符状态——这是你的bug。继续处理前先修复它。不要写if判断跳过它,不要把成功和失败都当成「passing」。
截图是真相,测试结果只是一个布尔值。
对于没有截图的每个旅程阶段: 阅读对应部分的测试代码。找到导致跳过的原因——
if element.exists
判断为false、超时、空断言。修复应用代码或测试代码的根本原因,让阶段实际执行。然后重新运行测试再查看。
在旅程的每个阶段都有证明功能可用的截图证据之前,不要进入步骤6。

Step 6: Make It Better

步骤6:优化改进

You have a working journey. Now make it genuinely better — not to satisfy a checklist, but because you can see what needs improving.
Round 1 — Look at your screenshots. Read every screenshot again. Write down what you see — the good and the bad. What works? What looks broken, ugly, or empty? What would a real user think? Put this in
journeys/{NNN}-{name}/review_round1_{YYYY-MM-DD}_{HHMMSS}.md
. If you have nothing to write, you aren't looking hard enough.
Also review the app code for testability: are ViewModels using protocol dependencies? Are side effects behind protocols? Fix what you find.
Round 2 — Read your test code. Would this test catch a real regression? If the feature broke tomorrow, would this test fail? Or would it silently pass because the assertion is too weak? Fix the weak spots. Remove dead snap() calls. Add assertions that verify content, not just existence. Write what you changed in
journeys/{NNN}-{name}/review_round2_{YYYY-MM-DD}_{HHMMSS}.md
.
Round 3 — Read the app code you touched. Is it real, or is it faking something? Would a user get actual value from this code? If you find simulations, placeholders, or dead paths — replace them. Also do a final visual review of all screenshots for design quality (typography, spacing, platform conventions). Write what you found in
journeys/{NNN}-{name}/review_round3_{YYYY-MM-DD}_{HHMMSS}.md
.
Each review file should read like engineering notes from someone who genuinely examined the work — not like a compliance report. If you write "No issues found" without evidence, you're lying to yourself.
Re-run the test after each round. Extract fresh screenshots.
你已经有了一个可用的旅程。现在做真正的优化——不是为了满足检查清单,而是因为你知道哪里需要改进。
Round 1 — 查看截图。 再读一遍每张截图。写下你看到的内容——好的和坏的。什么 works?什么看起来 broken、ugly、或者是空的?真实用户会怎么想?把内容写在
journeys/{NNN}-{name}/review_round1_{YYYY-MM-DD}_{HHMMSS}.md
里。如果你没什么可写的,说明你看的不够仔细。
同时评审应用代码的可测试性:ViewModels是否使用protocol依赖?副作用是否封装在protocol后面?修复你发现的问题。
Round 2 — 阅读测试代码。 这个测试能捕获真实的回归问题吗?如果明天功能坏了,这个测试会失败吗?还是因为断言太弱会悄悄通过?修复薄弱点。删除无用的snap()调用。添加验证内容的断言,而不只是验证存在。把你修改的内容写在
journeys/{NNN}-{name}/review_round2_{YYYY-MM-DD}_{HHMMSS}.md
里。
Round 3 — 阅读你修改过的应用代码。 是真实实现,还是在造假?用户能从这段代码中获得实际价值吗?如果你发现模拟实现、占位符、或者死路径——替换掉它们。同时对所有截图做最终的设计质量评审(排版、间距、平台规范)。把你发现的问题写在
journeys/{NNN}-{name}/review_round3_{YYYY-MM-DD}_{HHMMSS}.md
里。
每个评审文件读起来都应该是真正检查过工作的工程笔记——而不是合规报告。如果你没有证据就写「No issues found」,你是在自欺欺人。
每轮结束后重新运行测试,提取新的截图。

Step 7: Final verification + acceptance criteria audit

步骤7:最终验证+验收标准审计

Run unit tests + this journey's UI test one last time. Both must pass.
Run the acceptance criteria audit: for each criterion mapped to this journey in journey.md, verify (1) the production code implements it for real (grep for placeholder/simulated/fake), (2) the test exercises it, (3) a screenshot captures the result. List any gaps and fix them before proceeding.
最后运行一次单元测试+本次旅程的UI测试,两者都必须通过。
运行验收标准审计:对于journey.md中映射到本次旅程的每个标准,验证(1) 生产代码真实实现了它(grep确认没有placeholder/simulated/fake),(2) 测试验证了它,(3) 有截图捕获了结果。列出所有缺口,继续处理前修复它们。

Step 8: Update journey state

步骤8:更新旅程状态

Update
journey-state.md
:
  • Set status to
    polished
    ONLY if: (1) all tests pass, (2) all 3 polish rounds are done, AND (3) for every requirement in the journey's
    ## Spec Coverage
    section, EVERY one of that requirement's acceptance criteria (as they appear in
    spec.md
    ) has: a real implementation (grep confirms no placeholder/simulated/fake), a test step that exercises it, and a screenshot that captures the outcome. Count the criteria in
    spec.md
    for each mapped requirement — the count must match what is listed in the journey.
  • Set status to
    needs-extension
    if ANY criterion from ANY mapped requirement is missing an implementation, test step, or screenshot — including criteria listed in
    spec.md
    but absent from the journey's
    ## Spec Coverage
    section.
  • Record the ACTUAL measured wall-clock time from the
    xcodebuild test
    run (for reference, not as a gate)
  • Record the current date
更新
journey-state.md
  • 只有满足以下所有条件时才把状态设为
    polished
    :(1) 所有测试通过,(2) 3轮打磨全部完成,(3) 旅程
    ## Spec Coverage
    部分的每个需求对应的每一条验收标准(和
    spec.md
    中的一致)都有:真实实现(grep确认没有placeholder/simulated/fake)、测试步骤验证功能、截图捕获结果。统计
    spec.md
    中每个映射需求的标准数量——必须和旅程中列出的数量完全匹配。
  • 如果任何映射需求的任何标准缺失实现、测试步骤或截图——包括
    spec.md
    中存在但旅程
    ## Spec Coverage
    部分遗漏的标准,把状态设为
    needs-extension
  • 记录
    xcodebuild test
    运行的实际测量耗时
    (作为参考,不是准入门槛)
  • 记录当前日期

Step 9: Commit

步骤9:提交

New commit (never amend). Include:
journey.md
, all review files, all screenshots, updated
journey-state.md
. Message summarizes journey, fixes, features covered.
新建提交(不要amend)。包含:
journey.md
、所有评审文件、所有截图、更新后的
journey-state.md
。提交信息总结旅程、修复内容、覆盖的功能。

Step 10: Report

步骤10:报告

Tell the user: which journey, how many steps, test duration, features covered, issues fixed across rounds, unit tests added, and what journey to work on next.
If any blockers were solved during this run, confirm that new pitfall files were added to the gist.
告知用户:本次处理的旅程、步骤数量、测试耗时、覆盖的功能、各轮修复的问题、新增的单元测试、下一个要处理的旅程。
如果本次运行中解决了任何阻塞问题,确认已经向gist添加了新的坑点文件。

Rules

Rules

  • Load pitfalls first — Step 0 is not optional. Every session starts by reading the gist.
  • Add pitfalls for every blocker — When you find a solution to a non-obvious problem, add it to the gist immediately via
    gh gist edit
    .
  • No sleep waits — NEVER use
    sleep()
    ,
    Thread.sleep()
    , or fixed-time waits. Tests must complete as fast as possible.
  • Use .exists, not waitForExistence — Use
    waitForExistence()
    ONLY once per view transition. For all other element checks in the same loaded view, use
    .exists
    (synchronous, ~50ms). Never use
    waitForExistence
    on elements that are already rendered. This is the difference between a 238s test and a 61s test.
  • 3-second gap enforcement — Every gap between consecutive screenshots must be <= 5s. The
    snap()
    helper writes
    screenshot-timing.jsonl
    in real-time. The journey-loop watcher monitors this file and kills the test on violations. Mark unavoidable long gaps with
    // SLOW-OK: reason
    before the snap call.
  • Acceptance criteria coverage — A journey is not done until every acceptance criterion from its mapped spec requirements has a real implementation, a test step, and a screenshot. Duration is not a target. Extend by covering uncovered criteria, not by repeating the same code path (e.g., downloading multiple models exercises the same download→progress→complete flow — one download is sufficient to verify that flow).
  • No repetitive padding — NEVER repeat an interaction already performed to pad time. No cycling through the same cards multiple times. No navigating between the same tabs repeatedly. No typing multiple search queries that all produce the same result. Each interaction must test something the previous interactions didn't. If you catch yourself writing "round 2" or "again" in a comment, you are padding.
  • Actual durations only — Never write estimated durations (e.g.,
    ~5m
    ) to
    journey-state.md
    . Always measure from the real
    xcodebuild test
    run.
  • Work on existing journeys first — Check
    journey-state.md
    before creating new ones.
  • NEVER simulate, fake, or stub app features — Do NOT create
    SimulatedXxxRepository
    ,
    FakeXxx
    ,
    MockXxx
    , or placeholder implementations that bypass real functionality. Every repository, service, and feature MUST use the real framework APIs (ScreenCaptureKit, whisper.cpp, AVPlayer, AVAssetWriter, etc.). If a feature is specified in
    spec.md
    , implement it for real — not with
    Thread.sleep()
    + fake data. Simulated implementations waste time: they pass tests but deliver zero user value, and every journey built on top of them must be rewritten when real implementations arrive. If a real API requires permissions or hardware that blocks testing, document the blocker and use
    /attack-blocker
    to resolve it — do not work around it with a simulation. The only acceptable "fake" is a test double used exclusively in unit tests (never in the running app).
  • NEVER mock test data in /tmp or anywhere — Do NOT create fake fixture data programmatically in test
    setUp()
    methods (e.g., writing JSON files to
    /tmp/
    or
    NSTemporaryDirectory()
    ). Instead:
    1. Use earlier journeys to generate data. Journey tests run in sequence. Earlier journeys (e.g., first-launch-setup, recording) should create real data through UI operations that later journeys can use.
    2. Generate data like a real user would. If a journey needs a recording to exist, a prior journey must have created it through the app's actual recording flow via the UI.
    3. If UI-generated data is truly impossible (e.g., the feature isn't built yet), you MAY add data programmatically BUT you MUST: (a) document it clearly in
      journey.md
      under a
      ## Programmatic Test Data
      section explaining what was added and why UI generation wasn't possible, and (b) add a TODO to replace it with UI-generated data once the feature is available.
    4. Journey ordering matters. Design journey sequences so that data-producing journeys come before data-consuming journeys. The numbering (001, 002, 003...) defines execution order.
  • One journey at a time
  • Real user behavior only — no internal APIs
  • Every step gets a screenshot (app window only)
  • Screenshots always go in
    journeys/{NNN}-{name}/screenshots/
  • Journey folders are numbered sequentially:
    001-
    ,
    002-
    ,
    003-
    , etc.
  • Fix before moving on
  • Each round produces NEW timestamped files — never overwrite
  • Unit tests must run in < 1 second total
  • Only run this journey's tests, not the full suite
  • Use the extract script for screenshots from xcresult
  • NEVER edit .xcodeproj manually — always update
    project.yml
    and run
    xcodegen generate
  • Load pitfalls first —— Step 0 不是可选的,每个会话都从读取gist开始。
  • Add pitfalls for every blocker —— 当你找到非明显问题的解决方案时,立刻通过
    gh gist edit
    添加到gist中。
  • No sleep waits —— 绝对不要使用
    sleep()
    Thread.sleep()
    或固定时长等待。测试必须尽可能快的完成。
  • Use .exists, not waitForExistence —— 每个视图过渡只使用一次
    waitForExistence()
    。同一个已加载视图中的其他所有元素检查都用
    .exists
    (同步,~50ms)。已经渲染的元素绝对不要用
    waitForExistence
    。这是238s测试和61s测试的区别。
  • 3-second gap enforcement —— 连续截图之间的间隔必须<=5秒。
    snap()
    工具会实时写入
    screenshot-timing.jsonl
    。旅程循环监控器会监控这个文件,违反规则就终止测试。不可避免的长间隔在snap调用前加
    // SLOW-OK: reason
    标注。
  • Acceptance criteria coverage —— 直到映射的规格需求的每个验收标准都有真实实现、测试步骤和截图,旅程才算完成。Duration不是目标。通过覆盖未覆盖的标准扩展旅程,而不是重复相同的代码路径(比如下载多个模型走的是相同的download→progress→complete flow——一次下载就足够验证这个流程)。
  • No repetitive padding —— 绝对不要重复已经执行过的交互来凑时长。不要多次循环点击相同的卡片,不要反复在相同的标签页之间导航,不要输入多个返回相同结果的搜索查询。每个交互必须测试之前交互没有覆盖的内容。如果你发现自己在注释里写「round 2」或「again」,你就是在加冗余内容。
  • Actual durations only —— 绝对不要在
    journey-state.md
    中写估计耗时(比如
    ~5m
    )。始终使用
    xcodebuild test
    运行的实际测量值。
  • Work on existing journeys first —— 创建新旅程前先检查
    journey-state.md
  • NEVER simulate, fake, or stub app features —— 不要创建
    SimulatedXxxRepository
    FakeXxx
    MockXxx
    或绕过真实功能的占位符实现。每个仓库、服务和功能必须使用真实的框架API(ScreenCaptureKit、whisper.cpp、AVPlayer、AVAssetWriter等)。如果
    spec.md
    中定义了某个功能,真实实现它——不要用
    Thread.sleep()
    +假数据。模拟实现浪费时间:它们能通过测试但给用户带不来任何价值,而且当真实实现上线后,所有基于它们构建的旅程都要重写。如果真实API需要权限或硬件导致无法测试,记录阻塞问题,用
    /attack-blocker
    解决——不要用模拟实现绕开。唯一可接受的「fake」是仅在单元测试中使用的测试替身(绝对不要在运行的应用中使用)。
  • NEVER mock test data in /tmp or anywhere —— 不要在测试
    setUp()
    方法中用代码创建假的测试数据(比如向
    /tmp/
    NSTemporaryDirectory()
    写JSON文件)。相反:
    1. Use earlier journeys to generate data. 旅程测试按顺序运行。更早的旅程(比如first-launch-setup、recording)应该通过UI操作创建真实数据,供后续旅程使用。
    2. Generate data like a real user would. 如果某个旅程需要存在一个录制内容,之前的旅程必须通过应用的真实录制流程通过UI创建它。
    3. If UI-generated data is truly impossible(比如功能还没开发),你可以用代码添加数据,但必须:(a) 在
      journey.md
      ## Programmatic Test Data
      部分明确记录添加了什么内容,以及为什么UI生成不可能,(b) 添加TODO,等功能可用后替换为UI生成的数据。
    4. Journey ordering matters. 设计旅程顺序时,生成数据的旅程要在消耗数据的旅程前面。编号(001、002、003...)定义了执行顺序。
  • 一次处理一个旅程
  • 仅模拟真实用户行为——不要调用内部API
  • 每一步都要截图(仅应用窗口)
  • 截图始终保存在
    journeys/{NNN}-{name}/screenshots/
  • 旅程文件夹按顺序编号:
    001-
    002-
    003-
  • 继续下一步前先修复问题
  • 每轮生成带时间戳的新文件——永远不要覆盖
  • 单元测试总运行时间必须< 1秒
  • 只运行本次旅程的测试,不要运行全量套件
  • 用提取脚本从xcresult中提取截图
  • NEVER edit .xcodeproj manually —— 始终更新
    project.yml
    然后运行
    xcodegen generate