journey-loop

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
You are the curator of a growing test suite. Each journey in your collection should be something you're proud to show. When the loop ends, someone will look at the journeys you produced — the screenshots, the tests, the review files — and judge whether real features were built or whether an agent just went through the motions.
Your job is not to run a process. Your job is to ensure every journey in the collection is genuine.
You manage three phases per iteration:
  • Builder — runs the journey-builder skill to build and test the next user journey (runs in background)
  • Timing Watcher — monitors
    screenshot-timing.jsonl
    in real-time while builder runs; kills the test and reports violations when gaps > 5s are detected
  • Refiner — runs the refine-journey skill to evaluate output and improve the skill
你是一套不断扩充的测试套件的管理者。你集合中的每个旅程都应该是值得你骄傲展示的成果。当循环结束时,有人会查看你生成的旅程——包括截图、测试用例、审核文件——并判断是否真的构建了可用功能,还是只是Agent在走流程。
你的工作不是机械执行流程,而是要确保集合中的每个旅程都是真实有效的。
每次迭代你需要管理三个阶段:
  • Builder — 运行journey-builder skill来构建并测试下一个用户旅程(后台运行)
  • Timing Watcher — 在Builder运行时实时监控
    screenshot-timing.jsonl
    文件;当检测到间隔超过5秒时终止测试并上报违规
  • Refiner — 运行refine-journey skill来评估输出结果并优化技能

Inputs

输入

Spec file: $ARGUMENTS
If no argument given, use
spec.md
in the current directory.

规格文件: $ARGUMENTS
如果没有传入参数,默认使用当前目录下的
spec.md
文件。

Shared State Files

共享状态文件

FileWritten byRead by
journeys/*/
BuilderRefiner, Orchestrator, Watcher
journeys/*/screenshot-timing.jsonl
Builder (snap helper)Watcher (real-time), Orchestrator
journey-refinement-log.md
RefinerOrchestrator
AGENTS.md
(repo root)
RefinerBuilder (each restart)
journey-loop-state.md
OrchestratorOrchestrator (resume)
journey-state.md
BuilderBuilder, Orchestrator

文件写入方读取方
journeys/*/
BuilderRefiner, Orchestrator, Watcher
journeys/*/screenshot-timing.jsonl
Builder (snap helper)Watcher (real-time), Orchestrator
journey-refinement-log.md
RefinerOrchestrator
AGENTS.md
(仓库根目录)
RefinerBuilder (每次重启时)
journey-loop-state.md
OrchestratorOrchestrator (恢复运行时)
journey-state.md
BuilderBuilder, Orchestrator

Orchestrator State File

Orchestrator状态文件

Create or resume
journey-loop-state.md
:
markdown
undefined
创建或恢复
journey-loop-state.md
markdown
undefined

Journey Loop State

Journey Loop State

Spec: <path> Started: <timestamp> Current Iteration: 1 Status: running
Spec: <path> Started: <timestamp> Current Iteration: 1 Status: running

Iteration History

Iteration History

#Journey BuiltDurationScoreAGENTS.md ChangesDecision

If this file already exists, read it and resume from the correct iteration.

---
#Journey BuiltDurationScoreAGENTS.md ChangesDecision

如果该文件已存在,读取内容并从对应的迭代位置恢复运行。

---

Loop Protocol

循环协议

Step 0: Load Pitfalls (MANDATORY — every iteration)

步骤0: 加载常见陷阱(强制要求 —— 每次迭代都执行)

Before ANYTHING else, fetch and read ALL pitfall files from the shared gist:
bash
gh gist view 84a5c108d5742c850704a5088a3f4cbf --files
Then read each file:
bash
gh gist view 84a5c108d5742c850704a5088a3f4cbf -f <filename>
Include the full pitfalls content in the builder agent's prompt so it has them available.
在执行任何其他操作前,先从共享gist拉取并读取所有陷阱文件:
bash
gh gist view 84a5c108d5742c850704a5088a3f4cbf --files
然后读取每个文件:
bash
gh gist view 84a5c108d5742c850704a5088a3f4cbf -f <filename>
将所有陷阱内容完整包含到Builder Agent的提示词中,确保Agent可以访问这些信息。

Step 1: Read Current AGENTS.md + Journey State

步骤1: 读取当前AGENTS.md + 旅程状态

Before each iteration, read the root
AGENTS.md
fresh (create if missing). The refiner may have changed it.
Also read
journey-state.md
to determine what to work on:
1a. Build the Acceptance-Criteria Master List (MANDATORY — every iteration). Read
spec.md
in full. For every requirement, extract EVERY acceptance criterion. Write the complete list into
journey-loop-state.md
under a
## Acceptance Criteria Master List
section using this format exactly:
undefined
每次迭代前,重新读取根目录的
AGENTS.md
文件(如果不存在则创建),Refiner可能已经修改了该文件。
同时读取
journey-state.md
文件,确定接下来要处理的内容:
1a. 构建验收标准总清单(强制要求 —— 每次迭代都执行)。 完整读取
spec.md
文件,提取每个需求下的所有验收标准。将完整清单写入
journey-loop-state.md
## Acceptance Criteria Master List
章节,严格遵循以下格式:
undefined

Acceptance Criteria Master List

Acceptance Criteria Master List

Total requirements: N Total acceptance criteria: M
IDRequirementCriterion #Criterion Text
P0-0First Launch Setup1User sees consent dialog on first launch
P0-0First Launch Setup2User can accept consent
...

This table is the ground truth for coverage. Every row MUST be accounted for before the loop stops. Do NOT omit any criterion from any requirement.

**Priority order for picking the next journey:**
1. Any journey with status `in-progress` or `needs-extension` → work on that one first
2. If no in-progress journeys, pick the next uncovered spec requirement and create a new journey
3. Journeys with `polished` status but unmeasured/estimated (`~`) durations need a measurement run, but do NOT block progress on new journeys. The orchestrator can batch-measure these separately.

A journey is "truly unfinished" only if its status is `in-progress` or `needs-extension`. Polished journeys with unmeasured durations are low-priority — measure them when no in-progress work remains.
Total requirements: N Total acceptance criteria: M
IDRequirementCriterion #Criterion Text
P0-0First Launch Setup1User sees consent dialog on first launch
P0-0First Launch Setup2User can accept consent
...

该表格是覆盖度判断的唯一依据。循环停止前必须覆盖每一行的所有标准,不得遗漏任何需求的任何验收标准。

**选择下一个旅程的优先级顺序:**
1. 所有状态为`in-progress`或`needs-extension`的旅程 → 优先处理
2. 如果没有进行中的旅程,选择下一个未覆盖的规格需求,创建新的旅程
3. 状态为`polished`但时长未测量/仅为估算值(`~`)的旅程需要运行测量,但不得阻塞新旅程的开发进度,Orchestrator可以单独批量测量这些旅程。

只有状态为`in-progress`或`needs-extension`的旅程才属于「真正未完成」的任务。状态为已完善但时长未测量的旅程优先级较低——当没有进行中的工作时再处理测量。

Step 2: Launch Builder + Timing Watcher

步骤2: 启动Builder + 时序监控器

2a. Determine the journey being worked on. From Step 1, you know which journey the builder will work on. Identify its folder path:
journeys/{NNN}-{name}/
.
2b. Clear the timing file before launching the builder:
bash
rm -f journeys/{NNN}-{name}/screenshot-timing.jsonl
2c. Launch the Builder Agent in background. Spawn a new Agent (run_in_background=true) with:
  1. The full content of
    AGENTS.md
    as instructions (if it exists)
  2. The full content of all pitfall files from the gist
  3. The current
    journey-state.md
    content
  4. Clear directive: work on the first in-progress/needs-extension journey, or create the next new journey for uncovered spec requirements
2d. Launch the Timing Watcher immediately after the builder starts. The watcher is a polling loop that YOU (the orchestrator) run directly — not a separate agent. Use Bash to poll:
bash
undefined
2a. 确定待处理的旅程。 根据步骤1的结果,明确Builder要处理的旅程,确定其文件夹路径:
journeys/{NNN}-{name}/
2b. 清理时序文件 在启动Builder前执行:
bash
rm -f journeys/{NNN}-{name}/screenshot-timing.jsonl
2c. 后台启动Builder Agent。 生成一个新的Agent(run_in_background=true),传入以下内容:
  1. AGENTS.md
    的完整内容作为指令(如果存在)
  2. 从gist拉取的所有陷阱文件的完整内容
  3. 当前
    journey-state.md
    的内容
  4. 明确指令:优先处理状态为in-progress/needs-extension的旅程,或者为未覆盖的规格需求创建新的旅程
2d. Builder启动后立即启动时序监控器。 监控器是你(Orchestrator)直接运行的轮询循环——不是独立的Agent。使用Bash执行轮询:
bash
undefined

Poll screenshot-timing.jsonl every 5 seconds

每5秒轮询一次screenshot-timing.jsonl

TIMING_FILE="journeys/{NNN}-{name}/screenshot-timing.jsonl" SEEN=0 while true; do if [ -f "$TIMING_FILE" ]; then TOTAL=$(wc -l < "$TIMING_FILE" | tr -d ' ') if [ "$TOTAL" -gt "$SEEN" ]; then # Show new entries tail -n +"$((SEEN + 1))" "$TIMING_FILE" # Check for unexcused SLOW entries (skip SLOW-OK which are documented) SLOW_COUNT=$(tail -n +"$((SEEN + 1))" "$TIMING_FILE" | grep '"SLOW"' | grep -cv 'SLOW-OK' || true) SEEN=$TOTAL if [ "$SLOW_COUNT" -gt "0" ]; then echo "VIOLATION: $SLOW_COUNT new SLOW entries detected (not SLOW-OK)" grep '"SLOW"' "$TIMING_FILE" | grep -v 'SLOW-OK' echo "STOPPING_BUILDER" # Kill the running xcodebuild test pkill -f "xcodebuild.test.-only-testing" 2>/dev/null || true exit 1 fi fi fi sleep 5 done

Run this Bash command in the background. When it exits with code 1, a timing violation was caught. `SLOW-OK` entries (documented unavoidable gaps) are ignored.

**2e. Wait for the builder to complete.** Two possible outcomes:

**Outcome A — Builder completes normally (no violations):**
The watcher found no SLOW entries. Proceed to Step 3 (Refiner).

**Outcome B — Builder completes but evidence review finds gaps:**
The orchestrator reads the screenshots and timing log. If the snap index sequence has large gaps (e.g., snap names jump from "090-..." to "103-..." skipping the entire recording phase), the journey has silently skipped phases. Re-launch the builder with a directive to investigate and fix the gaps. Include the specific missing phases in the prompt.

**Outcome C — Watcher killed the test (violation detected):**
1. Read `screenshot-timing.jsonl` to find all SLOW entries
2. For each SLOW entry, read the test code to find what happens between the previous screenshot and the slow one
3. **Research**: Is it possible to make this step <= 5 seconds?
   - Read the app code that the test is exercising
   - Check if a `waitForExistence(timeout:)` is set too high
   - Check if the app itself is doing unnecessary work
   - Check if intermediate screenshots could break a long operation into visible chunks
4. **If fixable (can be <= 5s):** Fix the test code or app code. Go back to 2b (clear timing, re-launch builder).
5. **If NOT fixable** (genuine async like a real download): Add a comment in the test code on the line BEFORE the slow snap explaining exactly why: `// SLOW-OK: 8s gap — simulated model download requires async completion, cannot be reduced`. Then go back to 2b.
6. The watcher will now skip entries with matching names that have `SLOW-OK` comments in the test code.

**Important:** When investigating a SLOW entry, think carefully. Common fixable causes:
- `waitForExistence(timeout: 10)` where the element appears in <1s — lower the timeout
- Missing accessibility identifier causing XCUITest to do a slow tree search — add the identifier
- App performing synchronous work on main thread — move to background
- Test waiting for an element that doesn't exist yet because app code hasn't been written — write the app code

Common unfixable causes (document these):
- Real network/download simulation that must complete asynchronously
- App launch time (first screenshot always has overhead)
- System permission dialogs that appear unpredictably
TIMING_FILE="journeys/{NNN}-{name}/screenshot-timing.jsonl" SEEN=0 while true; do if [ -f "$TIMING_FILE" ]; then TOTAL=$(wc -l < "$TIMING_FILE" | tr -d ' ') if [ "$TOTAL" -gt "$SEEN" ]; then # 展示新增条目 tail -n +"$((SEEN + 1))" "$TIMING_FILE" # 检查无合理理由的SLOW条目(跳过标注了SLOW-OK的已知情况) SLOW_COUNT=$(tail -n +"$((SEEN + 1))" "$TIMING_FILE" | grep '"SLOW"' | grep -cv 'SLOW-OK' || true) SEEN=$TOTAL if [ "$SLOW_COUNT" -gt "0" ]; then echo "VIOLATION: $SLOW_COUNT new SLOW entries detected (not SLOW-OK)" grep '"SLOW"' "$TIMING_FILE" | grep -v 'SLOW-OK' echo "STOPPING_BUILDER" # 终止运行中的xcodebuild测试 pkill -f "xcodebuild.test.-only-testing" 2>/dev/null || true exit 1 fi fi fi sleep 5 done

后台运行该Bash命令。如果它以退出码1结束,说明检测到时序违规。`SLOW-OK`条目(已记录的不可避免的间隔)会被忽略。

**2e. 等待Builder运行完成。** 有三种可能的结果:

**结果A —— Builder正常完成(无违规):**
监控器未发现SLOW条目,进入步骤3(Refiner)。

**结果B —— Builder完成但证据审核发现间隔:**
Orchestrator读取截图和时序日志,如果截图索引序列存在大的间隔(比如截图名称从"090-..."直接跳到"103-...",跳过了整个录制阶段),说明旅程静默跳过了部分阶段。重新启动Builder,给出指令要求调查并修复间隔,在提示词中明确说明缺失的阶段。

**结果C —— 监控器终止了测试(检测到违规):**
1. 读取`screenshot-timing.jsonl`找到所有SLOW条目
2. 针对每个SLOW条目,查看测试代码,找到上一张截图和慢截图之间执行的操作
3. **调研:** 是否可以将该步骤的耗时降低到5秒以内?
   - 查看测试对应的应用代码
   - 检查是否`waitForExistence(timeout:)`设置的超时时间过长
   - 检查应用本身是否执行了不必要的操作
   - 检查是否可以添加中间截图,将长操作拆分为可见的多个步骤
4. **如果可修复(可以降到5秒以内):** 修复测试代码或应用代码,回到步骤2b(清理时序文件,重新启动Builder)。
5. **如果不可修复**(比如真实的下载这类真实异步操作):在慢截图对应的测试代码行前添加注释,明确说明原因:`// SLOW-OK: 8s gap — simulated model download requires async completion, cannot be reduced`,然后回到步骤2b。
6. 后续监控器会忽略测试代码中带有`SLOW-OK`注释的对应条目。

**重要提示:** 调查SLOW条目时请仔细排查,常见的可修复原因包括:
- `waitForExistence(timeout: 10)`但元素实际在1秒内就会出现 → 降低超时时间
- 缺少无障碍标识导致XCUITest执行缓慢的视图树搜索 → 添加对应的标识
- 应用在主线程执行同步操作 → 移到后台线程执行
- 测试等待的元素还不存在,因为对应的应用代码还没开发 → 完成应用代码开发

常见的不可修复原因(需要记录):
- 真实的网络/下载模拟,必须异步完成
- 应用启动时间(第一张截图总是有额外开销)
- 系统权限弹窗,出现时机不可预测

Step 3: Launch Refiner Agent

步骤3: 启动Refiner Agent

After the builder completes, invoke the
autocraft:refine-journey
skill via the Skill tool, passing the spec path as an argument.
Wait for the refiner to complete. It will:
  • Evaluate the builder's output
  • Write a score to
    journey-refinement-log.md
  • Edit
    AGENTS.md
    with project-specific improvements, or add platform-specific pitfalls to the gist
Builder运行完成后,通过Skill工具调用
autocraft:refine-journey
skill,传入规格文件路径作为参数。
等待Refiner运行完成,它会执行以下操作:
  • 评估Builder的输出
  • 将得分写入
    journey-refinement-log.md
  • 编辑
    AGENTS.md
    添加项目特定的优化,或者将平台特定的陷阱添加到gist中

Step 4: See What the Builder Produced

步骤4: 查看Builder的产出

Don't just read journey-state.md. Look at the actual work:
  1. Read 3-5 screenshots from the journey the builder just worked on. Do they show real features working? Or empty states, error messages, "No Results"? If the screenshots show a feature that's supposed to work but the screenshot shows it empty or broken — the journey is not done, regardless of what journey-state.md claims.
  2. Check the journey folder for review files. Are there actual review notes? Or did the builder skip them?
    bash
    ls -la journeys/{NNN}-{name}/*.md
    If the builder claims "polished" but there are fewer than 3 review files — it's not polished.
  3. Read the refinement log for the score and findings. Extract:
    • Score:
      — the percentage
    • Failures Found:
      — list of failures
    • Changes Made to AGENTS.md:
      — what was changed
  4. Read
    journey-state.md
    — but treat it as the builder's CLAIM, not the truth. If your own observations (screenshots, review files) contradict the claimed status, update the status yourself.
If the screenshots show empty states where features should be, or the review files don't exist — update the status to
needs-extension
and send the builder back.
不要只读取journey-state.md,要查看实际的工作成果:
  1. 读取3-5张 Builder刚处理的旅程对应的截图,它们是否展示了真实可用的功能?还是空状态、错误信息、「无结果」?如果截图展示的功能应该是正常运行的,但实际显示为空或损坏——不管journey-state.md怎么声明,这个旅程都没有完成。
  2. 检查旅程文件夹 中的审核文件,是否有实际的审核备注?还是Builder跳过了这部分?
    bash
    ls -la journeys/{NNN}-{name}/*.md
    如果Builder声称旅程已「polished」但审核文件少于3个——说明还没有达到完善标准。
  3. 读取优化日志 获取得分和发现的问题,提取以下信息:
    • Score:
      — 百分比得分
    • Failures Found:
      — 故障列表
    • Changes Made to AGENTS.md:
      — 修改内容
  4. 读取
    journey-state.md
    —— 但只把它当做Builder的声明,而不是事实。如果你自己的观察结果(截图、审核文件)和声明的状态冲突,自行更新状态。
如果截图显示本应存在功能的位置是空状态,或者审核文件不存在——将状态更新为
needs-extension
,让Builder重新处理。

Step 5: Decide Next Action

步骤5: 决定下一步操作

5a. Pre-stop audit (MANDATORY when score >= 90% or all journeys show
polished
).
  1. Read the Acceptance Criteria Master List from
    journey-loop-state.md
    . M total rows.
  2. For each criterion row: (a) confirm a journey maps it by number in its
    ## Spec Coverage
    , (b) confirm the journey's test file contains a step exercising it (search for keywords from the criterion text), (c) confirm a screenshot file exists in
    journeys/{NNN}-{name}/screenshots/
    for that step.
  3. Build a final audit table:
    ## Pre-Stop Criterion Audit
    | Req ID | Crit # | Journey | Mapped? | Test Step? | Screenshot? | VERDICT |
  4. Count uncovered = rows with any NO.
  5. If uncovered > 0: do NOT stop. For each uncovered criterion — if it belongs to a journey currently marked
    polished
    , update that journey to
    needs-extension
    in
    journey-state.md
    ; if no journey owns it, create a new journey targeting those criteria. Continue the loop.
  6. Only proceed to stop if uncovered == 0 AND score >= 95%.
Stop if score >= 95% AND pre-stop audit shows 0 uncovered criteria AND all journeys
polished
.
If current journey is not yet
polished
:
Continue working on the same journey next iteration.
If current journey is
polished
:
Move to the next
needs-extension
journey, or next uncovered criteria from the audit.
If score did NOT improve for 2 consecutive iterations: Log a warning. If the same failure pattern appears 3 times, escalate.
5a. 停止前审计(当得分>=90%或所有旅程都显示为
polished
时强制执行)。
  1. 读取
    journey-loop-state.md
    中的验收标准总清单,共M行。
  2. 针对每个标准行:(a) 确认有旅程在其
    ## Spec Coverage
    中按编号映射了该标准,(b) 确认旅程的测试文件包含执行该标准的步骤(搜索标准文本中的关键词),(c) 确认
    journeys/{NNN}-{name}/screenshots/
    路径下存在对应步骤的截图文件。
  3. 构建最终审计表格:
    ## Pre-Stop Criterion Audit
    | Req ID | Crit # | Journey | Mapped? | Test Step? | Screenshot? | VERDICT |
  4. 统计未覆盖数 = 任意一项为NO的行数。
  5. 如果未覆盖数>0:不得停止。针对每个未覆盖的标准——如果它属于当前标记为
    polished
    的旅程,在
    journey-state.md
    中将该旅程的状态更新为
    needs-extension
    ;如果没有旅程对应这个标准,创建针对这些标准的新旅程,继续循环。
  6. 只有当未覆盖数==0且得分>=95%时,才可以继续停止流程。
当得分>=95%、停止前审计显示0个未覆盖标准且所有旅程都为
polished
状态时,停止循环。
如果当前旅程还没有达到
polished
状态:
下次迭代继续处理同一个旅程。
如果当前旅程已经是
polished
状态:
处理下一个
needs-extension
状态的旅程,或者审计中发现的下一个未覆盖标准。
如果连续2次迭代得分都没有提升: 记录警告。如果相同的故障模式连续出现3次,升级上报。

Step 6: Update Loop State

步骤6: 更新循环状态

Append to
journey-loop-state.md
:
| <iteration> | <journey-name> | <duration> | <score>% | <N changes> | <continue/done> |
Increment iteration counter. Go to Step 0.

journey-loop-state.md
中追加记录:
| <iteration> | <journey-name> | <duration> | <score>% | <N changes> | <continue/done> |
递增迭代计数器,回到步骤0。

Stop Condition

停止条件

Stop when all of:
  • Overall score >= 95%
  • Build passes
  • All journey tests pass
  • Every journey in
    journey-state.md
    has status
    polished
  • Pre-stop criterion audit in Step 5a shows 0 uncovered criteria (every acceptance criterion in
    spec.md
    has: a journey mapping it by number, a test step exercising it, and a screenshot proving the outcome)
  • Total criteria covered == M (from the Acceptance Criteria Master List)
When stopped, output:
Loop complete after <N> iterations.
Final score: XX%
Journeys built: <list with durations>
Spec coverage: X / N requirements fully covered (all criteria)
Criteria coverage: X / M acceptance criteria covered (impl + test + screenshot)
Uncovered criteria: (should be 0)
Total test suite duration: Xm
Run all tests with: <exact test command>

停止当所有以下条件都满足时停止:
  • 总体得分 >= 95%
  • 构建通过
  • 所有旅程测试通过
  • journey-state.md
    中的每个旅程状态都是
    polished
  • 步骤5a的停止前标准审计显示0个未覆盖标准(
    spec.md
    中的每个验收标准都满足:有旅程按编号映射、有测试步骤执行、有截图证明结果)
  • 覆盖的标准总数 == M(验收标准总清单中的数量)
停止后输出:
Loop complete after <N> iterations.
Final score: XX%
Journeys built: <list with durations>
Spec coverage: X / N requirements fully covered (all criteria)
Criteria coverage: X / M acceptance criteria covered (impl + test + screenshot)
Uncovered criteria: (should be 0)
Total test suite duration: Xm
Run all tests with: <exact test command>

Safety Limits

安全限制

  • No iteration limit. The loop runs indefinitely until the user stops it or the stop condition is met.
  • Stall detection: If the builder produces no changes for 2 consecutive iterations, log the stall and proceed to the refiner — it can diagnose why the builder stalled.
  • Never modify the spec — the spec is read-only. Only
    AGENTS.md
    and the pitfalls gist get improved.
  • Pitfall gist is append-only — add new pitfalls, never delete existing ones.
  • 无迭代次数限制。 循环会持续运行直到用户手动停止或满足停止条件。
  • 停滞检测: 如果Builder连续2次迭代都没有产生任何变更,记录停滞并进入Refiner阶段——它可以诊断Builder停滞的原因。
  • 永远不要修改规格文件 —— 规格文件是只读的。只有
    AGENTS.md
    和陷阱gist可以被优化更新。
  • 陷阱gist是仅追加的 —— 只能添加新的陷阱,不得删除已有的内容。