f1-test-drive

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

F1 Test Drive

F1测试演练

Run comprehensive F1 test drives that validate the full pipeline:
  • Issue-tracker behavior
  • EdgeWorker execution flow
  • Activity rendering/output quality
运行全面的F1测试演练,验证整个链路:
  • 问题跟踪器行为
  • EdgeWorker执行流程
  • 活动渲染/输出质量

Mission

任务

Execute test drives that verify:
  1. Issue-tracker correctness
  2. EdgeWorker worktree/session behavior
  3. Activity output visibility and formatting
执行测试演练以验证:
  1. 问题跟踪器正确性
  2. EdgeWorker工作树/会话行为
  3. 活动输出可见性与格式

Test Drive Protocol

测试演练协议

Phase 1: Setup

阶段1:环境搭建

  1. Create a fresh test repository (if needed):
    bash
    cd apps/f1
    ./f1 init-test-repo --path /tmp/f1-test-drive-<timestamp>
  2. Start F1 server:
    bash
    CYRUS_PORT=3600 CYRUS_REPO_PATH=/tmp/f1-test-drive-<timestamp> bun run apps/f1/server.ts &
  3. Verify server health:
    bash
    CYRUS_PORT=3600 ./f1 ping
    CYRUS_PORT=3600 ./f1 status
  1. 创建全新测试仓库(如有需要):
    bash
    cd apps/f1
    ./f1 init-test-repo --path /tmp/f1-test-drive-<timestamp>
  2. 启动F1服务器:
    bash
    CYRUS_PORT=3600 CYRUS_REPO_PATH=/tmp/f1-test-drive-<timestamp> bun run apps/f1/server.ts &
  3. 验证服务健康状态:
    bash
    CYRUS_PORT=3600 ./f1 ping
    CYRUS_PORT=3600 ./f1 status

Phase 2: Issue-Tracker Verification

阶段2:问题跟踪器验证

  1. Create test issue:
    bash
    CYRUS_PORT=3600 ./f1 create-issue \
      --title "<issue title>" \
      --description "<issue description>"
  2. Verify issue ID and issue creation response.
  1. 创建测试问题:
    bash
    CYRUS_PORT=3600 ./f1 create-issue \
      --title "<issue title>" \
      --description "<issue description>"
  2. 验证问题ID和问题创建响应。

Phase 3: EdgeWorker Verification

阶段3:EdgeWorker验证

  1. Start agent session:
    bash
    CYRUS_PORT=3600 ./f1 start-session --issue-id <issue-id>
  2. Monitor activities:
    bash
    CYRUS_PORT=3600 ./f1 view-session --session-id <session-id>
  3. Verify:
    • session started
    • activities appear
    • agent is processing issue
  1. 启动Agent会话:
    bash
    CYRUS_PORT=3600 ./f1 start-session --issue-id <issue-id>
  2. 监控活动:
    bash
    CYRUS_PORT=3600 ./f1 view-session --session-id <session-id>
  3. 验证:
    • 会话已启动
    • 活动正常展示
    • Agent正在处理问题

Phase 4: Renderer Verification

阶段4:渲染器验证

  1. Validate activity payload quality:
    • expected types (for example
      thought
      ,
      action
      ,
      response
      )
    • timestamps present
    • content well-formed and readable
  2. Validate pagination behavior:
    bash
    CYRUS_PORT=3600 ./f1 view-session --session-id <session-id> --limit 10 --offset 0
  1. 验证活动载荷质量:
    • 符合预期类型(例如
      thought
      action
      response
    • 包含时间戳
    • 内容格式规范、可读性好
  2. 验证分页行为:
    bash
    CYRUS_PORT=3600 ./f1 view-session --session-id <session-id> --limit 10 --offset 0

Phase 5: Cleanup

阶段5:环境清理

  1. Stop active session:
    bash
    CYRUS_PORT=3600 ./f1 stop-session --session-id <session-id>
  2. Stop background server process.
  1. 停止活跃会话:
    bash
    CYRUS_PORT=3600 ./f1 stop-session --session-id <session-id>
  2. 停止后台服务进程。

Reporting Format

报告格式

Write report under
apps/f1/test-drives/
:
markdown
undefined
apps/f1/test-drives/
目录下编写报告:
markdown
undefined

Test Drive #NNN: [Goal Description]

Test Drive #NNN: [Goal Description]

Date: YYYY-MM-DD Goal: [One sentence] Test Repo: [Path]
Date: YYYY-MM-DD Goal: [One sentence] Test Repo: [Path]

Verification Results

Verification Results

Issue-Tracker

Issue-Tracker

  • Issue created
  • Issue ID returned
  • Issue metadata accessible
  • Issue created
  • Issue ID returned
  • Issue metadata accessible

EdgeWorker

EdgeWorker

  • Session started
  • Worktree created (if applicable)
  • Activities tracked
  • Agent processed issue
  • Session started
  • Worktree created (if applicable)
  • Activities tracked
  • Agent processed issue

Renderer

Renderer

  • Activity format correct
  • Pagination works
  • Search works
  • Activity format correct
  • Pagination works
  • Search works

Session Log

Session Log

[commands + key outputs + pass/fail]
[commands + key outputs + pass/fail]

Final Retrospective

Final Retrospective

[what worked, issues, recommendations]
undefined
[what worked, issues, recommendations]
undefined

Pass/Fail Criteria

通过/失败标准

Pass when:
  1. Server starts
  2. Issue created successfully
  3. Session starts and activities appear
  4. Activity payloads are coherent
  5. Session stops cleanly
  6. No unhandled errors
Fail when:
  • server startup fails
  • issue creation fails
  • session does not start
  • no activities after reasonable wait
  • malformed activity data
  • unhandled exceptions
满足以下条件视为通过:
  1. 服务正常启动
  2. 问题创建成功
  3. 会话启动且活动正常展示
  4. 活动载荷内容连贯
  5. 会话正常停止
  6. 无未处理错误
出现以下任一情况视为失败:
  • 服务启动失败
  • 问题创建失败
  • 会话未启动
  • 合理等待后无活动产生
  • 活动数据格式错误
  • 出现未处理异常

Important Notes

重要注意事项

  • Prefer fixed port
    3600
    unless already in use.
  • Use fresh test repos per drive.
  • Preserve failed state when debugging.
  • For major runner/harness changes, run at least one F1 end-to-end validation before merge.
  • 优先使用固定端口
    3600
    ,除非端口已被占用。
  • 每次测试演练使用全新的测试仓库。
  • 调试时保留失败现场。
  • 若对运行器/测试框架有重大变更,合并前至少执行一次F1端到端验证。

Multi-Harness Note

多测试框架说明

This skill is intentionally harness-agnostic:
  • Claude subagents can call this skill.
  • Codex/OpenCode workflows can reference the same skill content.
  • Harness-specific adapters should be thin wrappers around this canonical skill.
本Skill特意设计为与测试框架无关:
  • Claude子Agent可调用该Skill。
  • Codex/OpenCode工作流可引用相同的Skill内容。
  • 特定测试框架的适配器应为该标准Skill的轻量封装。