f1-test-drive
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseF1 Test Drive
F1测试演练
Run comprehensive F1 test drives that validate the full pipeline:
- Issue-tracker behavior
- EdgeWorker execution flow
- Activity rendering/output quality
运行全面的F1测试演练,验证整个链路:
- 问题跟踪器行为
- EdgeWorker执行流程
- 活动渲染/输出质量
Mission
任务
Execute test drives that verify:
- Issue-tracker correctness
- EdgeWorker worktree/session behavior
- Activity output visibility and formatting
执行测试演练以验证:
- 问题跟踪器正确性
- EdgeWorker工作树/会话行为
- 活动输出可见性与格式
Test Drive Protocol
测试演练协议
Phase 1: Setup
阶段1:环境搭建
-
Create a fresh test repository (if needed):bash
cd apps/f1 ./f1 init-test-repo --path /tmp/f1-test-drive-<timestamp> -
Start F1 server:bash
CYRUS_PORT=3600 CYRUS_REPO_PATH=/tmp/f1-test-drive-<timestamp> bun run apps/f1/server.ts & -
Verify server health:bash
CYRUS_PORT=3600 ./f1 ping CYRUS_PORT=3600 ./f1 status
-
创建全新测试仓库(如有需要):bash
cd apps/f1 ./f1 init-test-repo --path /tmp/f1-test-drive-<timestamp> -
启动F1服务器:bash
CYRUS_PORT=3600 CYRUS_REPO_PATH=/tmp/f1-test-drive-<timestamp> bun run apps/f1/server.ts & -
验证服务健康状态:bash
CYRUS_PORT=3600 ./f1 ping CYRUS_PORT=3600 ./f1 status
Phase 2: Issue-Tracker Verification
阶段2:问题跟踪器验证
-
Create test issue:bash
CYRUS_PORT=3600 ./f1 create-issue \ --title "<issue title>" \ --description "<issue description>" -
Verify issue ID and issue creation response.
-
创建测试问题:bash
CYRUS_PORT=3600 ./f1 create-issue \ --title "<issue title>" \ --description "<issue description>" -
验证问题ID和问题创建响应。
Phase 3: EdgeWorker Verification
阶段3:EdgeWorker验证
-
Start agent session:bash
CYRUS_PORT=3600 ./f1 start-session --issue-id <issue-id> -
Monitor activities:bash
CYRUS_PORT=3600 ./f1 view-session --session-id <session-id> -
Verify:
- session started
- activities appear
- agent is processing issue
-
启动Agent会话:bash
CYRUS_PORT=3600 ./f1 start-session --issue-id <issue-id> -
监控活动:bash
CYRUS_PORT=3600 ./f1 view-session --session-id <session-id> -
验证:
- 会话已启动
- 活动正常展示
- Agent正在处理问题
Phase 4: Renderer Verification
阶段4:渲染器验证
-
Validate activity payload quality:
- expected types (for example ,
thought,action)response - timestamps present
- content well-formed and readable
- expected types (for example
-
Validate pagination behavior:bash
CYRUS_PORT=3600 ./f1 view-session --session-id <session-id> --limit 10 --offset 0
-
验证活动载荷质量:
- 符合预期类型(例如、
thought、action)response - 包含时间戳
- 内容格式规范、可读性好
- 符合预期类型(例如
-
验证分页行为:bash
CYRUS_PORT=3600 ./f1 view-session --session-id <session-id> --limit 10 --offset 0
Phase 5: Cleanup
阶段5:环境清理
-
Stop active session:bash
CYRUS_PORT=3600 ./f1 stop-session --session-id <session-id> -
Stop background server process.
-
停止活跃会话:bash
CYRUS_PORT=3600 ./f1 stop-session --session-id <session-id> -
停止后台服务进程。
Reporting Format
报告格式
Write report under :
apps/f1/test-drives/markdown
undefined在目录下编写报告:
apps/f1/test-drives/markdown
undefinedTest Drive #NNN: [Goal Description]
Test Drive #NNN: [Goal Description]
Date: YYYY-MM-DD
Goal: [One sentence]
Test Repo: [Path]
Date: YYYY-MM-DD
Goal: [One sentence]
Test Repo: [Path]
Verification Results
Verification Results
Issue-Tracker
Issue-Tracker
- Issue created
- Issue ID returned
- Issue metadata accessible
- Issue created
- Issue ID returned
- Issue metadata accessible
EdgeWorker
EdgeWorker
- Session started
- Worktree created (if applicable)
- Activities tracked
- Agent processed issue
- Session started
- Worktree created (if applicable)
- Activities tracked
- Agent processed issue
Renderer
Renderer
- Activity format correct
- Pagination works
- Search works
- Activity format correct
- Pagination works
- Search works
Session Log
Session Log
[commands + key outputs + pass/fail]
[commands + key outputs + pass/fail]
Final Retrospective
Final Retrospective
[what worked, issues, recommendations]
undefined[what worked, issues, recommendations]
undefinedPass/Fail Criteria
通过/失败标准
Pass when:
- Server starts
- Issue created successfully
- Session starts and activities appear
- Activity payloads are coherent
- Session stops cleanly
- No unhandled errors
Fail when:
- server startup fails
- issue creation fails
- session does not start
- no activities after reasonable wait
- malformed activity data
- unhandled exceptions
满足以下条件视为通过:
- 服务正常启动
- 问题创建成功
- 会话启动且活动正常展示
- 活动载荷内容连贯
- 会话正常停止
- 无未处理错误
出现以下任一情况视为失败:
- 服务启动失败
- 问题创建失败
- 会话未启动
- 合理等待后无活动产生
- 活动数据格式错误
- 出现未处理异常
Important Notes
重要注意事项
- Prefer fixed port unless already in use.
3600 - Use fresh test repos per drive.
- Preserve failed state when debugging.
- For major runner/harness changes, run at least one F1 end-to-end validation before merge.
- 优先使用固定端口,除非端口已被占用。
3600 - 每次测试演练使用全新的测试仓库。
- 调试时保留失败现场。
- 若对运行器/测试框架有重大变更,合并前至少执行一次F1端到端验证。
Multi-Harness Note
多测试框架说明
This skill is intentionally harness-agnostic:
- Claude subagents can call this skill.
- Codex/OpenCode workflows can reference the same skill content.
- Harness-specific adapters should be thin wrappers around this canonical skill.
本Skill特意设计为与测试框架无关:
- Claude子Agent可调用该Skill。
- Codex/OpenCode工作流可引用相同的Skill内容。
- 特定测试框架的适配器应为该标准Skill的轻量封装。