harness-engineering-playbook
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseHarness Engineering Playbook
Harness Engineering操作手册
Use this skill to operationalize the practices from OpenAI's Harness Engineering guide in a repo that agents can run against repeatedly and safely.
使用本技能将OpenAI Harness Engineering指南中的实践落地到代码仓库中,让Agent可以重复、安全地在仓库中运行。
What To Load
需要加载的资源
- Use for the full practice-to-artifact mapping.
references/openai-harness-practices.md - Use for phased adoption in active repos.
references/rollout-checklist.md - Use for Typer wizard command flows.
references/wizard-cli.md - Use when creating or updating harness files.
assets/templates/
- 参考获取完整的实践与产出物映射关系。
references/openai-harness-practices.md - 参考在活跃仓库中分阶段采用这些实践。
references/rollout-checklist.md - 参考了解Typer向导命令流程。
references/wizard-cli.md - 创建或更新Harness文件时使用中的模板。
assets/templates/
Inputs
输入信息
- Target repository path.
- Existing command surface (,
make,npm,cargo, etc.).pytest - Existing CI workflows and branch protections.
- 目标代码仓库路径。
- 现有命令体系(、
make、npm、cargo等)。pytest - 现有CI工作流和分支保护规则。
Workflow
工作流程
- Baseline the repo and detect existing workflows.
- Bootstrap harness artifacts and templates.
- Apply all nine Harness Engineering practices.
- Run harness audit checks and repair gaps.
- Iterate after real agent runs.
- 为仓库建立基线并检测现有工作流。
- 初始化Harness产出物和模板。
- 应用全部9项Harness Engineering实践。
- 运行Harness审计检查并修复存在的差距。
- 在实际Agent运行后进行迭代优化。
Step 1: Baseline The Repo
步骤1:为仓库建立基线
- Identify language/toolchain and canonical entrypoints.
- Inventory existing checks, scripts, and CI jobs.
- Record current pain points for agent runs: setup drift, unclear docs, flaky tests, missing trace IDs, slow loops.
Use a short baseline note inside so decisions remain durable.
PLANS.md- 识别语言/工具链和标准入口点。
- 盘点现有的检查、脚本和CI任务。
- 记录当前Agent运行存在的痛点:环境配置漂移、文档不清晰、测试不稳定、缺少跟踪ID、循环耗时过长。
在中添加简短的基线说明,确保决策的持久性。
PLANS.mdStep 2: Bootstrap Harness Artifacts
步骤2:初始化Harness产出物
Preferred entrypoint:
bash
python3 scripts/harness_wizard.py init <repo-path> --profile controlProfiles:
- : only core harness artifacts.
baseline - : baseline + control-system primitives.
control - : control + entropy controls (nightly audit + entropy checks).
full
Direct shell fallback:
Run:
bash
./scripts/bootstrap_harness.sh <repo-path>This script installs safe defaults from :
assets/templates/AGENTS.mdPLANS.mddocs/ARCHITECTURE.mddocs/OBSERVABILITY.md- (+
Makefile.harnessin-include Makefile.harness)Makefile scripts/audit_harness.shscripts/harness/{smoke,test,lint,typecheck}.sh.github/workflows/harness.yml
By default, existing files are not overwritten. Pass to replace template-managed files.
--force推荐入口点:
bash
python3 scripts/harness_wizard.py init <repo-path> --profile control配置文件选项:
- :仅包含核心Harness产出物。
baseline - :基线内容+控制系统原语。
control - :控制系统内容+熵控制(夜间审计+熵检查)。
full
备用Shell命令:
运行:
bash
./scripts/bootstrap_harness.sh <repo-path>该脚本会从中安装安全默认配置:
assets/templates/AGENTS.mdPLANS.mddocs/ARCHITECTURE.mddocs/OBSERVABILITY.md- (需在
Makefile.harness中添加Makefile)-include Makefile.harness scripts/audit_harness.shscripts/harness/{smoke,test,lint,typecheck}.sh.github/workflows/harness.yml
默认不会覆盖现有文件。传递参数可替换模板管理的文件。
--forceStep 3: Apply The Nine Practices
步骤3:应用9项实践
Implement each practice directly in repo artifacts.
在仓库产出物中直接实施每项实践。
1. Make Easy To Do Hard Thing
1. 让复杂任务易于执行
- Ensure hard, high-value tasks are one command away (,
make smoke,make check).make ci - Keep setup and cleanup scripted.
- Make smoke checks cheap enough for frequent use.
- 确保高价值的复杂任务只需一条命令即可完成(如、
make smoke、make check)。make ci - 将环境搭建和清理过程脚本化。
- 让冒烟测试足够轻量化,便于频繁使用。
2. Communicate Actionable Constraints With Compact Docs
2. 通过简洁文档传达可执行的约束条件
- Keep short, concrete, and command-first.
AGENTS.md - Document non-obvious constraints and guardrails.
- Keep docs close to code and update with behavior changes.
- 保持简洁、具体,以命令为核心。
AGENTS.md - 记录非显而易见的约束条件和防护规则。
- 让文档贴近代码,并随行为变更同步更新。
3. Structure Codebase With Strict Boundaries And Flow
3. 用严格的边界和流程构建代码库
- Define module boundaries in .
docs/ARCHITECTURE.md - Parse and validate data at boundaries; use typed contracts for internal flow.
- Prefer one abstraction per module and one clear ownership path.
- 在中定义模块边界。
docs/ARCHITECTURE.md - 在边界处解析并验证数据;内部流程使用类型化契约。
- 每个模块优先对应一个抽象,且有清晰的归属路径。
4. Build Observability In From Day 1
4. 从项目初期就构建可观测性
- Emit structured logs/events with correlation IDs.
- Capture key transitions in long-running workflows.
- Define minimum observable fields in .
docs/OBSERVABILITY.md
- 生成包含关联ID的结构化日志/事件。
- 捕获长运行工作流中的关键转换节点。
- 在中定义最小可观测字段。
docs/OBSERVABILITY.md
5. Optimize For Agent Flow, Not Human Flow
5. 针对Agent工作流而非人类工作流优化
- Treat context as a first-class system dependency.
- Use for multi-step/multi-hour tasks.
PLANS.md - Front-load durable context (scope, constraints, checkpoints) so restarts stay cheap.
- 将上下文视为一等系统依赖。
- 使用管理多步骤/长时间运行的任务。
PLANS.md - 前置持久化上下文(范围、约束、检查点),让重启成本更低。
6. Bring Your Own Harness
6. 自定义Harness
- Standardize repo-local wrappers (,
Makefile.harness).scripts/harness/ - Wrap local infra actions in deterministic scripts.
- Make agent behavior reproducible across machines and runs.
- 标准化仓库本地的包装器(、
Makefile.harness)。scripts/harness/ - 用可确定的脚本封装本地基础设施操作。
- 让Agent的行为在不同机器和运行场景下可复现。
7. Prototype In Natural Language First
7. 先使用自然语言原型设计
- Draft logic and tests in prose before coding.
- Review edge cases in prose and lock acceptance criteria.
- Translate approved prose into code and tests.
- 在编码前先用散文式描述草拟逻辑和测试。
- 在描述中评审边缘案例并锁定验收标准。
- 将通过评审的描述转换为代码和测试。
8. Invest In Static Analysis And Linting
8. 投入静态分析与代码检查
- Pin formatter/linter/typechecker versions where practical.
- Enforce checks in both local workflow and CI.
- Run static checks before long tests to shorten failure loops.
- 尽可能固定格式化工具/代码检查工具/类型检查工具的版本。
- 在本地工作流和CI中都强制执行检查。
- 在长耗时测试前运行静态检查,缩短失败反馈周期。
9. Manage Entropy
9. 管理熵
- Add periodic audits for docs drift, flaky checks, and dead scripts.
- Keep templates synchronized with real workflows.
- Remove stale abstractions quickly to keep agent context clean.
For a detailed artifact matrix, load .
references/openai-harness-practices.md- 添加定期审计,检查文档漂移、不稳定的检查项和废弃脚本。
- 保持模板与实际工作流同步。
- 快速移除过时的抽象,保持Agent上下文简洁。
如需详细的产出物对应关系,请加载。
references/openai-harness-practices.mdStep 4: Validate
步骤4:验证
Run:
bash
python3 scripts/harness_wizard.py audit <repo-path>Treat any or result as blocking before calling harness setup complete.
MISSINGFAIL运行:
bash
python3 scripts/harness_wizard.py audit <repo-path>在完成Harness设置前,任何或结果都视为阻塞项。
MISSINGFAILStep 5: Iterate On Real Runs
步骤5:基于实际运行迭代优化
- Observe one full agent run from clean checkout to merged change.
- Patch harness gaps immediately.
- Re-run audit.
- Keep ,
AGENTS.md, and architecture docs aligned with current behavior.PLANS.md
- 观察从干净检出到合并变更的完整Agent运行流程。
- 立即修复Harness存在的差距。
- 重新运行审计。
- 保持、
AGENTS.md和架构文档与当前行为一致。PLANS.md
Adaptation Rules
适配规则
- Preserve existing project conventions and replace templates incrementally.
- Do not overwrite user-authored files without explicit approval.
- Keep command names stable; change internals behind wrappers.
- Favor deterministic, scriptable workflows over ad-hoc interactive steps.
- 保留现有项目约定,逐步替换模板。
- 未经明确批准,不得覆盖用户编写的文件。
- 保持命令名称稳定;在包装器后修改内部实现。
- 优先选择可确定、可脚本化的工作流,而非临时的交互式步骤。