harness-engineering-playbook

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Harness Engineering Playbook

Harness Engineering操作手册

Use this skill to operationalize the practices from OpenAI's Harness Engineering guide in a repo that agents can run against repeatedly and safely.
使用本技能将OpenAI Harness Engineering指南中的实践落地到代码仓库中,让Agent可以重复、安全地在仓库中运行。

What To Load

需要加载的资源

  • Use
    references/openai-harness-practices.md
    for the full practice-to-artifact mapping.
  • Use
    references/rollout-checklist.md
    for phased adoption in active repos.
  • Use
    references/wizard-cli.md
    for Typer wizard command flows.
  • Use
    assets/templates/
    when creating or updating harness files.
  • 参考
    references/openai-harness-practices.md
    获取完整的实践与产出物映射关系。
  • 参考
    references/rollout-checklist.md
    在活跃仓库中分阶段采用这些实践。
  • 参考
    references/wizard-cli.md
    了解Typer向导命令流程。
  • 创建或更新Harness文件时使用
    assets/templates/
    中的模板。

Inputs

输入信息

  • Target repository path.
  • Existing command surface (
    make
    ,
    npm
    ,
    cargo
    ,
    pytest
    , etc.).
  • Existing CI workflows and branch protections.
  • 目标代码仓库路径。
  • 现有命令体系(
    make
    npm
    cargo
    pytest
    等)。
  • 现有CI工作流和分支保护规则。

Workflow

工作流程

  1. Baseline the repo and detect existing workflows.
  2. Bootstrap harness artifacts and templates.
  3. Apply all nine Harness Engineering practices.
  4. Run harness audit checks and repair gaps.
  5. Iterate after real agent runs.
  1. 为仓库建立基线并检测现有工作流。
  2. 初始化Harness产出物和模板。
  3. 应用全部9项Harness Engineering实践。
  4. 运行Harness审计检查并修复存在的差距。
  5. 在实际Agent运行后进行迭代优化。

Step 1: Baseline The Repo

步骤1:为仓库建立基线

  • Identify language/toolchain and canonical entrypoints.
  • Inventory existing checks, scripts, and CI jobs.
  • Record current pain points for agent runs: setup drift, unclear docs, flaky tests, missing trace IDs, slow loops.
Use a short baseline note inside
PLANS.md
so decisions remain durable.
  • 识别语言/工具链和标准入口点。
  • 盘点现有的检查、脚本和CI任务。
  • 记录当前Agent运行存在的痛点:环境配置漂移、文档不清晰、测试不稳定、缺少跟踪ID、循环耗时过长。
PLANS.md
中添加简短的基线说明,确保决策的持久性。

Step 2: Bootstrap Harness Artifacts

步骤2:初始化Harness产出物

Preferred entrypoint:
bash
python3 scripts/harness_wizard.py init <repo-path> --profile control
Profiles:
  • baseline
    : only core harness artifacts.
  • control
    : baseline + control-system primitives.
  • full
    : control + entropy controls (nightly audit + entropy checks).
Direct shell fallback:
Run:
bash
./scripts/bootstrap_harness.sh <repo-path>
This script installs safe defaults from
assets/templates/
:
  • AGENTS.md
  • PLANS.md
  • docs/ARCHITECTURE.md
  • docs/OBSERVABILITY.md
  • Makefile.harness
    (+
    -include Makefile.harness
    in
    Makefile
    )
  • scripts/audit_harness.sh
  • scripts/harness/{smoke,test,lint,typecheck}.sh
  • .github/workflows/harness.yml
By default, existing files are not overwritten. Pass
--force
to replace template-managed files.
推荐入口点:
bash
python3 scripts/harness_wizard.py init <repo-path> --profile control
配置文件选项:
  • baseline
    :仅包含核心Harness产出物。
  • control
    :基线内容+控制系统原语。
  • full
    :控制系统内容+熵控制(夜间审计+熵检查)。
备用Shell命令:
运行:
bash
./scripts/bootstrap_harness.sh <repo-path>
该脚本会从
assets/templates/
中安装安全默认配置:
  • AGENTS.md
  • PLANS.md
  • docs/ARCHITECTURE.md
  • docs/OBSERVABILITY.md
  • Makefile.harness
    (需在
    Makefile
    中添加
    -include Makefile.harness
  • scripts/audit_harness.sh
  • scripts/harness/{smoke,test,lint,typecheck}.sh
  • .github/workflows/harness.yml
默认不会覆盖现有文件。传递
--force
参数可替换模板管理的文件。

Step 3: Apply The Nine Practices

步骤3:应用9项实践

Implement each practice directly in repo artifacts.
在仓库产出物中直接实施每项实践。

1. Make Easy To Do Hard Thing

1. 让复杂任务易于执行

  • Ensure hard, high-value tasks are one command away (
    make smoke
    ,
    make check
    ,
    make ci
    ).
  • Keep setup and cleanup scripted.
  • Make smoke checks cheap enough for frequent use.
  • 确保高价值的复杂任务只需一条命令即可完成(如
    make smoke
    make check
    make ci
    )。
  • 将环境搭建和清理过程脚本化。
  • 让冒烟测试足够轻量化,便于频繁使用。

2. Communicate Actionable Constraints With Compact Docs

2. 通过简洁文档传达可执行的约束条件

  • Keep
    AGENTS.md
    short, concrete, and command-first.
  • Document non-obvious constraints and guardrails.
  • Keep docs close to code and update with behavior changes.
  • 保持
    AGENTS.md
    简洁、具体,以命令为核心。
  • 记录非显而易见的约束条件和防护规则。
  • 让文档贴近代码,并随行为变更同步更新。

3. Structure Codebase With Strict Boundaries And Flow

3. 用严格的边界和流程构建代码库

  • Define module boundaries in
    docs/ARCHITECTURE.md
    .
  • Parse and validate data at boundaries; use typed contracts for internal flow.
  • Prefer one abstraction per module and one clear ownership path.
  • docs/ARCHITECTURE.md
    中定义模块边界。
  • 在边界处解析并验证数据;内部流程使用类型化契约。
  • 每个模块优先对应一个抽象,且有清晰的归属路径。

4. Build Observability In From Day 1

4. 从项目初期就构建可观测性

  • Emit structured logs/events with correlation IDs.
  • Capture key transitions in long-running workflows.
  • Define minimum observable fields in
    docs/OBSERVABILITY.md
    .
  • 生成包含关联ID的结构化日志/事件。
  • 捕获长运行工作流中的关键转换节点。
  • docs/OBSERVABILITY.md
    中定义最小可观测字段。

5. Optimize For Agent Flow, Not Human Flow

5. 针对Agent工作流而非人类工作流优化

  • Treat context as a first-class system dependency.
  • Use
    PLANS.md
    for multi-step/multi-hour tasks.
  • Front-load durable context (scope, constraints, checkpoints) so restarts stay cheap.
  • 将上下文视为一等系统依赖。
  • 使用
    PLANS.md
    管理多步骤/长时间运行的任务。
  • 前置持久化上下文(范围、约束、检查点),让重启成本更低。

6. Bring Your Own Harness

6. 自定义Harness

  • Standardize repo-local wrappers (
    Makefile.harness
    ,
    scripts/harness/
    ).
  • Wrap local infra actions in deterministic scripts.
  • Make agent behavior reproducible across machines and runs.
  • 标准化仓库本地的包装器(
    Makefile.harness
    scripts/harness/
    )。
  • 用可确定的脚本封装本地基础设施操作。
  • 让Agent的行为在不同机器和运行场景下可复现。

7. Prototype In Natural Language First

7. 先使用自然语言原型设计

  • Draft logic and tests in prose before coding.
  • Review edge cases in prose and lock acceptance criteria.
  • Translate approved prose into code and tests.
  • 在编码前先用散文式描述草拟逻辑和测试。
  • 在描述中评审边缘案例并锁定验收标准。
  • 将通过评审的描述转换为代码和测试。

8. Invest In Static Analysis And Linting

8. 投入静态分析与代码检查

  • Pin formatter/linter/typechecker versions where practical.
  • Enforce checks in both local workflow and CI.
  • Run static checks before long tests to shorten failure loops.
  • 尽可能固定格式化工具/代码检查工具/类型检查工具的版本。
  • 在本地工作流和CI中都强制执行检查。
  • 在长耗时测试前运行静态检查,缩短失败反馈周期。

9. Manage Entropy

9. 管理熵

  • Add periodic audits for docs drift, flaky checks, and dead scripts.
  • Keep templates synchronized with real workflows.
  • Remove stale abstractions quickly to keep agent context clean.
For a detailed artifact matrix, load
references/openai-harness-practices.md
.
  • 添加定期审计,检查文档漂移、不稳定的检查项和废弃脚本。
  • 保持模板与实际工作流同步。
  • 快速移除过时的抽象,保持Agent上下文简洁。
如需详细的产出物对应关系,请加载
references/openai-harness-practices.md

Step 4: Validate

步骤4:验证

Run:
bash
python3 scripts/harness_wizard.py audit <repo-path>
Treat any
MISSING
or
FAIL
result as blocking before calling harness setup complete.
运行:
bash
python3 scripts/harness_wizard.py audit <repo-path>
在完成Harness设置前,任何
MISSING
FAIL
结果都视为阻塞项。

Step 5: Iterate On Real Runs

步骤5:基于实际运行迭代优化

  • Observe one full agent run from clean checkout to merged change.
  • Patch harness gaps immediately.
  • Re-run audit.
  • Keep
    AGENTS.md
    ,
    PLANS.md
    , and architecture docs aligned with current behavior.
  • 观察从干净检出到合并变更的完整Agent运行流程。
  • 立即修复Harness存在的差距。
  • 重新运行审计。
  • 保持
    AGENTS.md
    PLANS.md
    和架构文档与当前行为一致。

Adaptation Rules

适配规则

  • Preserve existing project conventions and replace templates incrementally.
  • Do not overwrite user-authored files without explicit approval.
  • Keep command names stable; change internals behind wrappers.
  • Favor deterministic, scriptable workflows over ad-hoc interactive steps.
  • 保留现有项目约定,逐步替换模板。
  • 未经明确批准,不得覆盖用户编写的文件。
  • 保持命令名称稳定;在包装器后修改内部实现。
  • 优先选择可确定、可脚本化的工作流,而非临时的交互式步骤。