harness

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Harness Architecture

Agent Harness架构

An agent's context window is its working memory — finite and precious. The craft of harness programming is migrating the right information to the right context layer, so the agent always has enough awareness to make good decisions without drowning in details it doesn't yet need.
Two concerns, one discipline: context architecture (what the agent knows) and agent lifecycle (how the agent works across time). They meet at artifacts — an artifact is both information (context) and a mechanism for continuity (lifecycle).
Agent的上下文窗口是它的工作内存——容量有限且十分宝贵。Harness编程的核心技巧是将正确的信息迁移到合适的上下文层级,让Agent始终拥有足够的认知来做出合理决策,同时不会被当前不需要的细节淹没。
我们需要关注两个核心问题,它们同属一个领域:上下文架构(Agent知晓的内容)和Agent生命周期(Agent如何随时间开展工作)。二者通过工件(Artifact)关联——工件既是信息载体(上下文),也是实现连续性的机制(生命周期)。

Commands

命令

When invoked with an argument, dispatch to the corresponding file:
  • /harness audit
    → Read and follow
    commands/audit.md
    in this skill directory. Evaluate an existing project's context architecture and suggest improvements.
  • /harness init
    → Read and follow
    commands/init.md
    in this skill directory. First-time project setup — bootstrap a project's harness from scratch.
  • No argument → Continue with the methodology below.

当携带参数调用时,将分发到对应文件执行:
  • /harness audit
    → 读取并遵循本Skill目录下的
    commands/audit.md
    文件。评估现有项目的上下文架构并提出改进建议。
  • /harness init
    → 读取并遵循本Skill目录下的
    commands/init.md
    文件。首次项目设置——从零开始引导搭建项目的Harness。
  • 无参数 → 继续遵循以下方法执行。

Part I: Context Architecture

第一部分:上下文架构

How to structure what the agent knows.
如何构建Agent的知识体系。

The Three Layers

三个层级

Every piece of information an agent might need belongs at one of three abstraction levels:
┌─────────────────────────────────────────────────────┐
│  L1  Architecture                                   │
│  System shape, boundaries, invariants, principles   │
│  Always in context. Small, stable, high-leverage.   │
│  ≈ 100–500 tokens per artifact                      │
├─────────────────────────────────────────────────────┤
│  L2  Design                                         │
│  Patterns, mechanisms, approach, task plan           │
│  Loaded on activation. The working blueprint.       │
│  ≈ 1000–5000 tokens per artifact                    │
├─────────────────────────────────────────────────────┤
│  L3  Implementation                                 │
│  Concrete code, scripts, reference data, examples   │
│  Loaded on demand. The raw material.                │
│  Size varies — only what's needed right now          │
└─────────────────────────────────────────────────────┘
The higher the layer, the smaller and more stable it is. L1 gives the agent orientation. L2 gives it a plan. L3 gives it the details to execute.
The key insight: most harness problems come from layer violations — L3 details polluting L1 (bloated CLAUDE.md full of implementation notes), or L1 context missing entirely (agent has no architectural awareness and makes decisions that break system boundaries).
Agent可能需要的每一条信息都属于以下三个抽象层级之一:
┌─────────────────────────────────────────────────────┐
│  L1  架构层                                        │
│  系统形态、边界、不变量、设计原则                  │
│  始终处于上下文环境中。内容精简、稳定、价值密度高。│
│  每个工件约100–500 tokens                          │
├─────────────────────────────────────────────────────┤
│  L2  设计层                                        │
│  模式、机制、实现方案、任务计划                    │
│  激活时加载。作为工作蓝图使用。                    │
│  每个工件约1000–5000 tokens                        │
├─────────────────────────────────────────────────────┤
│  L3  实现层                                        │
│  具体代码、脚本、参考数据、示例                    │
│  按需加载。作为原始素材使用。                      │
│  大小不固定——仅加载当前所需内容                    │
└─────────────────────────────────────────────────────┘
层级越高,内容越精简、越稳定。L1为Agent提供方向指引,L2为其提供执行计划,L3为其提供具体执行的细节。
核心洞察:大多数Harness问题源于层级违规——L3的细节混入L1(例如CLAUDE.md中充斥着实现说明,导致内容臃肿),或者完全缺失L1上下文(Agent缺乏架构认知,做出的决策会破坏系统边界)。

Mapping Artifacts to Layers

工件与层级的映射

L1 (always present)          L2 (on activation)         L3 (on demand)
─────────────────────        ──────────────────────      ──────────────
CLAUDE.md                    Skill body (SKILL.md)       scripts/
Skill metadata               design/DESIGN.md            references/
  (name + description)       blueprints/                 assets/
Hook triggers                Task plans                  Code files
Project-level invariants     Decision records            Test fixtures
L1(始终加载)                L2(激活时加载)           L3(按需加载)
─────────────────────        ──────────────────────      ──────────────
CLAUDE.md                    Skill主体(SKILL.md)       scripts/
Skill元数据                  design/DESIGN.md            references/
  (名称+描述)               blueprints/                 assets/
Hook触发器                   任务计划                    代码文件
项目级不变量                 决策记录                    测试夹具

CLAUDE.md — the L1 anchor

CLAUDE.md —— L1的核心锚点

CLAUDE.md is the most critical L1 artifact. It's always loaded, so every token must earn its place. A good CLAUDE.md contains:
  • What this system is — one sentence
  • How to build/test/run — the commands, nothing more
  • Architectural shape — module boundaries, data flow, key patterns (or a pointer to design/ if using design-driven)
  • Non-obvious conventions — things the agent can't derive from code
A bad CLAUDE.md contains: file-by-file breakdowns (agent can read the tree), generic best practices (agent already knows), implementation details that change frequently (belongs in L2/L3).
Litmus test: if removing a line from CLAUDE.md wouldn't cause the agent to make a worse architectural decision, the line doesn't belong.
CLAUDE.md是最关键的L1工件,它始终处于加载状态,因此每一个token都必须有存在的价值。优质的CLAUDE.md应包含:
  • 系统定位:一句话描述系统用途
  • 构建/测试/运行方式:仅保留命令,无额外内容
  • 架构形态:模块边界、数据流、核心模式(若采用设计驱动开发,可指向design/目录)
  • 非显性约定:Agent无法从代码中推导的规则
劣质的CLAUDE.md会包含:逐文件的详细说明(Agent可自行读取目录结构)、通用最佳实践(Agent已内置相关知识)、频繁变更的实现细节(应放在L2/L3层级)。
验证测试:如果从CLAUDE.md中删除某一行,不会导致Agent做出更差的架构决策,那么这一行就不属于这里。

Skills — L1 metadata, L2 body, L3 files

Skills —— 跨三层的载体

A skill naturally spans all three layers:
  • L1:
    name
    +
    description
    in frontmatter (~100 tokens). Loaded at startup for all installed skills. This is how the agent decides whether to activate a skill — make it precise.
  • L2: The markdown body of SKILL.md (<5000 tokens). Loaded when activated. Contains the methodology, the loop, the principles.
  • L3: Supporting files (commands/, scripts/, references/). Loaded only when the skill dispatches to them.
Keep SKILL.md under 500 lines. If it's longer, something belongs in L3.
一个Skill天然覆盖所有三个层级:
  • L1:前文中的
    name
    +
    description
    (约100 tokens)。所有已安装的Skill在启动时都会加载这部分内容。这是Agent判断是否激活该Skill的依据——描述需精准。
  • L2:SKILL.md的Markdown主体内容(<5000 tokens)。激活Skill时加载,包含方法论、循环流程、设计原则。
  • L3:支持文件(commands/、scripts/、references/)。仅当Skill分发任务时才会加载。
SKILL.md的内容应控制在500行以内。如果超出,说明部分内容应移至L3层级。

Context Principles

上下文设计原则

Smallest effective context — Every token in L1 competes with the agent's working space for the current task. Write L1 artifacts ruthlessly — include only what changes the agent's decisions. Details that are nice-to-know but don't affect judgment belong in L2 or L3.
Stable layers, volatile details — L1 should change rarely (project architecture doesn't shift daily). L2 changes per-task (each blueprint is different). L3 changes constantly (code evolves). If you find yourself updating CLAUDE.md frequently, the information probably belongs at a lower layer.
Pointers over content — When L1 needs to reference complex information, point to it rather than inlining it. "See design/DESIGN.md for module boundaries" is better than copying the module list into CLAUDE.md. The agent loads L2/L3 when needed.
最小有效上下文:L1中的每一个token都会占用Agent当前任务的工作空间。编写L1工件时要极其精简——仅保留会影响Agent决策的内容。那些“值得了解”但不影响判断的细节应放在L2或L3层级。
层级稳定,细节易变:L1应极少变更(项目架构不会每天都变)。L2随任务变更(每个蓝图都不同)。L3则频繁变更(代码持续演进)。如果发现自己需要频繁更新CLAUDE.md,说明相关信息可能属于更低的层级。
指针优先于内容:当L1需要引用复杂信息时,应指向对应位置而非直接内联。例如“详见design/DESIGN.md中的模块边界说明”,比将模块列表复制到CLAUDE.md中更优。Agent会在需要时加载L2/L3的内容。

Diagnosing Layer Problems

层级问题诊断

SymptomLikely causeFix
Agent forgets project architecture mid-taskL1 too thin or missingAdd architectural context to CLAUDE.md
Agent drowns in context, slow responsesL1 too thick — L3 details leaking upAudit CLAUDE.md, move details to L2/L3 files
Agent breaks module boundariesNo design docs or CLAUDE.md lacks boundariesAdd design/ or architectural section to CLAUDE.md
Agent loads unnecessary filesSkill body has too many inline referencesSplit into supporting files, load on demand
Agent repeats same mistakesMissing hook or missing L1 principleAdd a hook (mechanical) or CLAUDE.md rule (judgment)

症状可能原因修复方案
Agent在任务执行中途忘记项目架构L1内容过于单薄或缺失为CLAUDE.md添加架构上下文
Agent被过多上下文淹没,响应缓慢L1内容过于臃肿——L3细节向上泄漏审计CLAUDE.md,将细节移至L2/L3文件
Agent破坏模块边界缺少设计文档或CLAUDE.md未定义边界添加design/目录或在CLAUDE.md中补充架构边界说明
Agent加载不必要的文件Skill主体包含过多内联引用拆分到支持文件中,按需加载
Agent重复犯相同错误缺少Hook或L1原则缺失添加Hook(机械校验)或在CLAUDE.md中补充规则(判断依据)

Part II: Agent Lifecycle

第二部分:Agent生命周期

How the agent works across time.
Agent如何随时间开展工作。

Succession over persistence

传承优于持久化

Every agent instance is ephemeral — it lives for one session, then its context is gone. Don't fight this. Design for succession: knowledge survives through artifacts, not through any single agent's memory.
The unit of continuity is the artifact chain, not the agent instance. L1 and L2 artifacts (CLAUDE.md, design docs, blueprints) are the institutional memory that outlives every session. Commit messages are the archaeological record. Blueprint State sections are handoff documents from one generation to the next. Verification criteria are how the next generation trusts the previous one's work.
To give an "agent" a longer effective lifecycle, don't extend the session — raise the abstraction level. An agent operating at L1 (architecture) spans the lifetime of the project. An agent operating at L3 (implementation details) lives and dies within one task. The layers aren't just about context efficiency — they're about temporal scope.
每个Agent实例都是临时的——仅在一个会话中存在,会话结束后上下文即消失。无需抗拒这一点,应设计为传承模式:知识通过工件留存,而非依赖单个Agent实例的记忆。
连续性的核心单元是工件链,而非Agent实例。L1和L2工件(CLAUDE.md、设计文档、蓝图)是超越单个会话的机构记忆。提交信息是考古记录,蓝图状态部分是不同会话间的交接文档,验证标准是后续会话信任前期工作的依据。
要延长Agent的有效生命周期,无需延长会话时长——应提升抽象层级。在L1(架构层)运行的Agent可覆盖项目的整个生命周期,而在L3(实现层)运行的Agent仅能完成单个任务。层级不仅关乎上下文效率,还关乎时间范围。

One task, one context

单任务单上下文

A single task should fit within one context window. If it can't, it's two tasks. This is the fundamental unit of agent work — each task gets a focused context with only the information it needs, preventing earlier work from polluting later decisions. When scoping tasks, ask: can the agent complete this without its context degrading?
单个任务应能容纳在一个上下文窗口中。如果无法容纳,说明这是两个任务。这是Agent工作的基本单元——每个任务都有聚焦的上下文,仅包含所需信息,避免前期工作干扰后续决策。在规划任务时,应自问:Agent能否在不降低上下文质量的前提下完成该任务?

Hooks — lifecycle guardrails

Hooks —— 生命周期的防护栏

Hooks shape agent behavior from outside the context window — always active, zero-cost in tokens. Two flavors:
  • Prompt hooks — inject a reminder, let the agent apply judgment. Best for checks that need context awareness (layer integrity, consistency, architectural boundaries).
  • Script hooks — run a command, pass or block mechanically. Best for checks that don't need judgment (linting, format validation, forbidden patterns).
Hooks在上下文窗口之外约束Agent的行为——始终激活,且不占用token。分为两类:
  • Prompt Hooks:注入提醒,让Agent自主判断。最适合需要上下文感知的校验(层级完整性、一致性、架构边界)。
  • Script Hooks:执行命令,机械性地通过或阻止操作。最适合无需判断的校验(代码检查、格式验证、禁用模式)。

Consistency after change

变更后的一致性维护

When you change something that other files reference — a path, a name, a term, a structure — check every file that depends on it. Stale references are a common failure mode: you rename a directory but leave old paths in SKILL.md, change a convention but leave the old wording in CLAUDE.md. A prompt hook that reminds "did you update everything that references what you just changed?" is one of the highest-value hooks you can add to a project.

当修改其他文件引用的内容(路径、名称、术语、结构)时,需检查所有依赖该内容的文件。过时引用是常见的失败模式:例如重命名目录后,SKILL.md中仍保留旧路径;修改约定后,CLAUDE.md中仍使用旧表述。添加一个Prompt Hook,提醒Agent“你是否更新了所有引用你刚修改内容的文件?”,这是能为项目添加的最高价值Hook之一。

Meta-principle

元原则

Understand why, not just what

知其然,更知其所以然

An agent that understands the reasoning behind a constraint exercises better judgment in novel situations than one following a rigid rule. When writing any harness artifact, explain the why — it costs a few extra tokens but compounds into better decisions across every task.
"If we want models to exercise good judgment across a wide range of novel situations, they need to be able to generalize — to apply broad principles rather than mechanically following specific rules." — Anthropic's constitution
理解约束背后原因的Agent,在面对新场景时能做出更优判断,而非机械遵循刚性规则。编写任何Harness工件时,都应解释原因——虽然会多占用几个token,但能在所有任务中持续带来更优的决策。
“如果我们希望模型在广泛的新场景中做出良好判断,它们需要具备泛化能力——应用通用原则,而非机械遵循具体规则。” —— Anthropic的准则