agent-readiness

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Agent-Readiness

Agent就绪性

Make a repo ready for autonomous agent work.
使代码仓库为Agent自主工作做好准备。

Principles

原则

  • Environment > instruction — infrastructure matters more than the prompt
  • Mechanical enforcement > prose — hooks, CI, health checks, and scripts beat wishes
  • Separate builder from judge
    agent-readiness
    builds the rig,
    verify
    proves your own change,
    review
    critiques existing code
  • Real behavior > mocked confidence — smoke, integration, and e2e checks beat large suites that mostly mock the seam under test
  • Smallest useful layer first — add layers in order, stop when the repo becomes reliably verifiable
  • Progressive disclosure — keep the core workflow here, load patterns on demand
  • 环境优先于指令——基础设施比提示词更重要
  • 机械执行优先于文字说明——钩子、CI、健康检查和脚本比主观期望更有效
  • 构建者与评审者分离——
    agent-readiness
    负责搭建基础架构,
    verify
    用于验证自身变更,
    review
    用于评审现有代码
  • 真实行为优先于模拟假象——smoke测试、集成测试和e2e测试优于大量仅模拟被测接口的测试套件
  • 先构建最小可用层——按顺序添加层级,当代码仓库可被可靠验证时即可停止
  • 渐进式披露——核心工作流保留在此,按需加载模式

Handoffs

交接场景

  • Need to review existing code, a diff, branch, or PR → use
    review
  • Need to prove your own completed change works on real surfaces → use
    verify
  • Need to update AGENTS.md, README.md, specs, or repo docs → use
    docs
  • 需要评审现有代码、代码差异、分支或PR → 使用
    review
  • 需要证明自身完成的变更在真实环境中有效 → 使用
    verify
  • 需要更新AGENTS.md、README.md、规格说明或仓库文档 → 使用
    docs

The 7-Layer Stack

七层架构

  1. Boot — single command starts the app
  2. Smoke — a fast proof the app is alive
  3. Interact — agent can exercise the real surface
  4. E2e — key user flows work end to end
  5. Enforce — hooks, CI gates, lint rules, or mechanical checks
  6. Observe — logs, health endpoints, traces, machine-readable signals
  7. Isolate — worktrees or containers do not collide
Concrete examples:
  • Boot:
    pnpm dev
    ,
    cargo run
    , or
    docker compose up
  • Smoke:
    curl http://127.0.0.1:3000/health
  • Interact/E2e:
    pnpm exec playwright test
  • Observe: structured logs or a machine-readable health endpoint
  1. 启动层——单命令启动应用
  2. Smoke测试层——快速验证应用是否存活
  3. 交互层——Agent可操作真实环境
  4. E2e测试层——关键用户流程端到端可用
  5. 强制层——钩子、CI门禁、代码规范或机械检查
  6. 可观测层——日志、健康端点、链路追踪、机器可读信号
  7. 隔离层——工作树或容器互不干扰
具体示例:
  • 启动:
    pnpm dev
    ,
    cargo run
    , or
    docker compose up
  • Smoke测试:
    curl http://127.0.0.1:3000/health
  • 交互/E2e测试:
    pnpm exec playwright test
  • 可观测:结构化日志或机器可读的健康端点

Workflow

工作流程

1. Audit

1. 审计

Grade the repo across these dimensions:
  • bootable
  • testable
  • observable
  • verifiable
For each, report:
  • status:
    pass
    /
    partial
    /
    fail
  • evidence: file or command
  • gap: what is missing
Use references/grading.md. Lowest dimension sets the overall grade.
Example output:
text
bootable: partial — `pnpm dev` starts the app after manual env setup
testable: fail — only mocked tests under test/
observable: partial — health endpoint exists, structured logs missing
verifiable: fail — no stable smoke or interaction script
overall grade: D
从以下维度对代码仓库进行评级:
  • 可启动性
  • 可测试性
  • 可观测性
  • 可验证性
针对每个维度,报告:
  • 状态:
    pass
    (通过)/
    partial
    (部分通过)/
    fail
    (失败)
  • 证据:文件或命令
  • 差距:缺失内容
参考references/grading.md。最低维度的评级即为整体评级。
示例输出:
text
bootable: partial — `pnpm dev` starts the app after manual env setup
testable: fail — only mocked tests under test/
observable: partial — health endpoint exists, structured logs missing
verifiable: fail — no stable smoke or interaction script
overall grade: D

2. Setup

2. 搭建

Build missing layers in this order:
Boot → Smoke → Interact → E2e → Enforce → Observe → Isolate
Each step should be independently useful. Stop once the repo is reliably verifiable; do not build a cathedral because you got excited.
When readiness work includes agent entrypoints, keep
AGENTS.md
as the canonical authored guide and place
CLAUDE.md
beside it as a symlink to
AGENTS.md
rather than maintaining two separate guidance files.
Boot — create a single-command entry point:
bash
#!/usr/bin/env bash
set -euo pipefail
<your-boot-command> &
APP_PID=$!
for i in $(seq 1 30); do
  curl -sf http://localhost:${PORT:-3000}/health > /dev/null 2>&1 && break
  sleep 1
done
curl -sf http://localhost:${PORT:-3000}/health > /dev/null 2>&1 || {
  echo "ERROR: App failed to start"; kill $APP_PID 2>/dev/null; exit 1
}
echo "App is ready"
Smoke — fast proof the app is alive (< 5 seconds):
bash
curl -sf http://localhost:3000/health | jq .   # HTTP service
./dist/my-cli --version                         # CLI tool
npx playwright test smoke.spec.ts               # UI app
Enforce — pre-push hook to catch failures before CI:
bash
#!/usr/bin/env bash
按以下顺序构建缺失的层级:
启动层 → Smoke测试层 → 交互层 → E2e测试层 → 强制层 → 可观测层 → 隔离层
每个步骤都应具备独立效用。当代码仓库可被可靠验证时即可停止,不要因一时兴起过度构建。
当就绪性工作包含Agent入口点时,将
AGENTS.md
作为标准权威指南,在其旁创建指向
AGENTS.md
的符号链接
CLAUDE.md
,而非维护两个独立的指南文件。
启动层——创建单命令入口:
bash
#!/usr/bin/env bash
set -euo pipefail
<your-boot-command> &
APP_PID=$!
for i in $(seq 1 30); do
  curl -sf http://localhost:${PORT:-3000}/health > /dev/null 2>&1 && break
  sleep 1
done
curl -sf http://localhost:${PORT:-3000}/health > /dev/null 2>&1 || {
  echo "ERROR: App failed to start"; kill $APP_PID 2>/dev/null; exit 1
}
echo "App is ready"
Smoke测试层——快速验证应用是否存活(耗时<5秒):
bash
curl -sf http://localhost:3000/health | jq .   # HTTP service
./dist/my-cli --version                         # CLI tool
npx playwright test smoke.spec.ts               # UI app
强制层——预推送钩子,在CI之前捕获问题:
bash
#!/usr/bin/env bash

.git-hooks/pre-push

.git-hooks/pre-push

set -euo pipefail <your-lint-command> <your-smoke-command>

See [references/setup-patterns.md](references/setup-patterns.md) for e2e, observability, isolation, and containerized stack patterns.
set -euo pipefail <your-lint-command> <your-smoke-command>

关于e2e测试、可观测性、隔离机制和容器化架构模式,请参考[references/setup-patterns.md](references/setup-patterns.md)。

3. Improve

3. 优化

Tighten weak or flaky layers:
  • remove mock-only confidence theater
  • prefer smoke, integration, and e2e checks over mock-heavy suites that self-verify implementation details
  • replace one-off checks with reusable scripts or hooks
  • add dead-code or unused-symbol enforcement where the stack supports it
  • add logs and health signals agents can query
  • make parallel work safe when agent collisions are real
加固薄弱或不稳定的层级:
  • 移除仅用于营造假象的纯模拟测试
  • 优先选择smoke测试、集成测试和e2e测试,而非大量仅验证实现细节的重模拟测试套件
  • 用可复用的脚本或钩子替代一次性检查
  • 在架构支持的情况下,添加死代码或未使用符号的检查机制
  • 添加Agent可查询的日志和健康信号
  • 当Agent存在冲突风险时,确保并行工作的安全性

4. Hand Off

4. 交接

When the repo reaches C+ and can be judged honestly, hand off to
verify
or
review
. If changes created doc drift, hand off to
docs
.
当代码仓库达到C+评级且可被客观评审时,交接给
verify
review
。 如果变更导致文档不一致,交接给
docs

Anti-Patterns

反模式

  • Mock-only tests — pass by construction, verify nothing
  • Mock-heavy unit suites as the main proof — agents love them because they are easy to satisfy, not because they prove the system works
  • Self-evaluation — builder grading its own work
  • Docs-only fixes disguised as readiness work
  • Routine PR review here — that's
    review
  • Perfect infrastructure upfront — iterate from real failure modes
  • 纯模拟测试——通过构造逻辑通过测试,无法验证任何实际功能
  • 以重模拟单元测试套件作为主要验证依据——Agent喜欢这类测试是因为容易满足,而非它们能证明系统正常工作
  • 自我评估——构建者给自己的工作评级
  • 伪装成就绪性工作的仅文档修复
  • 常规PR评审——这属于
    review
    的职责范围
  • 预先构建完美基础设施——应从实际失败模式出发逐步迭代

Output

输出要求

After readiness work, report:
  • grade before and after
  • dimensions with evidence
  • files changed
  • remaining gaps ranked by impact
  • verify readiness
  • recommended next handoff:
    verify
    ,
    review
    ,
    docs
    , or human review
就绪性工作完成后,需报告:
  • 工作前后的评级
  • 各维度及对应证据
  • 变更的文件
  • 按影响程度排序的剩余差距
  • 验证就绪状态
  • 推荐的下一步交接对象:
    verify
    review
    docs
    或人工评审

References

参考资料

  • references/grading.md — agent-readiness grading scale with mechanical criteria
  • references/setup-patterns.md — boot, smoke, e2e, observability, and isolation patterns
  • references/industry-examples.md — external patterns and justification for readiness investment
  • references/grading.md — 基于机械标准的Agent就绪性评级量表
  • references/setup-patterns.md — 启动、Smoke测试、E2e测试、可观测性和隔离机制模式
  • references/industry-examples.md — 外部行业模式及就绪性投入的合理性说明