autonomous-agent-readiness

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Autonomous Agent Readiness Assessment

自主Agent就绪性评估

Evaluate a codebase against proven patterns for autonomous agent execution and provide tailored recommendations.
针对自主Agent执行的成熟模式评估代码库,并提供定制化建议。

Core Philosophy

核心理念

Most agent failures are system design failures, not model failures. An agent that requires human approval at every step or depends on a developer's laptop being open is not autonomous. Autonomy is an infrastructure decision.
大多数Agent故障是系统设计故障,而非模型故障。每一步都需要人工批准或依赖开发者笔记本保持开机状态的Agent并非真正自主。自主性是一个基础设施层面的决策。

Assessment Workflow

评估流程

Phase 1: Discovery

阶段1:发现

Gather information about the project's current state:
  1. Examine project structure
    • Look for CI/CD configuration (
      .github/workflows/
      ,
      Jenkinsfile
      ,
      .gitlab-ci.yml
      )
    • Check for containerization (
      Dockerfile
      ,
      docker-compose.yml
      ,
      devcontainer.json
      )
    • Identify test infrastructure (
      tests/
      ,
      __tests__/
      , test config files)
    • Find environment management (
      .env.example
      ,
      requirements.txt
      ,
      package.json
      )
  2. Review development workflow
    • Read contributing guidelines, README, or developer docs
    • Check for sandbox/isolation patterns
    • Look for database setup scripts or fixtures
    • Identify how dependencies are managed
  3. Assess current automation
    • Review existing CI/CD pipelines
    • Check for automated testing patterns
    • Look for environment provisioning scripts
    • Identify cleanup/teardown procedures
收集项目当前状态的相关信息:
  1. 检查项目结构
    • 查找CI/CD配置文件(
      .github/workflows/
      Jenkinsfile
      .gitlab-ci.yml
    • 检查容器化相关文件(
      Dockerfile
      docker-compose.yml
      devcontainer.json
    • 识别测试基础设施(
      tests/
      __tests__/
      、测试配置文件)
    • 找到环境管理相关文件(
      .env.example
      requirements.txt
      package.json
  2. 回顾开发工作流
    • 阅读贡献指南、README或开发者文档
    • 检查沙箱/隔离模式
    • 查找数据库设置脚本或测试数据
    • 识别依赖管理方式
  3. 评估当前自动化水平
    • 回顾现有CI/CD流水线
    • 检查自动化测试模式
    • 查找环境配置脚本
    • 识别清理/销毁流程

Phase 2: Evaluate Against Principles

阶段2:基于原则评估

Score the project (0-3) on each dimension. See
references/assessment-criteria.md
for detailed rubrics.
DimensionWhat to Look For
Sandbox IsolationEphemeral environments, container support, clean state per run
Database IndependenceLocal DB setup, migrations in code, no external DB dependencies
Environment ReproducibilityExplicit dependencies, no hidden state, deterministic setup
Session IndependenceRemote execution capability, no user session dependencies
Outcome-Oriented DesignClear acceptance criteria, minimal procedural coupling
Direct InterfacesCLI-first tools, OS primitives, minimal abstraction layers
Minimal Framework OverheadSimple interfaces, no heavy orchestration, composable CLI tools
Explicit StateWorkspace directories, file-based artifacts, inspectable logs
BenchmarkingMeasurable quality criteria, automated verification
Cost AwarenessResource limits, usage tracking, explicit provisioning
Verifiable OutputAutomated validation, deterministic results, clear exit codes
Infrastructure-Bounded PermissionsSystem-enforced constraints, least-privilege, no runtime prompts
对每个维度进行评分(0-3分)。详细评分标准请参考
references/assessment-criteria.md
维度评估要点
沙箱隔离临时环境、容器支持、每次运行的干净状态
数据库独立性本地数据库设置、代码中包含迁移脚本、无外部数据库依赖
环境可复现性明确的依赖项、无隐藏状态、确定性的设置流程
会话独立性远程执行能力、无用户会话依赖
结果导向设计清晰的验收标准、最小化过程耦合
直接接口优先CLI工具、操作系统原语、最少抽象层
最小框架开销简单接口、无重型编排、可组合的CLI工具
明确状态工作区目录、基于文件的工件、可检查的日志
基准测试可衡量的质量标准、自动化验证
成本意识资源限制、使用跟踪、明确的资源配置
可验证输出自动化验证、确定性结果、清晰的退出码
基础设施边界权限系统强制执行的约束、最小权限原则、无运行时提示

Phase 3: Generate Recommendations

阶段3:生成建议

For each dimension scoring below 2, provide:
  1. Current state: What exists today
  2. Gap: What's missing for autonomous execution
  3. Recommendation: Specific, actionable improvement
  4. Priority: High/Medium/Low based on impact and effort
Tailor recommendations to the project's:
  • Technology stack
  • Team size and workflow
  • Existing infrastructure
  • Deployment targets
针对每个评分低于2分的维度,提供以下内容:
  1. 当前状态:目前已有的内容
  2. 差距:自主执行所缺失的部分
  3. 建议:具体、可落地的改进措施
  4. 优先级:根据影响和实施难度分为高/中/低
建议需结合项目的以下情况定制:
  • 技术栈
  • 团队规模和工作流
  • 现有基础设施
  • 部署目标

Output Format

输出格式

markdown
undefined
markdown
undefined

Autonomous Agent Readiness Assessment

Autonomous Agent Readiness Assessment

Project: [name]

Project: [name]

Assessment Date: [date]

Assessment Date: [date]

Executive Summary

Executive Summary

[1-2 paragraphs summarizing overall readiness and top priorities]
Overall Readiness Score: X/36 (sum of dimension scores)
[1-2 paragraphs summarizing overall readiness and top priorities]
Overall Readiness Score: X/36 (sum of dimension scores)

Dimension Scores

Dimension Scores

DimensionScoreStatus
Sandbox IsolationX/3[emoji]
Database IndependenceX/3[emoji]
.........
Status: 0-1 = needs work, 2 = adequate, 3 = strong
DimensionScoreStatus
Sandbox IsolationX/3[emoji]
Database IndependenceX/3[emoji]
.........
Status: 0-1 = needs work, 2 = adequate, 3 = strong

Detailed Findings

Detailed Findings

[Dimension Name] (X/3)

[Dimension Name] (X/3)

Current State: [What exists]
Gap: [What's missing]
Recommendation: [Specific action]
Priority: [High/Medium/Low]
[Repeat for each dimension]
Current State: [What exists]
Gap: [What's missing]
Recommendation: [Specific action]
Priority: [High/Medium/Low]
[Repeat for each dimension]

Prioritized Action Plan

Prioritized Action Plan

Immediate (This Week)

Immediate (This Week)

  1. [Highest impact, lowest effort items]
  1. [Highest impact, lowest effort items]

Short-term (This Month)

Short-term (This Month)

  1. [Important foundational changes]
  1. [Important foundational changes]

Medium-term (This Quarter)

Medium-term (This Quarter)

  1. [Larger infrastructure investments]
  1. [Larger infrastructure investments]

Quick Wins

Quick Wins

[2-3 changes that can be made today with minimal effort]
undefined
[2-3 changes that can be made today with minimal effort]
undefined

Key Principles Reference

核心原则参考

Sandbox Everything

万物皆沙箱

Every agent run executes in its own ephemeral, isolated, disposable environment. Clean environment, writable filesystem, command execution, scoped network access. Environment destroyed after verified output.
每个Agent运行都在独立的临时、隔离、可销毁环境中进行。包含干净的环境、可写文件系统、命令执行权限、限定范围的网络访问。在输出验证完成后销毁环境。

No External Databases

无外部数据库

Agents create their own databases inside the sandbox. Install packages on demand, spin up DBs locally, run migrations, seed data explicitly, tear down at end. Reproducible runs without shared state.
Agent在沙箱内创建自己的数据库。按需安装包、本地启动数据库、明确执行迁移和数据初始化、运行结束后销毁。实现无共享状态的可复现运行。

Environment Garbage Is Real

环境垃圾是真实存在的

Long-lived environments accumulate stray files, half-installed packages, cached state, orphaned processes. Fresh environments surface correctness; persistent environments obscure it.
长期运行的环境会积累零散文件、未完全安装的包、缓存状态、孤立进程。全新环境能体现正确性;持久化环境会掩盖问题。

Run Independently of User Sessions

独立于用户会话运行

Agent loop decoupled from browser tabs, terminal sessions, developer machines. Start task, close laptop, return to completed artifacts. Control via wall-clock limits, resource limits, automatic cleanup.
Agent循环与浏览器标签、终端会话、开发者机器解耦。启动任务后即可关闭笔记本,返回后查看已完成的工件。通过时钟限制、资源限制、自动清理进行控制。

Define Outcomes, Not Procedures

定义结果,而非流程

Avoid step-by-step plans and tool-level micromanagement. Define desired outcome, acceptance criteria, constraints. Planning and execution belong to the agent.
避免分步计划和工具层面的微观管理。定义期望的结果、验收标准和约束。规划和执行由Agent负责。

Direct, Low-Level Interfaces

直接的底层接口

Direct access to command execution, persistent files, network requests. OS primitives over abstraction layers. CLI-first systems are easier to debug and more capable than they look.
直接访问命令执行、持久化文件、网络请求。优先使用操作系统原语而非抽象层。优先CLI的系统比看起来更容易调试,功能也更强大。

Persist State Explicitly

明确持久化状态

Writable workspace directory for intermediate results, logs, partial outputs, planning artifacts. Files are inspectable, deterministic, and enable post-run analysis.
为中间结果、日志、部分输出、规划工件提供可写工作区目录。文件可检查、具有确定性,支持运行后分析。

Benchmarks Early

尽早引入基准测试

Introduce benchmarks as early as possible. Representative and repeatable metrics for quality. Even crude benchmarks beat none.
尽早引入基准测试。使用具有代表性和可重复性的质量指标。即使是简单的基准测试也比没有强。

Minimal Framework Overhead

最小框架开销

Most real-world agent workflows reduce to running commands, reading/writing files, and making network calls. CLI-first systems are easier to reason about, debug, and more capable than they look. When an abstraction layer is more complex than the task, it becomes the bottleneck.
大多数实际的Agent工作流可简化为运行命令、读写文件和发起网络请求。优先CLI的系统更易于推理、调试,功能也更强大。当抽象层比任务本身更复杂时,它就会成为瓶颈。

Plan for Cost

提前规划成本

Provision token usage, allocate compute explicitly, enforce limits by system. Autonomy shifts where costs appear, doesn't remove them.
明确配置令牌使用量、分配计算资源、系统强制执行限制。自主性会改变成本的产生位置,但不会消除成本。

Verifiable Output

可验证输出

Output must be verifiable without human review. Automated validation, deterministic results, clear success/failure exit codes. If quality cannot be measured, it cannot be trusted in autonomous operation.
无需人工审核即可验证输出。自动化验证、确定性结果、清晰的成功/失败退出码。如果质量无法衡量,就无法在自主操作中信任它。

Infrastructure-Bounded Permissions

基础设施边界权限

Permissions are constrained by the environment, not by prompts or runtime decisions. Explicit capability grants, sandbox restrictions on dangerous operations, least-privilege by default. No runtime permission prompts required.
权限由环境约束,而非提示或运行时决策。明确的能力授予、沙箱限制危险操作、默认最小权限原则。无需运行时权限提示。