autonomous-agent-readiness
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAutonomous Agent Readiness Assessment
自主Agent适配性评估
Evaluate a codebase against proven patterns for autonomous agent execution and provide tailored recommendations.
对照经过验证的自主Agent执行模式评估代码库,并提供定制化建议。
Core Philosophy
核心理念
Most agent failures are system design failures, not model failures. An agent that requires human approval at every step or depends on a developer's laptop being open is not autonomous. Autonomy is an infrastructure decision.
大多数Agent故障是系统设计故障,而非模型故障。如果一个Agent每一步都需要人工批准,或依赖开发者的笔记本保持开机状态,那么它就不具备自主性。自主性是一个基础设施层面的决策。
Assessment Workflow
评估工作流
Phase 1: Discovery
阶段1:发现调研
Gather information about the project's current state:
-
Examine project structure
- Look for CI/CD configuration (,
.github/workflows/,Jenkinsfile).gitlab-ci.yml - Check for containerization (,
Dockerfile,docker-compose.yml)devcontainer.json - Identify test infrastructure (,
tests/, test config files)__tests__/ - Find environment management (,
.env.example,requirements.txt)package.json
- Look for CI/CD configuration (
-
Review development workflow
- Read contributing guidelines, README, or developer docs
- Check for sandbox/isolation patterns
- Look for database setup scripts or fixtures
- Identify how dependencies are managed
-
Assess current automation
- Review existing CI/CD pipelines
- Check for automated testing patterns
- Look for environment provisioning scripts
- Identify cleanup/teardown procedures
收集项目当前状态的相关信息:
-
检查项目结构
- 查找CI/CD配置文件(、
.github/workflows/、Jenkinsfile).gitlab-ci.yml - 检查容器化相关文件(、
Dockerfile、docker-compose.yml)devcontainer.json - 识别测试基础设施(、
tests/、测试配置文件)__tests__/ - 查找环境管理文件(、
.env.example、requirements.txt)package.json
- 查找CI/CD配置文件(
-
评审开发工作流
- 阅读贡献指南、README或开发者文档
- 检查沙箱/隔离模式
- 查找数据库设置脚本或测试数据
- 识别依赖管理方式
-
评估当前自动化程度
- 评审现有CI/CD流水线
- 检查自动化测试模式
- 查找环境配置脚本
- 识别清理/销毁流程
Phase 2: Evaluate Against Principles
阶段2:对照原则评估
Score the project (0-3) on each dimension. See for detailed rubrics.
references/assessment-criteria.md| Dimension | What to Look For |
|---|---|
| Sandbox Isolation | Ephemeral environments, container support, clean state per run |
| Database Independence | Local DB setup, migrations in code, no external DB dependencies |
| Environment Reproducibility | Explicit dependencies, no hidden state, deterministic setup |
| Session Independence | Remote execution capability, no user session dependencies |
| Outcome-Oriented Design | Clear acceptance criteria, minimal procedural coupling |
| Direct Interfaces | CLI-first tools, OS primitives, minimal abstraction layers |
| Minimal Framework Overhead | Simple interfaces, no heavy orchestration, composable CLI tools |
| Explicit State | Workspace directories, file-based artifacts, inspectable logs |
| Benchmarking | Measurable quality criteria, automated verification |
| Cost Awareness | Resource limits, usage tracking, explicit provisioning |
| Verifiable Output | Automated validation, deterministic results, clear exit codes |
| Infrastructure-Bounded Permissions | System-enforced constraints, least-privilege, no runtime prompts |
针对每个维度为项目评分(0-3分)。详细评分标准请参考。
references/assessment-criteria.md| 评估维度 | 检查要点 |
|---|---|
| 沙箱隔离 | 临时环境、容器支持、每次运行都处于干净状态 |
| 数据库独立性 | 本地数据库设置、代码中包含迁移脚本、无外部数据库依赖 |
| 环境可复现性 | 依赖明确、无隐藏状态、设置流程可预测 |
| 会话独立性 | 支持远程执行、无用户会话依赖 |
| 结果导向设计 | 明确的验收标准、最小化流程耦合 |
| 直接接口 | 优先CLI工具、操作系统原语、最小化抽象层 |
| 最小框架开销 | 简单接口、无复杂编排、可组合的CLI工具 |
| 状态显性化 | 工作区目录、基于文件的产物、可检查的日志 |
| 基准测试 | 可衡量的质量标准、自动化验证 |
| 成本意识 | 资源限制、使用追踪、明确的资源配置 |
| 可验证输出 | 自动化验证、可预测结果、明确的退出码 |
| 基础设施边界权限 | 系统强制约束、最小权限原则、无运行时提示 |
Phase 3: Generate Recommendations
阶段3:生成改进建议
For each dimension scoring below 2, provide:
- Current state: What exists today
- Gap: What's missing for autonomous execution
- Recommendation: Specific, actionable improvement
- Priority: High/Medium/Low based on impact and effort
Tailor recommendations to the project's:
- Technology stack
- Team size and workflow
- Existing infrastructure
- Deployment targets
对于评分低于2分的维度,需提供以下内容:
- 当前状态:当前已具备的条件
- 差距:自主执行所缺失的要素
- 建议:具体、可落地的改进方案
- 优先级:根据影响程度和实施难度分为高/中/低
建议需结合项目的以下情况定制:
- 技术栈
- 团队规模与工作流
- 现有基础设施
- 部署目标
Output Format
输出格式
markdown
undefinedmarkdown
undefinedAutonomous Agent Readiness Assessment
自主Agent适配性评估
Project: [name]
项目: [name]
Assessment Date: [date]
评估日期: [date]
Executive Summary
执行摘要
[1-2 paragraphs summarizing overall readiness and top priorities]
Overall Readiness Score: X/36 (sum of dimension scores)
[1-2段文字总结整体适配性及首要优先级]
整体适配性得分: X/36(各维度得分之和)
Dimension Scores
维度得分
| Dimension | Score | Status |
|---|---|---|
| Sandbox Isolation | X/3 | [emoji] |
| Database Independence | X/3 | [emoji] |
| ... | ... | ... |
Status: 0-1 = needs work, 2 = adequate, 3 = strong
| 评估维度 | 得分 | 状态 |
|---|---|---|
| 沙箱隔离 | X/3 | [emoji] |
| 数据库独立性 | X/3 | [emoji] |
| ... | ... | ... |
状态说明: 0-1分 = 需要改进, 2分 = 合格, 3分 = 优秀
Detailed Findings
详细发现
[Dimension Name] (X/3)
[评估维度名称] (X/3)
Current State:
[What exists]
Gap:
[What's missing]
Recommendation:
[Specific action]
Priority: [High/Medium/Low]
[Repeat for each dimension]
当前状态:
[现有情况]
差距:
[缺失的要素]
建议:
[具体行动方案]
优先级: [高/中/低]
[每个维度重复上述结构]
Prioritized Action Plan
优先级行动计划
Immediate (This Week)
立即执行(本周内)
- [Highest impact, lowest effort items]
- [影响大、实施难度低的事项]
Short-term (This Month)
短期执行(本月内)
- [Important foundational changes]
- [重要的基础变更]
Medium-term (This Quarter)
中期执行(本季度内)
- [Larger infrastructure investments]
- [较大的基础设施投入]
Quick Wins
快速优化项
[2-3 changes that can be made today with minimal effort]
undefined[2-3项可立即实施、投入极小的变更]
undefinedKey Principles Reference
核心原则参考
Sandbox Everything
全沙箱化
Every agent run executes in its own ephemeral, isolated, disposable environment. Clean environment, writable filesystem, command execution, scoped network access. Environment destroyed after verified output.
每次Agent运行都在独立的临时、隔离、可销毁环境中执行。环境需干净、具备可写文件系统、支持命令执行、网络访问受限。在输出验证完成后销毁环境。
No External Databases
无外部数据库依赖
Agents create their own databases inside the sandbox. Install packages on demand, spin up DBs locally, run migrations, seed data explicitly, tear down at end. Reproducible runs without shared state.
Agent需在沙箱内创建自己的数据库。按需安装依赖包、本地启动数据库、显式执行迁移和数据填充、运行结束后销毁数据库。实现无共享状态的可复现运行。
Environment Garbage Is Real
环境垃圾真实存在
Long-lived environments accumulate stray files, half-installed packages, cached state, orphaned processes. Fresh environments surface correctness; persistent environments obscure it.
长期运行的环境会积累零散文件、未完全安装的包、缓存状态、孤儿进程。全新环境能暴露正确性问题;持久化环境则会掩盖问题。
Run Independently of User Sessions
独立于用户会话运行
Agent loop decoupled from browser tabs, terminal sessions, developer machines. Start task, close laptop, return to completed artifacts. Control via wall-clock limits, resource limits, automatic cleanup.
Agent循环需与浏览器标签、终端会话、开发者机器解耦。启动任务后即可关闭笔记本,返回后查看已完成的产物。通过时钟限制、资源限制、自动清理来控制执行。
Define Outcomes, Not Procedures
定义结果而非流程
Avoid step-by-step plans and tool-level micromanagement. Define desired outcome, acceptance criteria, constraints. Planning and execution belong to the agent.
避免分步计划和工具层面的微观管理。定义期望结果、验收标准、约束条件。规划和执行由Agent负责。
Direct, Low-Level Interfaces
直接的底层接口
Direct access to command execution, persistent files, network requests. OS primitives over abstraction layers. CLI-first systems are easier to debug and more capable than they look.
直接访问命令执行、持久化文件、网络请求。优先使用操作系统原语而非抽象层。优先CLI的系统比看起来更易于调试且功能更强。
Persist State Explicitly
显性化持久化状态
Writable workspace directory for intermediate results, logs, partial outputs, planning artifacts. Files are inspectable, deterministic, and enable post-run analysis.
为中间结果、日志、部分输出、规划产物提供可写工作区目录。文件需可检查、可预测,并支持运行后分析。
Benchmarks Early
尽早引入基准测试
Introduce benchmarks as early as possible. Representative and repeatable metrics for quality. Even crude benchmarks beat none.
尽早引入基准测试。采用具备代表性和可重复性的质量指标。即使是简单的基准测试也比没有强。
Minimal Framework Overhead
最小框架开销
Most real-world agent workflows reduce to running commands, reading/writing files, and making network calls. CLI-first systems are easier to reason about, debug, and more capable than they look. When an abstraction layer is more complex than the task, it becomes the bottleneck.
大多数实际场景下的Agent工作流可简化为运行命令、读写文件、发起网络请求。优先CLI的系统更易于推理、调试,且功能更强。当抽象层比任务本身更复杂时,它就会成为瓶颈。
Plan for Cost
提前规划成本
Provision token usage, allocate compute explicitly, enforce limits by system. Autonomy shifts where costs appear, doesn't remove them.
显式配置令牌使用量、分配计算资源、系统强制限制。自主性只是改变了成本的出现位置,并未消除成本。
Verifiable Output
可验证输出
Output must be verifiable without human review. Automated validation, deterministic results, clear success/failure exit codes. If quality cannot be measured, it cannot be trusted in autonomous operation.
输出无需人工审核即可验证。需具备自动化验证、可预测结果、明确的成功/失败退出码。如果质量无法衡量,就无法在自主运行中信任它。
Infrastructure-Bounded Permissions
基础设施边界权限
Permissions are constrained by the environment, not by prompts or runtime decisions. Explicit capability grants, sandbox restrictions on dangerous operations, least-privilege by default. No runtime permission prompts required.
权限由环境约束,而非提示或运行时决策。显式授予权限、沙箱限制危险操作、默认遵循最小权限原则。无需运行时权限提示。