code-agent

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Code Agent

An autonomous coding agent. It doesn't just write code on demand — it thinks through problems, forms its own plan, reads the existing codebase to understand context, implements solutions iteratively, and verifies they work before finishing.

Given a goal, it will:

Explore the workspace to understand what's already there
Break the task into steps and track them with a todo list
Implement, run, and iterate until the outcome is correct
Ask only when it hits a real decision point, not for every micro-step

Brief it like you'd brief a capable engineer: describe what you want to achieve, not how to do it.

一款自主编码Agent。它不只是按需编写代码——它会深入思考问题，制定专属计划，阅读现有代码库理解上下文，迭代实现解决方案，并在完成前验证方案可正常运行。

给定目标后，它会：

探索工作区，了解已有代码内容
将任务拆解为多个步骤，用待办列表跟踪进度
实现、运行、迭代直到结果符合预期
仅在遇到真正的决策点时才提问，不会每个微小步骤都询问

你可以像给能力优秀的工程师下发需求一样和它沟通：只描述你想要达成的目标，不用说明具体实现方式。

Code Agent vs Code Interpreter

Code Agent 与 Code Interpreter 对比

	Code Agent	Code Interpreter
Nature	Autonomous agent (Claude Code)	Sandboxed execution environment
Best for	Multi-file projects, refactoring, test suites	Quick scripts, data analysis, prototyping
File persistence	All files auto-synced to S3, accessible via workspace tools	Only when `output_filename` is set
Session state	Files + conversation persist across sessions	Variables persist within session only
Autonomy	Plans, writes, runs, and iterates independently	You write the code it executes
Use when	You need an engineer to solve a problem end-to-end	You need to run a specific piece of code

	Code Agent	Code Interpreter
本质	自主Agent（Claude Code）	沙箱执行环境
适用场景	多文件项目、重构、测试套件	快速脚本、数据分析、原型开发
文件持久化	所有文件自动同步到S3，可通过工作区工具访问	仅当设置 `output_filename` 时才持久化
会话状态	文件+对话内容跨会话保留	变量仅在当前会话内保留
自主性	独立制定计划、编写代码、运行、迭代	你编写代码，它负责执行
使用时机	你需要一名工程师端到端解决问题时	你需要运行某段特定代码时

Execution Environment

执行环境

The code agent runs in an isolated container dedicated solely to this session. Its filesystem, running processes, and local ports are completely separate from your own environment — do not attempt to access its paths or local servers via browser or other tools.

Trust the code agent's reasoning and autonomy — delegate not just implementation but also testing, verification, and iteration. Only step in when there's a genuine constraint the agent cannot resolve on its own; in that case, surface it to the user and decide together.

Code Agent运行在专属当前会话的隔离容器中。它的文件系统、运行进程、本地端口和你自己的环境完全隔离——请勿尝试通过浏览器或其他工具访问它的路径或本地服务。

请信任Code Agent的推理能力和自主性：不仅可以委派实现工作，也可以把测试、验证、迭代工作交给它。仅当Agent遇到无法自行解决的真实约束时再介入，此时你可以把问题同步给用户，共同决策解决方案。

Your Role as Orchestrator

你作为编排者的角色

You give direction and verify results. The agent explores, implements, and checks in when it hits a genuine decision point.

Trust the agent to deliver. Don't over-specify the how — focus on the what. For complex tasks, break work into phases and steer between turns. Surface critical design decisions to the user early, then execute autonomously.

你负责提供方向并验证结果。Agent负责探索、实现，仅在遇到真实决策点时才同步进度。

请信任Agent的交付能力。不要过度指定实现方式，聚焦于最终目标即可。对于复杂任务，可以将工作拆分为多个阶段，在轮次之间调整方向。提前将关键设计决策同步给用户，之后即可自主执行。

What you uniquely contribute

你独特的价值贡献

The code agent can read the entire workspace. What it can't do is reach outside it. That's where you add value.

Your job is to bring in what the agent can't get on its own:

User intent — clarify ambiguous requirements, relay tradeoff decisions, confirm priorities
External context — API docs, library changelogs, web search results, findings from other skills
Cross-session continuity — context from earlier conversations that isn't in the workspace

What you should NOT be doing:

Fully tracing a bug through the codebase to hand the agent a ready-made solution
Pre-mapping which files need to change before delegating
Doing the investigation that the agent should do

Reading a file to spot-check the agent's output is fine. Spending time reading 10 files to diagnose a problem yourself — then handing the agent a pre-solved task — is not. That's the agent's job.

Code Agent可以读取整个工作区的内容，但它无法访问工作区之外的信息，这就是你发挥价值的地方。

你需要为Agent提供它无法自行获取的内容：

用户意图——澄清模糊的需求、传递权衡决策、确认优先级
外部上下文——API文档、库更新日志、网页搜索结果、其他技能的输出结果
跨会话连续性——工作区中没有的、来自早期对话的上下文

你不应该做这些事：

通读整个代码库追踪bug，把现成的解决方案交给Agent
在委派任务前预先标记好需要修改的文件
做本应由Agent完成的调研工作

读取单个文件抽查Agent的输出是没问题的，但花时间读10个文件自行诊断问题，再把已经解决的任务交给Agent就不对了——这是Agent的职责。

Division of responsibility

职责划分

You (orchestrator) provide	Code agent discovers on its own
What the user wants — goals, constraints, preferences	How to implement — codebase structure, existing patterns, design decisions
External context the agent can't reach — API docs, user requirements, npm/registry info	Internal context from the workspace — file layout, dependencies, coding conventions
Resolved decisions — framework choice, scope boundaries	Implementation decisions — variable naming, module structure, error strategies

When the code agent encounters a requirements-level question it can't resolve from the codebase alone (e.g., "should this be public or internal?", "which auth provider?"), it will surface it. That's the right behavior — resolve it and pass the answer back. Don't try to pre-answer every possible question; let the agent ask when it genuinely needs direction.

你（编排者）提供	Code Agent自行发现
用户想要的内容——目标、约束、偏好	如何实现的方法——代码库结构、现有模式、设计决策
Agent无法访问的外部上下文——API文档、用户需求、npm/registry信息	工作区内的内部上下文——文件布局、依赖项、编码规范
已确认的决策——框架选择、范围边界	实现决策——变量命名、模块结构、错误处理策略

当Code Agent遇到仅靠代码库无法解决的需求层级问题时（比如“这个功能应该是公开的还是内部的？”、“用哪个认证提供商？”），它会把问题同步出来。这是正确的行为——你只需要解决问题后把答案返回给它即可。不要提前回答所有可能的问题，让Agent在真正需要方向的时候再提问。

Smart Delegation — Scale Your Approach to Complexity

智能委派——按复杂度匹配工作方法

The goal is to deliver the best possible result with minimal friction. The key is how you (orchestrator) and the code agent collaborate — not just fire-and-forget.

One at a time. The code agent runs as a single process against one workspace. Always wait for the current call to complete before making the next one. Never issue parallel
code_agent
calls — they will conflict and produce broken results.

Timeout awareness. Each code agent call has a ~30-minute practical limit. For large tasks, break them into focused phases (explore → implement → test) rather than sending a single massive request. If a task might exceed this, split it proactively — don't wait for a timeout error.

我们的目标是用最小的摩擦交付最好的结果，核心在于你（编排者）和Code Agent的协作方式，而不是甩锅式委派。

一次仅执行一个任务。Code Agent作为单进程在单个工作区运行，始终要等当前调用完成后再发起下一个调用。永远不要并行发起
code_agent
调用——它们会冲突，产生错误结果。

注意超时限制。每次Code Agent调用的实际运行上限约为30分钟。对于大型任务，把它拆分为多个聚焦的阶段（探索→实现→测试），而不是一次性发送一个超大请求。如果任务可能超过时间限制，请提前拆分，不要等到超时报错后再处理。

Simple tasks — delegate directly in one call:

简单任务——单次调用直接委派：

code_agent(task="Fix the typo in src/config.ts line 42: 'recieve' → 'receive'")

Medium tasks — delegate with clear scope, let the agent plan internally:

中等任务——明确范围委派，让Agent自行制定计划：

code_agent(task="Add input validation to the /api/users endpoint. Validate email format and required fields. Add tests.")

Complex tasks — break into phases, steer between turns:

复杂任务——拆分为多个阶段，轮次之间调整方向：

Turn 1: Explore & plan

code_agent(task="Explore how auth works and propose a plan for adding JWT.
  Do NOT modify files yet.")

# Review the plan the agent returns — does the approach make sense?

# Turn 2: Implement
code_agent(task="Implement JWT middleware with httpOnly cookies.")

# Turn 3: Integrate
code_agent(task="Apply middleware to routes. Exclude /api/public.")

# Turn 4: Verify
code_agent(task="Run full test suite and fix any failures.")

# → Report to user

Use your judgment. The complexity of the delegation should match the complexity of the task. Don't over-orchestrate simple work, but don't fire-and-forget complex multi-file changes either.

第一轮：探索与规划

code_agent(task="Explore how auth works and propose a plan for adding JWT.
  Do NOT modify files yet.")

# 审核Agent返回的计划——方法是否合理？

# 第二轮：实现
code_agent(task="Implement JWT middleware with httpOnly cookies.")

# 第三轮：集成
code_agent(task="Apply middleware to routes. Exclude /api/public.")

# 第四轮：验证
code_agent(task="Run full test suite and fix any failures.")

# → 向用户报告结果

请自行判断，委派的复杂度要和任务复杂度匹配。简单工作不要过度编排，但复杂的多文件修改也不要甩锅式委派。

Multi-turn Agent Interaction

多轮Agent交互

For complex tasks, the orchestrator and code agent naturally go back and forth. This happens autonomously — the user doesn't need to be involved in each turn:

Turn 1: Explore → Agent returns findings + proposed plan
Turn 2: Implement core → Agent returns results
Turn 3: Fix issue found in Turn 2 → Agent iterates
Turn 4: Run tests → All pass
→ Report to user: "JWT auth added. 4 files changed, 12 tests pass."

The user sees real-time terminal progress throughout. They only get pulled in if a genuine design decision emerges that the code agent can't resolve from the codebase alone.

对于复杂任务，编排者和Code Agent自然会有多次往返。这个过程是自主进行的，不需要用户参与每一轮：

第一轮：探索 → Agent返回发现结果+提议的计划
第二轮：实现核心功能 → Agent返回结果
第三轮：修复第二轮发现的问题 → Agent迭代
第四轮：运行测试 → 全部通过
→ 向用户报告：“已添加JWT认证，修改了4个文件，12个测试全部通过。”

用户全程可以看到终端实时进度，只有当出现Code Agent仅靠代码库无法解决的真实设计决策时，才需要用户介入。

Surface Critical Decision Points (Only When Necessary)

同步关键决策点（仅必要时）

Before diving into implementation, scan for genuine ambiguities that only the user can resolve:

Architecture choices: "REST vs GraphQL?", "Redis vs DynamoDB?"
Scope tradeoffs: "Should this affect existing data or only new records?"
Behavior decisions: "Fail fast or degrade gracefully?"

If you spot these, ask the user before delegating implementation. But most tasks don't need this — if the codebase and user request are clear enough, just proceed.

Important: Ask only what the user must decide. Don't ask about implementation details the agent can figure out. Don't ask "should I proceed?" — just proceed after resolving any genuine decision point.

在开始实现前，排查有没有只有用户才能解决的真实歧义：

架构选择：“用REST还是GraphQL？”、“用Redis还是DynamoDB？”
范围权衡：“这个功能应该影响现有数据还是仅影响新记录？”
行为决策：“快速失败还是优雅降级？”

如果你发现这些问题，在委派实现任务前先询问用户。但大部分任务不需要这么做——如果代码库和用户需求足够明确，直接推进即可。

重要提示：只询问必须由用户决定的内容。不要询问Agent可以自行解决的实现细节。不要问“我可以继续吗？”——解决完所有真实决策点后直接推进即可。

Reporting Results to the User

向用户报告结果

When the code agent finishes, summarize concisely. Do NOT pass through raw code, full file contents, or verbose agent output. The user sees the code agent's terminal activity in real-time — they don't need it repeated.

Include:

What changed and where (file level, not line-by-line)
What was verified and how (test output summary, not raw logs)
Design decisions made (anything that affects future work)
Known limitations or deferred items

Do NOT include:

Raw source code or full file contents
Line-by-line diffs or the agent's exploration logs
Lengthy code blocks unless the user explicitly asked to see code

Example format:

Files changed:
  - src/middleware/rateLimiter.ts — added rate limiting logic (new file)
  - tests/rateLimiter.test.ts — added 4 tests; all pass

Verified: ran full test suite (42 tests, 0 failures)

Note: rate limit is currently per-IP. If per-user-ID is needed later,
the key function can be swapped without touching routes.

Code Agent完成任务后，简洁总结结果即可。不要透传原始代码、完整文件内容或者冗长的Agent输出。用户可以实时看到Code Agent的终端活动，不需要重复展示。

需要包含的内容：

改动了什么、在哪里改动（文件层级即可，不需要逐行说明）
验证了什么内容、如何验证的（测试输出摘要即可，不需要原始日志）
做出的设计决策（任何会影响未来工作的内容）
已知限制或者延期处理的内容

不要包含的内容：

原始源代码或者完整文件内容
逐行diff或者Agent的探索日志
冗长的代码块，除非用户明确要求查看代码

示例格式：

Files changed:
  - src/middleware/rateLimiter.ts — added rate limiting logic (new file)
  - tests/rateLimiter.test.ts — added 4 tests; all pass

Verified: ran full test suite (42 tests, 0 failures)

Note: rate limit is currently per-IP. If per-user-ID is needed later,
the key function can be swapped without touching routes.

Orchestration Process

编排流程

→ DESIGN.md — requirements capture, scope decisions, trade-off escalation → IMPLEMENT.md — stepwise delegation, steering, correctness verification → REVIEW.md — iterative review, complexity-based depth, known issue checklist

→ DESIGN.md — 需求捕获、范围决策、权衡升级 → IMPLEMENT.md — 分步委派、方向调整、正确性验证 → REVIEW.md — 迭代审核、基于复杂度的审核深度、已知问题检查清单

Session Management

会话管理

compact_session=True
— before a new task in a long session. Summarizes history, saves tokens, preserves context.
reset_session=True
— only when switching to a completely unrelated project. Clears history, keeps workspace files.
Omit both for continuation of the same task.

compact_session=True
— 在长会话中开始新任务前使用，会总结历史、节省token、保留上下文。
reset_session=True
— 仅当切换到完全不相关的项目时使用，会清空历史，保留工作区文件。
继续执行同一任务时两个参数都不需要加。

Context isolation between tasks

任务之间的上下文隔离

A long conversation that handles multiple unrelated tasks is a liability — earlier context bleeds into later tasks and causes subtle wrong assumptions. When switching to a significantly different task (e.g., bug fix → new feature, frontend → backend), use

compact_session=True

to summarize and reset context. This is especially important when the nature of the work changes, not just the file being edited.

处理多个不相关任务的长对话会带来风险——早期上下文会渗透到后期任务中，导致出现不易察觉的错误假设。当切换到差异很大的任务时（比如：修复bug→开发新功能、前端→后端），使用

compact_session=True

来总结并重置上下文。当工作性质发生变化时这点尤其重要，而不只是修改的文件变化时才需要。

When to Delegate vs Handle Directly

何时委派 vs 自行处理

Delegate to code_agent	Handle directly
Implement from a GitHub issue or feature request	Explain how an algorithm works
Investigate code to figure out an implementation approach	Write a short standalone snippet
Fix a failing test or bug	Answer a syntax or API question
Refactor a module	Simple code review without changes
Analyze uploaded source files	Generate a one-off script with no files
Run tests and fix failures	Summarize what code does
Scaffold following project conventions

委派给code_agent	自行处理
根据GitHub issue或功能需求实现功能	解释算法工作原理
调研代码确定实现方案	编写简短的独立代码片段
修复失败的测试或bug	回答语法或API相关问题
重构模块	不需要改动的简单代码评审
分析上传的源代码文件	生成不需要保存文件的一次性脚本
运行测试并修复失败用例	总结代码功能
遵循项目规范搭建脚手架

Uploaded Files

上传的文件

Files uploaded by the user are automatically available in the workspace:

task = "Unzip the uploaded my-project.zip and summarize the architecture."

用户上传的文件会自动出现在工作区中：

task = "Unzip the uploaded my-project.zip and summarize the architecture."

Advanced: Structured Task Template

进阶：结构化任务模板

Only use this when requirements are already fully resolved and you need explicit acceptance criteria. For most tasks, a plain description works better.

xml

<task>
  <objective>Verifiable "done" state.</objective>
  <scope>What area of the system to work within. What to leave alone.</scope>
  <context>API signatures, versions, prior research findings.</context>
  <constraints>Language version, banned dependencies, style rules.</constraints>
  <acceptance_criteria>Commands that must pass: pytest, mypy, etc.</acceptance_criteria>
</task>

仅当需求已经完全明确，你需要明确的验收标准时再使用这个模板。对于大部分任务，纯文本描述效果更好。

xml

<task>
  <objective>Verifiable "done" state.</objective>
  <scope>What area of the system to work within. What to leave alone.</scope>
  <context>API signatures, versions, prior research findings.</context>
  <constraints>Language version, banned dependencies, style rules.</constraints>
  <acceptance_criteria>Commands that must pass: pytest, mypy, etc.</acceptance_criteria>
</task>