agent-development

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Agent Development

Agent开发

Overview

概述

Design and build AI agents that effectively use tools, manage memory, plan multi-step tasks, coordinate with other agents, and operate within safety guardrails. This skill covers the full agent development lifecycle from architecture through evaluation, with emphasis on observable, testable, and safe agent behavior.

设计并构建能够高效使用工具、管理内存、规划多步骤任务、与其他Agent协同且在安全护栏范围内运行的AI Agent。本技能覆盖从架构设计到评估的完整Agent开发生命周期，重点关注可观测、可测试且安全的Agent行为。

Phase 1: Agent Design

第一阶段：Agent设计

Define the agent's purpose and scope
Identify required tools and capabilities
Design memory architecture (short-term, long-term)
Plan agent loop structure (observe, think, act)
Define safety boundaries and guardrails

STOP — Present agent design to user for approval before implementation.

定义Agent的用途和适用范围
识别所需的工具和能力
设计内存架构（短期、长期）
规划Agent循环结构（观察、思考、执行）
定义安全边界和护栏

注意——在实现前先将Agent设计方案提交给用户确认。

Agent Architecture Decision Table

Agent架构决策表

Agent Type	When to Use	Loop Pattern	Complexity
Single-turn tool user	Simple queries with tool calls	Request -> Tool -> Response	Low
ReAct agent	Multi-step reasoning tasks	Thought -> Action -> Observation -> loop	Medium
Plan-and-execute	Complex tasks with dependencies	Plan -> Execute steps -> Validate	Medium-High
Multi-agent orchestrator	Parallel/specialized sub-tasks	Dispatch -> Collect -> Synthesize	High
Autonomous loop (Ralph-style)	Long-running iterative development	Plan -> Build -> Verify -> Exit gate	High

Agent类型	适用场景	循环模式	复杂度
单轮工具调用者	需要工具调用的简单查询	Request -> Tool -> Response	低
ReAct Agent	多步推理任务	Thought -> Action -> Observation -> loop	中等
规划执行型	有依赖关系的复杂任务	Plan -> Execute steps -> Validate	中高
多Agent编排器	并行/专属子任务	Dispatch -> Collect -> Synthesize	高
自主循环（Ralph风格）	长期运行的迭代开发	Plan -> Build -> Verify -> Exit gate	高

Phase 2: Implementation

第二阶段：实现

Build the agent loop with tool dispatch
Implement memory management (context window, persistence)
Add planning and decomposition logic
Integrate error recovery and retry patterns
Implement output validation

STOP — Run smoke tests on the agent loop before adding complexity.

构建具备工具分发能力的Agent循环
实现内存管理（上下文窗口、持久化）
添加规划与任务拆解逻辑
集成错误恢复与重试模式
实现输出校验

注意——在增加复杂度前先对Agent循环执行冒烟测试。

Tool Use Patterns

工具使用模式

Tool Definition Best Practices

工具定义最佳实践

Principle	Rule	Example
Clear naming	verb-noun format	`search_documents` , `create_file`
Detailed descriptions	Include when to use AND when NOT to use	"Use for keyword search. Do NOT use for semantic similarity."
Well-typed parameters	Descriptions and examples on every param	`query: string // "e.g., 'user authentication'"`
Predictable returns	Consistent format across tools	Always return `{ success, data, error }`
Self-correcting errors	Help agent recover	"Invalid date format. Expected ISO 8601: YYYY-MM-DD"

原则	规则	示例
命名清晰	动词-名词格式	`search_documents` , `create_file`
描述详尽	包含适用场景和不适用场景	"Use for keyword search. Do NOT use for semantic similarity."
参数类型明确	每个参数都要有描述和示例	`query: string // "e.g., 'user authentication'"`
返回值可预测	所有工具返回格式统一	Always return `{ success, data, error }`
错误自提示	帮助Agent恢复	"Invalid date format. Expected ISO 8601: YYYY-MM-DD"

Tool Selection Strategy

工具选择策略

Given a task:
1. Identify required information and actions
2. Map to available tools
3. Determine tool call order (dependencies)
4. Execute with result validation
5. Retry or try alternative tool on failure

Given a task:
1. Identify required information and actions
2. Map to available tools
3. Determine tool call order (dependencies)
4. Execute with result validation
5. Retry or try alternative tool on failure

Tool Design Principles

工具设计原则

Composable: small tools that combine for complex tasks
Idempotent: safe to retry without side effects (where possible)
Observable: return enough context for the agent to verify success
Bounded: timeouts and size limits on all operations
Documented: every parameter and return value described

可组合：小型工具可组合完成复杂任务
幂等：尽可能保证重试安全，无副作用
可观测：返回足够上下文供Agent校验执行是否成功
有边界：所有操作都设置超时和大小限制
有文档：每个参数和返回值都有描述

Memory Management

内存管理

Memory Type Decision Table

内存类型决策表

Type	Duration	Storage	Use Case
Working Memory	Current turn	Context window	Active reasoning
Short-term Memory	Current session	In-context or buffer	Recent conversation
Long-term Memory	Across sessions	Database/file	Learned patterns, user prefs
Episodic Memory	Specific events	Indexed store	Past task outcomes
Semantic Memory	Knowledge	Vector DB	Domain knowledge retrieval

类型	存续时长	存储位置	适用场景
工作内存	当前轮次	上下文窗口	活跃推理
短期内存	当前会话	上下文或缓冲区	近期对话
长期内存	跨会话	数据库/文件	学习到的模式、用户偏好
情景内存	特定事件	索引存储	过往任务结果
语义内存	知识	向量数据库	领域知识检索

Context Window Management

上下文窗口管理

Strategy: Sliding window with importance-based retention

1. Always retain: system prompt, tool definitions, current task
2. Summarize: older conversation turns into compressed summaries
3. Evict: least relevant context when approaching limit
4. Retrieve: pull relevant long-term memory on demand

Budget allocation:
  System prompt + tools: ~20%
  Current task context:  ~40%
  Conversation history:  ~25%
  Retrieved memory:      ~15%

Strategy: Sliding window with importance-based retention

1. Always retain: system prompt, tool definitions, current task
2. Summarize: older conversation turns into compressed summaries
3. Evict: least relevant context when approaching limit
4. Retrieve: pull relevant long-term memory on demand

Budget allocation:
  System prompt + tools: ~20%
  Current task context:  ~40%
  Conversation history:  ~25%
  Retrieved memory:      ~15%

Memory Update Triggers

内存更新触发条件

Trigger	Action
User correction	Update learned patterns
Task completion	Store outcome and approach
Error recovery	Record what failed and what worked
New domain knowledge	Index for future retrieval

触发条件	操作
用户更正	更新学习到的模式
任务完成	存储结果和实现方案
错误恢复	记录失败原因和有效解决方案
新增领域知识	建立索引供后续检索

Planning Strategies

规划策略

Hierarchical Task Decomposition

分层任务拆解

1. Break high-level goal into sub-goals
2. For each sub-goal, identify required actions
3. Order actions by dependencies
4. Execute with checkpoints between phases
5. Re-plan if intermediate results change the approach

1. Break high-level goal into sub-goals
2. For each sub-goal, identify required actions
3. Order actions by dependencies
4. Execute with checkpoints between phases
5. Re-plan if intermediate results change the approach

ReAct Pattern (Reason + Act)

ReAct模式（推理+执行）

Thought: I need to find the user's recent orders to answer their question.
Action: search_orders(user_id="123", limit=5)
Observation: Found 5 orders, most recent is #456 from yesterday.
Thought: The user asked about order #456. I have the details now.
Action: respond with order details

Thought: I need to find the user's recent orders to answer their question.
Action: search_orders(user_id="123", limit=5)
Observation: Found 5 orders, most recent is #456 from yesterday.
Thought: The user asked about order #456. I have the details now.
Action: respond with order details

Plan-and-Execute Pattern

规划执行模式

1. Create a complete plan before any action
2. Execute each step, checking preconditions
3. After each step, validate the result
4. If a step fails, re-plan from current state
5. Never modify the plan mid-step (finish or abort first)

1. Create a complete plan before any action
2. Execute each step, checking preconditions
3. After each step, validate the result
4. If a step fails, re-plan from current state
5. Never modify the plan mid-step (finish or abort first)

Reflection Pattern

反思模式

After completing a task:
1. Was the result correct?
2. Was the approach efficient?
3. What could be improved?
4. Should any memory be updated?

After completing a task:
1. Was the result correct?
2. Was the approach efficient?
3. What could be improved?
4. Should any memory be updated?

Phase 3: Evaluation and Safety

第三阶段：评估与安全

Build evaluation harness with test scenarios
Measure accuracy, efficiency, and safety metrics
Test edge cases and adversarial inputs
Add monitoring and logging
Implement circuit breakers for runaway behavior

STOP — All safety guardrails must be tested before deployment.

基于测试场景构建评估工具集
衡量准确率、效率和安全指标
测试边缘 case 和对抗性输入
添加监控和日志能力
为失控行为实现熔断机制

注意——所有安全护栏必须在部署前完成测试。

Multi-Agent Coordination

多Agent协调

Coordination Pattern Decision Table

协调模式决策表

Pattern	Description	Use When
Orchestrator	Central agent delegates to specialists	Clear task hierarchy
Pipeline	Agents process in sequence	Linear workflows
Debate	Agents propose and critique	Need diverse perspectives
Voting	Multiple agents, majority wins	Uncertainty in approach
Supervisor	One agent monitors others	Safety-critical tasks

模式	描述	适用场景
编排器	中心Agent将任务委派给专属Agent	任务层级清晰
流水线	Agent按顺序处理任务	线性工作流
辩论	Agent提出方案并互相评审	需要多元视角
投票	多Agent决策，多数胜出	方案存在不确定性
监督者	一个Agent监控其他Agent的运行	安全优先的任务

Communication Protocol

通信协议

Agent-to-Agent message:
{
  "from": "planner",
  "to": "executor",
  "type": "task_assignment",
  "content": { "task": "...", "context": "...", "constraints": "..." },
  "priority": "high",
  "deadline": "2025-01-15T10:00:00Z"
}

Agent-to-Agent message:
{
  "from": "planner",
  "to": "executor",
  "type": "task_assignment",
  "content": { "task": "...", "context": "...", "constraints": "..." },
  "priority": "high",
  "deadline": "2025-01-15T10:00:00Z"
}

Coordination Rules

协调规则

Define clear ownership boundaries
Use structured messages between agents
Implement deadlock detection
Set timeouts for inter-agent communication
Log all inter-agent messages for debugging

定义清晰的权责边界
Agent间使用结构化消息通信
实现死锁检测
为Agent间通信设置超时
记录所有Agent间消息用于调试

Evaluation Framework

评估框架

Metrics Decision Table

指标决策表

Metric	What It Measures	How to Measure	Target
Task Success Rate	Correct completions / total	Automated + human eval	> 90%
Efficiency	Steps vs optimal path	Step count comparison	< 2x optimal
Tool Accuracy	Correct tool calls / total	Log analysis	> 95%
Safety	Violations / total interactions	Guardrail checks	0 violations
Latency	Time to complete task	Wall clock	< SLA
Cost	Token usage per task	API usage tracking	Within budget

指标	衡量内容	衡量方式	目标值
任务成功率	正确完成的任务数/总任务数	自动化+人工评估	> 90%
效率	实际步骤数 vs 最优路径步骤数	步骤数对比	< 2x optimal
工具调用准确率	正确的工具调用数/总调用数	日志分析	> 95%
安全性	违规次数/总交互次数	护栏检测	0 violations
延迟	任务完成耗时	实际运行时间	< SLA
成本	单个任务的Token消耗量	API使用追踪	Within budget

Evaluation Dataset Structure

评估数据集结构

json

{
  "test_cases": [
    {
      "id": "tc_001",
      "input": "Find all orders over $100 from last week",
      "expected_tools": ["search_orders"],
      "expected_output_contains": ["order_id", "amount"],
      "category": "retrieval",
      "difficulty": "easy"
    }
  ]
}

json

{
  "test_cases": [
    {
      "id": "tc_001",
      "input": "Find all orders over $100 from last week",
      "expected_tools": ["search_orders"],
      "expected_output_contains": ["order_id", "amount"],
      "category": "retrieval",
      "difficulty": "easy"
    }
  ]
}

Safety Guardrails

安全护栏

Input Guardrails

输入护栏

Detect and reject prompt injection attempts
Validate all user inputs before processing
Rate limit requests per user/session
Content filtering for harmful requests

检测并拒绝prompt注入尝试
处理前校验所有用户输入
按用户/会话设置请求速率限制
对有害请求做内容过滤

Output Guardrails

输出护栏

Validate tool call arguments before execution
Check outputs for sensitive information (PII, secrets)
Enforce response format constraints
Prevent infinite tool call loops

执行前校验工具调用参数
检查输出是否包含敏感信息（PII、密钥）
强制遵守响应格式约束
避免无限工具调用循环

Operational Guardrails

运行时护栏

Maximum tool calls per task (circuit breaker)
Maximum tokens per response
Timeout for total task duration
Escalation to human when confidence is low
Audit logging for all actions

单个任务的最大工具调用次数（熔断）
单次响应的最大Token数
总任务耗时超时限制
置信度低时转人工处理
所有操作的审计日志

Circuit Breaker Thresholds

熔断阈值

Condition	Threshold	Action
Max tool calls per task	20	Stop execution, return error
Max consecutive errors	3	Stop, log, return graceful error
Max task duration	5 minutes	Timeout, return partial result
Max tokens generated	10,000	Stop generation
Pattern repeats	5 identical errors	Open circuit, alert operator

触发条件	阈值	执行动作
单个任务最大工具调用次数	20	停止执行，返回错误
最大连续错误次数	3	停止、记录日志、返回友好错误
最大任务时长	5 分钟	超时，返回部分结果
最大生成Token数	10,000	停止生成
重复模式	5 次相同错误	触发熔断，告警运营人员

Prompt Engineering for Agents

Agent的Prompt工程

System Prompt Structure

系统Prompt结构

1. Identity and purpose (who the agent is)
2. Available tools (what it can do)
3. Constraints (what it must not do)
4. Output format (how to respond)
5. Examples (few-shot demonstrations)
6. Error handling (what to do when stuck)

1. Identity and purpose (who the agent is)
2. Available tools (what it can do)
3. Constraints (what it must not do)
4. Output format (how to respond)
5. Examples (few-shot demonstrations)
6. Error handling (what to do when stuck)

Key Prompt Patterns

核心Prompt模式

Scratchpad: encourage step-by-step reasoning before action
Self-correction: "If your first approach fails, try..."
Confidence calibration: "Only proceed if you are confident"
Graceful degradation: "If you cannot complete the task, explain why"

草稿区：鼓励Agent在行动前做逐步推理
自纠错：「如果你的第一种方案失败，尝试...」
置信度校准：「仅当你有足够置信度时继续执行」
优雅降级：「如果你无法完成任务，请解释原因」

Anti-Patterns / Common Mistakes

反模式/常见错误

Anti-Pattern	Why It Is Wrong	What to Do Instead
Calling tools without reasoning	Wastes calls, misses context	Use ReAct pattern (think first)
No max iteration limit	Infinite loops, runaway costs	Set circuit breaker thresholds
Trusting all tool outputs	Corrupted data propagates	Validate tool results
Hardcoded tool sequences	No adaptability to failures	Dynamic tool selection based on state
No error recovery strategy	Agent gets stuck on first failure	Implement retry with alternatives
Apologizing instead of acting	Wastes user time	Take corrective action, then report
Over-reliance on single tool	Fragile if that tool fails	Provide fallback tools
No evaluation framework	Shipping blind, no quality signal	Build eval harness before deployment
Unlimited context growth	Context overflow, degraded quality	Implement memory management

反模式	问题	优化方案
不做推理直接调用工具	浪费调用次数，遗漏上下文	使用ReAct模式（先思考再行动）
无最大迭代次数限制	无限循环，成本失控	设置熔断阈值
完全信任工具输出	损坏的数据会向下传递	校验工具返回结果
硬编码工具调用序列	故障时无法适配	基于状态动态选择工具
无错误恢复策略	首次失败后Agent就卡住	实现带备选方案的重试逻辑
一味道歉不采取行动	浪费用户时间	先采取纠正措施，再反馈结果
过度依赖单一工具	工具故障时整个流程失效	提供备选工具
无评估框架	发布时无质量参考，相当于盲发	部署前搭建评估工具集
无限制的上下文增长	上下文溢出，质量下降	实现内存管理

Integration Points

集成点

Skill	Integration
`mcp-builder`	MCP servers provide tools for agents
`planning`	Agent planning uses structured plan generation
`autonomous-loop`	Ralph-style loops are a specialized agent pattern
`dispatching-parallel-agents`	Multi-agent coordination pattern
`circuit-breaker`	Operational safety for agent loops
`verification-before-completion`	Agent output validation
`test-driven-development`	TDD for agent tool implementations

Skill	集成方式
`mcp-builder`	MCP服务器为Agent提供工具
`planning`	Agent规划使用结构化的规划生成能力
`autonomous-loop`	Ralph风格循环是一种特殊的Agent模式
`dispatching-parallel-agents`	多Agent协调模式
`circuit-breaker`	Agent循环的运行时安全能力
`verification-before-completion`	Agent输出校验
`test-driven-development`	Agent工具实现的TDD开发

Skill Type

技能类型

FLEXIBLE — Adapt the agent architecture, memory strategy, and coordination patterns to the specific use case. Safety guardrails and evaluation frameworks are strongly recommended for all production agents.

灵活适配——根据具体使用场景调整Agent架构、内存策略和协调模式。所有生产环境的Agent都强烈建议配置安全护栏和评估框架。