agent-development
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAgent Development
Agent开发
Overview
概述
Design and build AI agents that effectively use tools, manage memory, plan multi-step tasks, coordinate with other agents, and operate within safety guardrails. This skill covers the full agent development lifecycle from architecture through evaluation, with emphasis on observable, testable, and safe agent behavior.
设计并构建能够高效使用工具、管理内存、规划多步骤任务、与其他Agent协同且在安全护栏范围内运行的AI Agent。本技能覆盖从架构设计到评估的完整Agent开发生命周期,重点关注可观测、可测试且安全的Agent行为。
Phase 1: Agent Design
第一阶段:Agent设计
- Define the agent's purpose and scope
- Identify required tools and capabilities
- Design memory architecture (short-term, long-term)
- Plan agent loop structure (observe, think, act)
- Define safety boundaries and guardrails
STOP — Present agent design to user for approval before implementation.
- 定义Agent的用途和适用范围
- 识别所需的工具和能力
- 设计内存架构(短期、长期)
- 规划Agent循环结构(观察、思考、执行)
- 定义安全边界和护栏
注意——在实现前先将Agent设计方案提交给用户确认。
Agent Architecture Decision Table
Agent架构决策表
| Agent Type | When to Use | Loop Pattern | Complexity |
|---|---|---|---|
| Single-turn tool user | Simple queries with tool calls | Request -> Tool -> Response | Low |
| ReAct agent | Multi-step reasoning tasks | Thought -> Action -> Observation -> loop | Medium |
| Plan-and-execute | Complex tasks with dependencies | Plan -> Execute steps -> Validate | Medium-High |
| Multi-agent orchestrator | Parallel/specialized sub-tasks | Dispatch -> Collect -> Synthesize | High |
| Autonomous loop (Ralph-style) | Long-running iterative development | Plan -> Build -> Verify -> Exit gate | High |
| Agent类型 | 适用场景 | 循环模式 | 复杂度 |
|---|---|---|---|
| 单轮工具调用者 | 需要工具调用的简单查询 | Request -> Tool -> Response | 低 |
| ReAct Agent | 多步推理任务 | Thought -> Action -> Observation -> loop | 中等 |
| 规划执行型 | 有依赖关系的复杂任务 | Plan -> Execute steps -> Validate | 中高 |
| 多Agent编排器 | 并行/专属子任务 | Dispatch -> Collect -> Synthesize | 高 |
| 自主循环(Ralph风格) | 长期运行的迭代开发 | Plan -> Build -> Verify -> Exit gate | 高 |
Phase 2: Implementation
第二阶段:实现
- Build the agent loop with tool dispatch
- Implement memory management (context window, persistence)
- Add planning and decomposition logic
- Integrate error recovery and retry patterns
- Implement output validation
STOP — Run smoke tests on the agent loop before adding complexity.
- 构建具备工具分发能力的Agent循环
- 实现内存管理(上下文窗口、持久化)
- 添加规划与任务拆解逻辑
- 集成错误恢复与重试模式
- 实现输出校验
注意——在增加复杂度前先对Agent循环执行冒烟测试。
Tool Use Patterns
工具使用模式
Tool Definition Best Practices
工具定义最佳实践
| Principle | Rule | Example |
|---|---|---|
| Clear naming | verb-noun format | |
| Detailed descriptions | Include when to use AND when NOT to use | "Use for keyword search. Do NOT use for semantic similarity." |
| Well-typed parameters | Descriptions and examples on every param | |
| Predictable returns | Consistent format across tools | Always return |
| Self-correcting errors | Help agent recover | "Invalid date format. Expected ISO 8601: YYYY-MM-DD" |
| 原则 | 规则 | 示例 |
|---|---|---|
| 命名清晰 | 动词-名词格式 | |
| 描述详尽 | 包含适用场景和不适用场景 | "Use for keyword search. Do NOT use for semantic similarity." |
| 参数类型明确 | 每个参数都要有描述和示例 | |
| 返回值可预测 | 所有工具返回格式统一 | Always return |
| 错误自提示 | 帮助Agent恢复 | "Invalid date format. Expected ISO 8601: YYYY-MM-DD" |
Tool Selection Strategy
工具选择策略
Given a task:
1. Identify required information and actions
2. Map to available tools
3. Determine tool call order (dependencies)
4. Execute with result validation
5. Retry or try alternative tool on failureGiven a task:
1. Identify required information and actions
2. Map to available tools
3. Determine tool call order (dependencies)
4. Execute with result validation
5. Retry or try alternative tool on failureTool Design Principles
工具设计原则
- Composable: small tools that combine for complex tasks
- Idempotent: safe to retry without side effects (where possible)
- Observable: return enough context for the agent to verify success
- Bounded: timeouts and size limits on all operations
- Documented: every parameter and return value described
- 可组合:小型工具可组合完成复杂任务
- 幂等:尽可能保证重试安全,无副作用
- 可观测:返回足够上下文供Agent校验执行是否成功
- 有边界:所有操作都设置超时和大小限制
- 有文档:每个参数和返回值都有描述
Memory Management
内存管理
Memory Type Decision Table
内存类型决策表
| Type | Duration | Storage | Use Case |
|---|---|---|---|
| Working Memory | Current turn | Context window | Active reasoning |
| Short-term Memory | Current session | In-context or buffer | Recent conversation |
| Long-term Memory | Across sessions | Database/file | Learned patterns, user prefs |
| Episodic Memory | Specific events | Indexed store | Past task outcomes |
| Semantic Memory | Knowledge | Vector DB | Domain knowledge retrieval |
| 类型 | 存续时长 | 存储位置 | 适用场景 |
|---|---|---|---|
| 工作内存 | 当前轮次 | 上下文窗口 | 活跃推理 |
| 短期内存 | 当前会话 | 上下文或缓冲区 | 近期对话 |
| 长期内存 | 跨会话 | 数据库/文件 | 学习到的模式、用户偏好 |
| 情景内存 | 特定事件 | 索引存储 | 过往任务结果 |
| 语义内存 | 知识 | 向量数据库 | 领域知识检索 |
Context Window Management
上下文窗口管理
Strategy: Sliding window with importance-based retention
1. Always retain: system prompt, tool definitions, current task
2. Summarize: older conversation turns into compressed summaries
3. Evict: least relevant context when approaching limit
4. Retrieve: pull relevant long-term memory on demand
Budget allocation:
System prompt + tools: ~20%
Current task context: ~40%
Conversation history: ~25%
Retrieved memory: ~15%Strategy: Sliding window with importance-based retention
1. Always retain: system prompt, tool definitions, current task
2. Summarize: older conversation turns into compressed summaries
3. Evict: least relevant context when approaching limit
4. Retrieve: pull relevant long-term memory on demand
Budget allocation:
System prompt + tools: ~20%
Current task context: ~40%
Conversation history: ~25%
Retrieved memory: ~15%Memory Update Triggers
内存更新触发条件
| Trigger | Action |
|---|---|
| User correction | Update learned patterns |
| Task completion | Store outcome and approach |
| Error recovery | Record what failed and what worked |
| New domain knowledge | Index for future retrieval |
| 触发条件 | 操作 |
|---|---|
| 用户更正 | 更新学习到的模式 |
| 任务完成 | 存储结果和实现方案 |
| 错误恢复 | 记录失败原因和有效解决方案 |
| 新增领域知识 | 建立索引供后续检索 |
Planning Strategies
规划策略
Hierarchical Task Decomposition
分层任务拆解
1. Break high-level goal into sub-goals
2. For each sub-goal, identify required actions
3. Order actions by dependencies
4. Execute with checkpoints between phases
5. Re-plan if intermediate results change the approach1. Break high-level goal into sub-goals
2. For each sub-goal, identify required actions
3. Order actions by dependencies
4. Execute with checkpoints between phases
5. Re-plan if intermediate results change the approachReAct Pattern (Reason + Act)
ReAct模式(推理+执行)
Thought: I need to find the user's recent orders to answer their question.
Action: search_orders(user_id="123", limit=5)
Observation: Found 5 orders, most recent is #456 from yesterday.
Thought: The user asked about order #456. I have the details now.
Action: respond with order detailsThought: I need to find the user's recent orders to answer their question.
Action: search_orders(user_id="123", limit=5)
Observation: Found 5 orders, most recent is #456 from yesterday.
Thought: The user asked about order #456. I have the details now.
Action: respond with order detailsPlan-and-Execute Pattern
规划执行模式
1. Create a complete plan before any action
2. Execute each step, checking preconditions
3. After each step, validate the result
4. If a step fails, re-plan from current state
5. Never modify the plan mid-step (finish or abort first)1. Create a complete plan before any action
2. Execute each step, checking preconditions
3. After each step, validate the result
4. If a step fails, re-plan from current state
5. Never modify the plan mid-step (finish or abort first)Reflection Pattern
反思模式
After completing a task:
1. Was the result correct?
2. Was the approach efficient?
3. What could be improved?
4. Should any memory be updated?After completing a task:
1. Was the result correct?
2. Was the approach efficient?
3. What could be improved?
4. Should any memory be updated?Phase 3: Evaluation and Safety
第三阶段:评估与安全
- Build evaluation harness with test scenarios
- Measure accuracy, efficiency, and safety metrics
- Test edge cases and adversarial inputs
- Add monitoring and logging
- Implement circuit breakers for runaway behavior
STOP — All safety guardrails must be tested before deployment.
- 基于测试场景构建评估工具集
- 衡量准确率、效率和安全指标
- 测试边缘 case 和对抗性输入
- 添加监控和日志能力
- 为失控行为实现熔断机制
注意——所有安全护栏必须在部署前完成测试。
Multi-Agent Coordination
多Agent协调
Coordination Pattern Decision Table
协调模式决策表
| Pattern | Description | Use When |
|---|---|---|
| Orchestrator | Central agent delegates to specialists | Clear task hierarchy |
| Pipeline | Agents process in sequence | Linear workflows |
| Debate | Agents propose and critique | Need diverse perspectives |
| Voting | Multiple agents, majority wins | Uncertainty in approach |
| Supervisor | One agent monitors others | Safety-critical tasks |
| 模式 | 描述 | 适用场景 |
|---|---|---|
| 编排器 | 中心Agent将任务委派给专属Agent | 任务层级清晰 |
| 流水线 | Agent按顺序处理任务 | 线性工作流 |
| 辩论 | Agent提出方案并互相评审 | 需要多元视角 |
| 投票 | 多Agent决策,多数胜出 | 方案存在不确定性 |
| 监督者 | 一个Agent监控其他Agent的运行 | 安全优先的任务 |
Communication Protocol
通信协议
Agent-to-Agent message:
{
"from": "planner",
"to": "executor",
"type": "task_assignment",
"content": { "task": "...", "context": "...", "constraints": "..." },
"priority": "high",
"deadline": "2025-01-15T10:00:00Z"
}Agent-to-Agent message:
{
"from": "planner",
"to": "executor",
"type": "task_assignment",
"content": { "task": "...", "context": "...", "constraints": "..." },
"priority": "high",
"deadline": "2025-01-15T10:00:00Z"
}Coordination Rules
协调规则
- Define clear ownership boundaries
- Use structured messages between agents
- Implement deadlock detection
- Set timeouts for inter-agent communication
- Log all inter-agent messages for debugging
- 定义清晰的权责边界
- Agent间使用结构化消息通信
- 实现死锁检测
- 为Agent间通信设置超时
- 记录所有Agent间消息用于调试
Evaluation Framework
评估框架
Metrics Decision Table
指标决策表
| Metric | What It Measures | How to Measure | Target |
|---|---|---|---|
| Task Success Rate | Correct completions / total | Automated + human eval | > 90% |
| Efficiency | Steps vs optimal path | Step count comparison | < 2x optimal |
| Tool Accuracy | Correct tool calls / total | Log analysis | > 95% |
| Safety | Violations / total interactions | Guardrail checks | 0 violations |
| Latency | Time to complete task | Wall clock | < SLA |
| Cost | Token usage per task | API usage tracking | Within budget |
| 指标 | 衡量内容 | 衡量方式 | 目标值 |
|---|---|---|---|
| 任务成功率 | 正确完成的任务数/总任务数 | 自动化+人工评估 | > 90% |
| 效率 | 实际步骤数 vs 最优路径步骤数 | 步骤数对比 | < 2x optimal |
| 工具调用准确率 | 正确的工具调用数/总调用数 | 日志分析 | > 95% |
| 安全性 | 违规次数/总交互次数 | 护栏检测 | 0 violations |
| 延迟 | 任务完成耗时 | 实际运行时间 | < SLA |
| 成本 | 单个任务的Token消耗量 | API使用追踪 | Within budget |
Evaluation Dataset Structure
评估数据集结构
json
{
"test_cases": [
{
"id": "tc_001",
"input": "Find all orders over $100 from last week",
"expected_tools": ["search_orders"],
"expected_output_contains": ["order_id", "amount"],
"category": "retrieval",
"difficulty": "easy"
}
]
}json
{
"test_cases": [
{
"id": "tc_001",
"input": "Find all orders over $100 from last week",
"expected_tools": ["search_orders"],
"expected_output_contains": ["order_id", "amount"],
"category": "retrieval",
"difficulty": "easy"
}
]
}Safety Guardrails
安全护栏
Input Guardrails
输入护栏
- Detect and reject prompt injection attempts
- Validate all user inputs before processing
- Rate limit requests per user/session
- Content filtering for harmful requests
- 检测并拒绝prompt注入尝试
- 处理前校验所有用户输入
- 按用户/会话设置请求速率限制
- 对有害请求做内容过滤
Output Guardrails
输出护栏
- Validate tool call arguments before execution
- Check outputs for sensitive information (PII, secrets)
- Enforce response format constraints
- Prevent infinite tool call loops
- 执行前校验工具调用参数
- 检查输出是否包含敏感信息(PII、密钥)
- 强制遵守响应格式约束
- 避免无限工具调用循环
Operational Guardrails
运行时护栏
- Maximum tool calls per task (circuit breaker)
- Maximum tokens per response
- Timeout for total task duration
- Escalation to human when confidence is low
- Audit logging for all actions
- 单个任务的最大工具调用次数(熔断)
- 单次响应的最大Token数
- 总任务耗时超时限制
- 置信度低时转人工处理
- 所有操作的审计日志
Circuit Breaker Thresholds
熔断阈值
| Condition | Threshold | Action |
|---|---|---|
| Max tool calls per task | 20 | Stop execution, return error |
| Max consecutive errors | 3 | Stop, log, return graceful error |
| Max task duration | 5 minutes | Timeout, return partial result |
| Max tokens generated | 10,000 | Stop generation |
| Pattern repeats | 5 identical errors | Open circuit, alert operator |
| 触发条件 | 阈值 | 执行动作 |
|---|---|---|
| 单个任务最大工具调用次数 | 20 | 停止执行,返回错误 |
| 最大连续错误次数 | 3 | 停止、记录日志、返回友好错误 |
| 最大任务时长 | 5 分钟 | 超时,返回部分结果 |
| 最大生成Token数 | 10,000 | 停止生成 |
| 重复模式 | 5 次相同错误 | 触发熔断,告警运营人员 |
Prompt Engineering for Agents
Agent的Prompt工程
System Prompt Structure
系统Prompt结构
1. Identity and purpose (who the agent is)
2. Available tools (what it can do)
3. Constraints (what it must not do)
4. Output format (how to respond)
5. Examples (few-shot demonstrations)
6. Error handling (what to do when stuck)1. Identity and purpose (who the agent is)
2. Available tools (what it can do)
3. Constraints (what it must not do)
4. Output format (how to respond)
5. Examples (few-shot demonstrations)
6. Error handling (what to do when stuck)Key Prompt Patterns
核心Prompt模式
- Scratchpad: encourage step-by-step reasoning before action
- Self-correction: "If your first approach fails, try..."
- Confidence calibration: "Only proceed if you are confident"
- Graceful degradation: "If you cannot complete the task, explain why"
- 草稿区:鼓励Agent在行动前做逐步推理
- 自纠错:「如果你的第一种方案失败,尝试...」
- 置信度校准:「仅当你有足够置信度时继续执行」
- 优雅降级:「如果你无法完成任务,请解释原因」
Anti-Patterns / Common Mistakes
反模式/常见错误
| Anti-Pattern | Why It Is Wrong | What to Do Instead |
|---|---|---|
| Calling tools without reasoning | Wastes calls, misses context | Use ReAct pattern (think first) |
| No max iteration limit | Infinite loops, runaway costs | Set circuit breaker thresholds |
| Trusting all tool outputs | Corrupted data propagates | Validate tool results |
| Hardcoded tool sequences | No adaptability to failures | Dynamic tool selection based on state |
| No error recovery strategy | Agent gets stuck on first failure | Implement retry with alternatives |
| Apologizing instead of acting | Wastes user time | Take corrective action, then report |
| Over-reliance on single tool | Fragile if that tool fails | Provide fallback tools |
| No evaluation framework | Shipping blind, no quality signal | Build eval harness before deployment |
| Unlimited context growth | Context overflow, degraded quality | Implement memory management |
| 反模式 | 问题 | 优化方案 |
|---|---|---|
| 不做推理直接调用工具 | 浪费调用次数,遗漏上下文 | 使用ReAct模式(先思考再行动) |
| 无最大迭代次数限制 | 无限循环,成本失控 | 设置熔断阈值 |
| 完全信任工具输出 | 损坏的数据会向下传递 | 校验工具返回结果 |
| 硬编码工具调用序列 | 故障时无法适配 | 基于状态动态选择工具 |
| 无错误恢复策略 | 首次失败后Agent就卡住 | 实现带备选方案的重试逻辑 |
| 一味道歉不采取行动 | 浪费用户时间 | 先采取纠正措施,再反馈结果 |
| 过度依赖单一工具 | 工具故障时整个流程失效 | 提供备选工具 |
| 无评估框架 | 发布时无质量参考,相当于盲发 | 部署前搭建评估工具集 |
| 无限制的上下文增长 | 上下文溢出,质量下降 | 实现内存管理 |
Integration Points
集成点
| Skill | Integration |
|---|---|
| MCP servers provide tools for agents |
| Agent planning uses structured plan generation |
| Ralph-style loops are a specialized agent pattern |
| Multi-agent coordination pattern |
| Operational safety for agent loops |
| Agent output validation |
| TDD for agent tool implementations |
| Skill | 集成方式 |
|---|---|
| MCP服务器为Agent提供工具 |
| Agent规划使用结构化的规划生成能力 |
| Ralph风格循环是一种特殊的Agent模式 |
| 多Agent协调模式 |
| Agent循环的运行时安全能力 |
| Agent输出校验 |
| Agent工具实现的TDD开发 |
Skill Type
技能类型
FLEXIBLE — Adapt the agent architecture, memory strategy, and coordination patterns to the specific use case. Safety guardrails and evaluation frameworks are strongly recommended for all production agents.
灵活适配——根据具体使用场景调整Agent架构、内存策略和协调模式。所有生产环境的Agent都强烈建议配置安全护栏和评估框架。