agent-development

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Agent Development

Agent开发

Overview

概述

Design and build AI agents that effectively use tools, manage memory, plan multi-step tasks, coordinate with other agents, and operate within safety guardrails. This skill covers the full agent development lifecycle from architecture through evaluation, with emphasis on observable, testable, and safe agent behavior.
设计并构建能够高效使用工具、管理内存、规划多步骤任务、与其他Agent协同且在安全护栏范围内运行的AI Agent。本技能覆盖从架构设计到评估的完整Agent开发生命周期,重点关注可观测、可测试且安全的Agent行为。

Phase 1: Agent Design

第一阶段:Agent设计

  1. Define the agent's purpose and scope
  2. Identify required tools and capabilities
  3. Design memory architecture (short-term, long-term)
  4. Plan agent loop structure (observe, think, act)
  5. Define safety boundaries and guardrails
STOP — Present agent design to user for approval before implementation.
  1. 定义Agent的用途和适用范围
  2. 识别所需的工具和能力
  3. 设计内存架构(短期、长期)
  4. 规划Agent循环结构(观察、思考、执行)
  5. 定义安全边界和护栏
注意——在实现前先将Agent设计方案提交给用户确认。

Agent Architecture Decision Table

Agent架构决策表

Agent TypeWhen to UseLoop PatternComplexity
Single-turn tool userSimple queries with tool callsRequest -> Tool -> ResponseLow
ReAct agentMulti-step reasoning tasksThought -> Action -> Observation -> loopMedium
Plan-and-executeComplex tasks with dependenciesPlan -> Execute steps -> ValidateMedium-High
Multi-agent orchestratorParallel/specialized sub-tasksDispatch -> Collect -> SynthesizeHigh
Autonomous loop (Ralph-style)Long-running iterative developmentPlan -> Build -> Verify -> Exit gateHigh
Agent类型适用场景循环模式复杂度
单轮工具调用者需要工具调用的简单查询Request -> Tool -> Response
ReAct Agent多步推理任务Thought -> Action -> Observation -> loop中等
规划执行型有依赖关系的复杂任务Plan -> Execute steps -> Validate中高
多Agent编排器并行/专属子任务Dispatch -> Collect -> Synthesize
自主循环(Ralph风格)长期运行的迭代开发Plan -> Build -> Verify -> Exit gate

Phase 2: Implementation

第二阶段:实现

  1. Build the agent loop with tool dispatch
  2. Implement memory management (context window, persistence)
  3. Add planning and decomposition logic
  4. Integrate error recovery and retry patterns
  5. Implement output validation
STOP — Run smoke tests on the agent loop before adding complexity.
  1. 构建具备工具分发能力的Agent循环
  2. 实现内存管理(上下文窗口、持久化)
  3. 添加规划与任务拆解逻辑
  4. 集成错误恢复与重试模式
  5. 实现输出校验
注意——在增加复杂度前先对Agent循环执行冒烟测试。

Tool Use Patterns

工具使用模式

Tool Definition Best Practices

工具定义最佳实践

PrincipleRuleExample
Clear namingverb-noun format
search_documents
,
create_file
Detailed descriptionsInclude when to use AND when NOT to use"Use for keyword search. Do NOT use for semantic similarity."
Well-typed parametersDescriptions and examples on every param
query: string // "e.g., 'user authentication'"
Predictable returnsConsistent format across toolsAlways return
{ success, data, error }
Self-correcting errorsHelp agent recover"Invalid date format. Expected ISO 8601: YYYY-MM-DD"
原则规则示例
命名清晰动词-名词格式
search_documents
,
create_file
描述详尽包含适用场景和不适用场景"Use for keyword search. Do NOT use for semantic similarity."
参数类型明确每个参数都要有描述和示例
query: string // "e.g., 'user authentication'"
返回值可预测所有工具返回格式统一Always return
{ success, data, error }
错误自提示帮助Agent恢复"Invalid date format. Expected ISO 8601: YYYY-MM-DD"

Tool Selection Strategy

工具选择策略

Given a task:
1. Identify required information and actions
2. Map to available tools
3. Determine tool call order (dependencies)
4. Execute with result validation
5. Retry or try alternative tool on failure
Given a task:
1. Identify required information and actions
2. Map to available tools
3. Determine tool call order (dependencies)
4. Execute with result validation
5. Retry or try alternative tool on failure

Tool Design Principles

工具设计原则

  • Composable: small tools that combine for complex tasks
  • Idempotent: safe to retry without side effects (where possible)
  • Observable: return enough context for the agent to verify success
  • Bounded: timeouts and size limits on all operations
  • Documented: every parameter and return value described
  • 可组合:小型工具可组合完成复杂任务
  • 幂等:尽可能保证重试安全,无副作用
  • 可观测:返回足够上下文供Agent校验执行是否成功
  • 有边界:所有操作都设置超时和大小限制
  • 有文档:每个参数和返回值都有描述

Memory Management

内存管理

Memory Type Decision Table

内存类型决策表

TypeDurationStorageUse Case
Working MemoryCurrent turnContext windowActive reasoning
Short-term MemoryCurrent sessionIn-context or bufferRecent conversation
Long-term MemoryAcross sessionsDatabase/fileLearned patterns, user prefs
Episodic MemorySpecific eventsIndexed storePast task outcomes
Semantic MemoryKnowledgeVector DBDomain knowledge retrieval
类型存续时长存储位置适用场景
工作内存当前轮次上下文窗口活跃推理
短期内存当前会话上下文或缓冲区近期对话
长期内存跨会话数据库/文件学习到的模式、用户偏好
情景内存特定事件索引存储过往任务结果
语义内存知识向量数据库领域知识检索

Context Window Management

上下文窗口管理

Strategy: Sliding window with importance-based retention

1. Always retain: system prompt, tool definitions, current task
2. Summarize: older conversation turns into compressed summaries
3. Evict: least relevant context when approaching limit
4. Retrieve: pull relevant long-term memory on demand

Budget allocation:
  System prompt + tools: ~20%
  Current task context:  ~40%
  Conversation history:  ~25%
  Retrieved memory:      ~15%
Strategy: Sliding window with importance-based retention

1. Always retain: system prompt, tool definitions, current task
2. Summarize: older conversation turns into compressed summaries
3. Evict: least relevant context when approaching limit
4. Retrieve: pull relevant long-term memory on demand

Budget allocation:
  System prompt + tools: ~20%
  Current task context:  ~40%
  Conversation history:  ~25%
  Retrieved memory:      ~15%

Memory Update Triggers

内存更新触发条件

TriggerAction
User correctionUpdate learned patterns
Task completionStore outcome and approach
Error recoveryRecord what failed and what worked
New domain knowledgeIndex for future retrieval
触发条件操作
用户更正更新学习到的模式
任务完成存储结果和实现方案
错误恢复记录失败原因和有效解决方案
新增领域知识建立索引供后续检索

Planning Strategies

规划策略

Hierarchical Task Decomposition

分层任务拆解

1. Break high-level goal into sub-goals
2. For each sub-goal, identify required actions
3. Order actions by dependencies
4. Execute with checkpoints between phases
5. Re-plan if intermediate results change the approach
1. Break high-level goal into sub-goals
2. For each sub-goal, identify required actions
3. Order actions by dependencies
4. Execute with checkpoints between phases
5. Re-plan if intermediate results change the approach

ReAct Pattern (Reason + Act)

ReAct模式(推理+执行)

Thought: I need to find the user's recent orders to answer their question.
Action: search_orders(user_id="123", limit=5)
Observation: Found 5 orders, most recent is #456 from yesterday.
Thought: The user asked about order #456. I have the details now.
Action: respond with order details
Thought: I need to find the user's recent orders to answer their question.
Action: search_orders(user_id="123", limit=5)
Observation: Found 5 orders, most recent is #456 from yesterday.
Thought: The user asked about order #456. I have the details now.
Action: respond with order details

Plan-and-Execute Pattern

规划执行模式

1. Create a complete plan before any action
2. Execute each step, checking preconditions
3. After each step, validate the result
4. If a step fails, re-plan from current state
5. Never modify the plan mid-step (finish or abort first)
1. Create a complete plan before any action
2. Execute each step, checking preconditions
3. After each step, validate the result
4. If a step fails, re-plan from current state
5. Never modify the plan mid-step (finish or abort first)

Reflection Pattern

反思模式

After completing a task:
1. Was the result correct?
2. Was the approach efficient?
3. What could be improved?
4. Should any memory be updated?
After completing a task:
1. Was the result correct?
2. Was the approach efficient?
3. What could be improved?
4. Should any memory be updated?

Phase 3: Evaluation and Safety

第三阶段:评估与安全

  1. Build evaluation harness with test scenarios
  2. Measure accuracy, efficiency, and safety metrics
  3. Test edge cases and adversarial inputs
  4. Add monitoring and logging
  5. Implement circuit breakers for runaway behavior
STOP — All safety guardrails must be tested before deployment.
  1. 基于测试场景构建评估工具集
  2. 衡量准确率、效率和安全指标
  3. 测试边缘 case 和对抗性输入
  4. 添加监控和日志能力
  5. 为失控行为实现熔断机制
注意——所有安全护栏必须在部署前完成测试。

Multi-Agent Coordination

多Agent协调

Coordination Pattern Decision Table

协调模式决策表

PatternDescriptionUse When
OrchestratorCentral agent delegates to specialistsClear task hierarchy
PipelineAgents process in sequenceLinear workflows
DebateAgents propose and critiqueNeed diverse perspectives
VotingMultiple agents, majority winsUncertainty in approach
SupervisorOne agent monitors othersSafety-critical tasks
模式描述适用场景
编排器中心Agent将任务委派给专属Agent任务层级清晰
流水线Agent按顺序处理任务线性工作流
辩论Agent提出方案并互相评审需要多元视角
投票多Agent决策,多数胜出方案存在不确定性
监督者一个Agent监控其他Agent的运行安全优先的任务

Communication Protocol

通信协议

Agent-to-Agent message:
{
  "from": "planner",
  "to": "executor",
  "type": "task_assignment",
  "content": { "task": "...", "context": "...", "constraints": "..." },
  "priority": "high",
  "deadline": "2025-01-15T10:00:00Z"
}
Agent-to-Agent message:
{
  "from": "planner",
  "to": "executor",
  "type": "task_assignment",
  "content": { "task": "...", "context": "...", "constraints": "..." },
  "priority": "high",
  "deadline": "2025-01-15T10:00:00Z"
}

Coordination Rules

协调规则

  • Define clear ownership boundaries
  • Use structured messages between agents
  • Implement deadlock detection
  • Set timeouts for inter-agent communication
  • Log all inter-agent messages for debugging
  • 定义清晰的权责边界
  • Agent间使用结构化消息通信
  • 实现死锁检测
  • 为Agent间通信设置超时
  • 记录所有Agent间消息用于调试

Evaluation Framework

评估框架

Metrics Decision Table

指标决策表

MetricWhat It MeasuresHow to MeasureTarget
Task Success RateCorrect completions / totalAutomated + human eval> 90%
EfficiencySteps vs optimal pathStep count comparison< 2x optimal
Tool AccuracyCorrect tool calls / totalLog analysis> 95%
SafetyViolations / total interactionsGuardrail checks0 violations
LatencyTime to complete taskWall clock< SLA
CostToken usage per taskAPI usage trackingWithin budget
指标衡量内容衡量方式目标值
任务成功率正确完成的任务数/总任务数自动化+人工评估> 90%
效率实际步骤数 vs 最优路径步骤数步骤数对比< 2x optimal
工具调用准确率正确的工具调用数/总调用数日志分析> 95%
安全性违规次数/总交互次数护栏检测0 violations
延迟任务完成耗时实际运行时间< SLA
成本单个任务的Token消耗量API使用追踪Within budget

Evaluation Dataset Structure

评估数据集结构

json
{
  "test_cases": [
    {
      "id": "tc_001",
      "input": "Find all orders over $100 from last week",
      "expected_tools": ["search_orders"],
      "expected_output_contains": ["order_id", "amount"],
      "category": "retrieval",
      "difficulty": "easy"
    }
  ]
}
json
{
  "test_cases": [
    {
      "id": "tc_001",
      "input": "Find all orders over $100 from last week",
      "expected_tools": ["search_orders"],
      "expected_output_contains": ["order_id", "amount"],
      "category": "retrieval",
      "difficulty": "easy"
    }
  ]
}

Safety Guardrails

安全护栏

Input Guardrails

输入护栏

  • Detect and reject prompt injection attempts
  • Validate all user inputs before processing
  • Rate limit requests per user/session
  • Content filtering for harmful requests
  • 检测并拒绝prompt注入尝试
  • 处理前校验所有用户输入
  • 按用户/会话设置请求速率限制
  • 对有害请求做内容过滤

Output Guardrails

输出护栏

  • Validate tool call arguments before execution
  • Check outputs for sensitive information (PII, secrets)
  • Enforce response format constraints
  • Prevent infinite tool call loops
  • 执行前校验工具调用参数
  • 检查输出是否包含敏感信息(PII、密钥)
  • 强制遵守响应格式约束
  • 避免无限工具调用循环

Operational Guardrails

运行时护栏

  • Maximum tool calls per task (circuit breaker)
  • Maximum tokens per response
  • Timeout for total task duration
  • Escalation to human when confidence is low
  • Audit logging for all actions
  • 单个任务的最大工具调用次数(熔断)
  • 单次响应的最大Token数
  • 总任务耗时超时限制
  • 置信度低时转人工处理
  • 所有操作的审计日志

Circuit Breaker Thresholds

熔断阈值

ConditionThresholdAction
Max tool calls per task20Stop execution, return error
Max consecutive errors3Stop, log, return graceful error
Max task duration5 minutesTimeout, return partial result
Max tokens generated10,000Stop generation
Pattern repeats5 identical errorsOpen circuit, alert operator
触发条件阈值执行动作
单个任务最大工具调用次数20停止执行,返回错误
最大连续错误次数3停止、记录日志、返回友好错误
最大任务时长5 分钟超时,返回部分结果
最大生成Token数10,000停止生成
重复模式5 次相同错误触发熔断,告警运营人员

Prompt Engineering for Agents

Agent的Prompt工程

System Prompt Structure

系统Prompt结构

1. Identity and purpose (who the agent is)
2. Available tools (what it can do)
3. Constraints (what it must not do)
4. Output format (how to respond)
5. Examples (few-shot demonstrations)
6. Error handling (what to do when stuck)
1. Identity and purpose (who the agent is)
2. Available tools (what it can do)
3. Constraints (what it must not do)
4. Output format (how to respond)
5. Examples (few-shot demonstrations)
6. Error handling (what to do when stuck)

Key Prompt Patterns

核心Prompt模式

  • Scratchpad: encourage step-by-step reasoning before action
  • Self-correction: "If your first approach fails, try..."
  • Confidence calibration: "Only proceed if you are confident"
  • Graceful degradation: "If you cannot complete the task, explain why"
  • 草稿区:鼓励Agent在行动前做逐步推理
  • 自纠错:「如果你的第一种方案失败,尝试...」
  • 置信度校准:「仅当你有足够置信度时继续执行」
  • 优雅降级:「如果你无法完成任务,请解释原因」

Anti-Patterns / Common Mistakes

反模式/常见错误

Anti-PatternWhy It Is WrongWhat to Do Instead
Calling tools without reasoningWastes calls, misses contextUse ReAct pattern (think first)
No max iteration limitInfinite loops, runaway costsSet circuit breaker thresholds
Trusting all tool outputsCorrupted data propagatesValidate tool results
Hardcoded tool sequencesNo adaptability to failuresDynamic tool selection based on state
No error recovery strategyAgent gets stuck on first failureImplement retry with alternatives
Apologizing instead of actingWastes user timeTake corrective action, then report
Over-reliance on single toolFragile if that tool failsProvide fallback tools
No evaluation frameworkShipping blind, no quality signalBuild eval harness before deployment
Unlimited context growthContext overflow, degraded qualityImplement memory management
反模式问题优化方案
不做推理直接调用工具浪费调用次数,遗漏上下文使用ReAct模式(先思考再行动)
无最大迭代次数限制无限循环,成本失控设置熔断阈值
完全信任工具输出损坏的数据会向下传递校验工具返回结果
硬编码工具调用序列故障时无法适配基于状态动态选择工具
无错误恢复策略首次失败后Agent就卡住实现带备选方案的重试逻辑
一味道歉不采取行动浪费用户时间先采取纠正措施,再反馈结果
过度依赖单一工具工具故障时整个流程失效提供备选工具
无评估框架发布时无质量参考,相当于盲发部署前搭建评估工具集
无限制的上下文增长上下文溢出,质量下降实现内存管理

Integration Points

集成点

SkillIntegration
mcp-builder
MCP servers provide tools for agents
planning
Agent planning uses structured plan generation
autonomous-loop
Ralph-style loops are a specialized agent pattern
dispatching-parallel-agents
Multi-agent coordination pattern
circuit-breaker
Operational safety for agent loops
verification-before-completion
Agent output validation
test-driven-development
TDD for agent tool implementations
Skill集成方式
mcp-builder
MCP服务器为Agent提供工具
planning
Agent规划使用结构化的规划生成能力
autonomous-loop
Ralph风格循环是一种特殊的Agent模式
dispatching-parallel-agents
多Agent协调模式
circuit-breaker
Agent循环的运行时安全能力
verification-before-completion
Agent输出校验
test-driven-development
Agent工具实现的TDD开发

Skill Type

技能类型

FLEXIBLE — Adapt the agent architecture, memory strategy, and coordination patterns to the specific use case. Safety guardrails and evaluation frameworks are strongly recommended for all production agents.
灵活适配——根据具体使用场景调整Agent架构、内存策略和协调模式。所有生产环境的Agent都强烈建议配置安全护栏和评估框架。