agentic-development

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Agentic Development Skill

Agent化开发技能

Load with: base.md + llm-patterns.md + [language].md
For building autonomous AI agents that perform multi-step tasks with tools.

加载依赖:base.md + llm-patterns.md + [language].md
用于构建可执行多步骤任务并使用工具的自主AI Agent。

Framework Selection by Language

按语言选择框架

Language/FrameworkDefaultWhy
PythonPydantic AIType-safe, Pydantic validation, multi-model, production-ready
Node.js / Next.jsClaude Agent SDKOfficial Anthropic SDK, tools, multi-agent, native streaming
语言/框架默认选项选择理由
PythonPydantic AI类型安全、支持Pydantic验证、多模型兼容、可用于生产环境
Node.js / Next.jsClaude Agent SDKAnthropic官方SDK、支持工具调用、多Agent协作、原生流式输出

Python: Pydantic AI (Default)

Python:Pydantic AI(默认)

python
from pydantic_ai import Agent
from pydantic import BaseModel

class SearchResult(BaseModel):
    title: str
    url: str
    summary: str

agent = Agent(
    'claude-sonnet-4-20250514',
    result_type=list[SearchResult],
    system_prompt='You are a research assistant.',
)
python
from pydantic_ai import Agent
from pydantic import BaseModel

class SearchResult(BaseModel):
    title: str
    url: str
    summary: str

agent = Agent(
    'claude-sonnet-4-20250514',
    result_type=list[SearchResult],
    system_prompt='You are a research assistant.',
)

Type-safe result

Type-safe result

result = await agent.run('Find articles about AI agents') for item in result.data: print(f"{item.title}: {item.url}")
undefined
result = await agent.run('Find articles about AI agents') for item in result.data: print(f"{item.title}: {item.url}")
undefined

Node.js / Next.js: Claude Agent SDK (Default)

Node.js / Next.js:Claude Agent SDK(默认)

typescript
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

// Define tools
const tools: Anthropic.Tool[] = [
  {
    name: "web_search",
    description: "Search the web for information",
    input_schema: {
      type: "object",
      properties: {
        query: { type: "string", description: "Search query" },
      },
      required: ["query"],
    },
  },
];

// Agentic loop
async function runAgent(prompt: string) {
  const messages: Anthropic.MessageParam[] = [
    { role: "user", content: prompt },
  ];

  while (true) {
    const response = await client.messages.create({
      model: "claude-sonnet-4-20250514",
      max_tokens: 4096,
      tools,
      messages,
    });

    // Check for tool use
    if (response.stop_reason === "tool_use") {
      const toolUse = response.content.find((b) => b.type === "tool_use");
      if (toolUse) {
        const result = await executeTool(toolUse.name, toolUse.input);
        messages.push({ role: "assistant", content: response.content });
        messages.push({
          role: "user",
          content: [{ type: "tool_result", tool_use_id: toolUse.id, content: result }],
        });
        continue;
      }
    }

    // Done - return final response
    return response.content.find((b) => b.type === "text")?.text;
  }
}

typescript
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

// Define tools
const tools: Anthropic.Tool[] = [
  {
    name: "web_search",
    description: "Search the web for information",
    input_schema: {
      type: "object",
      properties: {
        query: { type: "string", description: "Search query" },
      },
      required: ["query"],
    },
  },
];

// Agentic loop
async function runAgent(prompt: string) {
  const messages: Anthropic.MessageParam[] = [
    { role: "user", content: prompt },
  ];

  while (true) {
    const response = await client.messages.create({
      model: "claude-sonnet-4-20250514",
      max_tokens: 4096,
      tools,
      messages,
    });

    // Check for tool use
    if (response.stop_reason === "tool_use") {
      const toolUse = response.content.find((b) => b.type === "tool_use");
      if (toolUse) {
        const result = await executeTool(toolUse.name, toolUse.input);
        messages.push({ role: "assistant", content: response.content });
        messages.push({
          role: "user",
          content: [{ type: "tool_result", tool_use_id: toolUse.id, content: result }],
        });
        continue;
      }
    }

    // Done - return final response
    return response.content.find((b) => b.type === "text")?.text;
  }
}

Core Principle

核心原则

Plan first, act incrementally, verify always.
Agents that research and plan before executing consistently outperform those that jump straight to action. Break complex tasks into verifiable steps, use tools judiciously, and maintain clear state throughout execution.

先规划,再逐步执行,始终验证。
先调研规划再执行的Agent,性能始终优于直接行动的Agent。将复杂任务拆分为可验证的步骤,谨慎使用工具,并在执行过程中保持清晰的状态。

Agent Architecture

Agent架构

Three Components (OpenAI)

三大组件(OpenAI)

┌─────────────────────────────────────────────────┐
│                    AGENT                        │
├─────────────────────────────────────────────────┤
│  Model (Brain)      │ LLM for reasoning &       │
│                     │ decision-making           │
├─────────────────────┼───────────────────────────┤
│  Tools (Arms/Legs)  │ APIs, functions, external │
│                     │ systems for action        │
├─────────────────────┼───────────────────────────┤
│  Instructions       │ System prompts defining   │
│  (Rules)            │ behavior & boundaries     │
└─────────────────────┴───────────────────────────┘
┌─────────────────────────────────────────────────┐
│                    AGENT                        │
├─────────────────────────────────────────────────┤
│  Model (大脑)      │ 用于推理与决策的LLM       │
│                     │                          │
├─────────────────────┼───────────────────────────┤
│  Tools (四肢)       │ 用于执行操作的API、函数、外部 │
│                     │ 系统                      │
├─────────────────────┼───────────────────────────┤
│  Instructions       │ 定义Agent行为与边界的系统提示 │
│  (规则)            │                          │
└─────────────────────┴───────────────────────────┘

Project Structure

项目结构

project/
├── src/
│   ├── agents/
│   │   ├── orchestrator.ts    # Main agent coordinator
│   │   ├── specialized/       # Task-specific agents
│   │   │   ├── researcher.ts
│   │   │   ├── coder.ts
│   │   │   └── reviewer.ts
│   │   └── base.ts            # Shared agent interface
│   ├── tools/
│   │   ├── definitions/       # Tool schemas
│   │   ├── implementations/   # Tool logic
│   │   └── registry.ts        # Tool discovery
│   ├── prompts/
│   │   ├── system/            # Agent instructions
│   │   └── templates/         # Task templates
│   └── memory/
│       ├── conversation.ts    # Short-term context
│       └── persistent.ts      # Long-term storage
├── tests/
│   ├── agents/                # Agent behavior tests
│   ├── tools/                 # Tool unit tests
│   └── evals/                 # End-to-end evaluations
└── skills/                    # Agent skills (Anthropic pattern)
    ├── skill-name/
    │   ├── instructions.md
    │   ├── scripts/
    │   └── resources/

project/
├── src/
│   ├── agents/
│   │   ├── orchestrator.ts    # 主Agent协调器
│   │   ├── specialized/       # 特定任务Agent
│   │   │   ├── researcher.ts
│   │   │   ├── coder.ts
│   │   │   └── reviewer.ts
│   │   └── base.ts            # 共享Agent接口
│   ├── tools/
│   │   ├── definitions/       # 工具Schema
│   │   ├── implementations/   # 工具逻辑实现
│   │   └── registry.ts        # 工具注册中心
│   ├── prompts/
│   │   ├── system/            # Agent指令
│   │   └── templates/         # 任务模板
│   └── memory/
│       ├── conversation.ts    # 短期上下文记忆
│       └── persistent.ts      # 长期存储记忆
├── tests/
│   ├── agents/                # Agent行为测试
│   ├── tools/                 # 工具单元测试
│   └── evals/                 # 端到端评估
└── skills/                    # Agent技能(Anthropic模式)
    ├── skill-name/
    │   ├── instructions.md
    │   ├── scripts/
    │   └── resources/

Workflow Pattern: Explore-Plan-Execute-Verify

工作流模式:探索-规划-执行-验证

1. Explore Phase

1. 探索阶段

typescript
// Gather context before acting
async function explore(task: Task): Promise<Context> {
  const relevantFiles = await agent.searchCodebase(task.query);
  const existingPatterns = await agent.analyzePatterns(relevantFiles);
  const dependencies = await agent.identifyDependencies(task);

  return { relevantFiles, existingPatterns, dependencies };
}
typescript
// 行动前收集上下文信息
async function explore(task: Task): Promise<Context> {
  const relevantFiles = await agent.searchCodebase(task.query);
  const existingPatterns = await agent.analyzePatterns(relevantFiles);
  const dependencies = await agent.identifyDependencies(task);

  return { relevantFiles, existingPatterns, dependencies };
}

2. Plan Phase (Critical)

2. 规划阶段(关键)

typescript
// Plan explicitly before execution
async function plan(task: Task, context: Context): Promise<Plan> {
  const prompt = `
    Task: ${task.description}
    Context: ${JSON.stringify(context)}

    Create a step-by-step plan. For each step:
    1. What action to take
    2. What tools to use
    3. How to verify success
    4. What could go wrong

    Output JSON with steps array.
  `;

  return await llmCall({ prompt, schema: PlanSchema });
}
typescript
// 执行前明确规划
async function plan(task: Task, context: Context): Promise<Plan> {
  const prompt = `
    Task: ${task.description}
    Context: ${JSON.stringify(context)}

    创建分步计划。每个步骤需包含:
    1. 要执行的操作
    2. 使用的工具
    3. 验证成功的方式
    4. 可能出现的问题

    输出包含steps数组的JSON。
  `;

  return await llmCall({ prompt, schema: PlanSchema });
}

3. Execute Phase

3. 执行阶段

typescript
// Execute with verification at each step
async function execute(plan: Plan): Promise<Result[]> {
  const results: Result[] = [];

  for (const step of plan.steps) {
    // Execute single step
    const result = await executeStep(step);

    // Verify before continuing
    if (!await verify(step, result)) {
      // Self-correct or escalate
      const corrected = await selfCorrect(step, result);
      if (!corrected.success) {
        return handleFailure(step, results);
      }
    }

    results.push(result);
  }

  return results;
}
typescript
// 每一步执行后都进行验证
async function execute(plan: Plan): Promise<Result[]> {
  const results: Result[] = [];

  for (const step of plan.steps) {
    // 执行单个步骤
    const result = await executeStep(step);

    // 继续前先验证
    if (!await verify(step, result)) {
      // 自我修正或升级处理
      const corrected = await selfCorrect(step, result);
      if (!corrected.success) {
        return handleFailure(step, results);
      }
    }

    results.push(result);
  }

  return results;
}

4. Verify Phase

4. 验证阶段

typescript
// Independent verification prevents overfitting
async function verify(step: Step, result: Result): Promise<boolean> {
  // Run tests if available
  if (step.testCommand) {
    const testResult = await runCommand(step.testCommand);
    if (!testResult.success) return false;
  }

  // Use LLM to verify against criteria
  const verification = await llmCall({
    prompt: `
      Step: ${step.description}
      Expected: ${step.successCriteria}
      Actual: ${JSON.stringify(result)}

      Does the result satisfy the success criteria?
      Respond with { "passes": boolean, "reasoning": string }
    `,
    schema: VerificationSchema
  });

  return verification.passes;
}

typescript
// 独立验证避免过度拟合
async function verify(step: Step, result: Result): Promise<boolean> {
  // 若有可用测试则运行
  if (step.testCommand) {
    const testResult = await runCommand(step.testCommand);
    if (!testResult.success) return false;
  }

  // 使用LLM根据标准验证
  const verification = await llmCall({
    prompt: `
      Step: ${step.description}
      Expected: ${step.successCriteria}
      Actual: ${JSON.stringify(result)}

      结果是否符合成功标准?
      返回包含{ "passes": boolean, "reasoning": string }的JSON。
    `,
    schema: VerificationSchema
  });

  return verification.passes;
}

Tool Design

工具设计

Tool Definition Pattern

工具定义模式

typescript
// tools/definitions/file-operations.ts
import { z } from 'zod';

export const ReadFileTool = {
  name: 'read_file',
  description: 'Read contents of a file. Use before modifying any file.',
  parameters: z.object({
    path: z.string().describe('Absolute path to the file'),
    startLine: z.number().optional().describe('Start line (1-indexed)'),
    endLine: z.number().optional().describe('End line (1-indexed)'),
  }),
  // Risk level for guardrails (OpenAI pattern)
  riskLevel: 'low' as const,
};

export const WriteFileTool = {
  name: 'write_file',
  description: 'Write content to a file. Always read first to understand context.',
  parameters: z.object({
    path: z.string().describe('Absolute path to the file'),
    content: z.string().describe('Complete file content'),
  }),
  riskLevel: 'medium' as const,
  // Require confirmation for high-risk operations
  requiresConfirmation: true,
};
typescript
// tools/definitions/file-operations.ts
import { z } from 'zod';

export const ReadFileTool = {
  name: 'read_file',
  description: '读取文件内容。修改文件前必须先调用此工具。',
  parameters: z.object({
    path: z.string().describe('文件的绝对路径'),
    startLine: z.number().optional().describe('起始行(从1开始计数)'),
    endLine: z.number().optional().describe('结束行(从1开始计数)'),
  }),
  // 用于护栏的风险等级(OpenAI模式)
  riskLevel: 'low' as const,
};

export const WriteFileTool = {
  name: 'write_file',
  description: '向文件写入内容。写入前务必先读取文件以了解上下文。',
  parameters: z.object({
    path: z.string().describe('文件的绝对路径'),
    content: z.string().describe('完整的文件内容'),
  }),
  riskLevel: 'medium' as const,
  // 高风险操作需要确认
  requiresConfirmation: true,
};

Tool Implementation

工具实现

typescript
// tools/implementations/file-operations.ts
export async function readFile(
  params: z.infer<typeof ReadFileTool.parameters>
): Promise<ToolResult> {
  try {
    const content = await fs.readFile(params.path, 'utf-8');
    const lines = content.split('\n');

    const start = (params.startLine ?? 1) - 1;
    const end = params.endLine ?? lines.length;

    return {
      success: true,
      data: lines.slice(start, end).join('\n'),
      metadata: { totalLines: lines.length }
    };
  } catch (error) {
    return {
      success: false,
      error: `Failed to read file: ${error.message}`
    };
  }
}
typescript
// tools/implementations/file-operations.ts
export async function readFile(
  params: z.infer<typeof ReadFileTool.parameters>
): Promise<ToolResult> {
  try {
    const content = await fs.readFile(params.path, 'utf-8');
    const lines = content.split('\n');

    const start = (params.startLine ?? 1) - 1;
    const end = params.endLine ?? lines.length;

    return {
      success: true,
      data: lines.slice(start, end).join('\n'),
      metadata: { totalLines: lines.length }
    };
  } catch (error) {
    return {
      success: false,
      error: `Failed to read file: ${error.message}`
    };
  }
}

Prefer Built-in Tools (OpenAI)

优先使用内置工具(OpenAI)

typescript
// Use platform-provided tools when available
const agent = createAgent({
  tools: [
    // Built-in tools (handled by platform)
    { type: 'web_search' },
    { type: 'code_interpreter' },

    // Custom tools only when needed
    { type: 'function', function: customDatabaseTool },
  ],
});

typescript
// 可用时优先使用平台提供的工具
const agent = createAgent({
  tools: [
    // 内置工具(由平台处理)
    { type: 'web_search' },
    { type: 'code_interpreter' },

    // 仅在需要时使用自定义工具
    { type: 'function', function: customDatabaseTool },
  ],
});

Multi-Agent Patterns

多Agent模式

Single Agent (Default)

单Agent(默认)

Use one agent for most tasks. Multiple agents add complexity.
大多数任务使用单个Agent即可。多Agent会增加复杂度。

Agent-as-Tool Pattern (OpenAI)

Agent作为工具模式(OpenAI)

typescript
// Expose specialized agents as callable tools
const researchAgent = createAgent({
  name: 'researcher',
  instructions: 'You research topics and return structured findings.',
  tools: [webSearchTool, documentReadTool],
});

const mainAgent = createAgent({
  tools: [
    {
      type: 'function',
      function: {
        name: 'research_topic',
        description: 'Delegate research to specialized agent',
        parameters: ResearchQuerySchema,
        handler: async (query) => researchAgent.run(query),
      },
    },
  ],
});
typescript
// 将专业Agent暴露为可调用工具
const researchAgent = createAgent({
  name: 'researcher',
  instructions: '你负责调研主题并返回结构化结果。',
  tools: [webSearchTool, documentReadTool],
});

const mainAgent = createAgent({
  tools: [
    {
      type: 'function',
      function: {
        name: 'research_topic',
        description: '将调研任务委托给专业Agent',
        parameters: ResearchQuerySchema,
        handler: async (query) => researchAgent.run(query),
      },
    },
  ],
});

Handoff Pattern (OpenAI)

任务交接模式(OpenAI)

typescript
// One-way transfer between agents
const customerServiceAgent = createAgent({
  tools: [
    // Handoff to specialist when needed
    {
      name: 'transfer_to_billing',
      description: 'Transfer to billing specialist for payment issues',
      handler: async (context) => {
        return { handoff: 'billing_agent', context };
      },
    },
  ],
});
typescript
// Agent间的单向任务转移
const customerServiceAgent = createAgent({
  tools: [
    // 必要时将任务转移给专家
    {
      name: 'transfer_to_billing',
      description: '将支付相关问题转移给账单专员',
      handler: async (context) => {
        return { handoff: 'billing_agent', context };
      },
    },
  ],
});

When to Use Multiple Agents

何时使用多Agent

  • Separate task domains with non-overlapping tools
  • Different authorization levels needed
  • Complex workflows with clear handoff points
  • Parallel execution of independent subtasks

  • 任务领域分离,工具无重叠
  • 需要不同的授权级别
  • 具有清晰交接点的复杂工作流
  • 独立子任务的并行执行

Memory & State

记忆与状态

Conversation Memory

对话记忆

typescript
// memory/conversation.ts
interface ConversationMemory {
  messages: Message[];
  maxTokens: number;

  add(message: Message): void;
  getContext(): Message[];
  summarize(): Promise<string>;
}

// Maintain state across tool calls (Gemini pattern)
interface AgentState {
  thoughtSignature?: string;  // Encrypted reasoning state
  conversationId: string;     // For shared memory
  currentPlan?: Plan;
  completedSteps: Step[];
}
typescript
// memory/conversation.ts
interface ConversationMemory {
  messages: Message[];
  maxTokens: number;

  add(message: Message): void;
  getContext(): Message[];
  summarize(): Promise<string>;
}

// 在工具调用间保持状态(Gemini模式)
interface AgentState {
  thoughtSignature?: string;  // 加密的推理状态
  conversationId: string;     // 用于共享记忆
  currentPlan?: Plan;
  completedSteps: Step[];
}

Persistent Memory

持久化记忆

typescript
// memory/persistent.ts
interface PersistentMemory {
  // Store learnings across sessions
  store(key: string, value: any): Promise<void>;
  retrieve(key: string): Promise<any>;

  // Semantic search over past interactions
  search(query: string, limit: number): Promise<Memory[]>;
}

typescript
// memory/persistent.ts
interface PersistentMemory {
  // 跨会话存储学习内容
  store(key: string, value: any): Promise<void>;
  retrieve(key: string): Promise<any>;

  // 对过往交互进行语义搜索
  search(query: string, limit: number): Promise<Memory[]>;
}

Guardrails & Safety

护栏与安全

Multi-Layer Protection (OpenAI)

多层保护(OpenAI)

typescript
// guards/index.ts
interface GuardrailConfig {
  // Input validation
  inputClassifier: (input: string) => Promise<SafetyResult>;

  // Output validation
  outputValidator: (output: string) => Promise<SafetyResult>;

  // Tool risk assessment
  toolRiskLevels: Record<string, 'low' | 'medium' | 'high'>;

  // Actions requiring human approval
  humanInTheLoop: string[];
}

async function executeWithGuardrails(
  agent: Agent,
  input: string,
  config: GuardrailConfig
): Promise<Result> {
  // 1. Check input safety
  const inputCheck = await config.inputClassifier(input);
  if (!inputCheck.safe) {
    return { blocked: true, reason: inputCheck.reason };
  }

  // 2. Execute with tool monitoring
  const result = await agent.run(input, {
    beforeTool: async (tool, params) => {
      const risk = config.toolRiskLevels[tool.name];
      if (risk === 'high' || config.humanInTheLoop.includes(tool.name)) {
        return await requestHumanApproval(tool, params);
      }
      return { approved: true };
    },
  });

  // 3. Validate output
  const outputCheck = await config.outputValidator(result.output);
  if (!outputCheck.safe) {
    return { blocked: true, reason: outputCheck.reason };
  }

  return result;
}
typescript
// guards/index.ts
interface GuardrailConfig {
  // 输入验证
  inputClassifier: (input: string) => Promise<SafetyResult>;

  // 输出验证
  outputValidator: (output: string) => Promise<SafetyResult>;

  // 工具风险评估
  toolRiskLevels: Record<string, 'low' | 'medium' | 'high'>;

  // 需要人工批准的操作
  humanInTheLoop: string[];
}

async function executeWithGuardrails(
  agent: Agent,
  input: string,
  config: GuardrailConfig
): Promise<Result> {
  // 1. 检查输入安全性
  const inputCheck = await config.inputClassifier(input);
  if (!inputCheck.safe) {
    return { blocked: true, reason: inputCheck.reason };
  }

  // 2. 执行时监控工具使用
  const result = await agent.run(input, {
    beforeTool: async (tool, params) => {
      const risk = config.toolRiskLevels[tool.name];
      if (risk === 'high' || config.humanInTheLoop.includes(tool.name)) {
        return await requestHumanApproval(tool, params);
      }
      return { approved: true };
    },
  });

  // 3. 验证输出
  const outputCheck = await config.outputValidator(result.output);
  if (!outputCheck.safe) {
    return { blocked: true, reason: outputCheck.reason };
  }

  return result;
}

Scope Enforcement (OpenAI)

范围限制(OpenAI)

typescript
// Agent must stay within defined scope
const agentInstructions = `
You are a customer service agent for Acme Corp.

SCOPE BOUNDARIES (non-negotiable):
- Only answer questions about Acme products and services
- Never provide legal, medical, or financial advice
- Never access or modify data outside your authorized scope
- If a request is out of scope, politely decline and explain why

If you cannot complete a task within scope, notify the user
and request explicit approval before proceeding.
`;

typescript
// Agent必须在定义的范围内工作
const agentInstructions = `
你是Acme公司的客户服务Agent。

范围边界(不可协商):
- 仅回答关于Acme产品和服务的问题
- 绝不提供法律、医疗或财务建议
- 绝不访问或修改授权范围外的数据
- 如果请求超出范围,请礼貌拒绝并说明原因

如果无法在范围内完成任务,请通知用户
并在继续前请求明确批准。
`;

Model Selection

模型选择

Match Model to Task

按任务匹配模型

Task ComplexityRecommended ModelNotes
Simple, fastgpt-5-mini, claude-haikuLow latency
General purposegpt-4.1, claude-sonnetBalance
Complex reasoningo4-mini, claude-opusHigher accuracy
Deep planninggpt-5 + reasoning, ultrathinkMaximum capability
任务复杂度推荐模型说明
简单、快速响应gpt-5-mini, claude-haiku低延迟
通用任务gpt-4.1, claude-sonnet平衡性能与成本
复杂推理o4-mini, claude-opus更高准确率
深度规划gpt-5 + reasoning, ultrathink最大能力

Gemini-Specific

Gemini特定配置

typescript
// Use thinking_level for reasoning depth
const response = await gemini.generate({
  model: 'gemini-3',
  thinking_level: 'high',  // For complex planning
  temperature: 1.0,        // Optimized for reasoning engine
});

// Preserve thought state across tool calls
const nextResponse = await gemini.generate({
  thoughtSignature: response.thoughtSignature,  // Required for function calling
  // ... rest of params
});
typescript
// 使用thinking_level控制推理深度
const response = await gemini.generate({
  model: 'gemini-3',
  thinking_level: 'high',  // 用于复杂规划
  temperature: 1.0,        // 针对推理引擎优化
});

// 在工具调用间保留思考状态
const nextResponse = await gemini.generate({
  thoughtSignature: response.thoughtSignature,  // 函数调用必需
  // ... 其他参数
});

Claude-Specific (Thinking Modes)

Claude特定配置(思考模式)

typescript
// Trigger extended thinking with keywords
const thinkingLevels = {
  'think': 'standard analysis',
  'think hard': 'deeper reasoning',
  'think harder': 'extensive analysis',
  'ultrathink': 'maximum reasoning budget',
};

const prompt = `
Think hard about this problem before proposing a solution.

Task: ${task.description}
`;

typescript
// 使用关键词触发深度思考
const thinkingLevels = {
  'think': '标准分析',
  'think hard': '深度推理',
  'think harder': '全面分析',
  'ultrathink': '最大推理资源',
};

const prompt = `
在提出解决方案前,请深度思考这个问题。

任务:${task.description}
`;

Testing Agents

测试Agent

Unit Tests (Tools)

单元测试(工具)

typescript
describe('readFile tool', () => {
  it('reads file content correctly', async () => {
    const result = await readFile({ path: '/test/file.txt' });
    expect(result.success).toBe(true);
    expect(result.data).toContain('expected content');
  });
});
typescript
describe('readFile tool', () => {
  it('正确读取文件内容', async () => {
    const result = await readFile({ path: '/test/file.txt' });
    expect(result.success).toBe(true);
    expect(result.data).toContain('expected content');
  });
});

Behavior Tests (Agent Decisions)

行为测试(Agent决策)

typescript
describe('agent planning', () => {
  it('creates plan before executing file modifications', async () => {
    const trace = await agent.runWithTrace('Refactor the auth module');

    // Verify planning happened first
    const firstToolCall = trace.toolCalls[0];
    expect(firstToolCall.name).toBe('read_file');

    // Verify no writes without reads
    const writeIndex = trace.toolCalls.findIndex(t => t.name === 'write_file');
    const readIndex = trace.toolCalls.findIndex(t => t.name === 'read_file');
    expect(readIndex).toBeLessThan(writeIndex);
  });
});
typescript
describe('Agent规划能力', () => {
  it('修改文件前先创建规划', async () => {
    const trace = await agent.runWithTrace('重构认证模块');

    // 验证先执行规划
    const firstToolCall = trace.toolCalls[0];
    expect(firstToolCall.name).toBe('read_file');

    // 验证读取文件后才写入
    const writeIndex = trace.toolCalls.findIndex(t => t.name === 'write_file');
    const readIndex = trace.toolCalls.findIndex(t => t.name === 'read_file');
    expect(readIndex).toBeLessThan(writeIndex);
  });
});

Evaluation Tests

评估测试

typescript
// Run nightly, not in regular CI
describe('Agent Accuracy (Eval)', () => {
  const testCases = loadTestCases('./evals/coding-tasks.json');

  it.each(testCases)('completes $name correctly', async (testCase) => {
    const result = await agent.run(testCase.input);

    // Verify against expected outcomes
    expect(result.filesModified).toEqual(testCase.expectedFiles);
    expect(await runTests(testCase.testCommand)).toBe(true);
  }, 120000);
});

typescript
// 夜间运行,不加入常规CI
describe('Agent准确率(评估)', () => {
  const testCases = loadTestCases('./evals/coding-tasks.json');

  it.each(testCases)('正确完成$name任务', async (testCase) => {
    const result = await agent.run(testCase.input);

    // 验证结果符合预期
    expect(result.filesModified).toEqual(testCase.expectedFiles);
    expect(await runTests(testCase.testCommand)).toBe(true);
  }, 120000);
});

Pydantic AI Patterns (Python Default)

Pydantic AI模式(Python默认)

Project Structure (Python)

Python项目结构

project/
├── src/
│   ├── agents/
│   │   ├── __init__.py
│   │   ├── researcher.py       # Research agent
│   │   ├── coder.py            # Coding agent
│   │   └── orchestrator.py     # Main coordinator
│   ├── tools/
│   │   ├── __init__.py
│   │   ├── web.py              # Web search tools
│   │   ├── files.py            # File operations
│   │   └── database.py         # DB queries
│   ├── models/
│   │   ├── __init__.py
│   │   └── schemas.py          # Pydantic models
│   └── deps.py                 # Dependencies
├── tests/
│   ├── test_agents.py
│   └── test_tools.py
└── pyproject.toml
project/
├── src/
│   ├── agents/
│   │   ├── __init__.py
│   │   ├── researcher.py       # 调研Agent
│   │   ├── coder.py            # 编码Agent
│   │   └── orchestrator.py     # 主协调器
│   ├── tools/
│   │   ├── __init__.py
│   │   ├── web.py              # 网页搜索工具
│   │   ├── files.py            # 文件操作工具
│   │   └── database.py         # 数据库查询工具
│   ├── models/
│   │   ├── __init__.py
│   │   └── schemas.py          # Pydantic模型
│   └── deps.py                 # 依赖项
├── tests/
│   ├── test_agents.py
│   └── test_tools.py
└── pyproject.toml

Agent with Tools

带工具的Agent

python
from pydantic_ai import Agent, RunContext
from pydantic import BaseModel
from httpx import AsyncClient

class SearchResult(BaseModel):
    title: str
    url: str
    snippet: str

class ResearchDeps(BaseModel):
    http_client: AsyncClient
    api_key: str

research_agent = Agent(
    'claude-sonnet-4-20250514',
    deps_type=ResearchDeps,
    result_type=list[SearchResult],
    system_prompt='You are a research assistant. Use tools to find information.',
)

@research_agent.tool
async def web_search(ctx: RunContext[ResearchDeps], query: str) -> list[dict]:
    """Search the web for information."""
    response = await ctx.deps.http_client.get(
        'https://api.search.com/search',
        params={'q': query},
        headers={'Authorization': f'Bearer {ctx.deps.api_key}'},
    )
    return response.json()['results']

@research_agent.tool
async def read_webpage(ctx: RunContext[ResearchDeps], url: str) -> str:
    """Read and extract content from a webpage."""
    response = await ctx.deps.http_client.get(url)
    return response.text[:5000]  # Truncate for context
python
from pydantic_ai import Agent, RunContext
from pydantic import BaseModel
from httpx import AsyncClient

class SearchResult(BaseModel):
    title: str
    url: str
    snippet: str

class ResearchDeps(BaseModel):
    http_client: AsyncClient
    api_key: str

research_agent = Agent(
    'claude-sonnet-4-20250514',
    deps_type=ResearchDeps,
    result_type=list[SearchResult],
    system_prompt='你是一名调研助手。使用工具查找信息。',
)

@research_agent.tool
async def web_search(ctx: RunContext[ResearchDeps], query: str) -> list[dict]:
    """Search the web for information."""
    response = await ctx.deps.http_client.get(
        'https://api.search.com/search',
        params={'q': query},
        headers={'Authorization': f'Bearer {ctx.deps.api_key}'},
    )
    return response.json()['results']

@research_agent.tool
async def read_webpage(ctx: RunContext[ResearchDeps], url: str) -> str:
    """Read and extract content from a webpage."""
    response = await ctx.deps.http_client.get(url)
    return response.text[:5000]  # Truncate for context

Usage

使用示例

async def main(): async with AsyncClient() as client: deps = ResearchDeps(http_client=client, api_key='...') result = await research_agent.run( 'Find recent articles about LLM agents', deps=deps, ) for item in result.data: print(f"- {item.title}")
undefined
async function main() { async with AsyncClient() as client: deps = ResearchDeps(http_client=client, api_key='...') result = await research_agent.run( '查找关于LLM Agent的近期文章', deps=deps, ) for item in result.data: print(f"- {item.title}") }
undefined

Structured Output with Validation

带验证的结构化输出

python
from pydantic import BaseModel, Field
from pydantic_ai import Agent

class CodeReview(BaseModel):
    summary: str = Field(description="Brief summary of the review")
    issues: list[str] = Field(description="List of issues found")
    suggestions: list[str] = Field(description="Improvement suggestions")
    approval: bool = Field(description="Whether code is approved")
    confidence: float = Field(ge=0, le=1, description="Confidence score")

review_agent = Agent(
    'claude-sonnet-4-20250514',
    result_type=CodeReview,
    system_prompt='Review code for quality, security, and best practices.',
)
python
from pydantic import BaseModel, Field
from pydantic_ai import Agent

class CodeReview(BaseModel):
    summary: str = Field(description="评审的简要总结")
    issues: list[str] = Field(description="发现的问题列表")
    suggestions: list[str] = Field(description="改进建议")
    approval: bool = Field(description="代码是否通过评审")
    confidence: float = Field(ge=0, le=1, description="置信度分数")

review_agent = Agent(
    'claude-sonnet-4-20250514',
    result_type=CodeReview,
    system_prompt='从质量、安全性和最佳实践角度评审代码。',
)

Result is validated Pydantic model

结果是经过验证的Pydantic模型

result = await review_agent.run(f"Review this code:\n
python\n{code}\n
") if result.data.approval: print("Code approved!") else: for issue in result.data.issues: print(f"Issue: {issue}")
undefined
result = await review_agent.run(f"评审以下代码:\n
python\n{code}\n
") if result.data.approval: print("代码通过评审!") else: for issue in result.data.issues: print(f"问题: {issue}")
undefined

Multi-Agent Coordination

多Agent协作

python
from pydantic_ai import Agent
python
from pydantic_ai import Agent

Specialized agents

专业Agent

planner = Agent('claude-sonnet-4-20250514', system_prompt='Create detailed plans.') executor = Agent('claude-sonnet-4-20250514', system_prompt='Execute tasks precisely.') reviewer = Agent('claude-sonnet-4-20250514', system_prompt='Review and verify work.')
async def orchestrate(task: str): # 1. Plan plan = await planner.run(f"Create a plan for: {task}")
# 2. Execute each step
results = []
for step in plan.data.steps:
    result = await executor.run(f"Execute: {step}")
    results.append(result.data)

# 3. Review
review = await reviewer.run(
    f"Review the results:\nTask: {task}\nResults: {results}"
)

return review.data
undefined
planner = Agent('claude-sonnet-4-20250514', system_prompt='创建详细的执行计划。') executor = Agent('claude-sonnet-4-20250514', system_prompt='精确执行任务。') reviewer = Agent('claude-sonnet-4-20250514', system_prompt='评审并验证工作成果。')
async def orchestrate(task: str): # 1. 规划 plan = await planner.run(f"为以下任务创建计划: {task}")
# 2. 执行每个步骤
results = []
for step in plan.data.steps:
    result = await executor.run(f"执行: {step}")
    results.append(result.data)

# 3. 评审
review = await reviewer.run(
    f"评审以下结果:\n任务: {task}\n结果: {results}"
)

return review.data
undefined

Streaming Responses

流式响应

python
from pydantic_ai import Agent

agent = Agent('claude-sonnet-4-20250514')

async def stream_response(prompt: str):
    async with agent.run_stream(prompt) as response:
        async for chunk in response.stream():
            print(chunk, end='', flush=True)

    # Get final structured result
    result = await response.get_data()
    return result
python
from pydantic_ai import Agent

agent = Agent('claude-sonnet-4-20250514')

async def stream_response(prompt: str):
    async with agent.run_stream(prompt) as response:
        async for chunk in response.stream():
            print(chunk, end='', flush=True)

    # 获取最终的结构化结果
    result = await response.get_data()
    return result

Testing Agents

测试Agent

python
import pytest
from pydantic_ai import Agent
from pydantic_ai.models.test import TestModel

@pytest.fixture
def test_agent():
    return Agent(
        TestModel(),  # Mock model for testing
        result_type=str,
    )

async def test_agent_response(test_agent):
    result = await test_agent.run('Test prompt')
    assert result.data is not None
python
import pytest
from pydantic_ai import Agent
from pydantic_ai.models.test import TestModel

@pytest.fixture
def test_agent():
    return Agent(
        TestModel(),  # 用于测试的Mock模型
        result_type=str,
    )

async def test_agent_response(test_agent):
    result = await test_agent.run('Test prompt')
    assert result.data is not None

Test with specific responses

使用特定响应测试

async def test_with_mock_response(): model = TestModel() model.seed_response('Expected output')
agent = Agent(model)
result = await agent.run('Any prompt')
assert result.data == 'Expected output'

---
async def test_with_mock_response(): model = TestModel() model.seed_response('Expected output')
agent = Agent(model)
result = await agent.run('Any prompt')
assert result.data == 'Expected output'

---

Skills Pattern (Anthropic)

技能模式(Anthropic)

Skill Structure

技能结构

skills/
└── code-review/
    ├── instructions.md      # How to perform code reviews
    ├── scripts/
    │   └── run-linters.sh   # Supporting scripts
    └── resources/
        └── checklist.md     # Review checklist
skills/
└── code-review/
    ├── instructions.md      # 如何执行代码评审
    ├── scripts/
    │   └── run-linters.sh   # 辅助脚本
    └── resources/
        └── checklist.md     # 评审检查清单

instructions.md Example

instructions.md示例

markdown
undefined
markdown
undefined

Code Review Skill

代码评审技能

When to Use

使用场景

Activate this skill when asked to review code, PRs, or diffs.
当被要求评审代码、PR或差异时激活此技能。

Process

流程

  1. Read the changed files completely
  2. Run linters:
    ./scripts/run-linters.sh
  3. Check against resources/checklist.md
  4. Provide structured feedback
  1. 完整阅读变更文件
  2. 运行代码检查工具:
    ./scripts/run-linters.sh
  3. 根据resources/checklist.md检查
  4. 提供结构化反馈

Output Format

输出格式

  • Summary (1-2 sentences)
  • Issues found (severity: critical/major/minor)
  • Suggestions for improvement
  • Approval recommendation
undefined
  • 总结(1-2句话)
  • 发现的问题(严重程度:critical/major/minor)
  • 改进建议
  • 评审通过建议
undefined

Loading Skills Dynamically

动态加载技能

typescript
async function loadSkill(skillName: string): Promise<Skill> {
  const skillPath = `./skills/${skillName}`;
  const instructions = await fs.readFile(`${skillPath}/instructions.md`, 'utf-8');
  const scripts = await glob(`${skillPath}/scripts/*`);
  const resources = await glob(`${skillPath}/resources/*`);

  return {
    name: skillName,
    instructions,
    scripts: scripts.map(s => ({ name: path.basename(s), path: s })),
    resources: await Promise.all(resources.map(loadResource)),
  };
}

typescript
async function loadSkill(skillName: string): Promise<Skill> {
  const skillPath = `./skills/${skillName}`;
  const instructions = await fs.readFile(`${skillPath}/instructions.md`, 'utf-8');
  const scripts = await glob(`${skillPath}/scripts/*`);
  const resources = await glob(`${skillPath}/resources/*`);

  return {
    name: skillName,
    instructions,
    scripts: scripts.map(s => ({ name: path.basename(s), path: s })),
    resources: await Promise.all(resources.map(loadResource)),
  };
}

Anti-Patterns

反模式

  • No planning before execution - Agents that jump to action make more errors
  • Monolithic agents - One agent with 50 tools becomes confused
  • No verification - Agents must verify their own work
  • Hardcoded tool sequences - Let the model decide tool order
  • Missing guardrails - All agents need safety boundaries
  • No state management - Lose context across tool calls
  • Testing only happy paths - Test failures and edge cases
  • Ignoring model differences - Reasoning models need different prompts
  • No cost tracking - Agentic workflows can be expensive
  • Full automation without oversight - Human-in-the-loop for critical actions

  • 执行前不规划 - 直接行动的Agent错误更多
  • 单体Agent - 一个Agent集成50个工具会导致混乱
  • 无验证机制 - Agent必须验证自身工作成果
  • 硬编码工具调用序列 - 让模型决定工具调用顺序
  • 缺少安全护栏 - 所有Agent都需要安全边界
  • 无状态管理 - 工具调用间丢失上下文
  • 仅测试正常路径 - 测试失败场景和边缘情况
  • 忽略模型差异 - 推理模型需要不同的提示词
  • 无成本跟踪 - Agent工作流可能成本高昂
  • 完全自动化无监督 - 关键操作需人工介入

Quick Reference

快速参考

Agent Development Checklist

Agent开发检查清单

  • Define clear agent scope and boundaries
  • Design tools with explicit schemas and risk levels
  • Implement explore-plan-execute-verify workflow
  • Add multi-layer guardrails
  • Set up conversation and persistent memory
  • Write behavior and evaluation tests
  • Configure appropriate model for task complexity
  • Add human-in-the-loop for high-risk operations
  • Monitor token usage and costs
  • Document skills and instructions
  • 定义清晰的Agent范围与边界
  • 设计带有明确Schema和风险等级的工具
  • 实现探索-规划-执行-验证工作流
  • 添加多层安全护栏
  • 设置对话记忆与持久化记忆
  • 编写行为测试与评估测试
  • 根据任务复杂度配置合适的模型
  • 为高风险操作添加人工介入环节
  • 监控Token使用与成本
  • 记录技能与指令

Thinking Triggers (Claude)

Claude思考触发词

"think"        → Standard analysis
"think hard"   → Deeper reasoning
"think harder" → Extensive analysis
"ultrathink"   → Maximum reasoning
"think"        → 标准分析
"think hard"   → 深度推理
"think harder" → 全面分析
"ultrathink"   → 最大推理资源

Gemini Settings

Gemini配置

thinking_level: "high" | "low"
temperature: 1.0 (keep at 1.0 for reasoning)
thoughtSignature: <pass back for function calling>
thinking_level: "high" | "low"
temperature: 1.0(推理任务保持1.0)
thoughtSignature: <函数调用时需传递>