ai-native-development

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

AI-Native Development

AI原生开发

Overview

概述

AI-Native Development focuses on building applications where AI is a first-class citizen, not an afterthought. This skill provides comprehensive patterns for integrating LLMs, implementing RAG (Retrieval-Augmented Generation), using vector databases, building agentic workflows, and optimizing AI application performance and cost.

When to use this skill:

Building chatbots, Q&A systems, or conversational interfaces
Implementing semantic search or recommendation engines
Creating AI agents that can use tools and take actions
Integrating LLMs (OpenAI, Anthropic, open-source models) into applications
Building RAG systems for knowledge retrieval
Optimizing AI costs and latency
Implementing AI observability and monitoring

AI原生开发专注于构建将AI作为一等公民的应用程序，而非事后补充的功能。本技能提供了集成LLM、实现RAG（检索增强生成）、使用向量数据库、构建智能代理工作流以及优化AI应用性能与成本的全面模式。

何时使用本技能：

构建聊天机器人、问答系统或对话界面
实现语义搜索或推荐引擎
创建可使用工具并执行操作的AI Agent
将LLM（OpenAI、Anthropic、开源模型）集成到应用中
构建用于知识检索的RAG系统
优化AI成本与延迟
实现AI可观测性与监控

Why AI-Native Development Matters

AI原生开发的重要性

Traditional software is deterministic; AI-native applications are probabilistic:

Context is Everything: LLMs need relevant context to provide accurate answers
RAG Over Fine-Tuning: Retrieval is cheaper and more flexible than fine-tuning
Embeddings Enable Semantic Search: Move beyond keyword matching to understanding meaning
Agentic Workflows: LLMs can reason, plan, and use tools autonomously
Cost Management: Token usage directly impacts operational costs
Observability: Debugging probabilistic systems requires new approaches
Prompt Engineering: How you ask matters as much as what you ask

传统软件是确定性的；而AI原生应用是概率性的：

上下文至关重要：LLM需要相关上下文才能提供准确答案
RAG优于微调：检索比微调更经济、更灵活
嵌入技术实现语义搜索：超越关键词匹配，实现语义理解
智能代理工作流：LLM可以自主推理、规划并使用工具
成本管理：Token使用量直接影响运营成本
可观测性：调试概率性系统需要新的方法
提示工程：提问的方式和提问的内容同样重要

Core Concepts

核心概念

1. Embeddings & Vector Search

1. 嵌入技术与向量搜索

Embeddings are vector representations of text that capture semantic meaning. Similar concepts have similar vectors.

Key Capabilities:

Convert text to high-dimensional vectors (1536 or 3072 dimensions)
Measure semantic similarity using cosine similarity
Find relevant documents through vector search
Batch process for efficiency

Detailed Implementation: See

references/vector-databases.md

for:

OpenAI embeddings setup and batch processing
Cosine similarity algorithms
Chunking strategies (500-1000 tokens with 10-20% overlap)

嵌入是捕获文本语义的向量表示。相似概念的向量也相似。

核心能力：

将文本转换为高维向量（1536或3072维度）
使用余弦相似度衡量语义相似度
通过向量搜索找到相关文档
批量处理以提升效率

详细实现： 参阅

references/vector-databases.md

获取：

OpenAI嵌入的设置与批量处理方法
余弦相似度算法
分块策略（500-1000个Token，重叠10-20%）

2. Vector Databases

2. 向量数据库

Store and retrieve embeddings efficiently at scale.

Popular Options:

Pinecone: Serverless, managed service ($0.096/hour)
Chroma: Open source, self-hosted
Weaviate: Flexible schema, hybrid search
Qdrant: Rust-based, high performance

Detailed Implementation: See

references/vector-databases.md

for:

Complete setup guides for each database
Upsert, query, update, delete operations
Metadata filtering and hybrid search
Cost comparison and best practices

高效存储和检索大规模嵌入数据。

热门选项：

Pinecone：无服务器托管服务（0.096美元/小时）
Chroma：开源、自托管
Weaviate：灵活的Schema、混合搜索
Qdrant：基于Rust、高性能

详细实现： 参阅

references/vector-databases.md

获取：

各数据库的完整设置指南
插入、查询、更新、删除操作
元数据过滤与混合搜索
成本对比与最佳实践

3. RAG (Retrieval-Augmented Generation)

3. RAG（检索增强生成）

RAG combines retrieval systems with LLMs to provide accurate, grounded answers.

Core Pattern:

Retrieve relevant documents from vector database
Construct context from top results
Generate answer with LLM using retrieved context

Advanced Patterns:

RAG with citations and source tracking
Hybrid search (semantic + keyword)
Multi-query RAG for better recall
HyDE (Hypothetical Document Embeddings)
Contextual compression for relevance

Detailed Implementation: See

references/rag-patterns.md

for:

Basic and advanced RAG patterns with full code
Citation strategies
Hybrid search with Reciprocal Rank Fusion
Conversation memory patterns
Error handling and validation

RAG结合检索系统与LLM，提供准确、有依据的答案。

核心模式：

从向量数据库中检索相关文档
从顶部结果构建上下文
使用检索到的上下文通过LLM生成答案

高级模式：

带引用和来源追踪的RAG
混合搜索（语义+关键词）
多查询RAG以提升召回率
HyDE（假设文档嵌入）
上下文压缩以提升相关性

详细实现： 参阅

references/rag-patterns.md

获取：

基础和高级RAG模式的完整代码
引用策略
基于Reciprocal Rank Fusion的混合搜索
对话记忆模式
错误处理与验证

4. Function Calling & Tool Use

4. 函数调用与工具使用

Enable LLMs to use external tools and APIs reliably.

Capabilities:

Define tools with JSON schemas
Execute functions based on LLM decisions
Handle parallel tool calls
Stream responses with tool use

Detailed Implementation: See

references/function-calling.md

for:

Tool definition patterns (OpenAI and Anthropic)
Function calling loops
Parallel and streaming tool execution
Input validation with Zod
Error handling and fallback strategies

让LLM可靠地使用外部工具和API。

能力：

用JSON Schema定义工具
根据LLM的决策执行函数
处理并行工具调用
结合工具使用实现流式响应

详细实现： 参阅

references/function-calling.md

获取：

工具定义模式（OpenAI和Anthropic）
函数调用循环
并行和流式工具执行
使用Zod进行输入验证
错误处理与回退策略

5. Agentic Workflows

5. 智能代理工作流

Enable LLMs to reason, plan, and take autonomous actions.

Patterns:

ReAct: Reasoning + Acting loop with observations
Tree of Thoughts: Explore multiple reasoning paths
Multi-Agent: Specialized agents collaborating on complex tasks
Autonomous Agents: Self-directed goal achievement

Detailed Implementation: See

references/agentic-workflows.md

for:

Complete ReAct loop implementation
Tree of Thoughts exploration
Multi-agent coordinator patterns
Agent memory management
Error recovery and safety guards

让LLM能够推理、规划并执行自主操作。

模式：

ReAct：带观测的推理+行动循环
Tree of Thoughts：探索多条推理路径
Multi-Agent：专业Agent协作完成复杂任务
Autonomous Agents：自主实现目标

详细实现： 参阅

references/agentic-workflows.md

获取：

完整的ReAct循环实现
Tree of Thoughts探索
多Agent协调模式
Agent内存管理
错误恢复与安全防护

5.1 Multi-Agent Orchestration (Opus 4.5)

5.1 多Agent编排（Opus 4.5）

Advanced multi-agent patterns leveraging Opus 4.5's extended thinking capabilities.

When to Use Extended Thinking:

Coordinating 3+ specialized agents
Complex dependency resolution between agent outputs
Dynamic task allocation based on agent capabilities
Conflict resolution when agents produce contradictory results

Orchestrator Pattern:

typescript

interface AgentTask {
  id: string;
  type: 'research' | 'code' | 'review' | 'design';
  input: unknown;
  dependencies: string[]; // Task IDs that must complete first
}

interface AgentResult {
  taskId: string;
  output: unknown;
  confidence: number;
  reasoning: string;
}

async function orchestrateAgents(
  goal: string,
  availableAgents: Agent[]
): Promise<AgentResult[]> {
  // Step 1: Use extended thinking to decompose goal into tasks
  const taskPlan = await planTasks(goal, availableAgents);

  // Step 2: Build dependency graph
  const dependencyGraph = buildDependencyGraph(taskPlan.tasks);

  // Step 3: Execute tasks respecting dependencies
  const results: AgentResult[] = [];
  const completed = new Set<string>();

  while (completed.size < taskPlan.tasks.length) {
    // Find tasks with satisfied dependencies
    const ready = taskPlan.tasks.filter(task =>
      !completed.has(task.id) &&
      task.dependencies.every(dep => completed.has(dep))
    );

    // Execute ready tasks in parallel
    const batchResults = await Promise.all(
      ready.map(task => executeAgentTask(task, availableAgents))
    );

    // Validate results - use extended thinking for conflicts
    const validatedResults = await validateAndResolveConflicts(
      batchResults,
      results
    );

    results.push(...validatedResults);
    ready.forEach(task => completed.add(task.id));
  }

  return results;
}

Task Planning with Extended Thinking:

Based on Anthropic's Extended Thinking documentation:

typescript

import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic();

async function planTasks(
  goal: string,
  agents: Agent[]
): Promise<{ tasks: AgentTask[]; rationale: string }> {
  // Extended thinking requires budget_tokens < max_tokens
  // Minimum budget: 1,024 tokens
  const response = await anthropic.messages.create({
    model: 'claude-opus-4-5-20251101', // Or claude-sonnet-4-5-20250929
    max_tokens: 16000,
    thinking: {
      type: 'enabled',
      budget_tokens: 10000 // Extended thinking for complex planning
    },
    messages: [{
      role: 'user',
      content: `
        Goal: ${goal}

        Available agents and their capabilities:
        ${agents.map(a => `- ${a.name}: ${a.capabilities.join(', ')}`).join('\n')}

        Decompose this goal into tasks. For each task, specify:
        1. Which agent should handle it
        2. What input it needs
        3. Which other tasks it depends on
        4. Expected output format

        Think carefully about:
        - Optimal parallelization opportunities
        - Potential conflicts between agent outputs
        - Information that needs to flow between tasks
      `
    }]
  });

  // Response contains thinking blocks followed by text blocks
  // content: [{ type: 'thinking', thinking: '...' }, { type: 'text', text: '...' }]
  return parseTaskPlan(response);
}

Conflict Resolution:

typescript

async function validateAndResolveConflicts(
  newResults: AgentResult[],
  existingResults: AgentResult[]
): Promise<AgentResult[]> {
  // Check for conflicts with existing results
  const conflicts = detectConflicts(newResults, existingResults);

  if (conflicts.length === 0) {
    return newResults;
  }

  // Use extended thinking to resolve conflicts
  const resolution = await anthropic.messages.create({
    model: 'claude-opus-4-5-20251101',
    max_tokens: 8000,
    thinking: {
      type: 'enabled',
      budget_tokens: 5000
    },
    messages: [{
      role: 'user',
      content: `
        The following agent outputs conflict:

        ${conflicts.map(c => `
          Conflict: ${c.description}
          Agent A (${c.agentA.name}): ${JSON.stringify(c.resultA)}
          Agent B (${c.agentB.name}): ${JSON.stringify(c.resultB)}
        `).join('\n\n')}

        Analyze each conflict and determine:
        1. Which output is more likely correct and why
        2. If both have merit, how to synthesize them
        3. What additional verification might be needed
      `
    }]
  });

  return applyResolutions(newResults, resolution);
}

Adaptive Agent Selection:

typescript

async function selectOptimalAgent(
  task: AgentTask,
  agents: Agent[],
  context: ExecutionContext
): Promise<Agent> {
  // Score each agent based on:
  // - Capability match
  // - Current load
  // - Historical performance on similar tasks
  // - Cost (model tier)

  const scores = agents.map(agent => ({
    agent,
    score: calculateAgentScore(agent, task, context)
  }));

  // For complex tasks, use Opus; for simple tasks, use Haiku
  const complexity = assessTaskComplexity(task);

  if (complexity > 0.7) {
    // Filter to agents that can use Opus
    const opusCapable = scores.filter(s => s.agent.supportsOpus);
    return opusCapable.sort((a, b) => b.score - a.score)[0].agent;
  }

  return scores.sort((a, b) => b.score - a.score)[0].agent;
}

Agent Communication Protocol:

typescript

interface AgentMessage {
  from: string;
  to: string | 'broadcast';
  type: 'request' | 'response' | 'update' | 'conflict';
  payload: unknown;
  timestamp: Date;
}

class AgentCommunicationBus {
  private messages: AgentMessage[] = [];
  private subscribers: Map<string, (msg: AgentMessage) => void> = new Map();

  send(message: AgentMessage): void {
    this.messages.push(message);

    if (message.to === 'broadcast') {
      this.subscribers.forEach(callback => callback(message));
    } else {
      this.subscribers.get(message.to)?.(message);
    }
  }

  subscribe(agentId: string, callback: (msg: AgentMessage) => void): void {
    this.subscribers.set(agentId, callback);
  }

  getHistory(agentId: string): AgentMessage[] {
    return this.messages.filter(
      m => m.from === agentId || m.to === agentId || m.to === 'broadcast'
    );
  }
}

利用Opus 4.5的扩展思考能力实现高级多Agent模式。

何时使用扩展思考：

协调3个及以上专业Agent
解决Agent输出之间的复杂依赖
根据Agent能力动态分配任务
解决Agent输出矛盾时的冲突

编排器模式：

typescript

interface AgentTask {
  id: string;
  type: 'research' | 'code' | 'review' | 'design';
  input: unknown;
  dependencies: string[]; // Task IDs that must complete first
}

interface AgentResult {
  taskId: string;
  output: unknown;
  confidence: number;
  reasoning: string;
}

async function orchestrateAgents(
  goal: string,
  availableAgents: Agent[]
): Promise<AgentResult[]> {
  // Step 1: Use extended thinking to decompose goal into tasks
  const taskPlan = await planTasks(goal, availableAgents);

  // Step 2: Build dependency graph
  const dependencyGraph = buildDependencyGraph(taskPlan.tasks);

  // Step 3: Execute tasks respecting dependencies
  const results: AgentResult[] = [];
  const completed = new Set<string>();

  while (completed.size < taskPlan.tasks.length) {
    // Find tasks with satisfied dependencies
    const ready = taskPlan.tasks.filter(task =>
      !completed.has(task.id) &&
      task.dependencies.every(dep => completed.has(dep))
    );

    // Execute ready tasks in parallel
    const batchResults = await Promise.all(
      ready.map(task => executeAgentTask(task, availableAgents))
    );

    // Validate results - use extended thinking for conflicts
    const validatedResults = await validateAndResolveConflicts(
      batchResults,
      results
    );

    results.push(...validatedResults);
    ready.forEach(task => completed.add(task.id));
  }

  return results;
}

基于扩展思考的任务规划：

基于Anthropic的扩展思考文档：

typescript

import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic();

async function planTasks(
  goal: string,
  agents: Agent[]
): Promise<{ tasks: AgentTask[]; rationale: string }> {
  // Extended thinking requires budget_tokens < max_tokens
  // Minimum budget: 1,024 tokens
  const response = await anthropic.messages.create({
    model: 'claude-opus-4-5-20251101', // Or claude-sonnet-4-5-20250929
    max_tokens: 16000,
    thinking: {
      type: 'enabled',
      budget_tokens: 10000 // Extended thinking for complex planning
    },
    messages: [{
      role: 'user',
      content: `
        Goal: ${goal}

        Available agents and their capabilities:
        ${agents.map(a => `- ${a.name}: ${a.capabilities.join(', ')}`).join('\n')}

        Decompose this goal into tasks. For each task, specify:
        1. Which agent should handle it
        2. What input it needs
        3. Which other tasks it depends on
        4. Expected output format

        Think carefully about:
        - Optimal parallelization opportunities
        - Potential conflicts between agent outputs
        - Information that needs to flow between tasks
      `
    }]
  });

  // Response contains thinking blocks followed by text blocks
  // content: [{ type: 'thinking', thinking: '...' }, { type: 'text', text: '...' }]
  return parseTaskPlan(response);
}

冲突解决：

typescript

async function validateAndResolveConflicts(
  newResults: AgentResult[],
  existingResults: AgentResult[]
): Promise<AgentResult[]> {
  // Check for conflicts with existing results
  const conflicts = detectConflicts(newResults, existingResults);

  if (conflicts.length === 0) {
    return newResults;
  }

  // Use extended thinking to resolve conflicts
  const resolution = await anthropic.messages.create({
    model: 'claude-opus-4-5-20251101',
    max_tokens: 8000,
    thinking: {
      type: 'enabled',
      budget_tokens: 5000
    },
    messages: [{
      role: 'user',
      content: `
        The following agent outputs conflict:

        ${conflicts.map(c => `
          Conflict: ${c.description}
          Agent A (${c.agentA.name}): ${JSON.stringify(c.resultA)}
          Agent B (${c.agentB.name}): ${JSON.stringify(c.resultB)}
        `).join('\n\n')}

        Analyze each conflict and determine:
        1. Which output is more likely correct and why
        2. If both have merit, how to synthesize them
        3. What additional verification might be needed
      `
    }]
  });

  return applyResolutions(newResults, resolution);
}

自适应Agent选择：

typescript

async function selectOptimalAgent(
  task: AgentTask,
  agents: Agent[],
  context: ExecutionContext
): Promise<Agent> {
  // Score each agent based on:
  // - Capability match
  // - Current load
  // - Historical performance on similar tasks
  // - Cost (model tier)

  const scores = agents.map(agent => ({
    agent,
    score: calculateAgentScore(agent, task, context)
  }));

  // For complex tasks, use Opus; for simple tasks, use Haiku
  const complexity = assessTaskComplexity(task);

  if (complexity > 0.7) {
    // Filter to agents that can use Opus
    const opusCapable = scores.filter(s => s.agent.supportsOpus);
    return opusCapable.sort((a, b) => b.score - a.score)[0].agent;
  }

  return scores.sort((a, b) => b.score - a.score)[0].agent;
}

Agent通信协议：

typescript

interface AgentMessage {
  from: string;
  to: string | 'broadcast';
  type: 'request' | 'response' | 'update' | 'conflict';
  payload: unknown;
  timestamp: Date;
}

class AgentCommunicationBus {
  private messages: AgentMessage[] = [];
  private subscribers: Map<string, (msg: AgentMessage) => void> = new Map();

  send(message: AgentMessage): void {
    this.messages.push(message);

    if (message.to === 'broadcast') {
      this.subscribers.forEach(callback => callback(message));
    } else {
      this.subscribers.get(message.to)?.(message);
    }
  }

  subscribe(agentId: string, callback: (msg: AgentMessage) => void): void {
    this.subscribers.set(agentId, callback);
  }

  getHistory(agentId: string): AgentMessage[] {
    return this.messages.filter(
      m => m.from === agentId || m.to === agentId || m.to === 'broadcast'
    );
  }
}

6. Streaming Responses

6. 流式响应

Deliver real-time AI responses for better UX.

Capabilities:

Stream LLM output token-by-token
Server-Sent Events (SSE) for web clients
Streaming with function calls
Backpressure handling

Detailed Implementation: See

../streaming-api-patterns/SKILL.md

for streaming patterns

提供实时AI响应以提升用户体验。

能力：

逐Token流式输出LLM结果
为Web客户端提供Server-Sent Events (SSE)
结合工具使用实现流式响应
背压处理

详细实现： 参阅

../streaming-api-patterns/SKILL.md

获取流式模式

7. Cost Optimization

7. 成本优化

Strategies:

Use smaller models for simple tasks (GPT-3.5 vs GPT-4)
Implement prompt caching (Anthropic's ephemeral cache)
Batch requests when possible
Set max_tokens to prevent runaway generation
Monitor usage with alerts

Token Counting:

typescript

import { encoding_for_model } from 'tiktoken'

function countTokens(text: string, model = 'gpt-4'): number {
  const encoder = encoding_for_model(model)
  const tokens = encoder.encode(text)
  encoder.free()
  return tokens.length
}

Detailed Implementation: See

references/observability.md

for:

Cost estimation and budget tracking
Model selection strategies
Prompt caching patterns

策略：

简单任务使用小型模型（GPT-3.5 vs GPT-4）
实现提示缓存（Anthropic的临时缓存）
尽可能批量处理请求
设置max_tokens以防止生成失控
监控使用情况并设置告警

Token计数：

typescript

import { encoding_for_model } from 'tiktoken'

function countTokens(text: string, model = 'gpt-4'): number {
  const encoder = encoding_for_model(model)
  const tokens = encoder.encode(text)
  encoder.free()
  return tokens.length
}

详细实现： 参阅

references/observability.md

获取：

成本估算与预算跟踪
模型选择策略
提示缓存模式

8. Observability & Monitoring

8. 可观测性与监控

Track LLM performance, costs, and quality in production.

Tools:

LangSmith: Tracing, evaluation, monitoring
LangFuse: Open-source observability
Custom Logging: Structured logs with metrics

Key Metrics:

Throughput (requests/minute)
Latency (P50, P95, P99)
Token usage and cost
Error rate
Quality scores (relevance, coherence, factuality)

Detailed Implementation: See

references/observability.md

for:

LangSmith and LangFuse integration
Custom logger implementation
Performance monitoring
Quality evaluation
Debugging and error analysis

在生产环境中跟踪LLM的性能、成本和质量。

工具：

LangSmith：追踪、评估、监控
LangFuse：开源可观测性工具
自定义日志：带指标的结构化日志

核心指标：

吞吐量（请求/分钟）
延迟（P50、P95、P99）
Token使用量与成本
错误率
质量评分（相关性、连贯性、事实准确性）

详细实现： 参阅

references/observability.md

获取：

LangSmith与LangFuse集成
自定义日志实现
性能监控
质量评估
调试与错误分析

Searching References

参考资料搜索

This skill includes detailed reference material. Use grep to find specific patterns:

bash

undefined

本技能包含详细的参考资料。使用grep查找特定模式：

bash

undefined

Find RAG patterns

查找RAG模式

grep -r "RAG" references/

Search for specific vector database

搜索特定向量数据库

grep -A 10 "Pinecone Setup" references/vector-databases.md

Find agentic workflow examples

查找智能代理工作流示例

grep -B 5 "ReAct Pattern" references/agentic-workflows.md

Locate function calling patterns

定位函数调用模式

grep -n "parallel.*tool" references/function-calling.md

Search for cost optimization

搜索成本优化相关内容

grep -i "cost|pricing|budget" references/observability.md

Find all code examples for embeddings

查找所有嵌入技术的代码示例

grep -A 20 "async function.*embedding" references/

---

grep -A 20 "async function.*embedding" references/

---

Best Practices

最佳实践

Context Management

上下文管理

✅ Keep context windows under 75% of model limit
✅ Use sliding window for long conversations
✅ Summarize old messages before they scroll out
✅ Remove redundant or irrelevant context

✅ 保持上下文窗口在模型限制的75%以内
✅ 长对话使用滑动窗口
✅ 在旧消息超出窗口前进行总结
✅ 移除冗余或无关的上下文

Embedding Strategy

嵌入策略

✅ Chunk documents to 500-1000 tokens
✅ Overlap chunks by 10-20% for continuity
✅ Include metadata (title, source, date) with chunks
✅ Re-embed when source data changes

✅ 将文档分块为500-1000个Token
✅ 块之间重叠10-20%以保证连续性
✅ 为块添加元数据（标题、来源、日期）
✅ 源数据变更时重新生成嵌入

RAG Quality

RAG质量

✅ Use hybrid search (semantic + keyword)
✅ Re-rank results for relevance
✅ Include citation/source in context
✅ Set temperature low (0.1-0.3) for factual answers
✅ Validate answers against retrieved context

✅ 使用混合搜索（语义+关键词）
✅ 对结果重新排序以提升相关性
✅ 在上下文中包含引用/来源
✅ 为事实性答案设置低温度值（0.1-0.3）
✅ 根据检索到的上下文验证答案

Function Calling

函数调用

✅ Provide clear, concise function descriptions
✅ Use strict JSON schema for parameters
✅ Handle missing or invalid parameters gracefully
✅ Limit to 10-20 tools to avoid confusion
✅ Validate function outputs before returning to LLM

✅ 提供清晰、简洁的函数描述
✅ 为参数使用严格的JSON Schema
✅ 优雅处理缺失或无效参数
✅ 工具数量限制在10-20个以内以避免混淆
✅ 返回给LLM前验证函数输出

Cost Optimization

成本优化

✅ Use smaller models for simple tasks
✅ Implement prompt caching for repeated content
✅ Batch requests when possible
✅ Set max_tokens to prevent runaway generation
✅ Monitor usage with alerts for anomalies

✅ 简单任务使用小型模型
✅ 为重复内容实现提示缓存
✅ 尽可能批量处理请求
✅ 设置max_tokens以防止生成失控
✅ 监控使用情况并设置异常告警

Security

安全

✅ Validate and sanitize user inputs
✅ Never include secrets in prompts
✅ Implement rate limiting
✅ Filter outputs for harmful content
✅ Use separate API keys per environment

✅ 验证并清理用户输入
✅ 绝不在提示中包含机密信息
✅ 实现速率限制
✅ 过滤有害内容输出
✅ 为不同环境使用独立的API密钥

Templates

模板

Use the provided templates for common AI patterns:

templates/rag-pipeline.ts
- Basic RAG implementation
templates/agentic-workflow.ts
- ReAct agent pattern

使用提供的模板实现常见AI模式：

templates/rag-pipeline.ts
- 基础RAG实现
templates/agentic-workflow.ts
- ReAct Agent模式

Examples

示例

Complete RAG Chatbot

完整RAG聊天机器人

See

examples/chatbot-with-rag/

for a full-stack implementation:

Vector database setup with document ingestion
RAG query with citations
Streaming chat interface
Cost tracking and monitoring

参阅

examples/chatbot-with-rag/

获取全栈实现：

向量数据库设置与文档导入
带引用的RAG查询
流式聊天界面
成本跟踪与监控

Checklists

检查清单

AI Implementation Checklist

AI实现检查清单

See

checklists/ai-implementation.md

for comprehensive validation covering:

Vector database setup and configuration
Embedding generation and chunking strategy
RAG pipeline with quality validation
Function calling with error handling
Streaming response implementation
Cost monitoring and budget alerts
Observability and logging
Security and input validation

参阅

checklists/ai-implementation.md

获取全面验证内容，包括：

向量数据库设置与配置
嵌入生成与分块策略
带质量验证的RAG流水线
带错误处理的函数调用
流式响应实现
成本监控与预算告警
可观测性与日志
安全与输入验证

Common Patterns

常见模式

Semantic Caching

语义缓存

Reduce costs by caching similar queries:

typescript

const cache = new Map<string, { embedding: number[]; response: string }>()

async function cachedRAG(query: string) {
  const queryEmbedding = await createEmbedding(query)

  // Check if similar query exists in cache
  for (const [cachedQuery, cached] of cache.entries()) {
    const similarity = cosineSimilarity(queryEmbedding, cached.embedding)
    if (similarity > 0.95) {
      return cached.response
    }
  }

  // Not cached, perform RAG
  const response = await ragQuery(query)
  cache.set(query, { embedding: queryEmbedding, response })
  return response
}

通过缓存相似查询降低成本：

typescript

const cache = new Map<string, { embedding: number[]; response: string }>()

async function cachedRAG(query: string) {
  const queryEmbedding = await createEmbedding(query)

  // Check if similar query exists in cache
  for (const [cachedQuery, cached] of cache.entries()) {
    const similarity = cosineSimilarity(queryEmbedding, cached.embedding)
    if (similarity > 0.95) {
      return cached.response
    }
  }

  // Not cached, perform RAG
  const response = await ragQuery(query)
  cache.set(query, { embedding: queryEmbedding, response })
  return response
}

Conversational Memory

对话记忆

Maintain context across multiple turns:

typescript

interface ConversationMemory {
  messages: Message[] // Last 10 messages
  summary?: string // Summary of older messages
}

async function getConversationContext(userId: string): Promise<Message[]> {
  const memory = await db.memory.findUnique({ where: { userId } })

  return [
    { role: 'system', content: `Previous conversation summary: ${memory.summary}` },
    ...memory.messages.slice(-5) // Last 5 messages
  ]
}

在多轮对话中保持上下文：

typescript

interface ConversationMemory {
  messages: Message[] // Last 10 messages
  summary?: string // Summary of older messages
}

async function getConversationContext(userId: string): Promise<Message[]> {
  const memory = await db.memory.findUnique({ where: { userId } })

  return [
    { role: 'system', content: `Previous conversation summary: ${memory.summary}` },
    ...memory.messages.slice(-5) // Last 5 messages
  ]
}

Prompt Engineering

提示工程

Few-Shot Learning

少样本学习

Provide examples to guide LLM behavior:

typescript

const fewShotExamples = `
Example 1:
Input: "I love this product!"
Sentiment: Positive

Example 2:
Input: "It's okay, nothing special"
Sentiment: Neutral
`

// Include in system prompt

提供示例以引导LLM行为：

typescript

const fewShotExamples = `
Example 1:
Input: "I love this product!"
Sentiment: Positive

Example 2:
Input: "It's okay, nothing special"
Sentiment: Neutral
`

// Include in system prompt

Chain of Thought (CoT)

思维链（CoT）

Ask LLM to show reasoning:

typescript

const prompt = `${problem}\n\nLet's think step by step:`

要求LLM展示推理过程：

typescript

const prompt = `${problem}\n\nLet's think step by step:`

Resources

资源

Next Steps

后续步骤

After mastering AI-Native Development:

Explore Streaming API Patterns skill for real-time AI responses
Use Type Safety & Validation skill for AI input/output validation
Apply Edge Computing Patterns skill for global AI deployment
Reference Observability Patterns for production monitoring

掌握AI原生开发后：

探索流式API模式技能以实现实时AI响应
使用类型安全与验证技能进行AI输入/输出验证
应用边缘计算模式技能实现全球AI部署
参考可观测性模式进行生产环境监控