ai-native-development

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

AI-Native Development

AI原生开发

Overview

概述

AI-Native Development focuses on building applications where AI is a first-class citizen, not an afterthought. This skill provides comprehensive patterns for integrating LLMs, implementing RAG (Retrieval-Augmented Generation), using vector databases, building agentic workflows, and optimizing AI application performance and cost.
When to use this skill:
  • Building chatbots, Q&A systems, or conversational interfaces
  • Implementing semantic search or recommendation engines
  • Creating AI agents that can use tools and take actions
  • Integrating LLMs (OpenAI, Anthropic, open-source models) into applications
  • Building RAG systems for knowledge retrieval
  • Optimizing AI costs and latency
  • Implementing AI observability and monitoring

AI原生开发专注于构建将AI作为一等公民的应用程序,而非事后补充的功能。本技能提供了集成LLM、实现RAG(检索增强生成)、使用向量数据库、构建智能代理工作流以及优化AI应用性能与成本的全面模式。
何时使用本技能:
  • 构建聊天机器人、问答系统或对话界面
  • 实现语义搜索或推荐引擎
  • 创建可使用工具并执行操作的AI Agent
  • 将LLM(OpenAI、Anthropic、开源模型)集成到应用中
  • 构建用于知识检索的RAG系统
  • 优化AI成本与延迟
  • 实现AI可观测性与监控

Why AI-Native Development Matters

AI原生开发的重要性

Traditional software is deterministic; AI-native applications are probabilistic:
  • Context is Everything: LLMs need relevant context to provide accurate answers
  • RAG Over Fine-Tuning: Retrieval is cheaper and more flexible than fine-tuning
  • Embeddings Enable Semantic Search: Move beyond keyword matching to understanding meaning
  • Agentic Workflows: LLMs can reason, plan, and use tools autonomously
  • Cost Management: Token usage directly impacts operational costs
  • Observability: Debugging probabilistic systems requires new approaches
  • Prompt Engineering: How you ask matters as much as what you ask

传统软件是确定性的;而AI原生应用是概率性的:
  • 上下文至关重要:LLM需要相关上下文才能提供准确答案
  • RAG优于微调:检索比微调更经济、更灵活
  • 嵌入技术实现语义搜索:超越关键词匹配,实现语义理解
  • 智能代理工作流:LLM可以自主推理、规划并使用工具
  • 成本管理:Token使用量直接影响运营成本
  • 可观测性:调试概率性系统需要新的方法
  • 提示工程:提问的方式和提问的内容同样重要

Core Concepts

核心概念

1. Embeddings & Vector Search

1. 嵌入技术与向量搜索

Embeddings are vector representations of text that capture semantic meaning. Similar concepts have similar vectors.
Key Capabilities:
  • Convert text to high-dimensional vectors (1536 or 3072 dimensions)
  • Measure semantic similarity using cosine similarity
  • Find relevant documents through vector search
  • Batch process for efficiency
Detailed Implementation: See
references/vector-databases.md
for:
  • OpenAI embeddings setup and batch processing
  • Cosine similarity algorithms
  • Chunking strategies (500-1000 tokens with 10-20% overlap)
嵌入是捕获文本语义的向量表示。相似概念的向量也相似。
核心能力:
  • 将文本转换为高维向量(1536或3072维度)
  • 使用余弦相似度衡量语义相似度
  • 通过向量搜索找到相关文档
  • 批量处理以提升效率
详细实现: 参阅
references/vector-databases.md
获取:
  • OpenAI嵌入的设置与批量处理方法
  • 余弦相似度算法
  • 分块策略(500-1000个Token,重叠10-20%)

2. Vector Databases

2. 向量数据库

Store and retrieve embeddings efficiently at scale.
Popular Options:
  • Pinecone: Serverless, managed service ($0.096/hour)
  • Chroma: Open source, self-hosted
  • Weaviate: Flexible schema, hybrid search
  • Qdrant: Rust-based, high performance
Detailed Implementation: See
references/vector-databases.md
for:
  • Complete setup guides for each database
  • Upsert, query, update, delete operations
  • Metadata filtering and hybrid search
  • Cost comparison and best practices
高效存储和检索大规模嵌入数据。
热门选项:
  • Pinecone:无服务器托管服务(0.096美元/小时)
  • Chroma:开源、自托管
  • Weaviate:灵活的Schema、混合搜索
  • Qdrant:基于Rust、高性能
详细实现: 参阅
references/vector-databases.md
获取:
  • 各数据库的完整设置指南
  • 插入、查询、更新、删除操作
  • 元数据过滤与混合搜索
  • 成本对比与最佳实践

3. RAG (Retrieval-Augmented Generation)

3. RAG(检索增强生成)

RAG combines retrieval systems with LLMs to provide accurate, grounded answers.
Core Pattern:
  1. Retrieve relevant documents from vector database
  2. Construct context from top results
  3. Generate answer with LLM using retrieved context
Advanced Patterns:
  • RAG with citations and source tracking
  • Hybrid search (semantic + keyword)
  • Multi-query RAG for better recall
  • HyDE (Hypothetical Document Embeddings)
  • Contextual compression for relevance
Detailed Implementation: See
references/rag-patterns.md
for:
  • Basic and advanced RAG patterns with full code
  • Citation strategies
  • Hybrid search with Reciprocal Rank Fusion
  • Conversation memory patterns
  • Error handling and validation
RAG结合检索系统与LLM,提供准确、有依据的答案。
核心模式:
  1. 从向量数据库中检索相关文档
  2. 从顶部结果构建上下文
  3. 使用检索到的上下文通过LLM生成答案
高级模式:
  • 带引用和来源追踪的RAG
  • 混合搜索(语义+关键词)
  • 多查询RAG以提升召回率
  • HyDE(假设文档嵌入)
  • 上下文压缩以提升相关性
详细实现: 参阅
references/rag-patterns.md
获取:
  • 基础和高级RAG模式的完整代码
  • 引用策略
  • 基于Reciprocal Rank Fusion的混合搜索
  • 对话记忆模式
  • 错误处理与验证

4. Function Calling & Tool Use

4. 函数调用与工具使用

Enable LLMs to use external tools and APIs reliably.
Capabilities:
  • Define tools with JSON schemas
  • Execute functions based on LLM decisions
  • Handle parallel tool calls
  • Stream responses with tool use
Detailed Implementation: See
references/function-calling.md
for:
  • Tool definition patterns (OpenAI and Anthropic)
  • Function calling loops
  • Parallel and streaming tool execution
  • Input validation with Zod
  • Error handling and fallback strategies
让LLM可靠地使用外部工具和API。
能力:
  • 用JSON Schema定义工具
  • 根据LLM的决策执行函数
  • 处理并行工具调用
  • 结合工具使用实现流式响应
详细实现: 参阅
references/function-calling.md
获取:
  • 工具定义模式(OpenAI和Anthropic)
  • 函数调用循环
  • 并行和流式工具执行
  • 使用Zod进行输入验证
  • 错误处理与回退策略

5. Agentic Workflows

5. 智能代理工作流

Enable LLMs to reason, plan, and take autonomous actions.
Patterns:
  • ReAct: Reasoning + Acting loop with observations
  • Tree of Thoughts: Explore multiple reasoning paths
  • Multi-Agent: Specialized agents collaborating on complex tasks
  • Autonomous Agents: Self-directed goal achievement
Detailed Implementation: See
references/agentic-workflows.md
for:
  • Complete ReAct loop implementation
  • Tree of Thoughts exploration
  • Multi-agent coordinator patterns
  • Agent memory management
  • Error recovery and safety guards
让LLM能够推理、规划并执行自主操作。
模式:
  • ReAct:带观测的推理+行动循环
  • Tree of Thoughts:探索多条推理路径
  • Multi-Agent:专业Agent协作完成复杂任务
  • Autonomous Agents:自主实现目标
详细实现: 参阅
references/agentic-workflows.md
获取:
  • 完整的ReAct循环实现
  • Tree of Thoughts探索
  • 多Agent协调模式
  • Agent内存管理
  • 错误恢复与安全防护

5.1 Multi-Agent Orchestration (Opus 4.5)

5.1 多Agent编排(Opus 4.5)

Advanced multi-agent patterns leveraging Opus 4.5's extended thinking capabilities.
When to Use Extended Thinking:
  • Coordinating 3+ specialized agents
  • Complex dependency resolution between agent outputs
  • Dynamic task allocation based on agent capabilities
  • Conflict resolution when agents produce contradictory results
Orchestrator Pattern:
typescript
interface AgentTask {
  id: string;
  type: 'research' | 'code' | 'review' | 'design';
  input: unknown;
  dependencies: string[]; // Task IDs that must complete first
}

interface AgentResult {
  taskId: string;
  output: unknown;
  confidence: number;
  reasoning: string;
}

async function orchestrateAgents(
  goal: string,
  availableAgents: Agent[]
): Promise<AgentResult[]> {
  // Step 1: Use extended thinking to decompose goal into tasks
  const taskPlan = await planTasks(goal, availableAgents);

  // Step 2: Build dependency graph
  const dependencyGraph = buildDependencyGraph(taskPlan.tasks);

  // Step 3: Execute tasks respecting dependencies
  const results: AgentResult[] = [];
  const completed = new Set<string>();

  while (completed.size < taskPlan.tasks.length) {
    // Find tasks with satisfied dependencies
    const ready = taskPlan.tasks.filter(task =>
      !completed.has(task.id) &&
      task.dependencies.every(dep => completed.has(dep))
    );

    // Execute ready tasks in parallel
    const batchResults = await Promise.all(
      ready.map(task => executeAgentTask(task, availableAgents))
    );

    // Validate results - use extended thinking for conflicts
    const validatedResults = await validateAndResolveConflicts(
      batchResults,
      results
    );

    results.push(...validatedResults);
    ready.forEach(task => completed.add(task.id));
  }

  return results;
}
Task Planning with Extended Thinking:
typescript
import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic();

async function planTasks(
  goal: string,
  agents: Agent[]
): Promise<{ tasks: AgentTask[]; rationale: string }> {
  // Extended thinking requires budget_tokens < max_tokens
  // Minimum budget: 1,024 tokens
  const response = await anthropic.messages.create({
    model: 'claude-opus-4-5-20251101', // Or claude-sonnet-4-5-20250929
    max_tokens: 16000,
    thinking: {
      type: 'enabled',
      budget_tokens: 10000 // Extended thinking for complex planning
    },
    messages: [{
      role: 'user',
      content: `
        Goal: ${goal}

        Available agents and their capabilities:
        ${agents.map(a => `- ${a.name}: ${a.capabilities.join(', ')}`).join('\n')}

        Decompose this goal into tasks. For each task, specify:
        1. Which agent should handle it
        2. What input it needs
        3. Which other tasks it depends on
        4. Expected output format

        Think carefully about:
        - Optimal parallelization opportunities
        - Potential conflicts between agent outputs
        - Information that needs to flow between tasks
      `
    }]
  });

  // Response contains thinking blocks followed by text blocks
  // content: [{ type: 'thinking', thinking: '...' }, { type: 'text', text: '...' }]
  return parseTaskPlan(response);
}
Conflict Resolution:
typescript
async function validateAndResolveConflicts(
  newResults: AgentResult[],
  existingResults: AgentResult[]
): Promise<AgentResult[]> {
  // Check for conflicts with existing results
  const conflicts = detectConflicts(newResults, existingResults);

  if (conflicts.length === 0) {
    return newResults;
  }

  // Use extended thinking to resolve conflicts
  const resolution = await anthropic.messages.create({
    model: 'claude-opus-4-5-20251101',
    max_tokens: 8000,
    thinking: {
      type: 'enabled',
      budget_tokens: 5000
    },
    messages: [{
      role: 'user',
      content: `
        The following agent outputs conflict:

        ${conflicts.map(c => `
          Conflict: ${c.description}
          Agent A (${c.agentA.name}): ${JSON.stringify(c.resultA)}
          Agent B (${c.agentB.name}): ${JSON.stringify(c.resultB)}
        `).join('\n\n')}

        Analyze each conflict and determine:
        1. Which output is more likely correct and why
        2. If both have merit, how to synthesize them
        3. What additional verification might be needed
      `
    }]
  });

  return applyResolutions(newResults, resolution);
}
Adaptive Agent Selection:
typescript
async function selectOptimalAgent(
  task: AgentTask,
  agents: Agent[],
  context: ExecutionContext
): Promise<Agent> {
  // Score each agent based on:
  // - Capability match
  // - Current load
  // - Historical performance on similar tasks
  // - Cost (model tier)

  const scores = agents.map(agent => ({
    agent,
    score: calculateAgentScore(agent, task, context)
  }));

  // For complex tasks, use Opus; for simple tasks, use Haiku
  const complexity = assessTaskComplexity(task);

  if (complexity > 0.7) {
    // Filter to agents that can use Opus
    const opusCapable = scores.filter(s => s.agent.supportsOpus);
    return opusCapable.sort((a, b) => b.score - a.score)[0].agent;
  }

  return scores.sort((a, b) => b.score - a.score)[0].agent;
}
Agent Communication Protocol:
typescript
interface AgentMessage {
  from: string;
  to: string | 'broadcast';
  type: 'request' | 'response' | 'update' | 'conflict';
  payload: unknown;
  timestamp: Date;
}

class AgentCommunicationBus {
  private messages: AgentMessage[] = [];
  private subscribers: Map<string, (msg: AgentMessage) => void> = new Map();

  send(message: AgentMessage): void {
    this.messages.push(message);

    if (message.to === 'broadcast') {
      this.subscribers.forEach(callback => callback(message));
    } else {
      this.subscribers.get(message.to)?.(message);
    }
  }

  subscribe(agentId: string, callback: (msg: AgentMessage) => void): void {
    this.subscribers.set(agentId, callback);
  }

  getHistory(agentId: string): AgentMessage[] {
    return this.messages.filter(
      m => m.from === agentId || m.to === agentId || m.to === 'broadcast'
    );
  }
}
利用Opus 4.5的扩展思考能力实现高级多Agent模式。
何时使用扩展思考:
  • 协调3个及以上专业Agent
  • 解决Agent输出之间的复杂依赖
  • 根据Agent能力动态分配任务
  • 解决Agent输出矛盾时的冲突
编排器模式:
typescript
interface AgentTask {
  id: string;
  type: 'research' | 'code' | 'review' | 'design';
  input: unknown;
  dependencies: string[]; // Task IDs that must complete first
}

interface AgentResult {
  taskId: string;
  output: unknown;
  confidence: number;
  reasoning: string;
}

async function orchestrateAgents(
  goal: string,
  availableAgents: Agent[]
): Promise<AgentResult[]> {
  // Step 1: Use extended thinking to decompose goal into tasks
  const taskPlan = await planTasks(goal, availableAgents);

  // Step 2: Build dependency graph
  const dependencyGraph = buildDependencyGraph(taskPlan.tasks);

  // Step 3: Execute tasks respecting dependencies
  const results: AgentResult[] = [];
  const completed = new Set<string>();

  while (completed.size < taskPlan.tasks.length) {
    // Find tasks with satisfied dependencies
    const ready = taskPlan.tasks.filter(task =>
      !completed.has(task.id) &&
      task.dependencies.every(dep => completed.has(dep))
    );

    // Execute ready tasks in parallel
    const batchResults = await Promise.all(
      ready.map(task => executeAgentTask(task, availableAgents))
    );

    // Validate results - use extended thinking for conflicts
    const validatedResults = await validateAndResolveConflicts(
      batchResults,
      results
    );

    results.push(...validatedResults);
    ready.forEach(task => completed.add(task.id));
  }

  return results;
}
基于扩展思考的任务规划:
typescript
import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic();

async function planTasks(
  goal: string,
  agents: Agent[]
): Promise<{ tasks: AgentTask[]; rationale: string }> {
  // Extended thinking requires budget_tokens < max_tokens
  // Minimum budget: 1,024 tokens
  const response = await anthropic.messages.create({
    model: 'claude-opus-4-5-20251101', // Or claude-sonnet-4-5-20250929
    max_tokens: 16000,
    thinking: {
      type: 'enabled',
      budget_tokens: 10000 // Extended thinking for complex planning
    },
    messages: [{
      role: 'user',
      content: `
        Goal: ${goal}

        Available agents and their capabilities:
        ${agents.map(a => `- ${a.name}: ${a.capabilities.join(', ')}`).join('\n')}

        Decompose this goal into tasks. For each task, specify:
        1. Which agent should handle it
        2. What input it needs
        3. Which other tasks it depends on
        4. Expected output format

        Think carefully about:
        - Optimal parallelization opportunities
        - Potential conflicts between agent outputs
        - Information that needs to flow between tasks
      `
    }]
  });

  // Response contains thinking blocks followed by text blocks
  // content: [{ type: 'thinking', thinking: '...' }, { type: 'text', text: '...' }]
  return parseTaskPlan(response);
}
冲突解决:
typescript
async function validateAndResolveConflicts(
  newResults: AgentResult[],
  existingResults: AgentResult[]
): Promise<AgentResult[]> {
  // Check for conflicts with existing results
  const conflicts = detectConflicts(newResults, existingResults);

  if (conflicts.length === 0) {
    return newResults;
  }

  // Use extended thinking to resolve conflicts
  const resolution = await anthropic.messages.create({
    model: 'claude-opus-4-5-20251101',
    max_tokens: 8000,
    thinking: {
      type: 'enabled',
      budget_tokens: 5000
    },
    messages: [{
      role: 'user',
      content: `
        The following agent outputs conflict:

        ${conflicts.map(c => `
          Conflict: ${c.description}
          Agent A (${c.agentA.name}): ${JSON.stringify(c.resultA)}
          Agent B (${c.agentB.name}): ${JSON.stringify(c.resultB)}
        `).join('\n\n')}

        Analyze each conflict and determine:
        1. Which output is more likely correct and why
        2. If both have merit, how to synthesize them
        3. What additional verification might be needed
      `
    }]
  });

  return applyResolutions(newResults, resolution);
}
自适应Agent选择:
typescript
async function selectOptimalAgent(
  task: AgentTask,
  agents: Agent[],
  context: ExecutionContext
): Promise<Agent> {
  // Score each agent based on:
  // - Capability match
  // - Current load
  // - Historical performance on similar tasks
  // - Cost (model tier)

  const scores = agents.map(agent => ({
    agent,
    score: calculateAgentScore(agent, task, context)
  }));

  // For complex tasks, use Opus; for simple tasks, use Haiku
  const complexity = assessTaskComplexity(task);

  if (complexity > 0.7) {
    // Filter to agents that can use Opus
    const opusCapable = scores.filter(s => s.agent.supportsOpus);
    return opusCapable.sort((a, b) => b.score - a.score)[0].agent;
  }

  return scores.sort((a, b) => b.score - a.score)[0].agent;
}
Agent通信协议:
typescript
interface AgentMessage {
  from: string;
  to: string | 'broadcast';
  type: 'request' | 'response' | 'update' | 'conflict';
  payload: unknown;
  timestamp: Date;
}

class AgentCommunicationBus {
  private messages: AgentMessage[] = [];
  private subscribers: Map<string, (msg: AgentMessage) => void> = new Map();

  send(message: AgentMessage): void {
    this.messages.push(message);

    if (message.to === 'broadcast') {
      this.subscribers.forEach(callback => callback(message));
    } else {
      this.subscribers.get(message.to)?.(message);
    }
  }

  subscribe(agentId: string, callback: (msg: AgentMessage) => void): void {
    this.subscribers.set(agentId, callback);
  }

  getHistory(agentId: string): AgentMessage[] {
    return this.messages.filter(
      m => m.from === agentId || m.to === agentId || m.to === 'broadcast'
    );
  }
}

6. Streaming Responses

6. 流式响应

Deliver real-time AI responses for better UX.
Capabilities:
  • Stream LLM output token-by-token
  • Server-Sent Events (SSE) for web clients
  • Streaming with function calls
  • Backpressure handling
Detailed Implementation: See
../streaming-api-patterns/SKILL.md
for streaming patterns
提供实时AI响应以提升用户体验。
能力:
  • 逐Token流式输出LLM结果
  • 为Web客户端提供Server-Sent Events (SSE)
  • 结合工具使用实现流式响应
  • 背压处理
详细实现: 参阅
../streaming-api-patterns/SKILL.md
获取流式模式

7. Cost Optimization

7. 成本优化

Strategies:
  • Use smaller models for simple tasks (GPT-3.5 vs GPT-4)
  • Implement prompt caching (Anthropic's ephemeral cache)
  • Batch requests when possible
  • Set max_tokens to prevent runaway generation
  • Monitor usage with alerts
Token Counting:
typescript
import { encoding_for_model } from 'tiktoken'

function countTokens(text: string, model = 'gpt-4'): number {
  const encoder = encoding_for_model(model)
  const tokens = encoder.encode(text)
  encoder.free()
  return tokens.length
}
Detailed Implementation: See
references/observability.md
for:
  • Cost estimation and budget tracking
  • Model selection strategies
  • Prompt caching patterns
策略:
  • 简单任务使用小型模型(GPT-3.5 vs GPT-4)
  • 实现提示缓存(Anthropic的临时缓存)
  • 尽可能批量处理请求
  • 设置max_tokens以防止生成失控
  • 监控使用情况并设置告警
Token计数:
typescript
import { encoding_for_model } from 'tiktoken'

function countTokens(text: string, model = 'gpt-4'): number {
  const encoder = encoding_for_model(model)
  const tokens = encoder.encode(text)
  encoder.free()
  return tokens.length
}
详细实现: 参阅
references/observability.md
获取:
  • 成本估算与预算跟踪
  • 模型选择策略
  • 提示缓存模式

8. Observability & Monitoring

8. 可观测性与监控

Track LLM performance, costs, and quality in production.
Tools:
  • LangSmith: Tracing, evaluation, monitoring
  • LangFuse: Open-source observability
  • Custom Logging: Structured logs with metrics
Key Metrics:
  • Throughput (requests/minute)
  • Latency (P50, P95, P99)
  • Token usage and cost
  • Error rate
  • Quality scores (relevance, coherence, factuality)
Detailed Implementation: See
references/observability.md
for:
  • LangSmith and LangFuse integration
  • Custom logger implementation
  • Performance monitoring
  • Quality evaluation
  • Debugging and error analysis

在生产环境中跟踪LLM的性能、成本和质量。
工具:
  • LangSmith:追踪、评估、监控
  • LangFuse:开源可观测性工具
  • 自定义日志:带指标的结构化日志
核心指标:
  • 吞吐量(请求/分钟)
  • 延迟(P50、P95、P99)
  • Token使用量与成本
  • 错误率
  • 质量评分(相关性、连贯性、事实准确性)
详细实现: 参阅
references/observability.md
获取:
  • LangSmith与LangFuse集成
  • 自定义日志实现
  • 性能监控
  • 质量评估
  • 调试与错误分析

Searching References

参考资料搜索

This skill includes detailed reference material. Use grep to find specific patterns:
bash
undefined
本技能包含详细的参考资料。使用grep查找特定模式:
bash
undefined

Find RAG patterns

查找RAG模式

grep -r "RAG" references/
grep -r "RAG" references/

Search for specific vector database

搜索特定向量数据库

grep -A 10 "Pinecone Setup" references/vector-databases.md
grep -A 10 "Pinecone Setup" references/vector-databases.md

Find agentic workflow examples

查找智能代理工作流示例

grep -B 5 "ReAct Pattern" references/agentic-workflows.md
grep -B 5 "ReAct Pattern" references/agentic-workflows.md

Locate function calling patterns

定位函数调用模式

grep -n "parallel.*tool" references/function-calling.md
grep -n "parallel.*tool" references/function-calling.md

Search for cost optimization

搜索成本优化相关内容

grep -i "cost|pricing|budget" references/observability.md
grep -i "cost|pricing|budget" references/observability.md

Find all code examples for embeddings

查找所有嵌入技术的代码示例

grep -A 20 "async function.*embedding" references/

---
grep -A 20 "async function.*embedding" references/

---

Best Practices

最佳实践

Context Management

上下文管理

  • ✅ Keep context windows under 75% of model limit
  • ✅ Use sliding window for long conversations
  • ✅ Summarize old messages before they scroll out
  • ✅ Remove redundant or irrelevant context
  • ✅ 保持上下文窗口在模型限制的75%以内
  • ✅ 长对话使用滑动窗口
  • ✅ 在旧消息超出窗口前进行总结
  • ✅ 移除冗余或无关的上下文

Embedding Strategy

嵌入策略

  • ✅ Chunk documents to 500-1000 tokens
  • ✅ Overlap chunks by 10-20% for continuity
  • ✅ Include metadata (title, source, date) with chunks
  • ✅ Re-embed when source data changes
  • ✅ 将文档分块为500-1000个Token
  • ✅ 块之间重叠10-20%以保证连续性
  • ✅ 为块添加元数据(标题、来源、日期)
  • ✅ 源数据变更时重新生成嵌入

RAG Quality

RAG质量

  • ✅ Use hybrid search (semantic + keyword)
  • ✅ Re-rank results for relevance
  • ✅ Include citation/source in context
  • ✅ Set temperature low (0.1-0.3) for factual answers
  • ✅ Validate answers against retrieved context
  • ✅ 使用混合搜索(语义+关键词)
  • ✅ 对结果重新排序以提升相关性
  • ✅ 在上下文中包含引用/来源
  • ✅ 为事实性答案设置低温度值(0.1-0.3)
  • ✅ 根据检索到的上下文验证答案

Function Calling

函数调用

  • ✅ Provide clear, concise function descriptions
  • ✅ Use strict JSON schema for parameters
  • ✅ Handle missing or invalid parameters gracefully
  • ✅ Limit to 10-20 tools to avoid confusion
  • ✅ Validate function outputs before returning to LLM
  • ✅ 提供清晰、简洁的函数描述
  • ✅ 为参数使用严格的JSON Schema
  • ✅ 优雅处理缺失或无效参数
  • ✅ 工具数量限制在10-20个以内以避免混淆
  • ✅ 返回给LLM前验证函数输出

Cost Optimization

成本优化

  • ✅ Use smaller models for simple tasks
  • ✅ Implement prompt caching for repeated content
  • ✅ Batch requests when possible
  • ✅ Set max_tokens to prevent runaway generation
  • ✅ Monitor usage with alerts for anomalies
  • ✅ 简单任务使用小型模型
  • ✅ 为重复内容实现提示缓存
  • ✅ 尽可能批量处理请求
  • ✅ 设置max_tokens以防止生成失控
  • ✅ 监控使用情况并设置异常告警

Security

安全

  • ✅ Validate and sanitize user inputs
  • ✅ Never include secrets in prompts
  • ✅ Implement rate limiting
  • ✅ Filter outputs for harmful content
  • ✅ Use separate API keys per environment

  • ✅ 验证并清理用户输入
  • ✅ 绝不在提示中包含机密信息
  • ✅ 实现速率限制
  • ✅ 过滤有害内容输出
  • ✅ 为不同环境使用独立的API密钥

Templates

模板

Use the provided templates for common AI patterns:
  • templates/rag-pipeline.ts
    - Basic RAG implementation
  • templates/agentic-workflow.ts
    - ReAct agent pattern

使用提供的模板实现常见AI模式:
  • templates/rag-pipeline.ts
    - 基础RAG实现
  • templates/agentic-workflow.ts
    - ReAct Agent模式

Examples

示例

Complete RAG Chatbot

完整RAG聊天机器人

See
examples/chatbot-with-rag/
for a full-stack implementation:
  • Vector database setup with document ingestion
  • RAG query with citations
  • Streaming chat interface
  • Cost tracking and monitoring

参阅
examples/chatbot-with-rag/
获取全栈实现:
  • 向量数据库设置与文档导入
  • 带引用的RAG查询
  • 流式聊天界面
  • 成本跟踪与监控

Checklists

检查清单

AI Implementation Checklist

AI实现检查清单

See
checklists/ai-implementation.md
for comprehensive validation covering:
  • Vector database setup and configuration
  • Embedding generation and chunking strategy
  • RAG pipeline with quality validation
  • Function calling with error handling
  • Streaming response implementation
  • Cost monitoring and budget alerts
  • Observability and logging
  • Security and input validation

参阅
checklists/ai-implementation.md
获取全面验证内容,包括:
  • 向量数据库设置与配置
  • 嵌入生成与分块策略
  • 带质量验证的RAG流水线
  • 带错误处理的函数调用
  • 流式响应实现
  • 成本监控与预算告警
  • 可观测性与日志
  • 安全与输入验证

Common Patterns

常见模式

Semantic Caching

语义缓存

Reduce costs by caching similar queries:
typescript
const cache = new Map<string, { embedding: number[]; response: string }>()

async function cachedRAG(query: string) {
  const queryEmbedding = await createEmbedding(query)

  // Check if similar query exists in cache
  for (const [cachedQuery, cached] of cache.entries()) {
    const similarity = cosineSimilarity(queryEmbedding, cached.embedding)
    if (similarity > 0.95) {
      return cached.response
    }
  }

  // Not cached, perform RAG
  const response = await ragQuery(query)
  cache.set(query, { embedding: queryEmbedding, response })
  return response
}
通过缓存相似查询降低成本:
typescript
const cache = new Map<string, { embedding: number[]; response: string }>()

async function cachedRAG(query: string) {
  const queryEmbedding = await createEmbedding(query)

  // Check if similar query exists in cache
  for (const [cachedQuery, cached] of cache.entries()) {
    const similarity = cosineSimilarity(queryEmbedding, cached.embedding)
    if (similarity > 0.95) {
      return cached.response
    }
  }

  // Not cached, perform RAG
  const response = await ragQuery(query)
  cache.set(query, { embedding: queryEmbedding, response })
  return response
}

Conversational Memory

对话记忆

Maintain context across multiple turns:
typescript
interface ConversationMemory {
  messages: Message[] // Last 10 messages
  summary?: string // Summary of older messages
}

async function getConversationContext(userId: string): Promise<Message[]> {
  const memory = await db.memory.findUnique({ where: { userId } })

  return [
    { role: 'system', content: `Previous conversation summary: ${memory.summary}` },
    ...memory.messages.slice(-5) // Last 5 messages
  ]
}

在多轮对话中保持上下文:
typescript
interface ConversationMemory {
  messages: Message[] // Last 10 messages
  summary?: string // Summary of older messages
}

async function getConversationContext(userId: string): Promise<Message[]> {
  const memory = await db.memory.findUnique({ where: { userId } })

  return [
    { role: 'system', content: `Previous conversation summary: ${memory.summary}` },
    ...memory.messages.slice(-5) // Last 5 messages
  ]
}

Prompt Engineering

提示工程

Few-Shot Learning

少样本学习

Provide examples to guide LLM behavior:
typescript
const fewShotExamples = `
Example 1:
Input: "I love this product!"
Sentiment: Positive

Example 2:
Input: "It's okay, nothing special"
Sentiment: Neutral
`

// Include in system prompt
提供示例以引导LLM行为:
typescript
const fewShotExamples = `
Example 1:
Input: "I love this product!"
Sentiment: Positive

Example 2:
Input: "It's okay, nothing special"
Sentiment: Neutral
`

// Include in system prompt

Chain of Thought (CoT)

思维链(CoT)

Ask LLM to show reasoning:
typescript
const prompt = `${problem}\n\nLet's think step by step:`

要求LLM展示推理过程:
typescript
const prompt = `${problem}\n\nLet's think step by step:`

Resources

资源

Next Steps

后续步骤

After mastering AI-Native Development:
  1. Explore Streaming API Patterns skill for real-time AI responses
  2. Use Type Safety & Validation skill for AI input/output validation
  3. Apply Edge Computing Patterns skill for global AI deployment
  4. Reference Observability Patterns for production monitoring
掌握AI原生开发后:
  1. 探索流式API模式技能以实现实时AI响应
  2. 使用类型安全与验证技能进行AI输入/输出验证
  3. 应用边缘计算模式技能实现全球AI部署
  4. 参考可观测性模式进行生产环境监控