ai-native-development
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAI-Native Development
AI原生开发
Overview
概述
AI-Native Development focuses on building applications where AI is a first-class citizen, not an afterthought. This skill provides comprehensive patterns for integrating LLMs, implementing RAG (Retrieval-Augmented Generation), using vector databases, building agentic workflows, and optimizing AI application performance and cost.
When to use this skill:
- Building chatbots, Q&A systems, or conversational interfaces
- Implementing semantic search or recommendation engines
- Creating AI agents that can use tools and take actions
- Integrating LLMs (OpenAI, Anthropic, open-source models) into applications
- Building RAG systems for knowledge retrieval
- Optimizing AI costs and latency
- Implementing AI observability and monitoring
AI原生开发专注于构建将AI作为一等公民的应用程序,而非事后补充的功能。本技能提供了集成LLM、实现RAG(检索增强生成)、使用向量数据库、构建智能代理工作流以及优化AI应用性能与成本的全面模式。
何时使用本技能:
- 构建聊天机器人、问答系统或对话界面
- 实现语义搜索或推荐引擎
- 创建可使用工具并执行操作的AI Agent
- 将LLM(OpenAI、Anthropic、开源模型)集成到应用中
- 构建用于知识检索的RAG系统
- 优化AI成本与延迟
- 实现AI可观测性与监控
Why AI-Native Development Matters
AI原生开发的重要性
Traditional software is deterministic; AI-native applications are probabilistic:
- Context is Everything: LLMs need relevant context to provide accurate answers
- RAG Over Fine-Tuning: Retrieval is cheaper and more flexible than fine-tuning
- Embeddings Enable Semantic Search: Move beyond keyword matching to understanding meaning
- Agentic Workflows: LLMs can reason, plan, and use tools autonomously
- Cost Management: Token usage directly impacts operational costs
- Observability: Debugging probabilistic systems requires new approaches
- Prompt Engineering: How you ask matters as much as what you ask
传统软件是确定性的;而AI原生应用是概率性的:
- 上下文至关重要:LLM需要相关上下文才能提供准确答案
- RAG优于微调:检索比微调更经济、更灵活
- 嵌入技术实现语义搜索:超越关键词匹配,实现语义理解
- 智能代理工作流:LLM可以自主推理、规划并使用工具
- 成本管理:Token使用量直接影响运营成本
- 可观测性:调试概率性系统需要新的方法
- 提示工程:提问的方式和提问的内容同样重要
Core Concepts
核心概念
1. Embeddings & Vector Search
1. 嵌入技术与向量搜索
Embeddings are vector representations of text that capture semantic meaning. Similar concepts have similar vectors.
Key Capabilities:
- Convert text to high-dimensional vectors (1536 or 3072 dimensions)
- Measure semantic similarity using cosine similarity
- Find relevant documents through vector search
- Batch process for efficiency
Detailed Implementation: See for:
references/vector-databases.md- OpenAI embeddings setup and batch processing
- Cosine similarity algorithms
- Chunking strategies (500-1000 tokens with 10-20% overlap)
嵌入是捕获文本语义的向量表示。相似概念的向量也相似。
核心能力:
- 将文本转换为高维向量(1536或3072维度)
- 使用余弦相似度衡量语义相似度
- 通过向量搜索找到相关文档
- 批量处理以提升效率
详细实现: 参阅 获取:
references/vector-databases.md- OpenAI嵌入的设置与批量处理方法
- 余弦相似度算法
- 分块策略(500-1000个Token,重叠10-20%)
2. Vector Databases
2. 向量数据库
Store and retrieve embeddings efficiently at scale.
Popular Options:
- Pinecone: Serverless, managed service ($0.096/hour)
- Chroma: Open source, self-hosted
- Weaviate: Flexible schema, hybrid search
- Qdrant: Rust-based, high performance
Detailed Implementation: See for:
references/vector-databases.md- Complete setup guides for each database
- Upsert, query, update, delete operations
- Metadata filtering and hybrid search
- Cost comparison and best practices
高效存储和检索大规模嵌入数据。
热门选项:
- Pinecone:无服务器托管服务(0.096美元/小时)
- Chroma:开源、自托管
- Weaviate:灵活的Schema、混合搜索
- Qdrant:基于Rust、高性能
详细实现: 参阅 获取:
references/vector-databases.md- 各数据库的完整设置指南
- 插入、查询、更新、删除操作
- 元数据过滤与混合搜索
- 成本对比与最佳实践
3. RAG (Retrieval-Augmented Generation)
3. RAG(检索增强生成)
RAG combines retrieval systems with LLMs to provide accurate, grounded answers.
Core Pattern:
- Retrieve relevant documents from vector database
- Construct context from top results
- Generate answer with LLM using retrieved context
Advanced Patterns:
- RAG with citations and source tracking
- Hybrid search (semantic + keyword)
- Multi-query RAG for better recall
- HyDE (Hypothetical Document Embeddings)
- Contextual compression for relevance
Detailed Implementation: See for:
references/rag-patterns.md- Basic and advanced RAG patterns with full code
- Citation strategies
- Hybrid search with Reciprocal Rank Fusion
- Conversation memory patterns
- Error handling and validation
RAG结合检索系统与LLM,提供准确、有依据的答案。
核心模式:
- 从向量数据库中检索相关文档
- 从顶部结果构建上下文
- 使用检索到的上下文通过LLM生成答案
高级模式:
- 带引用和来源追踪的RAG
- 混合搜索(语义+关键词)
- 多查询RAG以提升召回率
- HyDE(假设文档嵌入)
- 上下文压缩以提升相关性
详细实现: 参阅 获取:
references/rag-patterns.md- 基础和高级RAG模式的完整代码
- 引用策略
- 基于Reciprocal Rank Fusion的混合搜索
- 对话记忆模式
- 错误处理与验证
4. Function Calling & Tool Use
4. 函数调用与工具使用
Enable LLMs to use external tools and APIs reliably.
Capabilities:
- Define tools with JSON schemas
- Execute functions based on LLM decisions
- Handle parallel tool calls
- Stream responses with tool use
Detailed Implementation: See for:
references/function-calling.md- Tool definition patterns (OpenAI and Anthropic)
- Function calling loops
- Parallel and streaming tool execution
- Input validation with Zod
- Error handling and fallback strategies
让LLM可靠地使用外部工具和API。
能力:
- 用JSON Schema定义工具
- 根据LLM的决策执行函数
- 处理并行工具调用
- 结合工具使用实现流式响应
详细实现: 参阅 获取:
references/function-calling.md- 工具定义模式(OpenAI和Anthropic)
- 函数调用循环
- 并行和流式工具执行
- 使用Zod进行输入验证
- 错误处理与回退策略
5. Agentic Workflows
5. 智能代理工作流
Enable LLMs to reason, plan, and take autonomous actions.
Patterns:
- ReAct: Reasoning + Acting loop with observations
- Tree of Thoughts: Explore multiple reasoning paths
- Multi-Agent: Specialized agents collaborating on complex tasks
- Autonomous Agents: Self-directed goal achievement
Detailed Implementation: See for:
references/agentic-workflows.md- Complete ReAct loop implementation
- Tree of Thoughts exploration
- Multi-agent coordinator patterns
- Agent memory management
- Error recovery and safety guards
让LLM能够推理、规划并执行自主操作。
模式:
- ReAct:带观测的推理+行动循环
- Tree of Thoughts:探索多条推理路径
- Multi-Agent:专业Agent协作完成复杂任务
- Autonomous Agents:自主实现目标
详细实现: 参阅 获取:
references/agentic-workflows.md- 完整的ReAct循环实现
- Tree of Thoughts探索
- 多Agent协调模式
- Agent内存管理
- 错误恢复与安全防护
5.1 Multi-Agent Orchestration (Opus 4.5)
5.1 多Agent编排(Opus 4.5)
Advanced multi-agent patterns leveraging Opus 4.5's extended thinking capabilities.
When to Use Extended Thinking:
- Coordinating 3+ specialized agents
- Complex dependency resolution between agent outputs
- Dynamic task allocation based on agent capabilities
- Conflict resolution when agents produce contradictory results
Orchestrator Pattern:
typescript
interface AgentTask {
id: string;
type: 'research' | 'code' | 'review' | 'design';
input: unknown;
dependencies: string[]; // Task IDs that must complete first
}
interface AgentResult {
taskId: string;
output: unknown;
confidence: number;
reasoning: string;
}
async function orchestrateAgents(
goal: string,
availableAgents: Agent[]
): Promise<AgentResult[]> {
// Step 1: Use extended thinking to decompose goal into tasks
const taskPlan = await planTasks(goal, availableAgents);
// Step 2: Build dependency graph
const dependencyGraph = buildDependencyGraph(taskPlan.tasks);
// Step 3: Execute tasks respecting dependencies
const results: AgentResult[] = [];
const completed = new Set<string>();
while (completed.size < taskPlan.tasks.length) {
// Find tasks with satisfied dependencies
const ready = taskPlan.tasks.filter(task =>
!completed.has(task.id) &&
task.dependencies.every(dep => completed.has(dep))
);
// Execute ready tasks in parallel
const batchResults = await Promise.all(
ready.map(task => executeAgentTask(task, availableAgents))
);
// Validate results - use extended thinking for conflicts
const validatedResults = await validateAndResolveConflicts(
batchResults,
results
);
results.push(...validatedResults);
ready.forEach(task => completed.add(task.id));
}
return results;
}Task Planning with Extended Thinking:
typescript
import Anthropic from '@anthropic-ai/sdk';
const anthropic = new Anthropic();
async function planTasks(
goal: string,
agents: Agent[]
): Promise<{ tasks: AgentTask[]; rationale: string }> {
// Extended thinking requires budget_tokens < max_tokens
// Minimum budget: 1,024 tokens
const response = await anthropic.messages.create({
model: 'claude-opus-4-5-20251101', // Or claude-sonnet-4-5-20250929
max_tokens: 16000,
thinking: {
type: 'enabled',
budget_tokens: 10000 // Extended thinking for complex planning
},
messages: [{
role: 'user',
content: `
Goal: ${goal}
Available agents and their capabilities:
${agents.map(a => `- ${a.name}: ${a.capabilities.join(', ')}`).join('\n')}
Decompose this goal into tasks. For each task, specify:
1. Which agent should handle it
2. What input it needs
3. Which other tasks it depends on
4. Expected output format
Think carefully about:
- Optimal parallelization opportunities
- Potential conflicts between agent outputs
- Information that needs to flow between tasks
`
}]
});
// Response contains thinking blocks followed by text blocks
// content: [{ type: 'thinking', thinking: '...' }, { type: 'text', text: '...' }]
return parseTaskPlan(response);
}Conflict Resolution:
typescript
async function validateAndResolveConflicts(
newResults: AgentResult[],
existingResults: AgentResult[]
): Promise<AgentResult[]> {
// Check for conflicts with existing results
const conflicts = detectConflicts(newResults, existingResults);
if (conflicts.length === 0) {
return newResults;
}
// Use extended thinking to resolve conflicts
const resolution = await anthropic.messages.create({
model: 'claude-opus-4-5-20251101',
max_tokens: 8000,
thinking: {
type: 'enabled',
budget_tokens: 5000
},
messages: [{
role: 'user',
content: `
The following agent outputs conflict:
${conflicts.map(c => `
Conflict: ${c.description}
Agent A (${c.agentA.name}): ${JSON.stringify(c.resultA)}
Agent B (${c.agentB.name}): ${JSON.stringify(c.resultB)}
`).join('\n\n')}
Analyze each conflict and determine:
1. Which output is more likely correct and why
2. If both have merit, how to synthesize them
3. What additional verification might be needed
`
}]
});
return applyResolutions(newResults, resolution);
}Adaptive Agent Selection:
typescript
async function selectOptimalAgent(
task: AgentTask,
agents: Agent[],
context: ExecutionContext
): Promise<Agent> {
// Score each agent based on:
// - Capability match
// - Current load
// - Historical performance on similar tasks
// - Cost (model tier)
const scores = agents.map(agent => ({
agent,
score: calculateAgentScore(agent, task, context)
}));
// For complex tasks, use Opus; for simple tasks, use Haiku
const complexity = assessTaskComplexity(task);
if (complexity > 0.7) {
// Filter to agents that can use Opus
const opusCapable = scores.filter(s => s.agent.supportsOpus);
return opusCapable.sort((a, b) => b.score - a.score)[0].agent;
}
return scores.sort((a, b) => b.score - a.score)[0].agent;
}Agent Communication Protocol:
typescript
interface AgentMessage {
from: string;
to: string | 'broadcast';
type: 'request' | 'response' | 'update' | 'conflict';
payload: unknown;
timestamp: Date;
}
class AgentCommunicationBus {
private messages: AgentMessage[] = [];
private subscribers: Map<string, (msg: AgentMessage) => void> = new Map();
send(message: AgentMessage): void {
this.messages.push(message);
if (message.to === 'broadcast') {
this.subscribers.forEach(callback => callback(message));
} else {
this.subscribers.get(message.to)?.(message);
}
}
subscribe(agentId: string, callback: (msg: AgentMessage) => void): void {
this.subscribers.set(agentId, callback);
}
getHistory(agentId: string): AgentMessage[] {
return this.messages.filter(
m => m.from === agentId || m.to === agentId || m.to === 'broadcast'
);
}
}利用Opus 4.5的扩展思考能力实现高级多Agent模式。
何时使用扩展思考:
- 协调3个及以上专业Agent
- 解决Agent输出之间的复杂依赖
- 根据Agent能力动态分配任务
- 解决Agent输出矛盾时的冲突
编排器模式:
typescript
interface AgentTask {
id: string;
type: 'research' | 'code' | 'review' | 'design';
input: unknown;
dependencies: string[]; // Task IDs that must complete first
}
interface AgentResult {
taskId: string;
output: unknown;
confidence: number;
reasoning: string;
}
async function orchestrateAgents(
goal: string,
availableAgents: Agent[]
): Promise<AgentResult[]> {
// Step 1: Use extended thinking to decompose goal into tasks
const taskPlan = await planTasks(goal, availableAgents);
// Step 2: Build dependency graph
const dependencyGraph = buildDependencyGraph(taskPlan.tasks);
// Step 3: Execute tasks respecting dependencies
const results: AgentResult[] = [];
const completed = new Set<string>();
while (completed.size < taskPlan.tasks.length) {
// Find tasks with satisfied dependencies
const ready = taskPlan.tasks.filter(task =>
!completed.has(task.id) &&
task.dependencies.every(dep => completed.has(dep))
);
// Execute ready tasks in parallel
const batchResults = await Promise.all(
ready.map(task => executeAgentTask(task, availableAgents))
);
// Validate results - use extended thinking for conflicts
const validatedResults = await validateAndResolveConflicts(
batchResults,
results
);
results.push(...validatedResults);
ready.forEach(task => completed.add(task.id));
}
return results;
}基于扩展思考的任务规划:
typescript
import Anthropic from '@anthropic-ai/sdk';
const anthropic = new Anthropic();
async function planTasks(
goal: string,
agents: Agent[]
): Promise<{ tasks: AgentTask[]; rationale: string }> {
// Extended thinking requires budget_tokens < max_tokens
// Minimum budget: 1,024 tokens
const response = await anthropic.messages.create({
model: 'claude-opus-4-5-20251101', // Or claude-sonnet-4-5-20250929
max_tokens: 16000,
thinking: {
type: 'enabled',
budget_tokens: 10000 // Extended thinking for complex planning
},
messages: [{
role: 'user',
content: `
Goal: ${goal}
Available agents and their capabilities:
${agents.map(a => `- ${a.name}: ${a.capabilities.join(', ')}`).join('\n')}
Decompose this goal into tasks. For each task, specify:
1. Which agent should handle it
2. What input it needs
3. Which other tasks it depends on
4. Expected output format
Think carefully about:
- Optimal parallelization opportunities
- Potential conflicts between agent outputs
- Information that needs to flow between tasks
`
}]
});
// Response contains thinking blocks followed by text blocks
// content: [{ type: 'thinking', thinking: '...' }, { type: 'text', text: '...' }]
return parseTaskPlan(response);
}冲突解决:
typescript
async function validateAndResolveConflicts(
newResults: AgentResult[],
existingResults: AgentResult[]
): Promise<AgentResult[]> {
// Check for conflicts with existing results
const conflicts = detectConflicts(newResults, existingResults);
if (conflicts.length === 0) {
return newResults;
}
// Use extended thinking to resolve conflicts
const resolution = await anthropic.messages.create({
model: 'claude-opus-4-5-20251101',
max_tokens: 8000,
thinking: {
type: 'enabled',
budget_tokens: 5000
},
messages: [{
role: 'user',
content: `
The following agent outputs conflict:
${conflicts.map(c => `
Conflict: ${c.description}
Agent A (${c.agentA.name}): ${JSON.stringify(c.resultA)}
Agent B (${c.agentB.name}): ${JSON.stringify(c.resultB)}
`).join('\n\n')}
Analyze each conflict and determine:
1. Which output is more likely correct and why
2. If both have merit, how to synthesize them
3. What additional verification might be needed
`
}]
});
return applyResolutions(newResults, resolution);
}自适应Agent选择:
typescript
async function selectOptimalAgent(
task: AgentTask,
agents: Agent[],
context: ExecutionContext
): Promise<Agent> {
// Score each agent based on:
// - Capability match
// - Current load
// - Historical performance on similar tasks
// - Cost (model tier)
const scores = agents.map(agent => ({
agent,
score: calculateAgentScore(agent, task, context)
}));
// For complex tasks, use Opus; for simple tasks, use Haiku
const complexity = assessTaskComplexity(task);
if (complexity > 0.7) {
// Filter to agents that can use Opus
const opusCapable = scores.filter(s => s.agent.supportsOpus);
return opusCapable.sort((a, b) => b.score - a.score)[0].agent;
}
return scores.sort((a, b) => b.score - a.score)[0].agent;
}Agent通信协议:
typescript
interface AgentMessage {
from: string;
to: string | 'broadcast';
type: 'request' | 'response' | 'update' | 'conflict';
payload: unknown;
timestamp: Date;
}
class AgentCommunicationBus {
private messages: AgentMessage[] = [];
private subscribers: Map<string, (msg: AgentMessage) => void> = new Map();
send(message: AgentMessage): void {
this.messages.push(message);
if (message.to === 'broadcast') {
this.subscribers.forEach(callback => callback(message));
} else {
this.subscribers.get(message.to)?.(message);
}
}
subscribe(agentId: string, callback: (msg: AgentMessage) => void): void {
this.subscribers.set(agentId, callback);
}
getHistory(agentId: string): AgentMessage[] {
return this.messages.filter(
m => m.from === agentId || m.to === agentId || m.to === 'broadcast'
);
}
}6. Streaming Responses
6. 流式响应
Deliver real-time AI responses for better UX.
Capabilities:
- Stream LLM output token-by-token
- Server-Sent Events (SSE) for web clients
- Streaming with function calls
- Backpressure handling
Detailed Implementation: See for streaming patterns
../streaming-api-patterns/SKILL.md提供实时AI响应以提升用户体验。
能力:
- 逐Token流式输出LLM结果
- 为Web客户端提供Server-Sent Events (SSE)
- 结合工具使用实现流式响应
- 背压处理
详细实现: 参阅 获取流式模式
../streaming-api-patterns/SKILL.md7. Cost Optimization
7. 成本优化
Strategies:
- Use smaller models for simple tasks (GPT-3.5 vs GPT-4)
- Implement prompt caching (Anthropic's ephemeral cache)
- Batch requests when possible
- Set max_tokens to prevent runaway generation
- Monitor usage with alerts
Token Counting:
typescript
import { encoding_for_model } from 'tiktoken'
function countTokens(text: string, model = 'gpt-4'): number {
const encoder = encoding_for_model(model)
const tokens = encoder.encode(text)
encoder.free()
return tokens.length
}Detailed Implementation: See for:
references/observability.md- Cost estimation and budget tracking
- Model selection strategies
- Prompt caching patterns
策略:
- 简单任务使用小型模型(GPT-3.5 vs GPT-4)
- 实现提示缓存(Anthropic的临时缓存)
- 尽可能批量处理请求
- 设置max_tokens以防止生成失控
- 监控使用情况并设置告警
Token计数:
typescript
import { encoding_for_model } from 'tiktoken'
function countTokens(text: string, model = 'gpt-4'): number {
const encoder = encoding_for_model(model)
const tokens = encoder.encode(text)
encoder.free()
return tokens.length
}详细实现: 参阅 获取:
references/observability.md- 成本估算与预算跟踪
- 模型选择策略
- 提示缓存模式
8. Observability & Monitoring
8. 可观测性与监控
Track LLM performance, costs, and quality in production.
Tools:
- LangSmith: Tracing, evaluation, monitoring
- LangFuse: Open-source observability
- Custom Logging: Structured logs with metrics
Key Metrics:
- Throughput (requests/minute)
- Latency (P50, P95, P99)
- Token usage and cost
- Error rate
- Quality scores (relevance, coherence, factuality)
Detailed Implementation: See for:
references/observability.md- LangSmith and LangFuse integration
- Custom logger implementation
- Performance monitoring
- Quality evaluation
- Debugging and error analysis
在生产环境中跟踪LLM的性能、成本和质量。
工具:
- LangSmith:追踪、评估、监控
- LangFuse:开源可观测性工具
- 自定义日志:带指标的结构化日志
核心指标:
- 吞吐量(请求/分钟)
- 延迟(P50、P95、P99)
- Token使用量与成本
- 错误率
- 质量评分(相关性、连贯性、事实准确性)
详细实现: 参阅 获取:
references/observability.md- LangSmith与LangFuse集成
- 自定义日志实现
- 性能监控
- 质量评估
- 调试与错误分析
Searching References
参考资料搜索
This skill includes detailed reference material. Use grep to find specific patterns:
bash
undefined本技能包含详细的参考资料。使用grep查找特定模式:
bash
undefinedFind RAG patterns
查找RAG模式
grep -r "RAG" references/
grep -r "RAG" references/
Search for specific vector database
搜索特定向量数据库
grep -A 10 "Pinecone Setup" references/vector-databases.md
grep -A 10 "Pinecone Setup" references/vector-databases.md
Find agentic workflow examples
查找智能代理工作流示例
grep -B 5 "ReAct Pattern" references/agentic-workflows.md
grep -B 5 "ReAct Pattern" references/agentic-workflows.md
Locate function calling patterns
定位函数调用模式
grep -n "parallel.*tool" references/function-calling.md
grep -n "parallel.*tool" references/function-calling.md
Search for cost optimization
搜索成本优化相关内容
grep -i "cost|pricing|budget" references/observability.md
grep -i "cost|pricing|budget" references/observability.md
Find all code examples for embeddings
查找所有嵌入技术的代码示例
grep -A 20 "async function.*embedding" references/
---grep -A 20 "async function.*embedding" references/
---Best Practices
最佳实践
Context Management
上下文管理
- ✅ Keep context windows under 75% of model limit
- ✅ Use sliding window for long conversations
- ✅ Summarize old messages before they scroll out
- ✅ Remove redundant or irrelevant context
- ✅ 保持上下文窗口在模型限制的75%以内
- ✅ 长对话使用滑动窗口
- ✅ 在旧消息超出窗口前进行总结
- ✅ 移除冗余或无关的上下文
Embedding Strategy
嵌入策略
- ✅ Chunk documents to 500-1000 tokens
- ✅ Overlap chunks by 10-20% for continuity
- ✅ Include metadata (title, source, date) with chunks
- ✅ Re-embed when source data changes
- ✅ 将文档分块为500-1000个Token
- ✅ 块之间重叠10-20%以保证连续性
- ✅ 为块添加元数据(标题、来源、日期)
- ✅ 源数据变更时重新生成嵌入
RAG Quality
RAG质量
- ✅ Use hybrid search (semantic + keyword)
- ✅ Re-rank results for relevance
- ✅ Include citation/source in context
- ✅ Set temperature low (0.1-0.3) for factual answers
- ✅ Validate answers against retrieved context
- ✅ 使用混合搜索(语义+关键词)
- ✅ 对结果重新排序以提升相关性
- ✅ 在上下文中包含引用/来源
- ✅ 为事实性答案设置低温度值(0.1-0.3)
- ✅ 根据检索到的上下文验证答案
Function Calling
函数调用
- ✅ Provide clear, concise function descriptions
- ✅ Use strict JSON schema for parameters
- ✅ Handle missing or invalid parameters gracefully
- ✅ Limit to 10-20 tools to avoid confusion
- ✅ Validate function outputs before returning to LLM
- ✅ 提供清晰、简洁的函数描述
- ✅ 为参数使用严格的JSON Schema
- ✅ 优雅处理缺失或无效参数
- ✅ 工具数量限制在10-20个以内以避免混淆
- ✅ 返回给LLM前验证函数输出
Cost Optimization
成本优化
- ✅ Use smaller models for simple tasks
- ✅ Implement prompt caching for repeated content
- ✅ Batch requests when possible
- ✅ Set max_tokens to prevent runaway generation
- ✅ Monitor usage with alerts for anomalies
- ✅ 简单任务使用小型模型
- ✅ 为重复内容实现提示缓存
- ✅ 尽可能批量处理请求
- ✅ 设置max_tokens以防止生成失控
- ✅ 监控使用情况并设置异常告警
Security
安全
- ✅ Validate and sanitize user inputs
- ✅ Never include secrets in prompts
- ✅ Implement rate limiting
- ✅ Filter outputs for harmful content
- ✅ Use separate API keys per environment
- ✅ 验证并清理用户输入
- ✅ 绝不在提示中包含机密信息
- ✅ 实现速率限制
- ✅ 过滤有害内容输出
- ✅ 为不同环境使用独立的API密钥
Templates
模板
Use the provided templates for common AI patterns:
- - Basic RAG implementation
templates/rag-pipeline.ts - - ReAct agent pattern
templates/agentic-workflow.ts
使用提供的模板实现常见AI模式:
- - 基础RAG实现
templates/rag-pipeline.ts - - ReAct Agent模式
templates/agentic-workflow.ts
Examples
示例
Complete RAG Chatbot
完整RAG聊天机器人
See for a full-stack implementation:
examples/chatbot-with-rag/- Vector database setup with document ingestion
- RAG query with citations
- Streaming chat interface
- Cost tracking and monitoring
参阅 获取全栈实现:
examples/chatbot-with-rag/- 向量数据库设置与文档导入
- 带引用的RAG查询
- 流式聊天界面
- 成本跟踪与监控
Checklists
检查清单
AI Implementation Checklist
AI实现检查清单
See for comprehensive validation covering:
checklists/ai-implementation.md- Vector database setup and configuration
- Embedding generation and chunking strategy
- RAG pipeline with quality validation
- Function calling with error handling
- Streaming response implementation
- Cost monitoring and budget alerts
- Observability and logging
- Security and input validation
参阅 获取全面验证内容,包括:
checklists/ai-implementation.md- 向量数据库设置与配置
- 嵌入生成与分块策略
- 带质量验证的RAG流水线
- 带错误处理的函数调用
- 流式响应实现
- 成本监控与预算告警
- 可观测性与日志
- 安全与输入验证
Common Patterns
常见模式
Semantic Caching
语义缓存
Reduce costs by caching similar queries:
typescript
const cache = new Map<string, { embedding: number[]; response: string }>()
async function cachedRAG(query: string) {
const queryEmbedding = await createEmbedding(query)
// Check if similar query exists in cache
for (const [cachedQuery, cached] of cache.entries()) {
const similarity = cosineSimilarity(queryEmbedding, cached.embedding)
if (similarity > 0.95) {
return cached.response
}
}
// Not cached, perform RAG
const response = await ragQuery(query)
cache.set(query, { embedding: queryEmbedding, response })
return response
}通过缓存相似查询降低成本:
typescript
const cache = new Map<string, { embedding: number[]; response: string }>()
async function cachedRAG(query: string) {
const queryEmbedding = await createEmbedding(query)
// Check if similar query exists in cache
for (const [cachedQuery, cached] of cache.entries()) {
const similarity = cosineSimilarity(queryEmbedding, cached.embedding)
if (similarity > 0.95) {
return cached.response
}
}
// Not cached, perform RAG
const response = await ragQuery(query)
cache.set(query, { embedding: queryEmbedding, response })
return response
}Conversational Memory
对话记忆
Maintain context across multiple turns:
typescript
interface ConversationMemory {
messages: Message[] // Last 10 messages
summary?: string // Summary of older messages
}
async function getConversationContext(userId: string): Promise<Message[]> {
const memory = await db.memory.findUnique({ where: { userId } })
return [
{ role: 'system', content: `Previous conversation summary: ${memory.summary}` },
...memory.messages.slice(-5) // Last 5 messages
]
}在多轮对话中保持上下文:
typescript
interface ConversationMemory {
messages: Message[] // Last 10 messages
summary?: string // Summary of older messages
}
async function getConversationContext(userId: string): Promise<Message[]> {
const memory = await db.memory.findUnique({ where: { userId } })
return [
{ role: 'system', content: `Previous conversation summary: ${memory.summary}` },
...memory.messages.slice(-5) // Last 5 messages
]
}Prompt Engineering
提示工程
Few-Shot Learning
少样本学习
Provide examples to guide LLM behavior:
typescript
const fewShotExamples = `
Example 1:
Input: "I love this product!"
Sentiment: Positive
Example 2:
Input: "It's okay, nothing special"
Sentiment: Neutral
`
// Include in system prompt提供示例以引导LLM行为:
typescript
const fewShotExamples = `
Example 1:
Input: "I love this product!"
Sentiment: Positive
Example 2:
Input: "It's okay, nothing special"
Sentiment: Neutral
`
// Include in system promptChain of Thought (CoT)
思维链(CoT)
Ask LLM to show reasoning:
typescript
const prompt = `${problem}\n\nLet's think step by step:`要求LLM展示推理过程:
typescript
const prompt = `${problem}\n\nLet's think step by step:`Resources
资源
Next Steps
后续步骤
After mastering AI-Native Development:
- Explore Streaming API Patterns skill for real-time AI responses
- Use Type Safety & Validation skill for AI input/output validation
- Apply Edge Computing Patterns skill for global AI deployment
- Reference Observability Patterns for production monitoring
掌握AI原生开发后:
- 探索流式API模式技能以实现实时AI响应
- 使用类型安全与验证技能进行AI输入/输出验证
- 应用边缘计算模式技能实现全球AI部署
- 参考可观测性模式进行生产环境监控