rag-agent-builder

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

RAG Agent Builder

RAG Agent 构建指南

Build powerful Retrieval-Augmented Generation (RAG) applications that enhance LLM capabilities with external knowledge sources, enabling accurate, contextualized AI responses.

构建功能强大的Retrieval-Augmented Generation（RAG）应用，通过外部知识源增强LLM的能力，生成准确且贴合上下文的AI响应。

Quick Start

快速开始

Get started with RAG implementations in the examples and utilities:

Examples: See
```
examples/
```
directory for complete implementations:
- ```
basic_rag.py
```
  - Simple chunk-embed-retrieve-generate pipeline
- ```
retrieval_strategies.py
```
  - Hybrid search, reranking, and filtering
- ```
agentic_rag.py
```
  - Agent-controlled retrieval with iterative refinement
Utilities: See
```
scripts/
```
directory for helper modules:
- ```
embedding_management.py
```
  - Embedding generation, normalization, and caching
- ```
vector_db_manager.py
```
  - Vector database abstraction and factory
- ```
rag_evaluation.py
```
  - Retrieval and answer quality metrics

通过示例和工具快速上手RAG实现：

示例：查看
```
examples/
```
目录获取完整实现：
- ```
basic_rag.py
```
  - 简单的切分-嵌入-检索-生成流水线
- ```
retrieval_strategies.py
```
  - 混合搜索、重排序与过滤
- ```
agentic_rag.py
```
  - 由Agent控制的检索与迭代优化
工具：查看
```
scripts/
```
目录获取辅助模块：
- ```
embedding_management.py
```
  - 嵌入生成、归一化与缓存
- ```
vector_db_manager.py
```
  - 向量数据库抽象与工厂类
- ```
rag_evaluation.py
```
  - 检索与回答质量指标

Overview

概述

RAG systems combine three key components:

Document Retrieval - Find relevant information from knowledge bases
Context Integration - Pass retrieved context to the LLM
Response Generation - Generate answers grounded in the retrieved information

This skill covers building production-ready RAG applications with various frameworks and approaches.

RAG系统包含三个核心组件：

文档检索 - 从知识库中查找相关信息
上下文整合 - 将检索到的上下文传递给LLM
响应生成 - 基于检索到的信息生成回答

本指南涵盖如何使用各类框架与方法构建生产级RAG应用。

Core Concepts

核心概念

What is RAG?

什么是RAG？

RAG augments LLM knowledge with external data:

Without RAG: LLM relies on training data (may be outdated or limited)
With RAG: LLM uses real-time, custom knowledge + training knowledge

RAG通过外部数据增强LLM的知识储备：

无RAG时：LLM依赖训练数据（可能过时或受限）
有RAG时：LLM结合实时自定义知识与训练知识

When to Use RAG

何时使用RAG

Document Q&A: Answer questions about PDFs, books, reports
Knowledge Base Search: Query internal documentation, wikis
Enterprise Search: Search proprietary company data
Context-Specific Assistants: Customer support, HR assistants
Fact-Heavy Applications: Legal docs, medical records, financial data

文档问答：解答关于PDF、书籍、报告的问题
知识库搜索：查询内部文档、维基类资源
企业搜索：搜索企业专有数据
特定上下文助手：客户支持、HR助手
事实密集型应用：法律文档、医疗记录、金融数据

When RAG Might Not Be Needed

何时无需使用RAG

General knowledge questions (ChatGPT-like)
Real-time data that changes constantly (use tools instead)
Very simple lookup tasks (use database queries)

通用知识问题（类似ChatGPT的场景）
频繁变化的实时数据（改用工具）
非常简单的查询任务（改用数据库查询）

Architecture Patterns

架构模式

Basic RAG Pipeline

基础RAG流水线

Documents → Chunks → Embeddings → Vector DB
                                        ↓
User Question → Embedding → Retrieval → LLM → Answer
                              ↑         ↓
                         Vector DB    Context

Documents → Chunks → Embeddings → Vector DB
                                        ↓
User Question → Embedding → Retrieval → LLM → Answer
                              ↑         ↓
                         Vector DB    Context

Advanced RAG Patterns

高级RAG模式

1. Agentic RAG

1. Agent驱动的RAG

Agent decides what to retrieve and when
Can refine queries iteratively
Better for complex reasoning

Agent决定检索内容与时机
可迭代优化查询
更适合复杂推理场景

2. Hierarchical RAG

2. 分层RAG

Multi-level document structure
Search at different levels of detail
More flexible organization

多级文档结构
按不同细节层级搜索
组织方式更灵活

3. Hybrid Search RAG

3. 混合搜索RAG

Combines keyword search (BM25) + semantic search (embeddings)
Captures both exact matches and meaning
Better for mixed query types

结合关键词搜索（BM25）与语义搜索（嵌入）
同时捕获精确匹配与语义关联
更适合混合类型的查询

4. Corrective RAG (CRAG)

4. 纠错型RAG（CRAG）

Evaluates retrieved documents for relevance
Retrieves additional sources if needed
Ensures high-quality context

评估检索文档的相关性
如有需要检索额外来源
确保上下文的高质量

Implementation Components

实现组件

1. Document Processing

1. 文档处理

Chunking Strategies:

python

undefined

切分策略:

python

undefined

Simple fixed-size chunks

简单固定大小切分

chunks = split_text(doc, chunk_size=1000, overlap=100)

Semantic chunks (group by meaning)

语义切分（按语义分组）

chunks = semantic_chunking(doc, max_tokens=512)

Hierarchical chunks (different levels)

分层切分（不同层级）

chapters = split_by_heading(doc) chunks = split_each_chapter(chapters, size=1000)


**Key Considerations**:
- Chunk size affects retrieval quality and cost
- Overlap helps maintain context between chunks
- Semantic chunking preserves meaning better

chapters = split_by_heading(doc) chunks = split_each_chapter(chapters, size=1000)


**关键注意事项**:
- 切分大小影响检索质量与成本
- 重叠部分有助于维持块间上下文
- 语义切分更好地保留语义

2. Embedding Generation

2. 嵌入生成

Popular Embedding Models:

OpenAI:

text-embedding-3-small

text-embedding-3-large

Open Source:
```
all-MiniLM-L6-v2
```
,
```
all-mpnet-base-v2
```
Domain-Specific: Domain-trained embeddings for specialized knowledge

Best Practices:

Use consistent embedding model for retrieval and queries
Store embeddings with normalized vectors
Update embeddings when documents change

热门嵌入模型:

OpenAI:

text-embedding-3-small

text-embedding-3-large

开源模型:
```
all-MiniLM-L6-v2
```
,
```
all-mpnet-base-v2
```
领域特定模型: 针对专业知识训练的嵌入模型

最佳实践:

索引与查询使用相同的嵌入模型
存储归一化后的向量嵌入
文档更新时同步更新嵌入

3. Vector Databases

3. 向量数据库

Popular Options:

Pinecone: Managed, serverless, easy to scale
Weaviate: Open-source, self-hosted, flexible
Milvus: Open-source, high performance
Chroma: Lightweight, good for prototypes
Qdrant: Production-grade, high-performance

Selection Criteria:

Scale requirements (data volume, queries per second)
Latency needs (real-time vs batch)
Cost considerations
Deployment preferences (managed vs self-hosted)

热门选项:

Pinecone: 托管式、无服务器、易于扩展
Weaviate: 开源、自托管、灵活
Milvus: 开源、高性能
Chroma: 轻量、适合原型开发
Qdrant: 生产级、高性能

选择标准:

扩展需求（数据量、每秒查询数）
延迟要求（实时 vs 批量）
成本考量
部署偏好（托管 vs 自托管）

4. Retrieval Strategies

4. 检索策略

Retrieval Methods:

python

undefined

检索方法:

python

undefined

Similarity search (most common)

相似度搜索（最常用）

results = vector_db.query(question_embedding, k=5)

Hybrid search (keyword + semantic)

混合搜索（关键词+语义）

keyword_results = bm25.search(question, k=3) semantic_results = vector_db.query(embedding, k=3) results = combine_and_rank(keyword_results, semantic_results)

Reranking (improve relevance)

重排序（提升相关性）

retrieved = initial_retrieval(query) reranked = rerank_by_relevance(retrieved, query)


**Retrieval Parameters**:
- **k** (number of results): Balance between context and relevance
- **Similarity threshold**: Filter out low-relevance results
- **Diversity**: Return varied results vs best matches

retrieved = initial_retrieval(query) reranked = rerank_by_relevance(retrieved, query)


**检索参数**:
- **k**（结果数量）: 在上下文与相关性间平衡
- **相似度阈值**: 过滤低相关性结果
- **多样性**: 返回多样化结果 vs 最佳匹配

5. Context Integration

5. 上下文整合

Context Window Management:

python

undefined

上下文窗口管理:

python

undefined

Fit retrieved documents into context window

将检索到的文档适配到上下文窗口

def prepare_context(retrieved_docs, max_tokens=3000): context = "" for doc in retrieved_docs: if len(tokenize(context + doc)) <= max_tokens: context += doc else: break return context


**Prompt Design**:

You are a helpful assistant. Answer the question based on the provided context.

Context: {retrieved_documents}

Question: {user_question}

Answer:

undefined

def prepare_context(retrieved_docs, max_tokens=3000): context = "" for doc in retrieved_docs: if len(tokenize(context + doc)) <= max_tokens: context += doc else: break return context


**提示词设计**:

你是一个乐于助人的助手。请根据提供的上下文回答问题。

上下文: {retrieved_documents}

问题: {user_question}

回答:

undefined

6. Response Generation

6. 响应生成

Generation Strategies:

Direct Generation: LLM answers from context
Summarization: Summarize multiple retrieved docs first
Fact-Grounding: Ensure answer cites sources
Iterative Refinement: Refine based on user feedback

生成策略:

直接生成: LLM基于上下文回答
摘要生成: 先总结多个检索文档
事实锚定: 确保回答引用来源
迭代优化: 根据用户反馈优化

Implementation Patterns

实现模式

Pattern 1: Basic RAG

模式1: 基础RAG

Simplest RAG implementation:

Split documents into chunks
Generate embeddings for each chunk
Store in vector database
Retrieve top-k similar chunks for query
Pass to LLM with context

Pros: Simple, fast, works well for straightforward QA Cons: May miss relevant context, no refinement

最简单的RAG实现:

将文档切分为块
为每个块生成嵌入
存储到向量数据库
为查询检索top-k相似块
传递给LLM并附带上下文

优点: 简单、快速，适用于直接的问答场景缺点: 可能遗漏相关上下文，无优化机制

Pattern 2: Agentic RAG

模式2: Agent驱动的RAG

Agent controls retrieval:

Agent receives user question
Decides whether to retrieve documents
Formulates retrieval query (may differ from original)
Retrieves relevant documents
Can iterate or use tools
Generates final answer

Pros: Better for complex questions, iterative improvement Cons: More complex, higher costs

由Agent控制检索:

Agent接收用户问题
决定是否需要检索文档
制定检索查询（可能与原问题不同）
检索相关文档
可迭代或使用工具
生成最终回答

优点: 更适合复杂问题，可迭代改进缺点: 更复杂，成本更高

Pattern 3: Corrective RAG (CRAG)

模式3: 纠错型RAG（CRAG）

Validates retrieved documents:

Retrieve documents for question
Grade each document for relevance
If poor relevance:
- Try different retrieval strategy
- Expand search scope
- Retrieve from different sources
Generate answer from validated context

Pros: Higher quality answers, adapts to failures Cons: More API calls, slower

验证检索到的文档:

为问题检索文档
评估每个文档的相关性
若相关性差:
- 尝试不同的检索策略
- 扩大搜索范围
- 从不同来源检索
基于验证后的上下文生成回答

优点: 回答质量更高，能适配失败场景缺点: API调用更多，速度更慢

Popular Frameworks

Load documents

加载文档

loader = PDFLoader("document.pdf") docs = loader.load()

Create RAG chain

创建RAG链

embeddings = OpenAIEmbeddings() vectorstore = Pinecone.from_documents(docs, embeddings) qa = RetrievalQA.from_chain_type( llm=ChatOpenAI(), chain_type="stuff", retriever=vectorstore.as_retriever() )

answer = qa.run("What is the document about?")

undefined

embeddings = OpenAIEmbeddings() vectorstore = Pinecone.from_documents(docs, embeddings) qa = RetrievalQA.from_chain_type( llm=ChatOpenAI(), chain_type="stuff", retriever=vectorstore.as_retriever() )

answer = qa.run("What is the document about?")

undefined

LlamaIndex

python

from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader

python

from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader

Load documents

加载文档

documents = SimpleDirectoryReader("./data").load_data()

Create index

创建索引

index = GPTVectorStoreIndex.from_documents(documents)

Query

查询

response = index.as_query_engine().query("What is the main topic?")

undefined

response = index.as_query_engine().query("What is the main topic?")

undefined

CrewAI with RAG

python

from crewai import Agent, Task, Crew
from tools import retrieval_tool

researcher = Agent(
    role="Research Assistant",
    goal="Research topics using knowledge base",
    tools=[retrieval_tool]
)

research_task = Task(
    description="Research the topic: {topic}",
    agent=researcher
)

python

from crewai import Agent, Task, Crew
from tools import retrieval_tool

researcher = Agent(
    role="Research Assistant",
    goal="Research topics using knowledge base",
    tools=[retrieval_tool]
)

research_task = Task(
    description="Research the topic: {topic}",
    agent=researcher
)

Best Practices

最佳实践

Document Preparation

文档准备

✓ Clean and normalize text (remove headers, footers)
✓ Preserve document structure when possible
✓ Add metadata (source, date, category)
✓ Handle PDFs with OCR if scanned
✓ Test chunk sizes for your domain

✓ 清理并标准化文本（移除页眉、页脚）
✓ 尽可能保留文档结构
✓ 添加元数据（来源、日期、分类）
✓ 若为扫描件，使用OCR处理PDF
✓ 针对你的领域测试切分大小

Embedding Strategy

嵌入策略

✓ Use same embedding model for indexing and queries
✓ Fine-tune embeddings for domain-specific needs
✓ Normalize embeddings for consistency
✓ Monitor embedding quality metrics

✓ 索引与查询使用相同的嵌入模型
✓ 针对领域特定需求微调嵌入
✓ 归一化嵌入以保证一致性
✓ 监控嵌入质量指标

Retrieval Optimization

检索优化

✓ Tune k (number of results) for your use case
✓ Use reranking for quality improvement
✓ Implement relevance filtering
✓ Monitor retrieval precision and recall
✓ Cache frequently retrieved documents

✓ 针对你的场景调整k值（结果数量）
✓ 使用重排序提升质量
✓ 实现相关性过滤
✓ 监控检索的精确率与召回率
✓ 缓存频繁检索的文档

Generation Quality

生成质量

✓ Include source citations in answers
✓ Prompt LLM to indicate confidence
✓ Ask to cite specific documents
✓ Generate summaries for long contexts
✓ Validate answers against context

✓ 在回答中包含来源引用
✓ 提示LLM表明置信度
✓ 要求引用特定文档
✓ 为长上下文生成摘要
✓ 验证回答与上下文的一致性

Monitoring & Evaluation

监控与评估

✓ Track retrieval metrics (precision, recall, MRR)
✓ Monitor answer quality and relevance
✓ Log failed retrievals for improvement
✓ Collect user feedback
✓ Iterate based on failures

✓ 跟踪检索指标（精确率、召回率、MRR）
✓ 监控回答质量与相关性
✓ 记录检索失败的案例以改进
✓ 收集用户反馈
✓ 基于失败案例迭代优化

Common Challenges & Solutions

常见挑战与解决方案

Challenge: Irrelevant Retrieval

挑战：检索结果不相关

Solutions:

Improve chunking strategy
Better embedding model
Add document metadata to queries
Implement reranking
Use hybrid search

解决方案:

改进切分策略
使用更好的嵌入模型
在查询中添加文档元数据
实现重排序
使用混合搜索

Challenge: Context Too Large

挑战：上下文过大

Solutions:

Reduce chunk size
Retrieve fewer results (smaller k)
Summarize retrieved context
Use hierarchical retrieval
Filter by relevance score

解决方案:

减小切分大小
减少检索结果数量（更小的k值）
总结检索到的上下文
使用分层检索
按相关性分数过滤

Challenge: Missing Information

挑战：信息缺失

Solutions:

Increase k (retrieve more)
Improve embedding model
Better preprocessing
Use multiple search strategies
Add document hierarchy

解决方案:

增大k值（检索更多结果）
使用更好的嵌入模型
优化预处理步骤
使用多种搜索策略
添加文档层级结构

Challenge: Slow Performance

挑战：性能缓慢

Solutions:

Use managed vector database
Cache embeddings
Batch process documents
Optimize chunk size
Use smaller embedding model for speed

解决方案:

使用托管式向量数据库
缓存嵌入
批量处理文档
优化切分大小
使用更小的嵌入模型提升速度

Evaluation Metrics

评估指标

Retrieval Metrics:

Precision: % of retrieved docs that are relevant
Recall: % of relevant docs that are retrieved
MRR (Mean Reciprocal Rank): Rank of first relevant result
NDCG (Normalized DCG): Quality of ranking

Answer Quality Metrics:

Relevance: Does answer address the question?
Correctness: Is the answer factually accurate?
Grounding: Is answer supported by context?
User Satisfaction: Would user find answer helpful?

检索指标:

精确率: 检索到的相关文档占比
召回率: 被检索到的相关文档占比
MRR（平均倒数排名）: 第一个相关结果的排名
NDCG（归一化折损累积增益）: 排名的质量

回答质量指标:

相关性: 回答是否解决了问题？
正确性: 回答是否符合事实？
锚定性: 回答是否有上下文支持？
用户满意度: 用户是否认为回答有帮助？

Advanced Techniques

高级技术

1. Query Expansion

1. 查询扩展

python

undefined

python

undefined

Expand query with related terms

使用相关术语扩展查询

expanded_query = query + " " + synonym_expansion(query) results = retrieve(expanded_query)

undefined

expanded_query = query + " " + synonym_expansion(query) results = retrieve(expanded_query)

undefined

2. Document Compression

2. 文档压缩

python

undefined

python

undefined

Compress retrieved docs before passing to LLM

在传递给LLM前压缩检索到的文档

compressed = compress_documents(retrieved_docs, query) context = format_context(compressed)

undefined

compressed = compress_documents(retrieved_docs, query) context = format_context(compressed)

undefined

3. Active Retrieval

3. 主动检索

python

undefined

python

undefined

Iteratively refine retrieval based on LLM output

基于LLM输出迭代优化检索

query = user_question while iterations < max: results = retrieve(query) answer = generate_with_context(results) if answer_complete(answer): break query = refine_query(answer)

undefined

query = user_question while iterations < max: results = retrieve(query) answer = generate_with_context(results) if answer_complete(answer): break query = refine_query(answer)

undefined

4. Multi-Modal RAG

4. 多模态RAG

python

undefined

python

undefined

Retrieve both text and images

同时检索文本与图像

text_results = text_retriever.query(question) image_results = image_retriever.query(question) context = combine_multimodal(text_results, image_results)

undefined

text_results = text_retriever.query(question) image_results = image_retriever.query(question) context = combine_multimodal(text_results, image_results)

undefined

Resources & References

资源与参考

Key Papers

核心论文

"Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" (Lewis et al.)
"REALM: Retrieval-Augmented Language Model Pre-Training" (Guu et al.)

"Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" (Lewis et al.)
"REALM: Retrieval-Augmented Language Model Pre-Training" (Guu et al.)

Frameworks

框架

LangChain: https://python.langchain.com/
LlamaIndex: https://www.llamaindex.ai/
HayStack: https://haystack.deepset.ai/

LangChain: https://python.langchain.com/
LlamaIndex: https://www.llamaindex.ai/
HayStack: https://haystack.deepset.ai/

Vector Databases

向量数据库

Embedding Models

嵌入模型

OpenAI: https://platform.openai.com/docs/guides/embeddings
Hugging Face: https://huggingface.co/models?pipeline_tag=sentence-similarity

OpenAI: https://platform.openai.com/docs/guides/embeddings
Hugging Face: https://huggingface.co/models?pipeline_tag=sentence-similarity

Next Steps

下一步

Choose your stack: Decide on framework (LangChain, LlamaIndex, etc.)
Prepare documents: Process and chunk your knowledge base
Select embeddings: Choose embedding model for your domain
Pick vector DB: Select storage solution for scale
Build pipeline: Implement retrieval and generation
Evaluate: Test on sample questions and iterate
Monitor: Track quality metrics in production

选择技术栈: 确定框架（LangChain、LlamaIndex等）
准备文档: 处理并切分你的知识库
选择嵌入模型: 针对你的领域选择合适的嵌入模型
选择向量数据库: 选择适合规模的存储方案
构建流水线: 实现检索与生成功能
评估: 在示例问题上测试并迭代
监控: 在生产环境中跟踪质量指标

rag-agent-builder

Original

Translation

RAG Agent Builder

RAG Agent 构建指南

Quick Start

快速开始

Overview

概述

Core Concepts

核心概念

What is RAG?

什么是RAG？

When to Use RAG

何时使用RAG

When RAG Might Not Be Needed

何时无需使用RAG

Architecture Patterns

架构模式

Basic RAG Pipeline

基础RAG流水线

Advanced RAG Patterns

高级RAG模式

1. Agentic RAG

1. Agent驱动的RAG

2. Hierarchical RAG

2. 分层RAG

3. Hybrid Search RAG

3. 混合搜索RAG

4. Corrective RAG (CRAG)

4. 纠错型RAG（CRAG）

Implementation Components

实现组件

1. Document Processing

1. 文档处理

Simple fixed-size chunks

简单固定大小切分

Semantic chunks (group by meaning)

语义切分（按语义分组）

Hierarchical chunks (different levels)

分层切分（不同层级）

2. Embedding Generation

2. 嵌入生成

3. Vector Databases

3. 向量数据库

4. Retrieval Strategies

4. 检索策略

Similarity search (most common)

相似度搜索（最常用）

Hybrid search (keyword + semantic)

混合搜索（关键词+语义）

Reranking (improve relevance)

重排序（提升相关性）

5. Context Integration

5. 上下文整合

Fit retrieved documents into context window

将检索到的文档适配到上下文窗口

6. Response Generation

6. 响应生成

Implementation Patterns

实现模式

Pattern 1: Basic RAG

模式1: 基础RAG

Pattern 2: Agentic RAG

模式2: Agent驱动的RAG

Pattern 3: Corrective RAG (CRAG)

模式3: 纠错型RAG（CRAG）

Popular Frameworks

热门框架

LangChain

LangChain

Load documents

加载文档

Create RAG chain

创建RAG链

LlamaIndex

LlamaIndex

Load documents

加载文档

Create index