rag-systems
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseRAG Systems
RAG系统
Building Retrieval-Augmented Generation systems.
构建检索增强生成(Retrieval-Augmented Generation)系统。
RAG Architecture
RAG架构
INDEXING (Offline)
Documents → Chunking → Embedding → Vector DB
QUERYING (Online)
Query → Embed → Search → Retrieved Docs
↓
Response ← LLM ← Context + QueryINDEXING (Offline)
Documents → Chunking → Embedding → Vector DB
QUERYING (Online)
Query → Embed → Search → Retrieved Docs
↓
Response ← LLM ← Context + QueryRetrieval Algorithms
检索算法
Term-Based (BM25)
基于词项的检索(BM25)
python
from rank_bm25 import BM25Okapi
tokenized_docs = [doc.split() for doc in documents]
bm25 = BM25Okapi(tokenized_docs)
scores = bm25.get_scores(query.split())python
from rank_bm25 import BM25Okapi
tokenized_docs = [doc.split() for doc in documents]
bm25 = BM25Okapi(tokenized_docs)
scores = bm25.get_scores(query.split())Embedding-Based
基于嵌入的检索
python
from sentence_transformers import SentenceTransformer
import faiss
model = SentenceTransformer('all-MiniLM-L6-v2')
embeddings = model.encode(documents)
index = faiss.IndexFlatIP(embeddings.shape[1])
faiss.normalize_L2(embeddings)
index.add(embeddings)python
from sentence_transformers import SentenceTransformer
import faiss
model = SentenceTransformer('all-MiniLM-L6-v2')
embeddings = model.encode(documents)
index = faiss.IndexFlatIP(embeddings.shape[1])
faiss.normalize_L2(embeddings)
index.add(embeddings)Query
Query
query_emb = model.encode([query])
faiss.normalize_L2(query_emb)
distances, indices = index.search(query_emb, k=5)
undefinedquery_emb = model.encode([query])
faiss.normalize_L2(query_emb)
distances, indices = index.search(query_emb, k=5)
undefinedHybrid Retrieval
混合检索
python
def hybrid_retrieve(query, k=5, alpha=0.5):
bm25_scores = normalize(bm25.get_scores(query.split()))
dense_scores = normalize(index.search(embed(query), len(docs))[0])
hybrid = alpha * bm25_scores + (1-alpha) * dense_scores
return [docs[i] for i in np.argsort(hybrid)[::-1][:k]]python
def hybrid_retrieve(query, k=5, alpha=0.5):
bm25_scores = normalize(bm25.get_scores(query.split()))
dense_scores = normalize(index.search(embed(query), len(docs))[0])
hybrid = alpha * bm25_scores + (1-alpha) * dense_scores
return [docs[i] for i in np.argsort(hybrid)[::-1][:k]]Chunking Strategies
分块策略
Fixed Size
固定大小分块
python
def fixed_chunk(text, size=500, overlap=50):
chunks = []
for i in range(0, len(text), size - overlap):
chunks.append(text[i:i+size])
return chunkspython
def fixed_chunk(text, size=500, overlap=50):
chunks = []
for i in range(0, len(text), size - overlap):
chunks.append(text[i:i+size])
return chunksSemantic Chunking
语义分块
python
def semantic_chunk(text, model, threshold=0.5):
sentences = sent_tokenize(text)
chunks, current = [], []
for sent in sentences:
current.append(sent)
if len(current) > 1:
sim = similarity(current[-2], current[-1], model)
if sim < threshold:
chunks.append(" ".join(current[:-1]))
current = [sent]
if current:
chunks.append(" ".join(current))
return chunkspython
def semantic_chunk(text, model, threshold=0.5):
sentences = sent_tokenize(text)
chunks, current = [], []
for sent in sentences:
current.append(sent)
if len(current) > 1:
sim = similarity(current[-2], current[-1], model)
if sim < threshold:
chunks.append(" ".join(current[:-1]))
current = [sent]
if current:
chunks.append(" ".join(current))
return chunksRetrieval Optimization
检索优化
Query Expansion
查询扩展
python
def expand_query(query, model):
prompt = f"Generate 3 alternative phrasings:\n{query}"
return [query] + model.generate(prompt).split("\n")python
def expand_query(query, model):
prompt = f"Generate 3 alternative phrasings:\n{query}"
return [query] + model.generate(prompt).split("\n")HyDE (Hypothetical Document)
HyDE(假设文档法)
python
def hyde(query, model):
prompt = f"Write a paragraph answering:\n{query}"
return model.generate(prompt) # Use this for retrievalpython
def hyde(query, model):
prompt = f"Write a paragraph answering:\n{query}"
return model.generate(prompt) # Use this for retrievalReranking
重排序
python
from sentence_transformers import CrossEncoder
reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')
def rerank(query, docs, k=5):
pairs = [(query, doc) for doc in docs]
scores = reranker.predict(pairs)
return sorted(zip(docs, scores), key=lambda x: -x[1])[:k]python
from sentence_transformers import CrossEncoder
reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')
def rerank(query, docs, k=5):
pairs = [(query, doc) for doc in docs]
scores = reranker.predict(pairs)
return sorted(zip(docs, scores), key=lambda x: -x[1])[:k]RAG Evaluation
RAG评估
python
def rag_metrics(query, response, context, ground_truth):
return {
"retrieval_precision": precision(retrieved, relevant),
"retrieval_recall": recall(retrieved, relevant),
"answer_relevance": similarity(response, ground_truth),
"faithfulness": check_hallucination(response, context),
}python
def rag_metrics(query, response, context, ground_truth):
return {
"retrieval_precision": precision(retrieved, relevant),
"retrieval_recall": recall(retrieved, relevant),
"answer_relevance": similarity(response, ground_truth),
"faithfulness": check_hallucination(response, context),
}Best Practices
最佳实践
- Use hybrid retrieval (BM25 + dense)
- Add reranking for quality
- Chunk with overlap (10-20%)
- Experiment with chunk sizes (200-1000 tokens)
- Evaluate retrieval separately from generation
- 使用混合检索(BM25 + 稠密检索)
- 添加重排序以提升质量
- 分块时设置重叠(10-20%)
- 尝试不同的分块大小(200-1000个token)
- 单独评估检索环节与生成环节