hybrid-retrieval
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseHybrid Retrieval for RAG
面向RAG的混合检索
Combine dense (semantic) and sparse (keyword) retrieval for superior results.
结合稠密(语义)与稀疏(关键词)检索,获得更优结果。
When to Use
适用场景
- Vector search misses exact keyword matches
- Domain-specific terminology needs exact matching
- Users search with both natural language and specific terms
- Need to balance semantic understanding with precision
- 向量搜索无法命中精确关键词匹配结果
- 特定领域术语需要精确匹配
- 用户同时使用自然语言和特定术语进行搜索
- 需要平衡语义理解与检索精度
The Problem with Vector-Only Search
仅向量搜索的问题
Query: "Error code E-4521 troubleshooting"
Vector search returns:
- "Common error handling patterns" (semantically similar)
- "Debugging techniques for applications" (related topic)
Missing:
- "E-4521: Database connection timeout" (exact match needed!)Query: "Error code E-4521 troubleshooting"
Vector search returns:
- "Common error handling patterns" (semantically similar)
- "Debugging techniques for applications" (related topic)
Missing:
- "E-4521: Database connection timeout" (exact match needed!)Hybrid Architecture
混合检索架构
┌─────────────────────────────────────────────────┐
│ User Query │
└─────────────────────┬───────────────────────────┘
│
┌────────────┴────────────┐
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ Dense Search │ │ Sparse Search │
│ (Embeddings) │ │ (BM25/TF-IDF) │
└────────┬────────┘ └────────┬────────┘
│ │
└────────────┬────────────┘
│
▼
┌───────────────┐
│ Fusion │
│ (RRF/Linear) │
└───────┬───────┘
│
▼
┌───────────────┐
│ Reranker │
│ (Optional) │
└───────┬───────┘
│
▼
┌───────────────┐
│ Final Results │
└───────────────┘┌─────────────────────────────────────────────────┐
│ User Query │
└─────────────────────┬───────────────────────────┘
│
┌────────────┴────────────┐
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ Dense Search │ │ Sparse Search │
│ (Embeddings) │ │ (BM25/TF-IDF) │
└────────┬────────┘ └────────┬────────┘
│ │
└────────────┬────────────┘
│
▼
┌───────────────┐
│ Fusion │
│ (RRF/Linear) │
└───────┬───────┘
│
▼
┌───────────────┐
│ Reranker │
│ (Optional) │
└───────┬───────┘
│
▼
┌───────────────┐
│ Final Results │
└───────────────┘Implementation
实现方案
Basic Hybrid with LangChain
基于LangChain的基础混合检索
python
from langchain.retrievers import EnsembleRetriever
from langchain_community.retrievers import BM25Retriever
from langchain_community.vectorstores import Chromapython
from langchain.retrievers import EnsembleRetriever
from langchain_community.retrievers import BM25Retriever
from langchain_community.vectorstores import ChromaDense retriever (vector search)
Dense retriever (vector search)
vectorstore = Chroma.from_documents(docs, embeddings)
dense_retriever = vectorstore.as_retriever(search_kwargs={"k": 10})
vectorstore = Chroma.from_documents(docs, embeddings)
dense_retriever = vectorstore.as_retriever(search_kwargs={"k": 10})
Sparse retriever (BM25)
Sparse retriever (BM25)
bm25_retriever = BM25Retriever.from_documents(docs)
bm25_retriever.k = 10
bm25_retriever = BM25Retriever.from_documents(docs)
bm25_retriever.k = 10
Combine with ensemble
Combine with ensemble
hybrid_retriever = EnsembleRetriever(
retrievers=[dense_retriever, bm25_retriever],
weights=[0.5, 0.5] # Adjust based on your data
)
results = hybrid_retriever.invoke("Error code E-4521")
undefinedhybrid_retriever = EnsembleRetriever(
retrievers=[dense_retriever, bm25_retriever],
weights=[0.5, 0.5] # Adjust based on your data
)
results = hybrid_retriever.invoke("Error code E-4521")
undefinedReciprocal Rank Fusion (RRF)
reciprocal Rank Fusion(RRF)算法
python
def reciprocal_rank_fusion(results_list: list, k: int = 60) -> list:
"""
Combine multiple ranked lists using RRF.
k=60 is the standard constant from the original paper.
"""
fused_scores = {}
for results in results_list:
for rank, doc in enumerate(results):
doc_id = doc.metadata.get("id", hash(doc.page_content))
if doc_id not in fused_scores:
fused_scores[doc_id] = {"doc": doc, "score": 0}
fused_scores[doc_id]["score"] += 1 / (k + rank + 1)
# Sort by fused score
reranked = sorted(
fused_scores.values(),
key=lambda x: x["score"],
reverse=True
)
return [item["doc"] for item in reranked]python
def reciprocal_rank_fusion(results_list: list, k: int = 60) -> list:
"""
Combine multiple ranked lists using RRF.
k=60 is the standard constant from the original paper.
"""
fused_scores = {}
for results in results_list:
for rank, doc in enumerate(results):
doc_id = doc.metadata.get("id", hash(doc.page_content))
if doc_id not in fused_scores:
fused_scores[doc_id] = {"doc": doc, "score": 0}
fused_scores[doc_id]["score"] += 1 / (k + rank + 1)
# Sort by fused score
reranked = sorted(
fused_scores.values(),
key=lambda x: x["score"],
reverse=True
)
return [item["doc"] for item in reranked]Usage
Usage
dense_results = dense_retriever.invoke(query)
sparse_results = bm25_retriever.invoke(query)
final_results = reciprocal_rank_fusion([dense_results, sparse_results])
undefineddense_results = dense_retriever.invoke(query)
sparse_results = bm25_retriever.invoke(query)
final_results = reciprocal_rank_fusion([dense_results, sparse_results])
undefinedWith Pinecone (Native Hybrid)
基于Pinecone(原生混合检索)
python
from pinecone import Pinecone
from pinecone_text.sparse import BM25Encoderpython
from pinecone import Pinecone
from pinecone_text.sparse import BM25EncoderInitialize
Initialize
pc = Pinecone(api_key="...")
index = pc.Index("hybrid-index")
pc = Pinecone(api_key="...")
index = pc.Index("hybrid-index")
Sparse encoder
Sparse encoder
bm25 = BM25Encoder()
bm25.fit(corpus)
bm25 = BM25Encoder()
bm25.fit(corpus)
Query with both dense and sparse
Query with both dense and sparse
def hybrid_query(query: str, alpha: float = 0.5):
# Dense vector
dense_vec = embeddings.embed_query(query)
# Sparse vector
sparse_vec = bm25.encode_queries([query])[0]
# Hybrid search
results = index.query(
vector=dense_vec,
sparse_vector=sparse_vec,
top_k=10,
alpha=alpha, # 0 = sparse only, 1 = dense only
include_metadata=True
)
return resultsundefineddef hybrid_query(query: str, alpha: float = 0.5):
# Dense vector
dense_vec = embeddings.embed_query(query)
# Sparse vector
sparse_vec = bm25.encode_queries([query])[0]
# Hybrid search
results = index.query(
vector=dense_vec,
sparse_vector=sparse_vec,
top_k=10,
alpha=alpha, # 0 = sparse only, 1 = dense only
include_metadata=True
)
return resultsundefinedWith Weaviate (Native Hybrid)
基于Weaviate(原生混合检索)
python
import weaviate
client = weaviate.Client("http://localhost:8080")
result = client.query.get(
"Document",
["content", "title"]
).with_hybrid(
query="Error code E-4521",
alpha=0.5, # Balance between vector and keyword
fusion_type="rankedFusion"
).with_limit(10).do()python
import weaviate
client = weaviate.Client("http://localhost:8080")
result = client.query.get(
"Document",
["content", "title"]
).with_hybrid(
query="Error code E-4521",
alpha=0.5, # Balance between vector and keyword
fusion_type="rankedFusion"
).with_limit(10).do()Adding a Reranker
添加重排序器
python
from sentence_transformers import CrossEncoderpython
from sentence_transformers import CrossEncoderLoad reranker model
Load reranker model
reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')
def rerank_results(query: str, docs: list, top_k: int = 5) -> list:
"""Rerank documents using cross-encoder."""
pairs = [[query, doc.page_content] for doc in docs]
scores = reranker.predict(pairs)
# Sort by reranker scores
scored_docs = list(zip(docs, scores))
scored_docs.sort(key=lambda x: x[1], reverse=True)
return [doc for doc, score in scored_docs[:top_k]]reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')
def rerank_results(query: str, docs: list, top_k: int = 5) -> list:
"""Rerank documents using cross-encoder."""
pairs = [[query, doc.page_content] for doc in docs]
scores = reranker.predict(pairs)
# Sort by reranker scores
scored_docs = list(zip(docs, scores))
scored_docs.sort(key=lambda x: x[1], reverse=True)
return [doc for doc, score in scored_docs[:top_k]]Full pipeline
Full pipeline
hybrid_results = hybrid_retriever.invoke(query) # Get 20 results
final_results = rerank_results(query, hybrid_results, top_k=5) # Rerank to top 5
undefinedhybrid_results = hybrid_retriever.invoke(query) # Get 20 results
final_results = rerank_results(query, hybrid_results, top_k=5) # Rerank to top 5
undefinedWeight Tuning Guidelines
权重调优指南
| Data Type | Dense Weight | Sparse Weight | Notes |
|---|---|---|---|
| General text | 0.5 | 0.5 | Balanced default |
| Technical docs | 0.4 | 0.6 | Keywords matter more |
| Conversational | 0.7 | 0.3 | Semantic matters more |
| Code/APIs | 0.3 | 0.7 | Exact matches critical |
| Legal/Medical | 0.4 | 0.6 | Terminology precision |
| 数据类型 | 稠密检索权重 | 稀疏检索权重 | 说明 |
|---|---|---|---|
| 通用文本 | 0.5 | 0.5 | 平衡默认值 |
| 技术文档 | 0.4 | 0.6 | 关键词更重要 |
| 对话文本 | 0.7 | 0.3 | 语义更重要 |
| 代码/API | 0.3 | 0.7 | 精确匹配至关重要 |
| 法律/医疗文本 | 0.4 | 0.6 | 术语精度优先 |
Evaluation
效果评估
python
def evaluate_retrieval(queries: list, ground_truth: dict, retriever) -> dict:
"""Calculate retrieval metrics."""
metrics = {"mrr": 0, "recall@5": 0, "precision@5": 0}
for query in queries:
results = retriever.invoke(query)
result_ids = [doc.metadata["id"] for doc in results[:5]]
relevant_ids = ground_truth[query]
# MRR
for i, rid in enumerate(result_ids):
if rid in relevant_ids:
metrics["mrr"] += 1 / (i + 1)
break
# Recall & Precision
hits = len(set(result_ids) & set(relevant_ids))
metrics["recall@5"] += hits / len(relevant_ids)
metrics["precision@5"] += hits / 5
# Average
n = len(queries)
return {k: v/n for k, v in metrics.items()}python
def evaluate_retrieval(queries: list, ground_truth: dict, retriever) -> dict:
"""Calculate retrieval metrics."""
metrics = {"mrr": 0, "recall@5": 0, "precision@5": 0}
for query in queries:
results = retriever.invoke(query)
result_ids = [doc.metadata["id"] for doc in results[:5]]
relevant_ids = ground_truth[query]
# MRR
for i, rid in enumerate(result_ids):
if rid in relevant_ids:
metrics["mrr"] += 1 / (i + 1)
break
# Recall & Precision
hits = len(set(result_ids) & set(relevant_ids))
metrics["recall@5"] += hits / len(relevant_ids)
metrics["precision@5"] += hits / 5
# Average
n = len(queries)
return {k: v/n for k, v in metrics.items()}Best Practices
最佳实践
- Start with 50/50 weights - then tune based on evaluation
- Always add a reranker - significant quality improvement
- Index sparse vectors - BM25 on raw text, not chunks
- Use native hybrid - when available (Pinecone, Weaviate, Qdrant)
- Monitor both paths - log which retriever contributed to final results
- 从50/50权重开始 - 基于评估结果再进行调优
- 始终添加重排序器 - 可显著提升检索质量
- 为稀疏向量建立索引 - 对原始文本而非文本块执行BM25
- 使用原生混合检索 - 当平台支持时(如Pinecone、Weaviate、Qdrant)
- 监控两条检索路径 - 记录最终结果来自哪个检索器