rag-system-builder
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseRAG System Builder Skill
RAG系统构建Skill
Overview
概述
This skill creates complete RAG (Retrieval-Augmented Generation) systems that combine semantic search with LLM-powered Q&A. Users can ask natural language questions and receive accurate answers grounded in your document collection.
本Skill可创建完整的检索增强生成(RAG)系统,将语义搜索与大语言模型(LLM)驱动的问答功能相结合。用户可以用自然语言提问,并从您的文档集合中获得有依据的准确答案。
Quick Start
快速开始
python
from sentence_transformers import SentenceTransformer
import anthropicpython
from sentence_transformers import SentenceTransformer
import anthropicSetup
配置
model = SentenceTransformer('all-MiniLM-L6-v2')
client = anthropic.Anthropic()
model = SentenceTransformer('all-MiniLM-L6-v2')
client = anthropic.Anthropic()
Retrieve context (simplified)
检索上下文(简化版)
query = "What are the safety requirements?"
query_embedding = model.encode(query, normalize_embeddings=True)
query = "安全要求有哪些?"
query_embedding = model.encode(query, normalize_embeddings=True)
... search for similar chunks ...
... 搜索相似文本块 ...
Generate answer
生成答案
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": f"Context: {context}\n\nQuestion: {query}"}]
)
print(response.content[0].text)
undefinedresponse = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": f"Context: {context}
Question: {query}"}] ) print(response.content[0].text)
Question: {query}"}] ) print(response.content[0].text)
undefinedWhen to Use
适用场景
- Building AI assistants for technical documentation
- Creating Q&A systems for standards libraries
- Developing chatbots with domain expertise
- Enabling natural language queries over knowledge bases
- Adding AI-powered search to existing document systems
- 为技术文档构建AI助手
- 为标准库创建问答系统
- 开发具备领域专业知识的聊天机器人
- 支持对知识库进行自然语言查询
- 为现有文档系统添加AI驱动的搜索功能
Architecture
架构
User Question
|
v
+------------------+
| 1. Embed Query | sentence-transformers
+--------+---------+
v
+------------------+
| 2. Vector Search | Cosine similarity
+--------+---------+
v
+------------------+
| 3. Retrieve Top | Top-K relevant chunks
+--------+---------+
v
+------------------+
| 4. Build Prompt | Context + Question
+--------+---------+
v
+------------------+
| 5. LLM Answer | Claude/OpenAI
+------------------+用户问题
|
v
+------------------+
| 1. 生成查询向量 | sentence-transformers
+--------+---------+
v
+------------------+
| 2. 向量搜索 | 余弦相似度
+--------+---------+
v
+------------------+
| 3. 获取Top-K结果 | 最相关的K个文本块
+--------+---------+
v
+------------------+
| 4. 构建提示词 | 上下文 + 问题
+--------+---------+
v
+------------------+
| 5. LLM生成答案 | Claude/OpenAI
+------------------+Prerequisites
前置条件
- Knowledge base with extracted text (see )
knowledge-base-builder - Vector embeddings for semantic search (see )
semantic-search-setup - API key: or
ANTHROPIC_API_KEYOPENAI_API_KEY
- 已提取文本内容的知识库(参考)
knowledge-base-builder - 用于语义搜索的向量嵌入(参考)
semantic-search-setup - API密钥:或
ANTHROPIC_API_KEYOPENAI_API_KEY
Implementation
实现步骤
Step 1: Vector Embeddings Table
步骤1:向量嵌入表
python
import sqlite3
import numpy as np
def setup_embeddings_table(db_path):
conn = sqlite3.connect(db_path, timeout=30)
cursor = conn.cursor()
cursor.execute('''
CREATE TABLE IF NOT EXISTS embeddings (
id INTEGER PRIMARY KEY,
chunk_id INTEGER UNIQUE,
embedding BLOB,
model_name TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (chunk_id) REFERENCES chunks(id)
)
''')
conn.commit()
return connpython
import sqlite3
import numpy as np
def setup_embeddings_table(db_path):
conn = sqlite3.connect(db_path, timeout=30)
cursor = conn.cursor()
cursor.execute('''
CREATE TABLE IF NOT EXISTS embeddings (
id INTEGER PRIMARY KEY,
chunk_id INTEGER UNIQUE,
embedding BLOB,
model_name TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (chunk_id) REFERENCES chunks(id)
)
''')
conn.commit()
return connStep 2: Generate Embeddings
步骤2:生成向量嵌入
python
from sentence_transformers import SentenceTransformer
import numpy as np
class EmbeddingGenerator:
def __init__(self, model_name='all-MiniLM-L6-v2'):
self.model = SentenceTransformer(model_name)
self.dimension = 384 # all-MiniLM-L6-v2
def embed_text(self, text):
"""Generate embedding for text."""
embedding = self.model.encode(text, normalize_embeddings=True)
return embedding.astype(np.float32)
def embed_batch(self, texts, batch_size=100):
"""Generate embeddings for multiple texts."""
embeddings = self.model.encode(
texts,
batch_size=batch_size,
normalize_embeddings=True,
show_progress_bar=True
)
return embeddings.astype(np.float32)python
from sentence_transformers import SentenceTransformer
import numpy as np
class EmbeddingGenerator:
def __init__(self, model_name='all-MiniLM-L6-v2'):
self.model = SentenceTransformer(model_name)
self.dimension = 384 # all-MiniLM-L6-v2的维度
def embed_text(self, text):
"""为文本生成向量嵌入。"""
embedding = self.model.encode(text, normalize_embeddings=True)
return embedding.astype(np.float32)
def embed_batch(self, texts, batch_size=100):
"""为多个文本批量生成向量嵌入。"""
embeddings = self.model.encode(
texts,
batch_size=batch_size,
normalize_embeddings=True,
show_progress_bar=True
)
return embeddings.astype(np.float32)Step 3: Semantic Search
步骤3:语义搜索
python
def semantic_search(db_path, query, model, top_k=5):
"""Find most similar chunks to query."""
conn = sqlite3.connect(db_path, timeout=30)
cursor = conn.cursor()
# Embed query
query_embedding = model.embed_text(query)
# Get all embeddings
cursor.execute('''
SELECT e.chunk_id, e.embedding, c.chunk_text, d.filename
FROM embeddings e
JOIN chunks c ON e.chunk_id = c.id
JOIN documents d ON c.doc_id = d.id
''')
results = []
for chunk_id, emb_blob, text, filename in cursor.fetchall():
embedding = np.frombuffer(emb_blob, dtype=np.float32)
score = np.dot(query_embedding, embedding) # Cosine similarity
results.append({
'chunk_id': chunk_id,
'score': float(score),
'text': text,
'filename': filename
})
# Sort by similarity
results.sort(key=lambda x: x['score'], reverse=True)
return results[:top_k]python
def semantic_search(db_path, query, model, top_k=5):
"""找到与查询最相似的文本块。"""
conn = sqlite3.connect(db_path, timeout=30)
cursor = conn.cursor()
# 生成查询向量
query_embedding = model.embed_text(query)
# 获取所有嵌入向量
cursor.execute('''
SELECT e.chunk_id, e.embedding, c.chunk_text, d.filename
FROM embeddings e
JOIN chunks c ON e.chunk_id = c.id
JOIN documents d ON c.doc_id = d.id
''')
results = []
for chunk_id, emb_blob, text, filename in cursor.fetchall():
embedding = np.frombuffer(emb_blob, dtype=np.float32)
score = np.dot(query_embedding, embedding) # 余弦相似度
results.append({
'chunk_id': chunk_id,
'score': float(score),
'text': text,
'filename': filename
})
# 按相似度排序
results.sort(key=lambda x: x['score'], reverse=True)
return results[:top_k]Step 4: RAG Query Engine
步骤4:RAG查询引擎
python
import anthropic
import openai
class RAGQueryEngine:
def __init__(self, db_path, embedding_model):
self.db_path = db_path
self.model = embedding_model
def query(self, question, top_k=5, provider='anthropic'):
"""Answer question using RAG."""
# 1. Retrieve relevant context
results = semantic_search(self.db_path, question, self.model, top_k)
# 2. Build context string
context = "\n\n---\n\n".join([
f"Source: {r['filename']}\n{r['text']}"
for r in results
])
# 3. Build prompt
prompt = f"""Based on the following technical documents, answer the question.
If the answer is not in the documents, say so.
DOCUMENTS:
{context}
QUESTION: {question}
ANSWER:"""
# 4. Get LLM response
if provider == 'anthropic':
return self._query_claude(prompt), results
else:
return self._query_openai(prompt), results
def _query_claude(self, prompt):
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text
def _query_openai(self, prompt):
client = openai.OpenAI()
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.contentpython
import anthropic
import openai
class RAGQueryEngine:
def __init__(self, db_path, embedding_model):
self.db_path = db_path
self.model = embedding_model
def query(self, question, top_k=5, provider='anthropic'):
"""使用RAG回答问题。"""
# 1. 检索相关上下文
results = semantic_search(self.db_path, question, self.model, top_k)
# 2. 构建上下文字符串
context = "\
\
---\
\
".join([
f"来源: {r['filename']}\
{r['text']}"
for r in results
])
# 3. 构建提示词
prompt = f"""基于以下技术文档,回答问题。
如果文档中没有相关答案,请直接说明。
文档内容:
{context}
问题: {question}
回答:"""
# 4. 获取LLM响应
if provider == 'anthropic':
return self._query_claude(prompt), results
else:
return self._query_openai(prompt), results
def _query_claude(self, prompt):
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text
def _query_openai(self, prompt):
client = openai.OpenAI()
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.contentStep 5: CLI Interface
步骤5:CLI界面
python
#!/usr/bin/env python3
"""RAG Query CLI - Ask questions about your documents."""
import argparse
import os
def main():
parser = argparse.ArgumentParser(description='RAG Q&A System')
parser.add_argument('question', nargs='?', help='Question to ask')
parser.add_argument('-i', '--interactive', action='store_true')
parser.add_argument('-k', '--top-k', type=int, default=5)
parser.add_argument('--provider', choices=['anthropic', 'openai'], default='anthropic')
args = parser.parse_args()
engine = RAGQueryEngine(DB_PATH, EmbeddingGenerator())
if args.interactive:
print("RAG Q&A System (type 'quit' to exit)")
while True:
question = input("\nQuestion: ").strip()
if question.lower() == 'quit':
break
answer, sources = engine.query(question, args.top_k, args.provider)
print(f"\nAnswer: {answer}")
print(f"\nSources: {[s['filename'] for s in sources]}")
else:
answer, sources = engine.query(args.question, args.top_k, args.provider)
print(f"Answer: {answer}")
print(f"\nSources:")
for s in sources:
print(f" - {s['filename']} (score: {s['score']:.3f})")
if __name__ == '__main__':
main()python
#!/usr/bin/env python3
"""RAG查询CLI - 针对您的文档提问。"""
import argparse
import os
def main():
parser = argparse.ArgumentParser(description='RAG问答系统')
parser.add_argument('question', nargs='?', help='要提问的问题')
parser.add_argument('-i', '--interactive', action='store_true')
parser.add_argument('-k', '--top-k', type=int, default=5)
parser.add_argument('--provider', choices=['anthropic', 'openai'], default='anthropic')
args = parser.parse_args()
engine = RAGQueryEngine(DB_PATH, EmbeddingGenerator())
if args.interactive:
print("RAG问答系统(输入'quit'退出)")
while True:
question = input("\
问题: ").strip()
if question.lower() == 'quit':
break
answer, sources = engine.query(question, args.top_k, args.provider)
print(f"\
回答: {answer}")
print(f"\
来源: {[s['filename'] for s in sources]}")
else:
answer, sources = engine.query(args.question, args.top_k, args.provider)
print(f"回答: {answer}")
print(f"\
来源:")
for s in sources:
print(f" - {s['filename']} (相似度得分: {s['score']:.3f})")
if __name__ == '__main__':
main()Prompt Engineering Tips
提示词工程技巧
System Prompt Template
系统提示词模板
python
SYSTEM_PROMPT = """You are a technical expert assistant. Your role is to:
1. Answer questions based ONLY on the provided documents
2. Cite specific sources when possible
3. Acknowledge when information is not available
4. Be precise with technical terminology
5. Provide practical, actionable answers
If asked about topics not covered in the documents, say:
"I don't have information about that in the available documents."
"""python
SYSTEM_PROMPT = """您是一位技术专家助手。您的职责是:
1. 仅基于提供的文档回答问题
2. 尽可能引用具体来源
3. 当信息不可用时直接说明
4. 准确使用技术术语
5. 提供实用、可操作的答案
如果被问及文档未涵盖的主题,请回答:
"我在可用文档中没有相关信息。"
"""Multi-Turn Conversations
多轮对话支持
python
def query_with_history(self, question, history=[]):
"""Support follow-up questions."""
context = self.get_relevant_context(question)
messages = [{"role": "system", "content": SYSTEM_PROMPT}]
# Add conversation history
for h in history[-4:]: # Last 4 turns
messages.append({"role": "user", "content": h['question']})
messages.append({"role": "assistant", "content": h['answer']})
# Add current question with context
messages.append({
"role": "user",
"content": f"Context:\n{context}\n\nQuestion: {question}"
})
return self.llm.query(messages)python
def query_with_history(self, question, history=[]):
"""支持跟进式提问。"""
context = self.get_relevant_context(question)
messages = [{"role": "system", "content": SYSTEM_PROMPT}]
# 添加对话历史
for h in history[-4:]: # 最近4轮对话
messages.append({"role": "user", "content": h['question']})
messages.append({"role": "assistant", "content": h['answer']})
# 添加带上下文的当前问题
messages.append({
"role": "user",
"content": f"上下文:\
{context}\
\
问题: {question}"
})
return self.llm.query(messages)Execution Checklist
执行检查清单
- Set up knowledge base with text extraction
- Generate vector embeddings for all chunks
- Configure API keys (ANTHROPIC_API_KEY or OPENAI_API_KEY)
- Test semantic search independently
- Build and test RAG pipeline end-to-end
- Tune top_k parameter for answer quality
- Add source attribution to responses
- Implement error handling for API failures
- 完成带文本提取的知识库搭建
- 为所有文本块生成向量嵌入
- 配置API密钥(ANTHROPIC_API_KEY 或 OPENAI_API_KEY)
- 独立测试语义搜索功能
- 端到端构建并测试RAG流水线
- 调整top_k参数以优化回答质量
- 在响应中添加来源归因
- 为API故障实现错误处理
Error Handling
错误处理
Common Errors
常见错误
Error: anthropic.APIError (rate limit)
- Cause: Too many API requests
- Solution: Add exponential backoff retry logic
Error: Empty search results
- Cause: No relevant documents in knowledge base
- Solution: Expand search with lower similarity threshold
Error: Context too long
- Cause: Top-k chunks exceed model context window
- Solution: Reduce top_k or chunk size
Error: API key not found
- Cause: Environment variable not set
- Solution: Export ANTHROPIC_API_KEY or OPENAI_API_KEY
Error: Low quality answers
- Cause: Poor retrieval or insufficient context
- Solution: Tune chunk size, overlap, and top_k parameters
错误:anthropic.APIError(速率限制)
- 原因:API请求过于频繁
- 解决方案:添加指数退避重试逻辑
错误:搜索结果为空
- 原因:知识库中无相关文档
- 解决方案:降低相似度阈值以扩大搜索范围
错误:上下文过长
- 原因:Top-K文本块超出模型上下文窗口
- 解决方案:减小top_k值或文本块大小
错误:未找到API密钥
- 原因:未设置环境变量
- 解决方案:导出ANTHROPIC_API_KEY或OPENAI_API_KEY
错误:回答质量低下
- 原因:检索效果差或上下文不足
- 解决方案:调整文本块大小、重叠度和top_k参数
Metrics
指标
| Metric | Typical Value |
|---|---|
| Query latency (end-to-end) | 2-5 seconds |
| Retrieval time | <100ms |
| LLM response time | 1-4 seconds |
| Token usage per query | 500-2000 tokens |
| Answer relevance | 85-95% with good tuning |
| 指标 | 典型值 |
|---|---|
| 端到端查询延迟 | 2-5秒 |
| 检索时间 | <100毫秒 |
| LLM响应时间 | 1-4秒 |
| 每次查询Token用量 | 500-2000 Token |
| 回答相关性 | 调优后可达85-95% |
Performance Optimization
性能优化
1. Cache Embeddings
1. 缓存嵌入向量
python
undefinedpython
undefinedLoad all embeddings into memory at startup
启动时将所有嵌入向量加载到内存
self.embedding_cache = self._load_all_embeddings()
undefinedself.embedding_cache = self._load_all_embeddings()
undefined2. Use FAISS for Large Collections
2. 对大型集合使用FAISS
python
import faisspython
import faissBuild FAISS index for fast similarity search
构建FAISS索引以实现快速相似度搜索
index = faiss.IndexFlatIP(dimension) # Inner product for cosine sim
index.add(embeddings)
undefinedindex = faiss.IndexFlatIP(dimension) # 内积用于余弦相似度计算
index.add(embeddings)
undefined3. Batch Queries
3. 批量处理查询
python
undefinedpython
undefinedProcess multiple questions efficiently
高效处理多个问题
questions = ["Q1", "Q2", "Q3"]
query_embeddings = model.embed_batch(questions)
undefinedquestions = ["问题1", "问题2", "问题3"]
query_embeddings = model.embed_batch(questions)
undefinedBest Practices
最佳实践
- Chunk size matters - 500-1500 chars optimal for context
- Retrieve enough context - top_k=5-10 for comprehensive answers
- Include source attribution - Always show which documents were used
- Handle edge cases - Empty results, API errors, timeouts
- Monitor token usage - Track costs and optimize prompts
- Use SQLite timeout - for concurrent access
timeout=30
- 文本块大小很重要 - 500-1500字符是最优上下文长度
- 检索足够的上下文 - top_k=5-10可获得全面的回答
- 包含来源归因 - 始终显示回答所使用的文档
- 处理边缘情况 - 空结果、API错误、超时等
- 监控Token用量 - 跟踪成本并优化提示词
- 使用SQLite超时设置 - 并发访问时设置
timeout=30
Example Usage
使用示例
bash
undefinedbash
undefinedSingle question
单问题查询
./rag "What are the fatigue design requirements for risers?"
./rag "立管的疲劳设计要求是什么?"
Interactive mode
交互模式
./rag -i
./rag -i
With OpenAI
使用OpenAI
./rag --provider openai "Explain API 2RD requirements"
undefined./rag --provider openai "解释API 2RD要求"
undefinedAdvanced: Hybrid Search (BM25 + Vector)
进阶:混合搜索(BM25 + 向量)
Combine keyword and semantic search for better results:
python
import sqlite3
from rank_bm25 import BM25Okapi
import numpy as np
class HybridSearch:
def __init__(self, db_path, embedding_model):
self.db_path = db_path
self.model = embedding_model
self._build_bm25_index()
def _build_bm25_index(self):
"""Build BM25 index from chunks."""
conn = sqlite3.connect(self.db_path)
cursor = conn.cursor()
cursor.execute('SELECT id, chunk_text FROM chunks')
self.chunk_ids = []
tokenized_corpus = []
for chunk_id, text in cursor.fetchall():
self.chunk_ids.append(chunk_id)
tokenized_corpus.append(text.lower().split())
self.bm25 = BM25Okapi(tokenized_corpus)
conn.close()
def search(self, query, top_k=10, alpha=0.5):
"""Hybrid search with alpha weighting.
alpha=0.0: Pure BM25 (keyword)
alpha=1.0: Pure vector (semantic)
alpha=0.5: Balanced hybrid
"""
# BM25 scores
tokenized_query = query.lower().split()
bm25_scores = self.bm25.get_scores(tokenized_query)
bm25_scores = bm25_scores / (bm25_scores.max() + 1e-6) # Normalize
# Vector scores
vector_results = semantic_search(self.db_path, query, self.model, top_k=len(self.chunk_ids))
vector_scores = {r['chunk_id']: r['score'] for r in vector_results}
# Combine scores
combined = []
for i, chunk_id in enumerate(self.chunk_ids):
score = (1 - alpha) * bm25_scores[i] + alpha * vector_scores.get(chunk_id, 0)
combined.append((chunk_id, score))
combined.sort(key=lambda x: x[1], reverse=True)
return combined[:top_k]结合关键词搜索和语义搜索以获得更好的结果:
python
import sqlite3
from rank_bm25 import BM25Okapi
import numpy as np
class HybridSearch:
def __init__(self, db_path, embedding_model):
self.db_path = db_path
self.model = embedding_model
self._build_bm25_index()
def _build_bm25_index(self):
"""基于文本块构建BM25索引。"""
conn = sqlite3.connect(self.db_path)
cursor = conn.cursor()
cursor.execute('SELECT id, chunk_text FROM chunks')
self.chunk_ids = []
tokenized_corpus = []
for chunk_id, text in cursor.fetchall():
self.chunk_ids.append(chunk_id)
tokenized_corpus.append(text.lower().split())
self.bm25 = BM25Okapi(tokenized_corpus)
conn.close()
def search(self, query, top_k=10, alpha=0.5):
"""带权重的混合搜索。
alpha=0.0: 纯BM25(关键词搜索)
alpha=1.0: 纯向量(语义搜索)
alpha=0.5: 平衡混合搜索
"""
# BM25得分
tokenized_query = query.lower().split()
bm25_scores = self.bm25.get_scores(tokenized_query)
bm25_scores = bm25_scores / (bm25_scores.max() + 1e-6) # 归一化
# 向量搜索得分
vector_results = semantic_search(self.db_path, query, self.model, top_k=len(self.chunk_ids))
vector_scores = {r['chunk_id']: r['score'] for r in vector_results}
# 合并得分
combined = []
for i, chunk_id in enumerate(self.chunk_ids):
score = (1 - alpha) * bm25_scores[i] + alpha * vector_scores.get(chunk_id, 0)
combined.append((chunk_id, score))
combined.sort(key=lambda x: x[1], reverse=True)
return combined[:top_k]Advanced: Reranking
进阶:重排序
Add a reranking step for improved precision:
python
from sentence_transformers import CrossEncoder
class Reranker:
def __init__(self, model_name='cross-encoder/ms-marco-MiniLM-L-6-v2'):
self.model = CrossEncoder(model_name)
def rerank(self, query, candidates, top_k=5):
"""Rerank candidates using cross-encoder."""
pairs = [(query, c['text']) for c in candidates]
scores = self.model.predict(pairs)
for i, score in enumerate(scores):
candidates[i]['rerank_score'] = float(score)
reranked = sorted(candidates, key=lambda x: x['rerank_score'], reverse=True)
return reranked[:top_k]添加重排序步骤以提高精度:
python
from sentence_transformers import CrossEncoder
class Reranker:
def __init__(self, model_name='cross-encoder/ms-marco-MiniLM-L-6-v2'):
self.model = CrossEncoder(model_name)
def rerank(self, query, candidates, top_k=5):
"""使用交叉编码器对候选结果进行重排序。"""
pairs = [(query, c['text']) for c in candidates]
scores = self.model.predict(pairs)
for i, score in enumerate(scores):
candidates[i]['rerank_score'] = float(score)
reranked = sorted(candidates, key=lambda x: x['rerank_score'], reverse=True)
return reranked[:top_k]Usage in RAG pipeline
在RAG流水线中使用
def query_with_rerank(self, question, initial_k=20, final_k=5):
# First pass: retrieve more candidates
candidates = semantic_search(self.db_path, question, self.model, top_k=initial_k)
# Second pass: rerank for precision
reranked = self.reranker.rerank(question, candidates, top_k=final_k)
return rerankedundefineddef query_with_rerank(self, question, initial_k=20, final_k=5):
# 第一轮:检索更多候选结果
candidates = semantic_search(self.db_path, question, self.model, top_k=initial_k)
# 第二轮:重排序以提高精度
reranked = self.reranker.rerank(question, candidates, top_k=final_k)
return rerankedundefinedStreaming Responses
进阶:流式响应
For better UX with long answers:
python
def query_streaming(self, question, top_k=5):
"""Stream RAG response for real-time display."""
context = self.get_context(question, top_k)
prompt = self.build_prompt(context, question)
# Anthropic streaming
with anthropic.Anthropic().messages.stream(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": prompt}]
) as stream:
for text in stream.text_stream:
yield text为长回答提供更好的用户体验:
python
def query_streaming(self, question, top_k=5):
"""流式返回RAG响应以实现实时展示。"""
context = self.get_context(question, top_k)
prompt = self.build_prompt(context, question)
# Anthropic流式响应
with anthropic.Anthropic().messages.stream(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": prompt}]
) as stream:
for text in stream.text_stream:
yield textRelated Skills
相关Skill
- - Build the document database first
knowledge-base-builder - - Generate vector embeddings
semantic-search-setup - - Extract text from PDFs
pdf-text-extractor - - Complete end-to-end pipeline
document-rag-pipeline
- - 先构建文档数据库
knowledge-base-builder - - 生成向量嵌入
semantic-search-setup - - 从PDF中提取文本
pdf-text-extractor - - 完整的端到端流水线
document-rag-pipeline
Dependencies
依赖项
bash
pip install sentence-transformers anthropic openai numpyOptional:
- faiss-cpu (for large-scale vector search)
- rank-bm25 (for hybrid search)
bash
pip install sentence-transformers anthropic openai numpy可选依赖:
- faiss-cpu(用于大规模向量搜索)
- rank-bm25(用于混合搜索)
Version History
版本历史
- 1.2.0 (2026-01-02): Added Quick Start, Execution Checklist, Error Handling, Metrics sections; updated frontmatter with version, category, related_skills
- 1.1.0 (2025-12-30): Added hybrid search (BM25+vector), reranking, streaming responses
- 1.0.0 (2025-10-15): Initial release with basic RAG implementation
- 1.2.0 (2026-01-02):新增快速开始、执行检查清单、错误处理、指标章节;更新版本、分类、相关Skill等前置信息
- 1.1.0 (2025-12-30):新增混合搜索(BM25+向量)、重排序、流式响应功能
- 1.0.0 (2025-10-15):初始版本,包含基础RAG实现 ",