rag-system-builder

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

RAG System Builder Skill

RAG系统构建Skill

Overview

概述

This skill creates complete RAG (Retrieval-Augmented Generation) systems that combine semantic search with LLM-powered Q&A. Users can ask natural language questions and receive accurate answers grounded in your document collection.
本Skill可创建完整的检索增强生成(RAG)系统,将语义搜索与大语言模型(LLM)驱动的问答功能相结合。用户可以用自然语言提问,并从您的文档集合中获得有依据的准确答案。

Quick Start

快速开始

python
from sentence_transformers import SentenceTransformer
import anthropic
python
from sentence_transformers import SentenceTransformer
import anthropic

Setup

配置

model = SentenceTransformer('all-MiniLM-L6-v2') client = anthropic.Anthropic()
model = SentenceTransformer('all-MiniLM-L6-v2') client = anthropic.Anthropic()

Retrieve context (simplified)

检索上下文(简化版)

query = "What are the safety requirements?" query_embedding = model.encode(query, normalize_embeddings=True)
query = "安全要求有哪些?" query_embedding = model.encode(query, normalize_embeddings=True)

... search for similar chunks ...

... 搜索相似文本块 ...

Generate answer

生成答案

response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, messages=[{"role": "user", "content": f"Context: {context}\n\nQuestion: {query}"}] ) print(response.content[0].text)
undefined
response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, messages=[{"role": "user", "content": f"Context: {context}

Question: {query}"}] ) print(response.content[0].text)
undefined

When to Use

适用场景

  • Building AI assistants for technical documentation
  • Creating Q&A systems for standards libraries
  • Developing chatbots with domain expertise
  • Enabling natural language queries over knowledge bases
  • Adding AI-powered search to existing document systems
  • 为技术文档构建AI助手
  • 为标准库创建问答系统
  • 开发具备领域专业知识的聊天机器人
  • 支持对知识库进行自然语言查询
  • 为现有文档系统添加AI驱动的搜索功能

Architecture

架构

User Question
      |
      v
+------------------+
| 1. Embed Query   |  sentence-transformers
+--------+---------+
         v
+------------------+
| 2. Vector Search |  Cosine similarity
+--------+---------+
         v
+------------------+
| 3. Retrieve Top  |  Top-K relevant chunks
+--------+---------+
         v
+------------------+
| 4. Build Prompt  |  Context + Question
+--------+---------+
         v
+------------------+
| 5. LLM Answer    |  Claude/OpenAI
+------------------+
用户问题
      |
      v
+------------------+
| 1. 生成查询向量   |  sentence-transformers
+--------+---------+
         v
+------------------+
| 2. 向量搜索       |  余弦相似度
+--------+---------+
         v
+------------------+
| 3. 获取Top-K结果  |  最相关的K个文本块
+--------+---------+
         v
+------------------+
| 4. 构建提示词     |  上下文 + 问题
+--------+---------+
         v
+------------------+
| 5. LLM生成答案    |  Claude/OpenAI
+------------------+

Prerequisites

前置条件

  • Knowledge base with extracted text (see
    knowledge-base-builder
    )
  • Vector embeddings for semantic search (see
    semantic-search-setup
    )
  • API key:
    ANTHROPIC_API_KEY
    or
    OPENAI_API_KEY
  • 已提取文本内容的知识库(参考
    knowledge-base-builder
  • 用于语义搜索的向量嵌入(参考
    semantic-search-setup
  • API密钥:
    ANTHROPIC_API_KEY
    OPENAI_API_KEY

Implementation

实现步骤

Step 1: Vector Embeddings Table

步骤1:向量嵌入表

python
import sqlite3
import numpy as np

def setup_embeddings_table(db_path):
    conn = sqlite3.connect(db_path, timeout=30)
    cursor = conn.cursor()

    cursor.execute('''
        CREATE TABLE IF NOT EXISTS embeddings (
            id INTEGER PRIMARY KEY,
            chunk_id INTEGER UNIQUE,
            embedding BLOB,
            model_name TEXT,
            created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
            FOREIGN KEY (chunk_id) REFERENCES chunks(id)
        )
    ''')

    conn.commit()
    return conn
python
import sqlite3
import numpy as np

def setup_embeddings_table(db_path):
    conn = sqlite3.connect(db_path, timeout=30)
    cursor = conn.cursor()

    cursor.execute('''
        CREATE TABLE IF NOT EXISTS embeddings (
            id INTEGER PRIMARY KEY,
            chunk_id INTEGER UNIQUE,
            embedding BLOB,
            model_name TEXT,
            created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
            FOREIGN KEY (chunk_id) REFERENCES chunks(id)
        )
    ''')

    conn.commit()
    return conn

Step 2: Generate Embeddings

步骤2:生成向量嵌入

python
from sentence_transformers import SentenceTransformer
import numpy as np

class EmbeddingGenerator:
    def __init__(self, model_name='all-MiniLM-L6-v2'):
        self.model = SentenceTransformer(model_name)
        self.dimension = 384  # all-MiniLM-L6-v2

    def embed_text(self, text):
        """Generate embedding for text."""
        embedding = self.model.encode(text, normalize_embeddings=True)
        return embedding.astype(np.float32)

    def embed_batch(self, texts, batch_size=100):
        """Generate embeddings for multiple texts."""
        embeddings = self.model.encode(
            texts,
            batch_size=batch_size,
            normalize_embeddings=True,
            show_progress_bar=True
        )
        return embeddings.astype(np.float32)
python
from sentence_transformers import SentenceTransformer
import numpy as np

class EmbeddingGenerator:
    def __init__(self, model_name='all-MiniLM-L6-v2'):
        self.model = SentenceTransformer(model_name)
        self.dimension = 384  # all-MiniLM-L6-v2的维度

    def embed_text(self, text):
        """为文本生成向量嵌入。"""
        embedding = self.model.encode(text, normalize_embeddings=True)
        return embedding.astype(np.float32)

    def embed_batch(self, texts, batch_size=100):
        """为多个文本批量生成向量嵌入。"""
        embeddings = self.model.encode(
            texts,
            batch_size=batch_size,
            normalize_embeddings=True,
            show_progress_bar=True
        )
        return embeddings.astype(np.float32)

Step 3: Semantic Search

步骤3:语义搜索

python
def semantic_search(db_path, query, model, top_k=5):
    """Find most similar chunks to query."""
    conn = sqlite3.connect(db_path, timeout=30)
    cursor = conn.cursor()

    # Embed query
    query_embedding = model.embed_text(query)

    # Get all embeddings
    cursor.execute('''
        SELECT e.chunk_id, e.embedding, c.chunk_text, d.filename
        FROM embeddings e
        JOIN chunks c ON e.chunk_id = c.id
        JOIN documents d ON c.doc_id = d.id
    ''')

    results = []
    for chunk_id, emb_blob, text, filename in cursor.fetchall():
        embedding = np.frombuffer(emb_blob, dtype=np.float32)
        score = np.dot(query_embedding, embedding)  # Cosine similarity
        results.append({
            'chunk_id': chunk_id,
            'score': float(score),
            'text': text,
            'filename': filename
        })

    # Sort by similarity
    results.sort(key=lambda x: x['score'], reverse=True)
    return results[:top_k]
python
def semantic_search(db_path, query, model, top_k=5):
    """找到与查询最相似的文本块。"""
    conn = sqlite3.connect(db_path, timeout=30)
    cursor = conn.cursor()

    # 生成查询向量
    query_embedding = model.embed_text(query)

    # 获取所有嵌入向量
    cursor.execute('''
        SELECT e.chunk_id, e.embedding, c.chunk_text, d.filename
        FROM embeddings e
        JOIN chunks c ON e.chunk_id = c.id
        JOIN documents d ON c.doc_id = d.id
    ''')

    results = []
    for chunk_id, emb_blob, text, filename in cursor.fetchall():
        embedding = np.frombuffer(emb_blob, dtype=np.float32)
        score = np.dot(query_embedding, embedding)  # 余弦相似度
        results.append({
            'chunk_id': chunk_id,
            'score': float(score),
            'text': text,
            'filename': filename
        })

    # 按相似度排序
    results.sort(key=lambda x: x['score'], reverse=True)
    return results[:top_k]

Step 4: RAG Query Engine

步骤4:RAG查询引擎

python
import anthropic
import openai

class RAGQueryEngine:
    def __init__(self, db_path, embedding_model):
        self.db_path = db_path
        self.model = embedding_model

    def query(self, question, top_k=5, provider='anthropic'):
        """Answer question using RAG."""

        # 1. Retrieve relevant context
        results = semantic_search(self.db_path, question, self.model, top_k)

        # 2. Build context string
        context = "\n\n---\n\n".join([
            f"Source: {r['filename']}\n{r['text']}"
            for r in results
        ])

        # 3. Build prompt
        prompt = f"""Based on the following technical documents, answer the question.
If the answer is not in the documents, say so.

DOCUMENTS:
{context}

QUESTION: {question}

ANSWER:"""

        # 4. Get LLM response
        if provider == 'anthropic':
            return self._query_claude(prompt), results
        else:
            return self._query_openai(prompt), results

    def _query_claude(self, prompt):
        client = anthropic.Anthropic()
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1024,
            messages=[{"role": "user", "content": prompt}]
        )
        return response.content[0].text

    def _query_openai(self, prompt):
        client = openai.OpenAI()
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": prompt}]
        )
        return response.choices[0].message.content
python
import anthropic
import openai

class RAGQueryEngine:
    def __init__(self, db_path, embedding_model):
        self.db_path = db_path
        self.model = embedding_model

    def query(self, question, top_k=5, provider='anthropic'):
        """使用RAG回答问题。"""

        # 1. 检索相关上下文
        results = semantic_search(self.db_path, question, self.model, top_k)

        # 2. 构建上下文字符串
        context = "\
\
---\
\
".join([
            f"来源: {r['filename']}\
{r['text']}"
            for r in results
        ])

        # 3. 构建提示词
        prompt = f"""基于以下技术文档,回答问题。
如果文档中没有相关答案,请直接说明。

文档内容:
{context}

问题: {question}

回答:"""

        # 4. 获取LLM响应
        if provider == 'anthropic':
            return self._query_claude(prompt), results
        else:
            return self._query_openai(prompt), results

    def _query_claude(self, prompt):
        client = anthropic.Anthropic()
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1024,
            messages=[{"role": "user", "content": prompt}]
        )
        return response.content[0].text

    def _query_openai(self, prompt):
        client = openai.OpenAI()
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": prompt}]
        )
        return response.choices[0].message.content

Step 5: CLI Interface

步骤5:CLI界面

python
#!/usr/bin/env python3
"""RAG Query CLI - Ask questions about your documents."""

import argparse
import os

def main():
    parser = argparse.ArgumentParser(description='RAG Q&A System')
    parser.add_argument('question', nargs='?', help='Question to ask')
    parser.add_argument('-i', '--interactive', action='store_true')
    parser.add_argument('-k', '--top-k', type=int, default=5)
    parser.add_argument('--provider', choices=['anthropic', 'openai'], default='anthropic')

    args = parser.parse_args()

    engine = RAGQueryEngine(DB_PATH, EmbeddingGenerator())

    if args.interactive:
        print("RAG Q&A System (type 'quit' to exit)")
        while True:
            question = input("\nQuestion: ").strip()
            if question.lower() == 'quit':
                break
            answer, sources = engine.query(question, args.top_k, args.provider)
            print(f"\nAnswer: {answer}")
            print(f"\nSources: {[s['filename'] for s in sources]}")
    else:
        answer, sources = engine.query(args.question, args.top_k, args.provider)
        print(f"Answer: {answer}")
        print(f"\nSources:")
        for s in sources:
            print(f"  - {s['filename']} (score: {s['score']:.3f})")

if __name__ == '__main__':
    main()
python
#!/usr/bin/env python3
"""RAG查询CLI - 针对您的文档提问。"""

import argparse
import os

def main():
    parser = argparse.ArgumentParser(description='RAG问答系统')
    parser.add_argument('question', nargs='?', help='要提问的问题')
    parser.add_argument('-i', '--interactive', action='store_true')
    parser.add_argument('-k', '--top-k', type=int, default=5)
    parser.add_argument('--provider', choices=['anthropic', 'openai'], default='anthropic')

    args = parser.parse_args()

    engine = RAGQueryEngine(DB_PATH, EmbeddingGenerator())

    if args.interactive:
        print("RAG问答系统(输入'quit'退出)")
        while True:
            question = input("\
问题: ").strip()
            if question.lower() == 'quit':
                break
            answer, sources = engine.query(question, args.top_k, args.provider)
            print(f"\
回答: {answer}")
            print(f"\
来源: {[s['filename'] for s in sources]}")
    else:
        answer, sources = engine.query(args.question, args.top_k, args.provider)
        print(f"回答: {answer}")
        print(f"\
来源:")
        for s in sources:
            print(f"  - {s['filename']} (相似度得分: {s['score']:.3f})")

if __name__ == '__main__':
    main()

Prompt Engineering Tips

提示词工程技巧

System Prompt Template

系统提示词模板

python
SYSTEM_PROMPT = """You are a technical expert assistant. Your role is to:
1. Answer questions based ONLY on the provided documents
2. Cite specific sources when possible
3. Acknowledge when information is not available
4. Be precise with technical terminology
5. Provide practical, actionable answers

If asked about topics not covered in the documents, say:
"I don't have information about that in the available documents."
"""
python
SYSTEM_PROMPT = """您是一位技术专家助手。您的职责是:
1. 仅基于提供的文档回答问题
2. 尽可能引用具体来源
3. 当信息不可用时直接说明
4. 准确使用技术术语
5. 提供实用、可操作的答案

如果被问及文档未涵盖的主题,请回答:
"我在可用文档中没有相关信息。"
"""

Multi-Turn Conversations

多轮对话支持

python
def query_with_history(self, question, history=[]):
    """Support follow-up questions."""
    context = self.get_relevant_context(question)

    messages = [{"role": "system", "content": SYSTEM_PROMPT}]

    # Add conversation history
    for h in history[-4:]:  # Last 4 turns
        messages.append({"role": "user", "content": h['question']})
        messages.append({"role": "assistant", "content": h['answer']})

    # Add current question with context
    messages.append({
        "role": "user",
        "content": f"Context:\n{context}\n\nQuestion: {question}"
    })

    return self.llm.query(messages)
python
def query_with_history(self, question, history=[]):
    """支持跟进式提问。"""
    context = self.get_relevant_context(question)

    messages = [{"role": "system", "content": SYSTEM_PROMPT}]

    # 添加对话历史
    for h in history[-4:]:  # 最近4轮对话
        messages.append({"role": "user", "content": h['question']})
        messages.append({"role": "assistant", "content": h['answer']})

    # 添加带上下文的当前问题
    messages.append({
        "role": "user",
        "content": f"上下文:\
{context}\
\
问题: {question}"
    })

    return self.llm.query(messages)

Execution Checklist

执行检查清单

  • Set up knowledge base with text extraction
  • Generate vector embeddings for all chunks
  • Configure API keys (ANTHROPIC_API_KEY or OPENAI_API_KEY)
  • Test semantic search independently
  • Build and test RAG pipeline end-to-end
  • Tune top_k parameter for answer quality
  • Add source attribution to responses
  • Implement error handling for API failures
  • 完成带文本提取的知识库搭建
  • 为所有文本块生成向量嵌入
  • 配置API密钥(ANTHROPIC_API_KEY 或 OPENAI_API_KEY)
  • 独立测试语义搜索功能
  • 端到端构建并测试RAG流水线
  • 调整top_k参数以优化回答质量
  • 在响应中添加来源归因
  • 为API故障实现错误处理

Error Handling

错误处理

Common Errors

常见错误

Error: anthropic.APIError (rate limit)
  • Cause: Too many API requests
  • Solution: Add exponential backoff retry logic
Error: Empty search results
  • Cause: No relevant documents in knowledge base
  • Solution: Expand search with lower similarity threshold
Error: Context too long
  • Cause: Top-k chunks exceed model context window
  • Solution: Reduce top_k or chunk size
Error: API key not found
  • Cause: Environment variable not set
  • Solution: Export ANTHROPIC_API_KEY or OPENAI_API_KEY
Error: Low quality answers
  • Cause: Poor retrieval or insufficient context
  • Solution: Tune chunk size, overlap, and top_k parameters
错误:anthropic.APIError(速率限制)
  • 原因:API请求过于频繁
  • 解决方案:添加指数退避重试逻辑
错误:搜索结果为空
  • 原因:知识库中无相关文档
  • 解决方案:降低相似度阈值以扩大搜索范围
错误:上下文过长
  • 原因:Top-K文本块超出模型上下文窗口
  • 解决方案:减小top_k值或文本块大小
错误:未找到API密钥
  • 原因:未设置环境变量
  • 解决方案:导出ANTHROPIC_API_KEY或OPENAI_API_KEY
错误:回答质量低下
  • 原因:检索效果差或上下文不足
  • 解决方案:调整文本块大小、重叠度和top_k参数

Metrics

指标

MetricTypical Value
Query latency (end-to-end)2-5 seconds
Retrieval time<100ms
LLM response time1-4 seconds
Token usage per query500-2000 tokens
Answer relevance85-95% with good tuning
指标典型值
端到端查询延迟2-5秒
检索时间<100毫秒
LLM响应时间1-4秒
每次查询Token用量500-2000 Token
回答相关性调优后可达85-95%

Performance Optimization

性能优化

1. Cache Embeddings

1. 缓存嵌入向量

python
undefined
python
undefined

Load all embeddings into memory at startup

启动时将所有嵌入向量加载到内存

self.embedding_cache = self._load_all_embeddings()
undefined
self.embedding_cache = self._load_all_embeddings()
undefined

2. Use FAISS for Large Collections

2. 对大型集合使用FAISS

python
import faiss
python
import faiss

Build FAISS index for fast similarity search

构建FAISS索引以实现快速相似度搜索

index = faiss.IndexFlatIP(dimension) # Inner product for cosine sim index.add(embeddings)
undefined
index = faiss.IndexFlatIP(dimension) # 内积用于余弦相似度计算 index.add(embeddings)
undefined

3. Batch Queries

3. 批量处理查询

python
undefined
python
undefined

Process multiple questions efficiently

高效处理多个问题

questions = ["Q1", "Q2", "Q3"] query_embeddings = model.embed_batch(questions)
undefined
questions = ["问题1", "问题2", "问题3"] query_embeddings = model.embed_batch(questions)
undefined

Best Practices

最佳实践

  1. Chunk size matters - 500-1500 chars optimal for context
  2. Retrieve enough context - top_k=5-10 for comprehensive answers
  3. Include source attribution - Always show which documents were used
  4. Handle edge cases - Empty results, API errors, timeouts
  5. Monitor token usage - Track costs and optimize prompts
  6. Use SQLite timeout -
    timeout=30
    for concurrent access
  1. 文本块大小很重要 - 500-1500字符是最优上下文长度
  2. 检索足够的上下文 - top_k=5-10可获得全面的回答
  3. 包含来源归因 - 始终显示回答所使用的文档
  4. 处理边缘情况 - 空结果、API错误、超时等
  5. 监控Token用量 - 跟踪成本并优化提示词
  6. 使用SQLite超时设置 - 并发访问时设置
    timeout=30

Example Usage

使用示例

bash
undefined
bash
undefined

Single question

单问题查询

./rag "What are the fatigue design requirements for risers?"
./rag "立管的疲劳设计要求是什么?"

Interactive mode

交互模式

./rag -i
./rag -i

With OpenAI

使用OpenAI

./rag --provider openai "Explain API 2RD requirements"
undefined
./rag --provider openai "解释API 2RD要求"
undefined

Advanced: Hybrid Search (BM25 + Vector)

进阶:混合搜索(BM25 + 向量)

Combine keyword and semantic search for better results:
python
import sqlite3
from rank_bm25 import BM25Okapi
import numpy as np

class HybridSearch:
    def __init__(self, db_path, embedding_model):
        self.db_path = db_path
        self.model = embedding_model
        self._build_bm25_index()

    def _build_bm25_index(self):
        """Build BM25 index from chunks."""
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()
        cursor.execute('SELECT id, chunk_text FROM chunks')

        self.chunk_ids = []
        tokenized_corpus = []
        for chunk_id, text in cursor.fetchall():
            self.chunk_ids.append(chunk_id)
            tokenized_corpus.append(text.lower().split())

        self.bm25 = BM25Okapi(tokenized_corpus)
        conn.close()

    def search(self, query, top_k=10, alpha=0.5):
        """Hybrid search with alpha weighting.

        alpha=0.0: Pure BM25 (keyword)
        alpha=1.0: Pure vector (semantic)
        alpha=0.5: Balanced hybrid
        """
        # BM25 scores
        tokenized_query = query.lower().split()
        bm25_scores = self.bm25.get_scores(tokenized_query)
        bm25_scores = bm25_scores / (bm25_scores.max() + 1e-6)  # Normalize

        # Vector scores
        vector_results = semantic_search(self.db_path, query, self.model, top_k=len(self.chunk_ids))
        vector_scores = {r['chunk_id']: r['score'] for r in vector_results}

        # Combine scores
        combined = []
        for i, chunk_id in enumerate(self.chunk_ids):
            score = (1 - alpha) * bm25_scores[i] + alpha * vector_scores.get(chunk_id, 0)
            combined.append((chunk_id, score))

        combined.sort(key=lambda x: x[1], reverse=True)
        return combined[:top_k]
结合关键词搜索和语义搜索以获得更好的结果:
python
import sqlite3
from rank_bm25 import BM25Okapi
import numpy as np

class HybridSearch:
    def __init__(self, db_path, embedding_model):
        self.db_path = db_path
        self.model = embedding_model
        self._build_bm25_index()

    def _build_bm25_index(self):
        """基于文本块构建BM25索引。"""
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()
        cursor.execute('SELECT id, chunk_text FROM chunks')

        self.chunk_ids = []
        tokenized_corpus = []
        for chunk_id, text in cursor.fetchall():
            self.chunk_ids.append(chunk_id)
            tokenized_corpus.append(text.lower().split())

        self.bm25 = BM25Okapi(tokenized_corpus)
        conn.close()

    def search(self, query, top_k=10, alpha=0.5):
        """带权重的混合搜索。

        alpha=0.0: 纯BM25(关键词搜索)
        alpha=1.0: 纯向量(语义搜索)
        alpha=0.5: 平衡混合搜索
        """
        # BM25得分
        tokenized_query = query.lower().split()
        bm25_scores = self.bm25.get_scores(tokenized_query)
        bm25_scores = bm25_scores / (bm25_scores.max() + 1e-6)  # 归一化

        # 向量搜索得分
        vector_results = semantic_search(self.db_path, query, self.model, top_k=len(self.chunk_ids))
        vector_scores = {r['chunk_id']: r['score'] for r in vector_results}

        # 合并得分
        combined = []
        for i, chunk_id in enumerate(self.chunk_ids):
            score = (1 - alpha) * bm25_scores[i] + alpha * vector_scores.get(chunk_id, 0)
            combined.append((chunk_id, score))

        combined.sort(key=lambda x: x[1], reverse=True)
        return combined[:top_k]

Advanced: Reranking

进阶:重排序

Add a reranking step for improved precision:
python
from sentence_transformers import CrossEncoder

class Reranker:
    def __init__(self, model_name='cross-encoder/ms-marco-MiniLM-L-6-v2'):
        self.model = CrossEncoder(model_name)

    def rerank(self, query, candidates, top_k=5):
        """Rerank candidates using cross-encoder."""
        pairs = [(query, c['text']) for c in candidates]
        scores = self.model.predict(pairs)

        for i, score in enumerate(scores):
            candidates[i]['rerank_score'] = float(score)

        reranked = sorted(candidates, key=lambda x: x['rerank_score'], reverse=True)
        return reranked[:top_k]
添加重排序步骤以提高精度:
python
from sentence_transformers import CrossEncoder

class Reranker:
    def __init__(self, model_name='cross-encoder/ms-marco-MiniLM-L-6-v2'):
        self.model = CrossEncoder(model_name)

    def rerank(self, query, candidates, top_k=5):
        """使用交叉编码器对候选结果进行重排序。"""
        pairs = [(query, c['text']) for c in candidates]
        scores = self.model.predict(pairs)

        for i, score in enumerate(scores):
            candidates[i]['rerank_score'] = float(score)

        reranked = sorted(candidates, key=lambda x: x['rerank_score'], reverse=True)
        return reranked[:top_k]

Usage in RAG pipeline

在RAG流水线中使用

def query_with_rerank(self, question, initial_k=20, final_k=5): # First pass: retrieve more candidates candidates = semantic_search(self.db_path, question, self.model, top_k=initial_k)
# Second pass: rerank for precision
reranked = self.reranker.rerank(question, candidates, top_k=final_k)

return reranked
undefined
def query_with_rerank(self, question, initial_k=20, final_k=5): # 第一轮:检索更多候选结果 candidates = semantic_search(self.db_path, question, self.model, top_k=initial_k)
# 第二轮:重排序以提高精度
reranked = self.reranker.rerank(question, candidates, top_k=final_k)

return reranked
undefined

Streaming Responses

进阶:流式响应

For better UX with long answers:
python
def query_streaming(self, question, top_k=5):
    """Stream RAG response for real-time display."""
    context = self.get_context(question, top_k)
    prompt = self.build_prompt(context, question)

    # Anthropic streaming
    with anthropic.Anthropic().messages.stream(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[{"role": "user", "content": prompt}]
    ) as stream:
        for text in stream.text_stream:
            yield text
为长回答提供更好的用户体验:
python
def query_streaming(self, question, top_k=5):
    """流式返回RAG响应以实现实时展示。"""
    context = self.get_context(question, top_k)
    prompt = self.build_prompt(context, question)

    # Anthropic流式响应
    with anthropic.Anthropic().messages.stream(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[{"role": "user", "content": prompt}]
    ) as stream:
        for text in stream.text_stream:
            yield text

Related Skills

相关Skill

  • knowledge-base-builder
    - Build the document database first
  • semantic-search-setup
    - Generate vector embeddings
  • pdf-text-extractor
    - Extract text from PDFs
  • document-rag-pipeline
    - Complete end-to-end pipeline
  • knowledge-base-builder
    - 先构建文档数据库
  • semantic-search-setup
    - 生成向量嵌入
  • pdf-text-extractor
    - 从PDF中提取文本
  • document-rag-pipeline
    - 完整的端到端流水线

Dependencies

依赖项

bash
pip install sentence-transformers anthropic openai numpy
Optional:
  • faiss-cpu (for large-scale vector search)
  • rank-bm25 (for hybrid search)

bash
pip install sentence-transformers anthropic openai numpy
可选依赖:
  • faiss-cpu(用于大规模向量搜索)
  • rank-bm25(用于混合搜索)

Version History

版本历史

  • 1.2.0 (2026-01-02): Added Quick Start, Execution Checklist, Error Handling, Metrics sections; updated frontmatter with version, category, related_skills
  • 1.1.0 (2025-12-30): Added hybrid search (BM25+vector), reranking, streaming responses
  • 1.0.0 (2025-10-15): Initial release with basic RAG implementation
  • 1.2.0 (2026-01-02):新增快速开始、执行检查清单、错误处理、指标章节;更新版本、分类、相关Skill等前置信息
  • 1.1.0 (2025-12-30):新增混合搜索(BM25+向量)、重排序、流式响应功能
  • 1.0.0 (2025-10-15):初始版本,包含基础RAG实现 ",