rag-system-builder

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

RAG System Builder Skill

RAG系统构建Skill

Overview

概述

This skill creates complete RAG (Retrieval-Augmented Generation) systems that combine semantic search with LLM-powered Q&A. Users can ask natural language questions and receive accurate answers grounded in your document collection.

本Skill可创建完整的检索增强生成（RAG）系统，将语义搜索与大语言模型（LLM）驱动的问答功能相结合。用户可以用自然语言提问，并从您的文档集合中获得有依据的准确答案。

Quick Start

快速开始

python

from sentence_transformers import SentenceTransformer
import anthropic

python

from sentence_transformers import SentenceTransformer
import anthropic

Setup

配置

model = SentenceTransformer('all-MiniLM-L6-v2') client = anthropic.Anthropic()

Retrieve context (simplified)

检索上下文（简化版）

query = "What are the safety requirements?" query_embedding = model.encode(query, normalize_embeddings=True)

query = "安全要求有哪些？" query_embedding = model.encode(query, normalize_embeddings=True)

... search for similar chunks ...

... 搜索相似文本块 ...

Generate answer

生成答案

response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, messages=[{"role": "user", "content": f"Context: {context}\n\nQuestion: {query}"}] ) print(response.content[0].text)

undefined

response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, messages=[{"role": "user", "content": f"Context: {context}

Question: {query}"}] ) print(response.content[0].text)

undefined

When to Use

适用场景

Building AI assistants for technical documentation
Creating Q&A systems for standards libraries
Developing chatbots with domain expertise
Enabling natural language queries over knowledge bases
Adding AI-powered search to existing document systems

为技术文档构建AI助手
为标准库创建问答系统
开发具备领域专业知识的聊天机器人
支持对知识库进行自然语言查询
为现有文档系统添加AI驱动的搜索功能

Architecture

架构

User Question
      |
      v
+------------------+
| 1. Embed Query   |  sentence-transformers
+--------+---------+
         v
+------------------+
| 2. Vector Search |  Cosine similarity
+--------+---------+
         v
+------------------+
| 3. Retrieve Top  |  Top-K relevant chunks
+--------+---------+
         v
+------------------+
| 4. Build Prompt  |  Context + Question
+--------+---------+
         v
+------------------+
| 5. LLM Answer    |  Claude/OpenAI
+------------------+

用户问题
      |
      v
+------------------+
| 1. 生成查询向量   |  sentence-transformers
+--------+---------+
         v
+------------------+
| 2. 向量搜索       |  余弦相似度
+--------+---------+
         v
+------------------+
| 3. 获取Top-K结果  |  最相关的K个文本块
+--------+---------+
         v
+------------------+
| 4. 构建提示词     |  上下文 + 问题
+--------+---------+
         v
+------------------+
| 5. LLM生成答案    |  Claude/OpenAI
+------------------+

Prerequisites

前置条件

Knowledge base with extracted text (see
```
knowledge-base-builder
```
)
Vector embeddings for semantic search (see
```
semantic-search-setup
```
)
API key:
```
ANTHROPIC_API_KEY
```
or
```
OPENAI_API_KEY
```

已提取文本内容的知识库（参考
```
knowledge-base-builder
```
）
用于语义搜索的向量嵌入（参考
```
semantic-search-setup
```
）
API密钥：
```
ANTHROPIC_API_KEY
```
或
```
OPENAI_API_KEY
```

Implementation

实现步骤

Step 1: Vector Embeddings Table

步骤1：向量嵌入表

python

import sqlite3
import numpy as np

def setup_embeddings_table(db_path):
    conn = sqlite3.connect(db_path, timeout=30)
    cursor = conn.cursor()

    cursor.execute('''
        CREATE TABLE IF NOT EXISTS embeddings (
            id INTEGER PRIMARY KEY,
            chunk_id INTEGER UNIQUE,
            embedding BLOB,
            model_name TEXT,
            created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
            FOREIGN KEY (chunk_id) REFERENCES chunks(id)
        )
    ''')

    conn.commit()
    return conn

python

import sqlite3
import numpy as np

def setup_embeddings_table(db_path):
    conn = sqlite3.connect(db_path, timeout=30)
    cursor = conn.cursor()

    cursor.execute('''
        CREATE TABLE IF NOT EXISTS embeddings (
            id INTEGER PRIMARY KEY,
            chunk_id INTEGER UNIQUE,
            embedding BLOB,
            model_name TEXT,
            created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
            FOREIGN KEY (chunk_id) REFERENCES chunks(id)
        )
    ''')

    conn.commit()
    return conn

Step 2: Generate Embeddings

步骤2：生成向量嵌入

python

from sentence_transformers import SentenceTransformer
import numpy as np

class EmbeddingGenerator:
    def __init__(self, model_name='all-MiniLM-L6-v2'):
        self.model = SentenceTransformer(model_name)
        self.dimension = 384  # all-MiniLM-L6-v2

    def embed_text(self, text):
        """Generate embedding for text."""
        embedding = self.model.encode(text, normalize_embeddings=True)
        return embedding.astype(np.float32)

    def embed_batch(self, texts, batch_size=100):
        """Generate embeddings for multiple texts."""
        embeddings = self.model.encode(
            texts,
            batch_size=batch_size,
            normalize_embeddings=True,
            show_progress_bar=True
        )
        return embeddings.astype(np.float32)

python

from sentence_transformers import SentenceTransformer
import numpy as np

class EmbeddingGenerator:
    def __init__(self, model_name='all-MiniLM-L6-v2'):
        self.model = SentenceTransformer(model_name)
        self.dimension = 384  # all-MiniLM-L6-v2的维度

    def embed_text(self, text):
        """为文本生成向量嵌入。"""
        embedding = self.model.encode(text, normalize_embeddings=True)
        return embedding.astype(np.float32)

    def embed_batch(self, texts, batch_size=100):
        """为多个文本批量生成向量嵌入。"""
        embeddings = self.model.encode(
            texts,
            batch_size=batch_size,
            normalize_embeddings=True,
            show_progress_bar=True
        )
        return embeddings.astype(np.float32)

Step 3: Semantic Search

步骤3：语义搜索

python

def semantic_search(db_path, query, model, top_k=5):
    """Find most similar chunks to query."""
    conn = sqlite3.connect(db_path, timeout=30)
    cursor = conn.cursor()

    # Embed query
    query_embedding = model.embed_text(query)

    # Get all embeddings
    cursor.execute('''
        SELECT e.chunk_id, e.embedding, c.chunk_text, d.filename
        FROM embeddings e
        JOIN chunks c ON e.chunk_id = c.id
        JOIN documents d ON c.doc_id = d.id
    ''')

    results = []
    for chunk_id, emb_blob, text, filename in cursor.fetchall():
        embedding = np.frombuffer(emb_blob, dtype=np.float32)
        score = np.dot(query_embedding, embedding)  # Cosine similarity
        results.append({
            'chunk_id': chunk_id,
            'score': float(score),
            'text': text,
            'filename': filename
        })

    # Sort by similarity
    results.sort(key=lambda x: x['score'], reverse=True)
    return results[:top_k]

python

def semantic_search(db_path, query, model, top_k=5):
    """找到与查询最相似的文本块。"""
    conn = sqlite3.connect(db_path, timeout=30)
    cursor = conn.cursor()

    # 生成查询向量
    query_embedding = model.embed_text(query)

    # 获取所有嵌入向量
    cursor.execute('''
        SELECT e.chunk_id, e.embedding, c.chunk_text, d.filename
        FROM embeddings e
        JOIN chunks c ON e.chunk_id = c.id
        JOIN documents d ON c.doc_id = d.id
    ''')

    results = []
    for chunk_id, emb_blob, text, filename in cursor.fetchall():
        embedding = np.frombuffer(emb_blob, dtype=np.float32)
        score = np.dot(query_embedding, embedding)  # 余弦相似度
        results.append({
            'chunk_id': chunk_id,
            'score': float(score),
            'text': text,
            'filename': filename
        })

    # 按相似度排序
    results.sort(key=lambda x: x['score'], reverse=True)
    return results[:top_k]

Step 4: RAG Query Engine

步骤4：RAG查询引擎

python

import anthropic
import openai

class RAGQueryEngine:
    def __init__(self, db_path, embedding_model):
        self.db_path = db_path
        self.model = embedding_model

    def query(self, question, top_k=5, provider='anthropic'):
        """Answer question using RAG."""

        # 1. Retrieve relevant context
        results = semantic_search(self.db_path, question, self.model, top_k)

        # 2. Build context string
        context = "\n\n---\n\n".join([
            f"Source: {r['filename']}\n{r['text']}"
            for r in results
        ])

        # 3. Build prompt
        prompt = f"""Based on the following technical documents, answer the question.
If the answer is not in the documents, say so.

DOCUMENTS:
{context}

QUESTION: {question}

ANSWER:"""

        # 4. Get LLM response
        if provider == 'anthropic':
            return self._query_claude(prompt), results
        else:
            return self._query_openai(prompt), results

    def _query_claude(self, prompt):
        client = anthropic.Anthropic()
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1024,
            messages=[{"role": "user", "content": prompt}]
        )
        return response.content[0].text

    def _query_openai(self, prompt):
        client = openai.OpenAI()
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": prompt}]
        )
        return response.choices[0].message.content

python

import anthropic
import openai

class RAGQueryEngine:
    def __init__(self, db_path, embedding_model):
        self.db_path = db_path
        self.model = embedding_model

    def query(self, question, top_k=5, provider='anthropic'):
        """使用RAG回答问题。"""

        # 1. 检索相关上下文
        results = semantic_search(self.db_path, question, self.model, top_k)

        # 2. 构建上下文字符串
        context = "\
\
---\
\
".join([
            f"来源: {r['filename']}\
{r['text']}"
            for r in results
        ])

        # 3. 构建提示词
        prompt = f"""基于以下技术文档，回答问题。
如果文档中没有相关答案，请直接说明。

文档内容:
{context}

问题: {question}

回答:"""

        # 4. 获取LLM响应
        if provider == 'anthropic':
            return self._query_claude(prompt), results
        else:
            return self._query_openai(prompt), results

    def _query_claude(self, prompt):
        client = anthropic.Anthropic()
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1024,
            messages=[{"role": "user", "content": prompt}]
        )
        return response.content[0].text

    def _query_openai(self, prompt):
        client = openai.OpenAI()
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": prompt}]
        )
        return response.choices[0].message.content

Step 5: CLI Interface

步骤5：CLI界面

python

#!/usr/bin/env python3
"""RAG Query CLI - Ask questions about your documents."""

import argparse
import os

def main():
    parser = argparse.ArgumentParser(description='RAG Q&A System')
    parser.add_argument('question', nargs='?', help='Question to ask')
    parser.add_argument('-i', '--interactive', action='store_true')
    parser.add_argument('-k', '--top-k', type=int, default=5)
    parser.add_argument('--provider', choices=['anthropic', 'openai'], default='anthropic')

    args = parser.parse_args()

    engine = RAGQueryEngine(DB_PATH, EmbeddingGenerator())

    if args.interactive:
        print("RAG Q&A System (type 'quit' to exit)")
        while True:
            question = input("\nQuestion: ").strip()
            if question.lower() == 'quit':
                break
            answer, sources = engine.query(question, args.top_k, args.provider)
            print(f"\nAnswer: {answer}")
            print(f"\nSources: {[s['filename'] for s in sources]}")
    else:
        answer, sources = engine.query(args.question, args.top_k, args.provider)
        print(f"Answer: {answer}")
        print(f"\nSources:")
        for s in sources:
            print(f"  - {s['filename']} (score: {s['score']:.3f})")

if __name__ == '__main__':
    main()

python

#!/usr/bin/env python3
"""RAG查询CLI - 针对您的文档提问。"""

import argparse
import os

def main():
    parser = argparse.ArgumentParser(description='RAG问答系统')
    parser.add_argument('question', nargs='?', help='要提问的问题')
    parser.add_argument('-i', '--interactive', action='store_true')
    parser.add_argument('-k', '--top-k', type=int, default=5)
    parser.add_argument('--provider', choices=['anthropic', 'openai'], default='anthropic')

    args = parser.parse_args()

    engine = RAGQueryEngine(DB_PATH, EmbeddingGenerator())

    if args.interactive:
        print("RAG问答系统（输入'quit'退出）")
        while True:
            question = input("\
问题: ").strip()
            if question.lower() == 'quit':
                break
            answer, sources = engine.query(question, args.top_k, args.provider)
            print(f"\
回答: {answer}")
            print(f"\
来源: {[s['filename'] for s in sources]}")
    else:
        answer, sources = engine.query(args.question, args.top_k, args.provider)
        print(f"回答: {answer}")
        print(f"\
来源:")
        for s in sources:
            print(f"  - {s['filename']} (相似度得分: {s['score']:.3f})")

if __name__ == '__main__':
    main()

Prompt Engineering Tips

提示词工程技巧

System Prompt Template

系统提示词模板

python

SYSTEM_PROMPT = """You are a technical expert assistant. Your role is to:
1. Answer questions based ONLY on the provided documents
2. Cite specific sources when possible
3. Acknowledge when information is not available
4. Be precise with technical terminology
5. Provide practical, actionable answers

If asked about topics not covered in the documents, say:
"I don't have information about that in the available documents."
"""

python

SYSTEM_PROMPT = """您是一位技术专家助手。您的职责是：
1. 仅基于提供的文档回答问题
2. 尽可能引用具体来源
3. 当信息不可用时直接说明
4. 准确使用技术术语
5. 提供实用、可操作的答案

如果被问及文档未涵盖的主题，请回答：
"我在可用文档中没有相关信息。"
"""

Multi-Turn Conversations

多轮对话支持

python

def query_with_history(self, question, history=[]):
    """Support follow-up questions."""
    context = self.get_relevant_context(question)

    messages = [{"role": "system", "content": SYSTEM_PROMPT}]

    # Add conversation history
    for h in history[-4:]:  # Last 4 turns
        messages.append({"role": "user", "content": h['question']})
        messages.append({"role": "assistant", "content": h['answer']})

    # Add current question with context
    messages.append({
        "role": "user",
        "content": f"Context:\n{context}\n\nQuestion: {question}"
    })

    return self.llm.query(messages)

python

def query_with_history(self, question, history=[]):
    """支持跟进式提问。"""
    context = self.get_relevant_context(question)

    messages = [{"role": "system", "content": SYSTEM_PROMPT}]

    # 添加对话历史
    for h in history[-4:]:  # 最近4轮对话
        messages.append({"role": "user", "content": h['question']})
        messages.append({"role": "assistant", "content": h['answer']})

    # 添加带上下文的当前问题
    messages.append({
        "role": "user",
        "content": f"上下文:\
{context}\
\
问题: {question}"
    })

    return self.llm.query(messages)

Execution Checklist

执行检查清单

Error Handling

错误处理

Common Errors

常见错误

Error: anthropic.APIError (rate limit)

Cause: Too many API requests
Solution: Add exponential backoff retry logic

Error: Empty search results

Cause: No relevant documents in knowledge base
Solution: Expand search with lower similarity threshold

Error: Context too long

Cause: Top-k chunks exceed model context window
Solution: Reduce top_k or chunk size

Error: API key not found

Cause: Environment variable not set
Solution: Export ANTHROPIC_API_KEY or OPENAI_API_KEY

Error: Low quality answers

Cause: Poor retrieval or insufficient context
Solution: Tune chunk size, overlap, and top_k parameters

错误：anthropic.APIError（速率限制）

原因：API请求过于频繁
解决方案：添加指数退避重试逻辑

错误：搜索结果为空

原因：知识库中无相关文档
解决方案：降低相似度阈值以扩大搜索范围

错误：上下文过长

原因：Top-K文本块超出模型上下文窗口
解决方案：减小top_k值或文本块大小

错误：未找到API密钥

原因：未设置环境变量
解决方案：导出ANTHROPIC_API_KEY或OPENAI_API_KEY

错误：回答质量低下

原因：检索效果差或上下文不足
解决方案：调整文本块大小、重叠度和top_k参数

Metrics

指标

Metric	Typical Value
Query latency (end-to-end)	2-5 seconds
Retrieval time	<100ms
LLM response time	1-4 seconds
Token usage per query	500-2000 tokens
Answer relevance	85-95% with good tuning

指标	典型值
端到端查询延迟	2-5秒
检索时间	<100毫秒
LLM响应时间	1-4秒
每次查询Token用量	500-2000 Token
回答相关性	调优后可达85-95%

Performance Optimization

性能优化

1. Cache Embeddings

1. 缓存嵌入向量

python

undefined

python

undefined

Load all embeddings into memory at startup

启动时将所有嵌入向量加载到内存

self.embedding_cache = self._load_all_embeddings()

undefined

self.embedding_cache = self._load_all_embeddings()

undefined

2. Use FAISS for Large Collections

2. 对大型集合使用FAISS

python

import faiss

python

import faiss

Build FAISS index for fast similarity search

构建FAISS索引以实现快速相似度搜索

index = faiss.IndexFlatIP(dimension) # Inner product for cosine sim index.add(embeddings)

undefined

index = faiss.IndexFlatIP(dimension) # 内积用于余弦相似度计算 index.add(embeddings)

undefined

3. Batch Queries

3. 批量处理查询

python

undefined

python

undefined

Process multiple questions efficiently

高效处理多个问题

questions = ["Q1", "Q2", "Q3"] query_embeddings = model.embed_batch(questions)

undefined

questions = ["问题1", "问题2", "问题3"] query_embeddings = model.embed_batch(questions)

undefined

Best Practices

最佳实践

Chunk size matters - 500-1500 chars optimal for context
Retrieve enough context - top_k=5-10 for comprehensive answers
Include source attribution - Always show which documents were used
Handle edge cases - Empty results, API errors, timeouts
Monitor token usage - Track costs and optimize prompts
Use SQLite timeout -
```
timeout=30
```
for concurrent access

文本块大小很重要 - 500-1500字符是最优上下文长度
检索足够的上下文 - top_k=5-10可获得全面的回答
包含来源归因 - 始终显示回答所使用的文档
处理边缘情况 - 空结果、API错误、超时等
监控Token用量 - 跟踪成本并优化提示词
使用SQLite超时设置 - 并发访问时设置
```
timeout=30
```

Example Usage

使用示例

bash

undefined

bash

undefined

Single question

单问题查询

./rag "What are the fatigue design requirements for risers?"

./rag "立管的疲劳设计要求是什么？"

Interactive mode

交互模式

./rag -i

With OpenAI

使用OpenAI

./rag --provider openai "Explain API 2RD requirements"

undefined

./rag --provider openai "解释API 2RD要求"

undefined

Advanced: Hybrid Search (BM25 + Vector)

进阶：混合搜索（BM25 + 向量）

Combine keyword and semantic search for better results:

python

import sqlite3
from rank_bm25 import BM25Okapi
import numpy as np

class HybridSearch:
    def __init__(self, db_path, embedding_model):
        self.db_path = db_path
        self.model = embedding_model
        self._build_bm25_index()

    def _build_bm25_index(self):
        """Build BM25 index from chunks."""
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()
        cursor.execute('SELECT id, chunk_text FROM chunks')

        self.chunk_ids = []
        tokenized_corpus = []
        for chunk_id, text in cursor.fetchall():
            self.chunk_ids.append(chunk_id)
            tokenized_corpus.append(text.lower().split())

        self.bm25 = BM25Okapi(tokenized_corpus)
        conn.close()

    def search(self, query, top_k=10, alpha=0.5):
        """Hybrid search with alpha weighting.

        alpha=0.0: Pure BM25 (keyword)
        alpha=1.0: Pure vector (semantic)
        alpha=0.5: Balanced hybrid
        """
        # BM25 scores
        tokenized_query = query.lower().split()
        bm25_scores = self.bm25.get_scores(tokenized_query)
        bm25_scores = bm25_scores / (bm25_scores.max() + 1e-6)  # Normalize

        # Vector scores
        vector_results = semantic_search(self.db_path, query, self.model, top_k=len(self.chunk_ids))
        vector_scores = {r['chunk_id']: r['score'] for r in vector_results}

        # Combine scores
        combined = []
        for i, chunk_id in enumerate(self.chunk_ids):
            score = (1 - alpha) * bm25_scores[i] + alpha * vector_scores.get(chunk_id, 0)
            combined.append((chunk_id, score))

        combined.sort(key=lambda x: x[1], reverse=True)
        return combined[:top_k]

结合关键词搜索和语义搜索以获得更好的结果：

python

import sqlite3
from rank_bm25 import BM25Okapi
import numpy as np

class HybridSearch:
    def __init__(self, db_path, embedding_model):
        self.db_path = db_path
        self.model = embedding_model
        self._build_bm25_index()

    def _build_bm25_index(self):
        """基于文本块构建BM25索引。"""
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()
        cursor.execute('SELECT id, chunk_text FROM chunks')

        self.chunk_ids = []
        tokenized_corpus = []
        for chunk_id, text in cursor.fetchall():
            self.chunk_ids.append(chunk_id)
            tokenized_corpus.append(text.lower().split())

        self.bm25 = BM25Okapi(tokenized_corpus)
        conn.close()

    def search(self, query, top_k=10, alpha=0.5):
        """带权重的混合搜索。

        alpha=0.0: 纯BM25（关键词搜索）
        alpha=1.0: 纯向量（语义搜索）
        alpha=0.5: 平衡混合搜索
        """
        # BM25得分
        tokenized_query = query.lower().split()
        bm25_scores = self.bm25.get_scores(tokenized_query)
        bm25_scores = bm25_scores / (bm25_scores.max() + 1e-6)  # 归一化

        # 向量搜索得分
        vector_results = semantic_search(self.db_path, query, self.model, top_k=len(self.chunk_ids))
        vector_scores = {r['chunk_id']: r['score'] for r in vector_results}

        # 合并得分
        combined = []
        for i, chunk_id in enumerate(self.chunk_ids):
            score = (1 - alpha) * bm25_scores[i] + alpha * vector_scores.get(chunk_id, 0)
            combined.append((chunk_id, score))

        combined.sort(key=lambda x: x[1], reverse=True)
        return combined[:top_k]

Advanced: Reranking

进阶：重排序

Add a reranking step for improved precision:

python

from sentence_transformers import CrossEncoder

class Reranker:
    def __init__(self, model_name='cross-encoder/ms-marco-MiniLM-L-6-v2'):
        self.model = CrossEncoder(model_name)

    def rerank(self, query, candidates, top_k=5):
        """Rerank candidates using cross-encoder."""
        pairs = [(query, c['text']) for c in candidates]
        scores = self.model.predict(pairs)

        for i, score in enumerate(scores):
            candidates[i]['rerank_score'] = float(score)

        reranked = sorted(candidates, key=lambda x: x['rerank_score'], reverse=True)
        return reranked[:top_k]

添加重排序步骤以提高精度：

python

from sentence_transformers import CrossEncoder

class Reranker:
    def __init__(self, model_name='cross-encoder/ms-marco-MiniLM-L-6-v2'):
        self.model = CrossEncoder(model_name)

    def rerank(self, query, candidates, top_k=5):
        """使用交叉编码器对候选结果进行重排序。"""
        pairs = [(query, c['text']) for c in candidates]
        scores = self.model.predict(pairs)

        for i, score in enumerate(scores):
            candidates[i]['rerank_score'] = float(score)

        reranked = sorted(candidates, key=lambda x: x['rerank_score'], reverse=True)
        return reranked[:top_k]

Usage in RAG pipeline

在RAG流水线中使用

def query_with_rerank(self, question, initial_k=20, final_k=5): # First pass: retrieve more candidates candidates = semantic_search(self.db_path, question, self.model, top_k=initial_k)

# Second pass: rerank for precision
reranked = self.reranker.rerank(question, candidates, top_k=final_k)

return reranked

undefined

def query_with_rerank(self, question, initial_k=20, final_k=5): # 第一轮：检索更多候选结果 candidates = semantic_search(self.db_path, question, self.model, top_k=initial_k)

# 第二轮：重排序以提高精度
reranked = self.reranker.rerank(question, candidates, top_k=final_k)

return reranked

undefined

Streaming Responses

进阶：流式响应

For better UX with long answers:

python

def query_streaming(self, question, top_k=5):
    """Stream RAG response for real-time display."""
    context = self.get_context(question, top_k)
    prompt = self.build_prompt(context, question)

    # Anthropic streaming
    with anthropic.Anthropic().messages.stream(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[{"role": "user", "content": prompt}]
    ) as stream:
        for text in stream.text_stream:
            yield text

为长回答提供更好的用户体验：

python

def query_streaming(self, question, top_k=5):
    """流式返回RAG响应以实现实时展示。"""
    context = self.get_context(question, top_k)
    prompt = self.build_prompt(context, question)

    # Anthropic流式响应
    with anthropic.Anthropic().messages.stream(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[{"role": "user", "content": prompt}]
    ) as stream:
        for text in stream.text_stream:
            yield text

Related Skills

Dependencies

依赖项

bash

pip install sentence-transformers anthropic openai numpy

Optional:

faiss-cpu (for large-scale vector search)
rank-bm25 (for hybrid search)

bash

pip install sentence-transformers anthropic openai numpy

可选依赖：

faiss-cpu（用于大规模向量搜索）
rank-bm25（用于混合搜索）

Version History

版本历史

1.2.0 (2026-01-02): Added Quick Start, Execution Checklist, Error Handling, Metrics sections; updated frontmatter with version, category, related_skills
1.1.0 (2025-12-30): Added hybrid search (BM25+vector), reranking, streaming responses
1.0.0 (2025-10-15): Initial release with basic RAG implementation

1.2.0 (2026-01-02)：新增快速开始、执行检查清单、错误处理、指标章节；更新版本、分类、相关Skill等前置信息
1.1.0 (2025-12-30)：新增混合搜索（BM25+向量）、重排序、流式响应功能
1.0.0 (2025-10-15)：初始版本，包含基础RAG实现 ",