knowledge-base-rag

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Knowledge Base RAG

Part of Agent Skills™ by googleadsagent.ai™

属于Agent Skills™，由googleadsagent.ai™开发

Description

描述

Knowledge Base RAG implements the complete Retrieval-Augmented Generation pipeline: document ingestion, intelligent chunking, embedding generation, vector store indexing, semantic retrieval, and grounded response generation. The agent builds RAG systems that answer questions from private knowledge bases with cited sources and reduced hallucination.

RAG solves the fundamental limitation of large language models: they cannot access information created after their training cutoff or proprietary information they were never trained on. By retrieving relevant documents from a vector store and injecting them into the prompt context, RAG grounds the model's responses in factual, up-to-date, organization-specific knowledge.

The quality of a RAG system depends on chunking strategy more than model choice. This skill encodes production-tested chunking approaches: semantic chunking that preserves paragraph coherence, recursive splitting that respects document structure (headings, code blocks, tables), and overlap windows that maintain context across chunk boundaries. Each strategy is matched to the document type for optimal retrieval quality.

Knowledge Base RAG实现了完整的Retrieval-Augmented Generation流程：文档导入、智能分块、嵌入生成、向量存储索引、语义检索以及基于事实的响应生成。该Agent可构建RAG系统，基于私有知识库回答问题并提供引用来源，减少幻觉现象。

RAG解决了大语言模型（LLM）的核心局限性：它们无法访问训练截止日期之后创建的信息，也无法获取从未训练过的专有信息。通过从向量存储中检索相关文档并将其注入提示上下文，RAG使模型的响应基于真实、最新且符合组织特定需求的知识。

RAG系统的质量更多取决于分块策略而非模型选择。本Skill整合了经过生产验证的分块方法：保留段落连贯性的语义分块、尊重文档结构（标题、代码块、表格）的递归拆分，以及在分块边界间维持上下文的重叠窗口。每种策略都会匹配文档类型以实现最优检索质量。

Use When

适用场景

Building question-answering systems over private documents
Creating a searchable knowledge base from documentation, wikis, or PDFs
Reducing hallucination by grounding LLM responses in retrieved facts
Implementing semantic search across large document collections
Building customer support bots with product-specific knowledge
The user asks about RAG, vector search, or document embedding

基于私有文档构建问答系统
从文档、维基或PDF创建可搜索的知识库
通过将LLM响应基于检索到的事实来减少幻觉现象
在大型文档集合中实现语义搜索
构建具备产品特定知识的客户支持机器人
用户询问RAG、向量搜索或文档嵌入相关问题

How It Works

工作原理

mermaid

graph TD
    A[Documents: PDF, MD, HTML] --> B[Ingestion Pipeline]
    B --> C[Extract Text + Metadata]
    C --> D[Intelligent Chunking]
    D --> E[Generate Embeddings]
    E --> F[Index in Vector Store]
    G[User Query] --> H[Embed Query]
    H --> I[Semantic Search: Top-K]
    I --> J[Re-rank Results]
    J --> K[Construct Prompt with Context]
    K --> L[LLM Generation]
    L --> M[Response with Citations]

The pipeline has two phases: offline ingestion (documents to vectors) and online retrieval (query to answer). The re-ranking step applies a cross-encoder to refine the initial vector search results, improving precision before the generation step.

mermaid

graph TD
    A[Documents: PDF, MD, HTML] --> B[Ingestion Pipeline]
    B --> C[Extract Text + Metadata]
    C --> D[Intelligent Chunking]
    D --> E[Generate Embeddings]
    E --> F[Index in Vector Store]
    G[User Query] --> H[Embed Query]
    H --> I[Semantic Search: Top-K]
    I --> J[Re-rank Results]
    J --> K[Construct Prompt with Context]
    K --> L[LLM Generation]
    L --> M[Response with Citations]

该流程分为两个阶段：离线导入（文档到向量）和在线检索（查询到答案）。重排序步骤使用交叉编码器优化初始向量搜索结果，在生成步骤前提高精度。

Implementation

实现代码

python

from dataclasses import dataclass
import hashlib

@dataclass
class Chunk:
    text: str
    metadata: dict
    embedding: list[float] | None = None

    @property
    def id(self) -> str:
        return hashlib.sha256(self.text.encode()).hexdigest()[:16]

class RecursiveChunker:
    def __init__(self, max_tokens: int = 512, overlap: int = 64):
        self.max_tokens = max_tokens
        self.overlap = overlap
        self.separators = ["\n## ", "\n### ", "\n\n", "\n", ". ", " "]

    def chunk(self, text: str, metadata: dict) -> list[Chunk]:
        chunks = self._split(text, self.separators)
        return [
            Chunk(text=c.strip(), metadata={**metadata, "chunk_index": i})
            for i, c in enumerate(chunks) if c.strip()
        ]

    def _split(self, text: str, separators: list[str]) -> list[str]:
        if not separators or self._token_count(text) <= self.max_tokens:
            return [text]

        sep = separators[0]
        parts = text.split(sep)
        chunks, current = [], ""

        for part in parts:
            candidate = current + sep + part if current else part
            if self._token_count(candidate) > self.max_tokens and current:
                chunks.append(current)
                overlap_text = current[-self.overlap * 4:]
                current = overlap_text + sep + part
            else:
                current = candidate

        if current:
            chunks.append(current)

        result = []
        for chunk in chunks:
            if self._token_count(chunk) > self.max_tokens:
                result.extend(self._split(chunk, separators[1:]))
            else:
                result.append(chunk)
        return result

    def _token_count(self, text: str) -> int:
        return len(text) // 4

class RAGPipeline:
    def __init__(self, embedder, vector_store, llm):
        self.embedder = embedder
        self.store = vector_store
        self.llm = llm
        self.chunker = RecursiveChunker()

    async def ingest(self, documents: list[dict]) -> int:
        all_chunks = []
        for doc in documents:
            chunks = self.chunker.chunk(doc["text"], doc["metadata"])
            for chunk in chunks:
                chunk.embedding = await self.embedder.embed(chunk.text)
            all_chunks.extend(chunks)

        await self.store.upsert(all_chunks)
        return len(all_chunks)

    async def query(self, question: str, top_k: int = 5) -> dict:
        query_embedding = await self.embedder.embed(question)
        results = await self.store.search(query_embedding, top_k=top_k)

        context = "\n\n".join(
            f"[Source: {r.metadata.get('source', 'unknown')}]\n{r.text}" for r in results
        )

        prompt = f"""Answer the question based on the provided context. Cite sources.
If the context does not contain the answer, say so explicitly.

Context:
{context}

Question: {question}"""

        response = await self.llm.generate(prompt)
        return {"answer": response, "sources": [r.metadata for r in results]}

python

from dataclasses import dataclass
import hashlib

@dataclass
class Chunk:
    text: str
    metadata: dict
    embedding: list[float] | None = None

    @property
    def id(self) -> str:
        return hashlib.sha256(self.text.encode()).hexdigest()[:16]

class RecursiveChunker:
    def __init__(self, max_tokens: int = 512, overlap: int = 64):
        self.max_tokens = max_tokens
        self.overlap = overlap
        self.separators = ["\n## ", "\n### ", "\n\n", "\n", ". ", " "]

    def chunk(self, text: str, metadata: dict) -> list[Chunk]:
        chunks = self._split(text, self.separators)
        return [
            Chunk(text=c.strip(), metadata={**metadata, "chunk_index": i})
            for i, c in enumerate(chunks) if c.strip()
        ]

    def _split(self, text: str, separators: list[str]) -> list[str]:
        if not separators or self._token_count(text) <= self.max_tokens:
            return [text]

        sep = separators[0]
        parts = text.split(sep)
        chunks, current = [], ""

        for part in parts:
            candidate = current + sep + part if current else part
            if self._token_count(candidate) > self.max_tokens and current:
                chunks.append(current)
                overlap_text = current[-self.overlap * 4:]
                current = overlap_text + sep + part
            else:
                current = candidate

        if current:
            chunks.append(current)

        result = []
        for chunk in chunks:
            if self._token_count(chunk) > self.max_tokens:
                result.extend(self._split(chunk, separators[1:]))
            else:
                result.append(chunk)
        return result

    def _token_count(self, text: str) -> int:
        return len(text) // 4

class RAGPipeline:
    def __init__(self, embedder, vector_store, llm):
        self.embedder = embedder
        self.store = vector_store
        self.llm = llm
        self.chunker = RecursiveChunker()

    async def ingest(self, documents: list[dict]) -> int:
        all_chunks = []
        for doc in documents:
            chunks = self.chunker.chunk(doc["text"], doc["metadata"])
            for chunk in chunks:
                chunk.embedding = await self.embedder.embed(chunk.text)
            all_chunks.extend(chunks)

        await self.store.upsert(all_chunks)
        return len(all_chunks)

    async def query(self, question: str, top_k: int = 5) -> dict:
        query_embedding = await self.embedder.embed(question)
        results = await self.store.search(query_embedding, top_k=top_k)

        context = "\n\n".join(
            f"[Source: {r.metadata.get('source', 'unknown')}]\n{r.text}" for r in results
        )

        prompt = f"""Answer the question based on the provided context. Cite sources.
If the context does not contain the answer, say so explicitly.

Context:
{context}

Question: {question}"""

        response = await self.llm.generate(prompt)
        return {"answer": response, "sources": [r.metadata for r in results]}

Best Practices

最佳实践

Use recursive chunking that respects document structure (headings, paragraphs, code blocks)
Set chunk size to 256-512 tokens with 10-15% overlap for most use cases
Re-rank vector search results with a cross-encoder before passing to the LLM
Include source metadata in every chunk for citation generation
Deduplicate chunks by content hash before indexing to avoid retrieval noise
Instruct the LLM to say "I don't know" when the context lacks the answer

使用尊重文档结构（标题、段落、代码块）的递归分块策略
大多数场景下，将分块大小设置为256-512 tokens，重叠率为10-15%
在将向量搜索结果传递给LLM之前，使用交叉编码器进行重排序
在每个分块中包含来源元数据以生成引用
索引前通过内容哈希对分块进行去重，避免检索噪声
当上下文缺乏答案时，指示LLM明确回答“我不知道”

Platform Compatibility

平台兼容性

Platform	Support	Notes
Cursor	Full	Pipeline code generation
VS Code	Full	Python/TS RAG implementation
Windsurf	Full	RAG workflow support
Claude Code	Full	End-to-end RAG building
Cline	Full	Vector store integration
aider	Partial	Code-level support

平台	支持情况	说明
Cursor	完全支持	流程代码生成
VS Code	完全支持	Python/TS RAG实现
Windsurf	完全支持	RAG工作流支持
Claude Code	完全支持	端到端RAG构建
Cline	完全支持	向量存储集成
aider	部分支持	代码级支持

Related Skills

Keywords

关键词

rag

retrieval-augmented-generation

vector-search

embeddings

chunking

knowledge-base

semantic-search

document-qa

rag

retrieval-augmented-generation

vector-search

embeddings

chunking

knowledge-base

semantic-search

document-qa

knowledge-base-rag

Original

Translation

Knowledge Base RAG

Knowledge Base RAG

Description

描述

Use When

适用场景

How It Works

工作原理

Implementation

实现代码

Best Practices

最佳实践

Platform Compatibility

平台兼容性

Related Skills

相关技能

Keywords

关键词