knowledge-base-rag

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Knowledge Base RAG

Knowledge Base RAG

Description

描述

Knowledge Base RAG implements the complete Retrieval-Augmented Generation pipeline: document ingestion, intelligent chunking, embedding generation, vector store indexing, semantic retrieval, and grounded response generation. The agent builds RAG systems that answer questions from private knowledge bases with cited sources and reduced hallucination.
RAG solves the fundamental limitation of large language models: they cannot access information created after their training cutoff or proprietary information they were never trained on. By retrieving relevant documents from a vector store and injecting them into the prompt context, RAG grounds the model's responses in factual, up-to-date, organization-specific knowledge.
The quality of a RAG system depends on chunking strategy more than model choice. This skill encodes production-tested chunking approaches: semantic chunking that preserves paragraph coherence, recursive splitting that respects document structure (headings, code blocks, tables), and overlap windows that maintain context across chunk boundaries. Each strategy is matched to the document type for optimal retrieval quality.
Knowledge Base RAG实现了完整的Retrieval-Augmented Generation流程:文档导入、智能分块、嵌入生成、向量存储索引、语义检索以及基于事实的响应生成。该Agent可构建RAG系统,基于私有知识库回答问题并提供引用来源,减少幻觉现象。
RAG解决了大语言模型(LLM)的核心局限性:它们无法访问训练截止日期之后创建的信息,也无法获取从未训练过的专有信息。通过从向量存储中检索相关文档并将其注入提示上下文,RAG使模型的响应基于真实、最新且符合组织特定需求的知识。
RAG系统的质量更多取决于分块策略而非模型选择。本Skill整合了经过生产验证的分块方法:保留段落连贯性的语义分块、尊重文档结构(标题、代码块、表格)的递归拆分,以及在分块边界间维持上下文的重叠窗口。每种策略都会匹配文档类型以实现最优检索质量。

Use When

适用场景

  • Building question-answering systems over private documents
  • Creating a searchable knowledge base from documentation, wikis, or PDFs
  • Reducing hallucination by grounding LLM responses in retrieved facts
  • Implementing semantic search across large document collections
  • Building customer support bots with product-specific knowledge
  • The user asks about RAG, vector search, or document embedding
  • 基于私有文档构建问答系统
  • 从文档、维基或PDF创建可搜索的知识库
  • 通过将LLM响应基于检索到的事实来减少幻觉现象
  • 在大型文档集合中实现语义搜索
  • 构建具备产品特定知识的客户支持机器人
  • 用户询问RAG、向量搜索或文档嵌入相关问题

How It Works

工作原理

mermaid
graph TD
    A[Documents: PDF, MD, HTML] --> B[Ingestion Pipeline]
    B --> C[Extract Text + Metadata]
    C --> D[Intelligent Chunking]
    D --> E[Generate Embeddings]
    E --> F[Index in Vector Store]
    G[User Query] --> H[Embed Query]
    H --> I[Semantic Search: Top-K]
    I --> J[Re-rank Results]
    J --> K[Construct Prompt with Context]
    K --> L[LLM Generation]
    L --> M[Response with Citations]
The pipeline has two phases: offline ingestion (documents to vectors) and online retrieval (query to answer). The re-ranking step applies a cross-encoder to refine the initial vector search results, improving precision before the generation step.
mermaid
graph TD
    A[Documents: PDF, MD, HTML] --> B[Ingestion Pipeline]
    B --> C[Extract Text + Metadata]
    C --> D[Intelligent Chunking]
    D --> E[Generate Embeddings]
    E --> F[Index in Vector Store]
    G[User Query] --> H[Embed Query]
    H --> I[Semantic Search: Top-K]
    I --> J[Re-rank Results]
    J --> K[Construct Prompt with Context]
    K --> L[LLM Generation]
    L --> M[Response with Citations]
该流程分为两个阶段:离线导入(文档到向量)和在线检索(查询到答案)。重排序步骤使用交叉编码器优化初始向量搜索结果,在生成步骤前提高精度。

Implementation

实现代码

python
from dataclasses import dataclass
import hashlib

@dataclass
class Chunk:
    text: str
    metadata: dict
    embedding: list[float] | None = None

    @property
    def id(self) -> str:
        return hashlib.sha256(self.text.encode()).hexdigest()[:16]

class RecursiveChunker:
    def __init__(self, max_tokens: int = 512, overlap: int = 64):
        self.max_tokens = max_tokens
        self.overlap = overlap
        self.separators = ["\n## ", "\n### ", "\n\n", "\n", ". ", " "]

    def chunk(self, text: str, metadata: dict) -> list[Chunk]:
        chunks = self._split(text, self.separators)
        return [
            Chunk(text=c.strip(), metadata={**metadata, "chunk_index": i})
            for i, c in enumerate(chunks) if c.strip()
        ]

    def _split(self, text: str, separators: list[str]) -> list[str]:
        if not separators or self._token_count(text) <= self.max_tokens:
            return [text]

        sep = separators[0]
        parts = text.split(sep)
        chunks, current = [], ""

        for part in parts:
            candidate = current + sep + part if current else part
            if self._token_count(candidate) > self.max_tokens and current:
                chunks.append(current)
                overlap_text = current[-self.overlap * 4:]
                current = overlap_text + sep + part
            else:
                current = candidate

        if current:
            chunks.append(current)

        result = []
        for chunk in chunks:
            if self._token_count(chunk) > self.max_tokens:
                result.extend(self._split(chunk, separators[1:]))
            else:
                result.append(chunk)
        return result

    def _token_count(self, text: str) -> int:
        return len(text) // 4

class RAGPipeline:
    def __init__(self, embedder, vector_store, llm):
        self.embedder = embedder
        self.store = vector_store
        self.llm = llm
        self.chunker = RecursiveChunker()

    async def ingest(self, documents: list[dict]) -> int:
        all_chunks = []
        for doc in documents:
            chunks = self.chunker.chunk(doc["text"], doc["metadata"])
            for chunk in chunks:
                chunk.embedding = await self.embedder.embed(chunk.text)
            all_chunks.extend(chunks)

        await self.store.upsert(all_chunks)
        return len(all_chunks)

    async def query(self, question: str, top_k: int = 5) -> dict:
        query_embedding = await self.embedder.embed(question)
        results = await self.store.search(query_embedding, top_k=top_k)

        context = "\n\n".join(
            f"[Source: {r.metadata.get('source', 'unknown')}]\n{r.text}" for r in results
        )

        prompt = f"""Answer the question based on the provided context. Cite sources.
If the context does not contain the answer, say so explicitly.

Context:
{context}

Question: {question}"""

        response = await self.llm.generate(prompt)
        return {"answer": response, "sources": [r.metadata for r in results]}
python
from dataclasses import dataclass
import hashlib

@dataclass
class Chunk:
    text: str
    metadata: dict
    embedding: list[float] | None = None

    @property
    def id(self) -> str:
        return hashlib.sha256(self.text.encode()).hexdigest()[:16]

class RecursiveChunker:
    def __init__(self, max_tokens: int = 512, overlap: int = 64):
        self.max_tokens = max_tokens
        self.overlap = overlap
        self.separators = ["\n## ", "\n### ", "\n\n", "\n", ". ", " "]

    def chunk(self, text: str, metadata: dict) -> list[Chunk]:
        chunks = self._split(text, self.separators)
        return [
            Chunk(text=c.strip(), metadata={**metadata, "chunk_index": i})
            for i, c in enumerate(chunks) if c.strip()
        ]

    def _split(self, text: str, separators: list[str]) -> list[str]:
        if not separators or self._token_count(text) <= self.max_tokens:
            return [text]

        sep = separators[0]
        parts = text.split(sep)
        chunks, current = [], ""

        for part in parts:
            candidate = current + sep + part if current else part
            if self._token_count(candidate) > self.max_tokens and current:
                chunks.append(current)
                overlap_text = current[-self.overlap * 4:]
                current = overlap_text + sep + part
            else:
                current = candidate

        if current:
            chunks.append(current)

        result = []
        for chunk in chunks:
            if self._token_count(chunk) > self.max_tokens:
                result.extend(self._split(chunk, separators[1:]))
            else:
                result.append(chunk)
        return result

    def _token_count(self, text: str) -> int:
        return len(text) // 4

class RAGPipeline:
    def __init__(self, embedder, vector_store, llm):
        self.embedder = embedder
        self.store = vector_store
        self.llm = llm
        self.chunker = RecursiveChunker()

    async def ingest(self, documents: list[dict]) -> int:
        all_chunks = []
        for doc in documents:
            chunks = self.chunker.chunk(doc["text"], doc["metadata"])
            for chunk in chunks:
                chunk.embedding = await self.embedder.embed(chunk.text)
            all_chunks.extend(chunks)

        await self.store.upsert(all_chunks)
        return len(all_chunks)

    async def query(self, question: str, top_k: int = 5) -> dict:
        query_embedding = await self.embedder.embed(question)
        results = await self.store.search(query_embedding, top_k=top_k)

        context = "\n\n".join(
            f"[Source: {r.metadata.get('source', 'unknown')}]\n{r.text}" for r in results
        )

        prompt = f"""Answer the question based on the provided context. Cite sources.
If the context does not contain the answer, say so explicitly.

Context:
{context}

Question: {question}"""

        response = await self.llm.generate(prompt)
        return {"answer": response, "sources": [r.metadata for r in results]}

Best Practices

最佳实践

  • Use recursive chunking that respects document structure (headings, paragraphs, code blocks)
  • Set chunk size to 256-512 tokens with 10-15% overlap for most use cases
  • Re-rank vector search results with a cross-encoder before passing to the LLM
  • Include source metadata in every chunk for citation generation
  • Deduplicate chunks by content hash before indexing to avoid retrieval noise
  • Instruct the LLM to say "I don't know" when the context lacks the answer
  • 使用尊重文档结构(标题、段落、代码块)的递归分块策略
  • 大多数场景下,将分块大小设置为256-512 tokens,重叠率为10-15%
  • 在将向量搜索结果传递给LLM之前,使用交叉编码器进行重排序
  • 在每个分块中包含来源元数据以生成引用
  • 索引前通过内容哈希对分块进行去重,避免检索噪声
  • 当上下文缺乏答案时,指示LLM明确回答“我不知道”

Platform Compatibility

平台兼容性

PlatformSupportNotes
CursorFullPipeline code generation
VS CodeFullPython/TS RAG implementation
WindsurfFullRAG workflow support
Claude CodeFullEnd-to-end RAG building
ClineFullVector store integration
aiderPartialCode-level support
平台支持情况说明
Cursor完全支持流程代码生成
VS Code完全支持Python/TS RAG实现
Windsurf完全支持RAG工作流支持
Claude Code完全支持端到端RAG构建
Cline完全支持向量存储集成
aider部分支持代码级支持

Related Skills

相关技能

  • AI Chat Studio
  • Workflow Orchestration
  • Knowledge Base Injection
  • Entity Memory Management
  • AI Chat Studio
  • Workflow Orchestration
  • Knowledge Base Injection
  • Entity Memory Management

Keywords

关键词

rag
retrieval-augmented-generation
vector-search
embeddings
chunking
knowledge-base
semantic-search
document-qa

© 2026 googleadsagent.ai™ | Agent Skills™ | MIT License
rag
retrieval-augmented-generation
vector-search
embeddings
chunking
knowledge-base
semantic-search
document-qa

© 2026 googleadsagent.ai™ | Agent Skills™ | MIT License