agentic-rag

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Agentic RAG

Agentic RAG

Build RAG systems that reason, plan, and adaptively retrieve information.
构建具备推理、规划和自适应检索能力的RAG系统。

When to Use

适用场景

  • Questions require multiple retrieval steps
  • Need to combine information from different sources
  • Query needs decomposition into sub-queries
  • Results need validation or refinement
  • Complex reasoning over retrieved documents
  • 问题需要多步检索才能解决
  • 需要整合来自不同来源的信息
  • 查询需要分解为子查询
  • 结果需要验证或优化
  • 需要对检索到的文档进行复杂推理

Simple RAG vs Agentic RAG

简单RAG vs Agentic RAG

Simple RAG:
Query → Retrieve → Generate → Answer

Agentic RAG:
Query → Plan → [Retrieve → Analyze → Decide]*n → Synthesize → Answer
Simple RAG:
Query → Retrieve → Generate → Answer

Agentic RAG:
Query → Plan → [Retrieve → Analyze → Decide]*n → Synthesize → Answer

Core Architecture

核心架构

┌─────────────────────────────────────────────────────────┐
│                     User Question                        │
└─────────────────────────┬───────────────────────────────┘
                ┌───────────────────┐
                │   Query Analyzer  │
                │   (Decompose?)    │
                └─────────┬─────────┘
         ┌────────────────┼────────────────┐
         │                │                │
         ▼                ▼                ▼
   ┌──────────┐    ┌──────────┐    ┌──────────┐
   │ Sub-Q 1  │    │ Sub-Q 2  │    │ Sub-Q 3  │
   └────┬─────┘    └────┬─────┘    └────┬─────┘
        │               │               │
        ▼               ▼               ▼
   ┌──────────┐    ┌──────────┐    ┌──────────┐
   │ Retrieve │    │ Retrieve │    │ Retrieve │
   └────┬─────┘    └────┬─────┘    └────┬─────┘
        │               │               │
        └───────────────┼───────────────┘
              ┌───────────────────┐
              │    Synthesizer    │
              │  (Combine & Cite) │
              └─────────┬─────────┘
              ┌───────────────────┐
              │   Final Answer    │
              └───────────────────┘
┌─────────────────────────────────────────────────────────┐
│                     User Question                        │
└─────────────────────────┬───────────────────────────────┘
                ┌───────────────────┐
                │   Query Analyzer  │
                │   (Decompose?)    │
                └─────────┬─────────┘
         ┌────────────────┼────────────────┐
         │                │                │
         ▼                ▼                ▼
   ┌──────────┐    ┌──────────┐    ┌──────────┐
   │ Sub-Q 1  │    │ Sub-Q 2  │    │ Sub-Q 3  │
   └────┬─────┘    └────┬─────┘    └────┬─────┘
        │               │               │
        ▼               ▼               ▼
   ┌──────────┐    ┌──────────┐    ┌──────────┐
   │ Retrieve │    │ Retrieve │    │ Retrieve │
   └────┬─────┘    └────┬─────┘    └────┬─────┘
        │               │               │
        └───────────────┼───────────────┘
              ┌───────────────────┐
              │    Synthesizer    │
              │  (Combine & Cite) │
              └─────────┬─────────┘
              ┌───────────────────┐
              │   Final Answer    │
              └───────────────────┘

Implementation with LangGraph

基于LangGraph的实现

python
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from typing import TypedDict, List, Annotated
import operator

class AgentState(TypedDict):
    question: str
    sub_questions: List[str]
    retrieved_docs: Annotated[List, operator.add]
    current_step: int
    final_answer: str
python
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from typing import TypedDict, List, Annotated
import operator

class AgentState(TypedDict):
    question: str
    sub_questions: List[str]
    retrieved_docs: Annotated[List, operator.add]
    current_step: int
    final_answer: str

Nodes

Nodes

def analyze_query(state: AgentState) -> AgentState: """Decompose complex query into sub-questions.""" llm = ChatOpenAI(model="gpt-4")
prompt = f"""Analyze this question and break it into sub-questions if needed.
Question: {state['question']}

Return a JSON list of sub-questions, or just the original if simple."""

response = llm.invoke(prompt)
sub_questions = parse_questions(response.content)

return {"sub_questions": sub_questions, "current_step": 0}
def retrieve_for_subquery(state: AgentState) -> AgentState: """Retrieve documents for current sub-question.""" current_q = state["sub_questions"][state["current_step"]] docs = retriever.invoke(current_q)
return {
    "retrieved_docs": docs,
    "current_step": state["current_step"] + 1
}
def should_continue(state: AgentState) -> str: """Check if more sub-questions to process.""" if state["current_step"] < len(state["sub_questions"]): return "retrieve" return "synthesize"
def synthesize_answer(state: AgentState) -> AgentState: """Combine all retrieved info into final answer.""" llm = ChatOpenAI(model="gpt-4")
context = "\n\n".join([doc.page_content for doc in state["retrieved_docs"]])

prompt = f"""Based on the following context, answer the question.
Cite sources using [1], [2], etc.

Question: {state['question']}

Context:
{context}
"""

response = llm.invoke(prompt)
return {"final_answer": response.content}
def analyze_query(state: AgentState) -> AgentState: """Decompose complex query into sub-questions.""" llm = ChatOpenAI(model="gpt-4")
prompt = f"""Analyze this question and break it into sub-questions if needed.
Question: {state['question']}

Return a JSON list of sub-questions, or just the original if simple."""

response = llm.invoke(prompt)
sub_questions = parse_questions(response.content)

return {"sub_questions": sub_questions, "current_step": 0}
def retrieve_for_subquery(state: AgentState) -> AgentState: """Retrieve documents for current sub-question.""" current_q = state["sub_questions"][state["current_step"]] docs = retriever.invoke(current_q)
return {
    "retrieved_docs": docs,
    "current_step": state["current_step"] + 1
}
def should_continue(state: AgentState) -> str: """Check if more sub-questions to process.""" if state["current_step"] < len(state["sub_questions"]): return "retrieve" return "synthesize"
def synthesize_answer(state: AgentState) -> AgentState: """Combine all retrieved info into final answer.""" llm = ChatOpenAI(model="gpt-4")
context = "\n\n".join([doc.page_content for doc in state["retrieved_docs"]])

prompt = f"""Based on the following context, answer the question.
Cite sources using [1], [2], etc.

Question: {state['question']}

Context:
{context}
"""

response = llm.invoke(prompt)
return {"final_answer": response.content}

Build graph

Build graph

workflow = StateGraph(AgentState)
workflow.add_node("analyze", analyze_query) workflow.add_node("retrieve", retrieve_for_subquery) workflow.add_node("synthesize", synthesize_answer)
workflow.set_entry_point("analyze") workflow.add_edge("analyze", "retrieve") workflow.add_conditional_edges("retrieve", should_continue, { "retrieve": "retrieve", "synthesize": "synthesize" }) workflow.add_edge("synthesize", END)
agent = workflow.compile()
workflow = StateGraph(AgentState)
workflow.add_node("analyze", analyze_query) workflow.add_node("retrieve", retrieve_for_subquery) workflow.add_node("synthesize", synthesize_answer)
workflow.set_entry_point("analyze") workflow.add_edge("analyze", "retrieve") workflow.add_conditional_edges("retrieve", should_continue, { "retrieve": "retrieve", "synthesize": "synthesize" }) workflow.add_edge("synthesize", END)
agent = workflow.compile()

Run

Run

result = agent.invoke({"question": "Compare AWS and GCP pricing for ML workloads"})
undefined
result = agent.invoke({"question": "Compare AWS and GCP pricing for ML workloads"})
undefined

Self-RAG: Retrieve When Needed

Self-RAG:按需检索

python
def self_rag_node(state: AgentState) -> AgentState:
    """Decide whether retrieval is needed."""
    llm = ChatOpenAI(model="gpt-4")

    prompt = f"""Given this question, do you need to retrieve external information?
    Question: {state['question']}

    Consider:
    - Is this factual or requires current data? → RETRIEVE
    - Is this reasoning/math/coding? → NO RETRIEVE
    - Do you have high confidence? → NO RETRIEVE

    Answer: RETRIEVE or NO_RETRIEVE"""

    response = llm.invoke(prompt)

    if "RETRIEVE" in response.content and "NO" not in response.content:
        return {"needs_retrieval": True}
    return {"needs_retrieval": False}
python
def self_rag_node(state: AgentState) -> AgentState:
    """Decide whether retrieval is needed."""
    llm = ChatOpenAI(model="gpt-4")

    prompt = f"""Given this question, do you need to retrieve external information?
    Question: {state['question']}

    Consider:
    - Is this factual or requires current data? → RETRIEVE
    - Is this reasoning/math/coding? → NO RETRIEVE
    - Do you have high confidence? → NO RETRIEVE

    Answer: RETRIEVE or NO_RETRIEVE"""

    response = llm.invoke(prompt)

    if "RETRIEVE" in response.content and "NO" not in response.content:
        return {"needs_retrieval": True}
    return {"needs_retrieval": False}

Tool-Using RAG Agent

工具调用型RAG Agent

python
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain.tools import Tool
python
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain.tools import Tool

Define retrieval tools

Define retrieval tools

tools = [ Tool( name="search_docs", func=lambda q: retriever.invoke(q), description="Search internal documentation" ), Tool( name="search_code", func=lambda q: code_retriever.invoke(q), description="Search codebase for examples" ), Tool( name="search_tickets", func=lambda q: jira_retriever.invoke(q), description="Search JIRA tickets and issues" ), Tool( name="calculator", func=lambda x: eval(x), description="Perform calculations" ) ]
tools = [ Tool( name="search_docs", func=lambda q: retriever.invoke(q), description="Search internal documentation" ), Tool( name="search_code", func=lambda q: code_retriever.invoke(q), description="Search codebase for examples" ), Tool( name="search_tickets", func=lambda q: jira_retriever.invoke(q), description="Search JIRA tickets and issues" ), Tool( name="calculator", func=lambda x: eval(x), description="Perform calculations" ) ]

Create agent

Create agent

llm = ChatOpenAI(model="gpt-4") agent = create_openai_tools_agent(llm, tools, prompt) executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
llm = ChatOpenAI(model="gpt-4") agent = create_openai_tools_agent(llm, tools, prompt) executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

Agent decides which tools to use

Agent decides which tools to use

result = executor.invoke({ "input": "What's the average response time mentioned in our API docs, and how does it compare to ticket #1234?" })
undefined
result = executor.invoke({ "input": "What's the average response time mentioned in our API docs, and how does it compare to ticket #1234?" })
undefined

Adaptive Retrieval

自适应检索

python
def adaptive_retrieve(query: str, min_score: float = 0.7) -> list:
    """Retrieve with quality check, expand if needed."""

    # Initial retrieval
    results = retriever.invoke(query)
    scores = [doc.metadata.get("score", 0) for doc in results]

    # Check quality
    if max(scores) < min_score:
        # Try query expansion
        expanded = expand_query(query)
        for eq in expanded:
            more_results = retriever.invoke(eq)
            results.extend(more_results)

        # Deduplicate
        results = deduplicate(results)

    # Rerank
    results = rerank(query, results)

    return results[:5]

def expand_query(query: str) -> list:
    """Generate alternative phrasings."""
    llm = ChatOpenAI(model="gpt-4")
    prompt = f"Generate 3 alternative phrasings for: {query}"
    response = llm.invoke(prompt)
    return parse_alternatives(response.content)
python
def adaptive_retrieve(query: str, min_score: float = 0.7) -> list:
    """Retrieve with quality check, expand if needed."""

    # Initial retrieval
    results = retriever.invoke(query)
    scores = [doc.metadata.get("score", 0) for doc in results]

    # Check quality
    if max(scores) < min_score:
        # Try query expansion
        expanded = expand_query(query)
        for eq in expanded:
            more_results = retriever.invoke(eq)
            results.extend(more_results)

        # Deduplicate
        results = deduplicate(results)

    # Rerank
    results = rerank(query, results)

    return results[:5]

def expand_query(query: str) -> list:
    """Generate alternative phrasings."""
    llm = ChatOpenAI(model="gpt-4")
    prompt = f"Generate 3 alternative phrasings for: {query}"
    response = llm.invoke(prompt)
    return parse_alternatives(response.content)

Patterns Summary

模式总结

PatternWhen to UseComplexity
Query DecompositionMulti-part questionsMedium
Self-RAGUncertain if retrieval neededLow
Tool-Using AgentMultiple data sourcesHigh
Adaptive RetrievalVariable quality needsMedium
Iterative RefinementResearch tasksHigh
模式适用场景复杂度
查询分解多部分问题中等
Self-RAG不确定是否需要检索
工具调用型Agent多数据源场景
自适应检索对检索质量有可变需求中等
迭代优化研究类任务

Best Practices

最佳实践

  1. Start simple - add agency only when needed
  2. Limit iterations - set max steps to prevent loops
  3. Log decisions - track when/why agent retrieves
  4. Validate outputs - agent can hallucinate tool usage
  5. Cost awareness - more steps = more LLM calls
  1. 从简入手 - 仅在必要时引入Agent能力
  2. 限制迭代次数 - 设置最大步骤数以避免循环
  3. 记录决策过程 - 跟踪Agent何时、为何进行检索
  4. 验证输出结果 - Agent可能会虚构工具调用情况
  5. 成本意识 - 步骤越多,LLM调用次数越多