agentic-rag
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAgentic RAG
Agentic RAG
Build RAG systems that reason, plan, and adaptively retrieve information.
构建具备推理、规划和自适应检索能力的RAG系统。
When to Use
适用场景
- Questions require multiple retrieval steps
- Need to combine information from different sources
- Query needs decomposition into sub-queries
- Results need validation or refinement
- Complex reasoning over retrieved documents
- 问题需要多步检索才能解决
- 需要整合来自不同来源的信息
- 查询需要分解为子查询
- 结果需要验证或优化
- 需要对检索到的文档进行复杂推理
Simple RAG vs Agentic RAG
简单RAG vs Agentic RAG
Simple RAG:
Query → Retrieve → Generate → Answer
Agentic RAG:
Query → Plan → [Retrieve → Analyze → Decide]*n → Synthesize → AnswerSimple RAG:
Query → Retrieve → Generate → Answer
Agentic RAG:
Query → Plan → [Retrieve → Analyze → Decide]*n → Synthesize → AnswerCore Architecture
核心架构
┌─────────────────────────────────────────────────────────┐
│ User Question │
└─────────────────────────┬───────────────────────────────┘
│
▼
┌───────────────────┐
│ Query Analyzer │
│ (Decompose?) │
└─────────┬─────────┘
│
┌────────────────┼────────────────┐
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Sub-Q 1 │ │ Sub-Q 2 │ │ Sub-Q 3 │
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Retrieve │ │ Retrieve │ │ Retrieve │
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │
└───────────────┼───────────────┘
│
▼
┌───────────────────┐
│ Synthesizer │
│ (Combine & Cite) │
└─────────┬─────────┘
│
▼
┌───────────────────┐
│ Final Answer │
└───────────────────┘┌─────────────────────────────────────────────────────────┐
│ User Question │
└─────────────────────────┬───────────────────────────────┘
│
▼
┌───────────────────┐
│ Query Analyzer │
│ (Decompose?) │
└─────────┬─────────┘
│
┌────────────────┼────────────────┐
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Sub-Q 1 │ │ Sub-Q 2 │ │ Sub-Q 3 │
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Retrieve │ │ Retrieve │ │ Retrieve │
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │
└───────────────┼───────────────┘
│
▼
┌───────────────────┐
│ Synthesizer │
│ (Combine & Cite) │
└─────────┬─────────┘
│
▼
┌───────────────────┐
│ Final Answer │
└───────────────────┘Implementation with LangGraph
基于LangGraph的实现
python
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from typing import TypedDict, List, Annotated
import operator
class AgentState(TypedDict):
question: str
sub_questions: List[str]
retrieved_docs: Annotated[List, operator.add]
current_step: int
final_answer: strpython
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from typing import TypedDict, List, Annotated
import operator
class AgentState(TypedDict):
question: str
sub_questions: List[str]
retrieved_docs: Annotated[List, operator.add]
current_step: int
final_answer: strNodes
Nodes
def analyze_query(state: AgentState) -> AgentState:
"""Decompose complex query into sub-questions."""
llm = ChatOpenAI(model="gpt-4")
prompt = f"""Analyze this question and break it into sub-questions if needed.
Question: {state['question']}
Return a JSON list of sub-questions, or just the original if simple."""
response = llm.invoke(prompt)
sub_questions = parse_questions(response.content)
return {"sub_questions": sub_questions, "current_step": 0}def retrieve_for_subquery(state: AgentState) -> AgentState:
"""Retrieve documents for current sub-question."""
current_q = state["sub_questions"][state["current_step"]]
docs = retriever.invoke(current_q)
return {
"retrieved_docs": docs,
"current_step": state["current_step"] + 1
}def should_continue(state: AgentState) -> str:
"""Check if more sub-questions to process."""
if state["current_step"] < len(state["sub_questions"]):
return "retrieve"
return "synthesize"
def synthesize_answer(state: AgentState) -> AgentState:
"""Combine all retrieved info into final answer."""
llm = ChatOpenAI(model="gpt-4")
context = "\n\n".join([doc.page_content for doc in state["retrieved_docs"]])
prompt = f"""Based on the following context, answer the question.
Cite sources using [1], [2], etc.
Question: {state['question']}
Context:
{context}
"""
response = llm.invoke(prompt)
return {"final_answer": response.content}def analyze_query(state: AgentState) -> AgentState:
"""Decompose complex query into sub-questions."""
llm = ChatOpenAI(model="gpt-4")
prompt = f"""Analyze this question and break it into sub-questions if needed.
Question: {state['question']}
Return a JSON list of sub-questions, or just the original if simple."""
response = llm.invoke(prompt)
sub_questions = parse_questions(response.content)
return {"sub_questions": sub_questions, "current_step": 0}def retrieve_for_subquery(state: AgentState) -> AgentState:
"""Retrieve documents for current sub-question."""
current_q = state["sub_questions"][state["current_step"]]
docs = retriever.invoke(current_q)
return {
"retrieved_docs": docs,
"current_step": state["current_step"] + 1
}def should_continue(state: AgentState) -> str:
"""Check if more sub-questions to process."""
if state["current_step"] < len(state["sub_questions"]):
return "retrieve"
return "synthesize"
def synthesize_answer(state: AgentState) -> AgentState:
"""Combine all retrieved info into final answer."""
llm = ChatOpenAI(model="gpt-4")
context = "\n\n".join([doc.page_content for doc in state["retrieved_docs"]])
prompt = f"""Based on the following context, answer the question.
Cite sources using [1], [2], etc.
Question: {state['question']}
Context:
{context}
"""
response = llm.invoke(prompt)
return {"final_answer": response.content}Build graph
Build graph
workflow = StateGraph(AgentState)
workflow.add_node("analyze", analyze_query)
workflow.add_node("retrieve", retrieve_for_subquery)
workflow.add_node("synthesize", synthesize_answer)
workflow.set_entry_point("analyze")
workflow.add_edge("analyze", "retrieve")
workflow.add_conditional_edges("retrieve", should_continue, {
"retrieve": "retrieve",
"synthesize": "synthesize"
})
workflow.add_edge("synthesize", END)
agent = workflow.compile()
workflow = StateGraph(AgentState)
workflow.add_node("analyze", analyze_query)
workflow.add_node("retrieve", retrieve_for_subquery)
workflow.add_node("synthesize", synthesize_answer)
workflow.set_entry_point("analyze")
workflow.add_edge("analyze", "retrieve")
workflow.add_conditional_edges("retrieve", should_continue, {
"retrieve": "retrieve",
"synthesize": "synthesize"
})
workflow.add_edge("synthesize", END)
agent = workflow.compile()
Run
Run
result = agent.invoke({"question": "Compare AWS and GCP pricing for ML workloads"})
undefinedresult = agent.invoke({"question": "Compare AWS and GCP pricing for ML workloads"})
undefinedSelf-RAG: Retrieve When Needed
Self-RAG:按需检索
python
def self_rag_node(state: AgentState) -> AgentState:
"""Decide whether retrieval is needed."""
llm = ChatOpenAI(model="gpt-4")
prompt = f"""Given this question, do you need to retrieve external information?
Question: {state['question']}
Consider:
- Is this factual or requires current data? → RETRIEVE
- Is this reasoning/math/coding? → NO RETRIEVE
- Do you have high confidence? → NO RETRIEVE
Answer: RETRIEVE or NO_RETRIEVE"""
response = llm.invoke(prompt)
if "RETRIEVE" in response.content and "NO" not in response.content:
return {"needs_retrieval": True}
return {"needs_retrieval": False}python
def self_rag_node(state: AgentState) -> AgentState:
"""Decide whether retrieval is needed."""
llm = ChatOpenAI(model="gpt-4")
prompt = f"""Given this question, do you need to retrieve external information?
Question: {state['question']}
Consider:
- Is this factual or requires current data? → RETRIEVE
- Is this reasoning/math/coding? → NO RETRIEVE
- Do you have high confidence? → NO RETRIEVE
Answer: RETRIEVE or NO_RETRIEVE"""
response = llm.invoke(prompt)
if "RETRIEVE" in response.content and "NO" not in response.content:
return {"needs_retrieval": True}
return {"needs_retrieval": False}Tool-Using RAG Agent
工具调用型RAG Agent
python
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain.tools import Toolpython
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain.tools import ToolDefine retrieval tools
Define retrieval tools
tools = [
Tool(
name="search_docs",
func=lambda q: retriever.invoke(q),
description="Search internal documentation"
),
Tool(
name="search_code",
func=lambda q: code_retriever.invoke(q),
description="Search codebase for examples"
),
Tool(
name="search_tickets",
func=lambda q: jira_retriever.invoke(q),
description="Search JIRA tickets and issues"
),
Tool(
name="calculator",
func=lambda x: eval(x),
description="Perform calculations"
)
]
tools = [
Tool(
name="search_docs",
func=lambda q: retriever.invoke(q),
description="Search internal documentation"
),
Tool(
name="search_code",
func=lambda q: code_retriever.invoke(q),
description="Search codebase for examples"
),
Tool(
name="search_tickets",
func=lambda q: jira_retriever.invoke(q),
description="Search JIRA tickets and issues"
),
Tool(
name="calculator",
func=lambda x: eval(x),
description="Perform calculations"
)
]
Create agent
Create agent
llm = ChatOpenAI(model="gpt-4")
agent = create_openai_tools_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
llm = ChatOpenAI(model="gpt-4")
agent = create_openai_tools_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
Agent decides which tools to use
Agent decides which tools to use
result = executor.invoke({
"input": "What's the average response time mentioned in our API docs, and how does it compare to ticket #1234?"
})
undefinedresult = executor.invoke({
"input": "What's the average response time mentioned in our API docs, and how does it compare to ticket #1234?"
})
undefinedAdaptive Retrieval
自适应检索
python
def adaptive_retrieve(query: str, min_score: float = 0.7) -> list:
"""Retrieve with quality check, expand if needed."""
# Initial retrieval
results = retriever.invoke(query)
scores = [doc.metadata.get("score", 0) for doc in results]
# Check quality
if max(scores) < min_score:
# Try query expansion
expanded = expand_query(query)
for eq in expanded:
more_results = retriever.invoke(eq)
results.extend(more_results)
# Deduplicate
results = deduplicate(results)
# Rerank
results = rerank(query, results)
return results[:5]
def expand_query(query: str) -> list:
"""Generate alternative phrasings."""
llm = ChatOpenAI(model="gpt-4")
prompt = f"Generate 3 alternative phrasings for: {query}"
response = llm.invoke(prompt)
return parse_alternatives(response.content)python
def adaptive_retrieve(query: str, min_score: float = 0.7) -> list:
"""Retrieve with quality check, expand if needed."""
# Initial retrieval
results = retriever.invoke(query)
scores = [doc.metadata.get("score", 0) for doc in results]
# Check quality
if max(scores) < min_score:
# Try query expansion
expanded = expand_query(query)
for eq in expanded:
more_results = retriever.invoke(eq)
results.extend(more_results)
# Deduplicate
results = deduplicate(results)
# Rerank
results = rerank(query, results)
return results[:5]
def expand_query(query: str) -> list:
"""Generate alternative phrasings."""
llm = ChatOpenAI(model="gpt-4")
prompt = f"Generate 3 alternative phrasings for: {query}"
response = llm.invoke(prompt)
return parse_alternatives(response.content)Patterns Summary
模式总结
| Pattern | When to Use | Complexity |
|---|---|---|
| Query Decomposition | Multi-part questions | Medium |
| Self-RAG | Uncertain if retrieval needed | Low |
| Tool-Using Agent | Multiple data sources | High |
| Adaptive Retrieval | Variable quality needs | Medium |
| Iterative Refinement | Research tasks | High |
| 模式 | 适用场景 | 复杂度 |
|---|---|---|
| 查询分解 | 多部分问题 | 中等 |
| Self-RAG | 不确定是否需要检索 | 低 |
| 工具调用型Agent | 多数据源场景 | 高 |
| 自适应检索 | 对检索质量有可变需求 | 中等 |
| 迭代优化 | 研究类任务 | 高 |
Best Practices
最佳实践
- Start simple - add agency only when needed
- Limit iterations - set max steps to prevent loops
- Log decisions - track when/why agent retrieves
- Validate outputs - agent can hallucinate tool usage
- Cost awareness - more steps = more LLM calls
- 从简入手 - 仅在必要时引入Agent能力
- 限制迭代次数 - 设置最大步骤数以避免循环
- 记录决策过程 - 跟踪Agent何时、为何进行检索
- 验证输出结果 - Agent可能会虚构工具调用情况
- 成本意识 - 步骤越多,LLM调用次数越多