llms-generative-ai

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

LLMs & Generative AI

大语言模型(LLMs)与生成式AI

Production-grade LLM applications with prompt engineering, RAG systems, and modern AI development patterns.
基于提示词工程、RAG系统及现代AI开发模式的生产级LLM应用。

Quick Start

快速开始

python
undefined
python
undefined

Production RAG System with LangChain (2024-2025)

Production RAG System with LangChain (2024-2025)

from langchain_openai import ChatOpenAI, OpenAIEmbeddings from langchain_community.vectorstores import Chroma from langchain.text_splitter import RecursiveCharacterTextSplitter from langchain_core.prompts import ChatPromptTemplate from langchain_core.runnables import RunnablePassthrough from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI, OpenAIEmbeddings from langchain_community.vectorstores import Chroma from langchain.text_splitter import RecursiveCharacterTextSplitter from langchain_core.prompts import ChatPromptTemplate from langchain_core.runnables import RunnablePassthrough from langchain_core.output_parsers import StrOutputParser

Initialize components

Initialize components

llm = ChatOpenAI(model="gpt-4-turbo-preview", temperature=0) embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
llm = ChatOpenAI(model="gpt-4-turbo-preview", temperature=0) embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

Document processing

Document processing

text_splitter = RecursiveCharacterTextSplitter( chunk_size=1000, chunk_overlap=200, separators=["\n\n", "\n", ". ", " ", ""] )
documents = text_splitter.split_documents(raw_documents)
text_splitter = RecursiveCharacterTextSplitter( chunk_size=1000, chunk_overlap=200, separators=["\n\n", "\n", ". ", " ", ""] )
documents = text_splitter.split_documents(raw_documents)

Vector store

Vector store

vectorstore = Chroma.from_documents( documents=documents, embedding=embeddings, persist_directory="./chroma_db" ) retriever = vectorstore.as_retriever( search_type="mmr", # Maximum Marginal Relevance search_kwargs={"k": 5, "fetch_k": 10} )
vectorstore = Chroma.from_documents( documents=documents, embedding=embeddings, persist_directory="./chroma_db" ) retriever = vectorstore.as_retriever( search_type="mmr", # Maximum Marginal Relevance search_kwargs={"k": 5, "fetch_k": 10} )

RAG chain

RAG chain

template = """Answer the question based only on the following context:
Context: {context}
Question: {question}
Answer thoughtfully and cite specific parts of the context."""
prompt = ChatPromptTemplate.from_template(template)
rag_chain = ( {"context": retriever, "question": RunnablePassthrough()} | prompt | llm | StrOutputParser() )
template = """Answer the question based only on the following context:
Context: {context}
Question: {question}
Answer thoughtfully and cite specific parts of the context."""
prompt = ChatPromptTemplate.from_template(template)
rag_chain = ( {"context": retriever, "question": RunnablePassthrough()} | prompt | llm | StrOutputParser() )

Query

Query

response = rag_chain.invoke("What are the key features?") print(response)
undefined
response = rag_chain.invoke("What are the key features?") print(response)
undefined

Core Concepts

核心概念

1. Prompt Engineering Patterns

1. 提示词工程模式

python
from langchain_core.prompts import ChatPromptTemplate, FewShotChatMessagePromptTemplate
python
from langchain_core.prompts import ChatPromptTemplate, FewShotChatMessagePromptTemplate

System prompt design

System prompt design

system_prompt = """You are an expert data analyst assistant.
CAPABILITIES:
  • Analyze data patterns and trends
  • Generate SQL queries
  • Explain statistical concepts
CONSTRAINTS:
  • Only use information provided in the context
  • Acknowledge uncertainty when relevant
  • Format outputs in clear, structured way
OUTPUT FORMAT:
  • Start with a brief summary
  • Use bullet points for key findings
  • Include confidence level (high/medium/low) """
system_prompt = """You are an expert data analyst assistant.
CAPABILITIES:
  • Analyze data patterns and trends
  • Generate SQL queries
  • Explain statistical concepts
CONSTRAINTS:
  • Only use information provided in the context
  • Acknowledge uncertainty when relevant
  • Format outputs in clear, structured way
OUTPUT FORMAT:
  • Start with a brief summary
  • Use bullet points for key findings
  • Include confidence level (high/medium/low) """

Few-shot prompting

Few-shot prompting

examples = [ {"input": "What's the average order value?", "output": "
sql\nSELECT AVG(total_amount) as avg_order_value\nFROM orders\nWHERE status = 'completed';\n
"}, {"input": "Show top customers by revenue", "output": "
sql\nSELECT customer_id, SUM(total_amount) as revenue\nFROM orders\nGROUP BY customer_id\nORDER BY revenue DESC\nLIMIT 10;\n
"} ]
example_prompt = ChatPromptTemplate.from_messages([ ("human", "{input}"), ("ai", "{output}") ])
few_shot_prompt = FewShotChatMessagePromptTemplate( example_prompt=example_prompt, examples=examples )
examples = [ {"input": "What's the average order value?", "output": "
sql\nSELECT AVG(total_amount) as avg_order_value\nFROM orders\nWHERE status = 'completed';\n
"}, {"input": "Show top customers by revenue", "output": "
sql\nSELECT customer_id, SUM(total_amount) as revenue\nFROM orders\nGROUP BY customer_id\nORDER BY revenue DESC\nLIMIT 10;\n
"} ]
example_prompt = ChatPromptTemplate.from_messages([ ("human", "{input}"), ("ai", "{output}") ])
few_shot_prompt = FewShotChatMessagePromptTemplate( example_prompt=example_prompt, examples=examples )

Chain of Thought prompting

Chain of Thought prompting

cot_prompt = """Let's solve this step by step:
Question: {question}
Step 1: Identify the key components Step 2: Break down the problem Step 3: Apply relevant knowledge Step 4: Synthesize the answer
Reasoning:"""
cot_prompt = """Let's solve this step by step:
Question: {question}
Step 1: Identify the key components Step 2: Break down the problem Step 3: Apply relevant knowledge Step 4: Synthesize the answer
Reasoning:"""

Self-consistency (multiple reasoning paths)

Self-consistency (multiple reasoning paths)

async def self_consistent_answer(question: str, n_samples: int = 5) -> str: responses = await asyncio.gather(*[ llm.ainvoke(question) for _ in range(n_samples) ]) # Majority voting or aggregation return aggregate_responses(responses)
undefined
async def self_consistent_answer(question: str, n_samples: int = 5) -> str: responses = await asyncio.gather(*[ llm.ainvoke(question) for _ in range(n_samples) ]) # Majority voting or aggregation return aggregate_responses(responses)
undefined

2. Advanced RAG Patterns

2. 高级RAG模式

python
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor
from langchain_community.retrievers import BM25Retriever
from langchain.retrievers import EnsembleRetriever
python
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor
from langchain_community.retrievers import BM25Retriever
from langchain.retrievers import EnsembleRetriever

Hybrid search (dense + sparse)

Hybrid search (dense + sparse)

bm25_retriever = BM25Retriever.from_documents(documents) bm25_retriever.k = 5
chroma_retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
ensemble_retriever = EnsembleRetriever( retrievers=[bm25_retriever, chroma_retriever], weights=[0.4, 0.6] )
bm25_retriever = BM25Retriever.from_documents(documents) bm25_retriever.k = 5
chroma_retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
ensemble_retriever = EnsembleRetriever( retrievers=[bm25_retriever, chroma_retriever], weights=[0.4, 0.6] )

Contextual compression

Contextual compression

compressor = LLMChainExtractor.from_llm(llm) compression_retriever = ContextualCompressionRetriever( base_compressor=compressor, base_retriever=ensemble_retriever )
compressor = LLMChainExtractor.from_llm(llm) compression_retriever = ContextualCompressionRetriever( base_compressor=compressor, base_retriever=ensemble_retriever )

Parent document retriever (for better context)

Parent document retriever (for better context)

from langchain.retrievers import ParentDocumentRetriever from langchain.storage import InMemoryStore
parent_splitter = RecursiveCharacterTextSplitter(chunk_size=2000) child_splitter = RecursiveCharacterTextSplitter(chunk_size=400)
store = InMemoryStore() parent_retriever = ParentDocumentRetriever( vectorstore=vectorstore, docstore=store, child_splitter=child_splitter, parent_splitter=parent_splitter )
from langchain.retrievers import ParentDocumentRetriever from langchain.storage import InMemoryStore
parent_splitter = RecursiveCharacterTextSplitter(chunk_size=2000) child_splitter = RecursiveCharacterTextSplitter(chunk_size=400)
store = InMemoryStore() parent_retriever = ParentDocumentRetriever( vectorstore=vectorstore, docstore=store, child_splitter=child_splitter, parent_splitter=parent_splitter )

Self-querying retriever

Self-querying retriever

from langchain.retrievers.self_query.base import SelfQueryRetriever from langchain.chains.query_constructor.base import AttributeInfo
metadata_field_info = [ AttributeInfo(name="source", description="Document source", type="string"), AttributeInfo(name="date", description="Creation date", type="date"), AttributeInfo(name="category", description="Document category", type="string"), ]
self_query_retriever = SelfQueryRetriever.from_llm( llm=llm, vectorstore=vectorstore, document_contents="Technical documentation", metadata_field_info=metadata_field_info )
undefined
from langchain.retrievers.self_query.base import SelfQueryRetriever from langchain.chains.query_constructor.base import AttributeInfo
metadata_field_info = [ AttributeInfo(name="source", description="Document source", type="string"), AttributeInfo(name="date", description="Creation date", type="date"), AttributeInfo(name="category", description="Document category", type="string"), ]
self_query_retriever = SelfQueryRetriever.from_llm( llm=llm, vectorstore=vectorstore, document_contents="Technical documentation", metadata_field_info=metadata_field_info )
undefined

3. Agents and Tool Use

3. 智能体与工具调用

python
from langchain.agents import create_openai_functions_agent, AgentExecutor
from langchain.tools import Tool, StructuredTool
from langchain_core.pydantic_v1 import BaseModel, Field
from typing import Optional
python
from langchain.agents import create_openai_functions_agent, AgentExecutor
from langchain.tools import Tool, StructuredTool
from langchain_core.pydantic_v1 import BaseModel, Field
from typing import Optional

Define tools with Pydantic schemas

Define tools with Pydantic schemas

class SQLQueryInput(BaseModel): query: str = Field(description="SQL query to execute") limit: Optional[int] = Field(default=100, description="Max rows to return")
def execute_sql(query: str, limit: int = 100) -> str: """Execute SQL query against the database.""" # Validate query (prevent injection) if any(kw in query.upper() for kw in ["DROP", "DELETE", "UPDATE", "INSERT"]): return "Error: Only SELECT queries allowed"
result = db.execute(f"{query} LIMIT {limit}")
return result.to_markdown()
sql_tool = StructuredTool.from_function( func=execute_sql, name="sql_executor", description="Execute SQL queries against the data warehouse", args_schema=SQLQueryInput )
class SQLQueryInput(BaseModel): query: str = Field(description="SQL query to execute") limit: Optional[int] = Field(default=100, description="Max rows to return")
def execute_sql(query: str, limit: int = 100) -> str: """Execute SQL query against the database.""" # Validate query (prevent injection) if any(kw in query.upper() for kw in ["DROP", "DELETE", "UPDATE", "INSERT"]): return "Error: Only SELECT queries allowed"
result = db.execute(f"{query} LIMIT {limit}")
return result.to_markdown()
sql_tool = StructuredTool.from_function( func=execute_sql, name="sql_executor", description="Execute SQL queries against the data warehouse", args_schema=SQLQueryInput )

Calculator tool

Calculator tool

def calculate(expression: str) -> str: """Evaluate mathematical expression.""" try: # Safe eval with limited scope allowed_names = {"abs": abs, "round": round, "sum": sum} return str(eval(expression, {"builtins": {}}, allowed_names)) except Exception as e: return f"Error: {e}"
calc_tool = Tool.from_function( func=calculate, name="calculator", description="Evaluate mathematical expressions" )
def calculate(expression: str) -> str: """Evaluate mathematical expression.""" try: # Safe eval with limited scope allowed_names = {"abs": abs, "round": round, "sum": sum} return str(eval(expression, {"builtins": {}}, allowed_names)) except Exception as e: return f"Error: {e}"
calc_tool = Tool.from_function( func=calculate, name="calculator", description="Evaluate mathematical expressions" )

Create agent

Create agent

tools = [sql_tool, calc_tool] agent = create_openai_functions_agent(llm, tools, prompt) agent_executor = AgentExecutor( agent=agent, tools=tools, verbose=True, max_iterations=5, early_stopping_method="generate" )
result = agent_executor.invoke({"input": "What's the total revenue for Q4 2024?"})
undefined
tools = [sql_tool, calc_tool] agent = create_openai_functions_agent(llm, tools, prompt) agent_executor = AgentExecutor( agent=agent, tools=tools, verbose=True, max_iterations=5, early_stopping_method="generate" )
result = agent_executor.invoke({"input": "What's the total revenue for Q4 2024?"})
undefined

4. Structured Output

4. 结构化输出

python
from langchain_core.pydantic_v1 import BaseModel, Field
from typing import List, Optional
from langchain.output_parsers import PydanticOutputParser
python
from langchain_core.pydantic_v1 import BaseModel, Field
from typing import List, Optional
from langchain.output_parsers import PydanticOutputParser

Define output schema

Define output schema

class DataInsight(BaseModel): title: str = Field(description="Brief title of the insight") description: str = Field(description="Detailed explanation") confidence: float = Field(description="Confidence score 0-1") data_points: List[str] = Field(description="Supporting data points") recommendations: Optional[List[str]] = Field(description="Action items")
class AnalysisReport(BaseModel): summary: str = Field(description="Executive summary") insights: List[DataInsight] = Field(description="Key insights found") methodology: str = Field(description="Analysis approach used")
class DataInsight(BaseModel): title: str = Field(description="Brief title of the insight") description: str = Field(description="Detailed explanation") confidence: float = Field(description="Confidence score 0-1") data_points: List[str] = Field(description="Supporting data points") recommendations: Optional[List[str]] = Field(description="Action items")
class AnalysisReport(BaseModel): summary: str = Field(description="Executive summary") insights: List[DataInsight] = Field(description="Key insights found") methodology: str = Field(description="Analysis approach used")

Parser

Parser

parser = PydanticOutputParser(pydantic_object=AnalysisReport)
prompt = ChatPromptTemplate.from_messages([ ("system", "Analyze the data and provide structured insights."), ("human", "{input}\n\n{format_instructions}") ]).partial(format_instructions=parser.get_format_instructions())
chain = prompt | llm | parser
report: AnalysisReport = chain.invoke({"input": "Analyze Q4 sales trends"}) print(report.summary) for insight in report.insights: print(f"- {insight.title}: {insight.confidence:.0%} confidence")
undefined
parser = PydanticOutputParser(pydantic_object=AnalysisReport)
prompt = ChatPromptTemplate.from_messages([ ("system", "Analyze the data and provide structured insights."), ("human", "{input}\n\n{format_instructions}") ]).partial(format_instructions=parser.get_format_instructions())
chain = prompt | llm | parser
report: AnalysisReport = chain.invoke({"input": "Analyze Q4 sales trends"}) print(report.summary) for insight in report.insights: print(f"- {insight.title}: {insight.confidence:.0%} confidence")
undefined

5. Evaluation and Monitoring

5. 评估与监控

python
from langchain.evaluation import load_evaluator
from langsmith import Client
import openai
python
from langchain.evaluation import load_evaluator
from langsmith import Client
import openai

LangSmith for tracing

LangSmith for tracing

client = Client()
client = Client()

Create evaluation dataset

Create evaluation dataset

examples = [ {"input": "What is RAG?", "output": "Retrieval Augmented Generation..."}, {"input": "How does chunking work?", "output": "Chunking splits documents..."}, ] dataset = client.create_dataset("rag-evaluation") for ex in examples: client.create_example(inputs={"question": ex["input"]}, outputs={"answer": ex["output"]}, dataset_id=dataset.id)
examples = [ {"input": "What is RAG?", "output": "Retrieval Augmented Generation..."}, {"input": "How does chunking work?", "output": "Chunking splits documents..."}, ] dataset = client.create_dataset("rag-evaluation") for ex in examples: client.create_example(inputs={"question": ex["input"]}, outputs={"answer": ex["output"]}, dataset_id=dataset.id)

Evaluators

Evaluators

faithfulness_evaluator = load_evaluator("labeled_criteria", criteria="correctness") relevance_evaluator = load_evaluator("embedding_distance")
faithfulness_evaluator = load_evaluator("labeled_criteria", criteria="correctness") relevance_evaluator = load_evaluator("embedding_distance")

Custom evaluator for RAG

Custom evaluator for RAG

def evaluate_rag_response(question: str, context: str, response: str) -> dict: """Evaluate RAG response quality."""
# Faithfulness: Is response grounded in context?
faithfulness_prompt = f"""
Context: {context}
Response: {response}

Is the response fully supported by the context?
Score 1-5 and explain.
"""

# Relevance: Does response answer the question?
relevance_prompt = f"""
Question: {question}
Response: {response}

Does the response adequately answer the question?
Score 1-5 and explain.
"""

# Get scores
faithfulness_score = llm.invoke(faithfulness_prompt)
relevance_score = llm.invoke(relevance_prompt)

return {
    "faithfulness": parse_score(faithfulness_score),
    "relevance": parse_score(relevance_score)
}
def evaluate_rag_response(question: str, context: str, response: str) -> dict: """Evaluate RAG response quality."""
# Faithfulness: Is response grounded in context?
faithfulness_prompt = f"""
Context: {context}
Response: {response}

Is the response fully supported by the context?
Score 1-5 and explain.
"""

# Relevance: Does response answer the question?
relevance_prompt = f"""
Question: {question}
Response: {response}

Does the response adequately answer the question?
Score 1-5 and explain.
"""

# Get scores
faithfulness_score = llm.invoke(faithfulness_prompt)
relevance_score = llm.invoke(relevance_prompt)

return {
    "faithfulness": parse_score(faithfulness_score),
    "relevance": parse_score(relevance_score)
}

Production monitoring

Production monitoring

from prometheus_client import Counter, Histogram
llm_requests = Counter("llm_requests_total", "Total LLM requests", ["model", "status"]) llm_latency = Histogram("llm_latency_seconds", "LLM request latency") token_usage = Counter("llm_tokens_total", "Total tokens used", ["type"])
@llm_latency.time() def monitored_llm_call(prompt: str) -> str: try: response = llm.invoke(prompt) llm_requests.labels(model="gpt-4", status="success").inc() token_usage.labels(type="input").inc(count_tokens(prompt)) token_usage.labels(type="output").inc(count_tokens(response)) return response except Exception as e: llm_requests.labels(model="gpt-4", status="error").inc() raise
undefined
from prometheus_client import Counter, Histogram
llm_requests = Counter("llm_requests_total", "Total LLM requests", ["model", "status"]) llm_latency = Histogram("llm_latency_seconds", "LLM request latency") token_usage = Counter("llm_tokens_total", "Total tokens used", ["type"])
@llm_latency.time() def monitored_llm_call(prompt: str) -> str: try: response = llm.invoke(prompt) llm_requests.labels(model="gpt-4", status="success").inc() token_usage.labels(type="input").inc(count_tokens(prompt)) token_usage.labels(type="output").inc(count_tokens(response)) return response except Exception as e: llm_requests.labels(model="gpt-4", status="error").inc() raise
undefined

Tools & Technologies

工具与技术

ToolPurposeVersion (2025)
LangChainLLM application framework0.2+
LlamaIndexData framework for LLMs0.10+
OpenAI APIGPT-4, embeddingsLatest
Anthropic APIClaude modelsLatest
ChromaVector database0.4+
PineconeManaged vector DBLatest
LangSmithLLM observabilityLatest
OllamaLocal LLM running0.1+
vLLMHigh-perf LLM serving0.3+
工具用途版本(2025年)
LangChainLLM应用框架0.2+
LlamaIndexLLM数据框架0.10+
OpenAI APIGPT-4、嵌入模型最新版
Anthropic APIClaude模型最新版
Chroma向量数据库0.4+
Pinecone托管式向量数据库最新版
LangSmithLLM可观测性工具最新版
Ollama本地LLM运行工具0.1+
vLLM高性能LLM服务框架0.3+

Learning Path

学习路径

Phase 1: Foundations (Weeks 1-3)

第一阶段:基础(第1-3周)

Week 1: LLM concepts, tokenization, prompting basics
Week 2: OpenAI/Anthropic APIs, prompt engineering
Week 3: LangChain basics, chains, output parsers
第1周:LLM概念、分词、提示词基础
第2周:OpenAI/Anthropic API、提示词工程
第3周:LangChain基础、链、输出解析器

Phase 2: RAG Systems (Weeks 4-7)

第二阶段:RAG系统(第4-7周)

Week 4: Embeddings, vector databases
Week 5: Document processing, chunking strategies
Week 6: Retrieval strategies (hybrid, reranking)
Week 7: Advanced RAG patterns
第4周:嵌入模型、向量数据库
第5周:文档处理、分块策略
第6周:检索策略(混合检索、重排序)
第7周:高级RAG模式

Phase 3: Agents (Weeks 8-10)

第三阶段:智能体(第8-10周)

Week 8: Tool calling, function calling
Week 9: Agent architectures, planning
Week 10: Multi-agent systems
第8周:工具调用、函数调用
第9周:智能体架构、规划
第10周:多智能体系统

Phase 4: Production (Weeks 11-14)

第四阶段:生产部署(第11-14周)

Week 11: Evaluation frameworks
Week 12: Guardrails, safety
Week 13: Deployment, scaling
Week 14: Monitoring, optimization
第11周:评估框架
第12周:防护机制、安全性
第13周:部署、扩容
第14周:监控、优化

Troubleshooting Guide

故障排除指南

Common Failure Modes

常见故障模式

IssueSymptomsRoot CauseFix
HallucinationIncorrect factsNo groundingBetter RAG, fact-checking
Context OverflowTruncated responseToo much contextSummarize, filter
Poor RetrievalIrrelevant chunksBad embeddings/chunkingTune chunk size, reranking
Slow ResponseHigh latencyLarge context, no cacheStreaming, caching
Rate Limits429 errorsToo many requestsBackoff, batch requests
问题症状根本原因修复方案
幻觉输出错误事实缺乏上下文支撑优化RAG、添加事实校验
上下文溢出响应被截断上下文内容过多摘要处理、过滤无关内容
检索效果差返回无关片段嵌入模型或分块策略不佳调整分块大小、添加重排序
响应缓慢高延迟上下文过大、无缓存流式输出、添加缓存
速率限制429错误请求过于频繁退避重试、批量请求

Debug Checklist

调试清单

python
undefined
python
undefined

1. Check retrieval quality

1. Check retrieval quality

retrieved_docs = retriever.get_relevant_documents("test query") for doc in retrieved_docs: print(f"Score: {doc.metadata.get('score')}") print(f"Content: {doc.page_content[:200]}...")
retrieved_docs = retriever.get_relevant_documents("test query") for doc in retrieved_docs: print(f"Score: {doc.metadata.get('score')}") print(f"Content: {doc.page_content[:200]}...")

2. Validate prompt

2. Validate prompt

print(prompt.format(context="test", question="test"))
print(prompt.format(context="test", question="test"))

3. Token counting

3. Token counting

import tiktoken enc = tiktoken.encoding_for_model("gpt-4") tokens = len(enc.encode(full_prompt)) print(f"Token count: {tokens}")
import tiktoken enc = tiktoken.encoding_for_model("gpt-4") tokens = len(enc.encode(full_prompt)) print(f"Token count: {tokens}")

4. Test LLM directly

4. Test LLM directly

response = llm.invoke("Simple test prompt") print(response)
response = llm.invoke("Simple test prompt") print(response)

5. Check embeddings

5. Check embeddings

embedding = embeddings.embed_query("test") print(f"Embedding dim: {len(embedding)}")
undefined
embedding = embeddings.embed_query("test") print(f"Embedding dim: {len(embedding)}")
undefined

Unit Test Template

单元测试模板

python
import pytest
from unittest.mock import Mock, patch
from your_rag_system import RAGPipeline, DocumentProcessor

class TestRAGPipeline:

    @pytest.fixture
    def mock_llm(self):
        llm = Mock()
        llm.invoke.return_value = "Mocked response"
        return llm

    @pytest.fixture
    def rag_pipeline(self, mock_llm):
        return RAGPipeline(llm=mock_llm)

    def test_retrieves_relevant_documents(self, rag_pipeline):
        query = "What is machine learning?"
        docs = rag_pipeline.retrieve(query)

        assert len(docs) > 0
        assert all("machine learning" in doc.page_content.lower()
                   for doc in docs[:3])

    def test_generates_grounded_response(self, rag_pipeline, mock_llm):
        response = rag_pipeline.query("Test question")

        mock_llm.invoke.assert_called_once()
        assert response is not None

    def test_handles_empty_retrieval(self, rag_pipeline):
        with patch.object(rag_pipeline.retriever, 'get_relevant_documents',
                         return_value=[]):
            response = rag_pipeline.query("Obscure question")
            assert "no information" in response.lower()


class TestDocumentProcessor:

    def test_chunks_documents_correctly(self):
        processor = DocumentProcessor(chunk_size=100, chunk_overlap=20)
        text = "A" * 250  # 250 character document

        chunks = processor.split(text)

        assert len(chunks) >= 2
        assert all(len(c) <= 100 for c in chunks)

    def test_preserves_metadata(self):
        processor = DocumentProcessor()
        doc = Document(page_content="Test", metadata={"source": "test.pdf"})

        chunks = processor.split_documents([doc])

        assert all(c.metadata["source"] == "test.pdf" for c in chunks)
python
import pytest
from unittest.mock import Mock, patch
from your_rag_system import RAGPipeline, DocumentProcessor

class TestRAGPipeline:

    @pytest.fixture
    def mock_llm(self):
        llm = Mock()
        llm.invoke.return_value = "Mocked response"
        return llm

    @pytest.fixture
    def rag_pipeline(self, mock_llm):
        return RAGPipeline(llm=mock_llm)

    def test_retrieves_relevant_documents(self, rag_pipeline):
        query = "What is machine learning?"
        docs = rag_pipeline.retrieve(query)

        assert len(docs) > 0
        assert all("machine learning" in doc.page_content.lower()
                   for doc in docs[:3])

    def test_generates_grounded_response(self, rag_pipeline, mock_llm):
        response = rag_pipeline.query("Test question")

        mock_llm.invoke.assert_called_once()
        assert response is not None

    def test_handles_empty_retrieval(self, rag_pipeline):
        with patch.object(rag_pipeline.retriever, 'get_relevant_documents',
                         return_value=[]):
            response = rag_pipeline.query("Obscure question")
            assert "no information" in response.lower()


class TestDocumentProcessor:

    def test_chunks_documents_correctly(self):
        processor = DocumentProcessor(chunk_size=100, chunk_overlap=20)
        text = "A" * 250  # 250 character document

        chunks = processor.split(text)

        assert len(chunks) >= 2
        assert all(len(c) <= 100 for c in chunks)

    def test_preserves_metadata(self):
        processor = DocumentProcessor()
        doc = Document(page_content="Test", metadata={"source": "test.pdf"})

        chunks = processor.split_documents([doc])

        assert all(c.metadata["source"] == "test.pdf" for c in chunks)

Best Practices

最佳实践

Prompt Engineering

提示词工程

python
undefined
python
undefined

✅ DO: Be specific and structured

✅ DO: Be specific and structured

prompt = """Task: Summarize the document. Format: 3 bullet points Constraints: Max 50 words per point Tone: Professional"""
prompt = """Task: Summarize the document. Format: 3 bullet points Constraints: Max 50 words per point Tone: Professional"""

✅ DO: Include examples

✅ DO: Include examples

✅ DO: Set clear output format

✅ DO: Set clear output format

✅ DO: Handle edge cases in prompt

✅ DO: Handle edge cases in prompt

❌ DON'T: Vague prompts

❌ DON'T: Vague prompts

❌ DON'T: Assume LLM knows context

❌ DON'T: Assume LLM knows context

❌ DON'T: Trust LLM output without validation

❌ DON'T: Trust LLM output without validation

undefined
undefined

RAG Systems

RAG系统

python
undefined
python
undefined

✅ DO: Tune chunk size for your domain

✅ DO: Tune chunk size for your domain

✅ DO: Use hybrid retrieval

✅ DO: Use hybrid retrieval

✅ DO: Implement reranking

✅ DO: Implement reranking

✅ DO: Add metadata filtering

✅ DO: Add metadata filtering

❌ DON'T: One-size-fits-all chunking

❌ DON'T: One-size-fits-all chunking

❌ DON'T: Skip evaluation

❌ DON'T: Skip evaluation

❌ DON'T: Ignore retrieval quality

❌ DON'T: Ignore retrieval quality

undefined
undefined

Resources

资源

Official Documentation

官方文档

Courses

课程

Research

研究资料

Next Skills

后续技能

After mastering LLMs & Generative AI:
  • deep-learning
    - Understand transformer internals
  • mlops
    - Deploy LLM applications at scale
  • big-data
    - Process training data

Skill Certification Checklist:
  • Can build production RAG systems
  • Can implement effective prompt engineering
  • Can create tool-using agents
  • Can evaluate and monitor LLM applications
  • Can optimize for latency and cost
掌握大语言模型与生成式AI后,可学习:
  • deep-learning
    - 理解Transformer内部机制
  • mlops
    - 规模化部署LLM应用
  • big-data
    - 处理训练数据

技能认证清单:
  • 能够构建生产级RAG系统
  • 能够实现有效的提示词工程
  • 能够创建工具调用型智能体
  • 能够评估与监控LLM应用
  • 能够针对延迟与成本进行优化