LLMs & Generative AI

大语言模型（LLMs）与生成式AI

Production-grade LLM applications with prompt engineering, RAG systems, and modern AI development patterns.

基于提示词工程、RAG系统及现代AI开发模式的生产级LLM应用。

Quick Start

快速开始

python

undefined

python

undefined

Production RAG System with LangChain (2024-2025)

from langchain_openai import ChatOpenAI, OpenAIEmbeddings from langchain_community.vectorstores import Chroma from langchain.text_splitter import RecursiveCharacterTextSplitter from langchain_core.prompts import ChatPromptTemplate from langchain_core.runnables import RunnablePassthrough from langchain_core.output_parsers import StrOutputParser

Initialize components

llm = ChatOpenAI(model="gpt-4-turbo-preview", temperature=0) embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

Document processing

text_splitter = RecursiveCharacterTextSplitter( chunk_size=1000, chunk_overlap=200, separators=["\n\n", "\n", ". ", " ", ""] )

documents = text_splitter.split_documents(raw_documents)

text_splitter = RecursiveCharacterTextSplitter( chunk_size=1000, chunk_overlap=200, separators=["\n\n", "\n", ". ", " ", ""] )

documents = text_splitter.split_documents(raw_documents)

Vector store

vectorstore = Chroma.from_documents( documents=documents, embedding=embeddings, persist_directory="./chroma_db" ) retriever = vectorstore.as_retriever( search_type="mmr", # Maximum Marginal Relevance search_kwargs={"k": 5, "fetch_k": 10} )

RAG chain

template = """Answer the question based only on the following context:

Context: {context}

Question: {question}

Answer thoughtfully and cite specific parts of the context."""

prompt = ChatPromptTemplate.from_template(template)

rag_chain = ( {"context": retriever, "question": RunnablePassthrough()} | prompt | llm | StrOutputParser() )

template = """Answer the question based only on the following context:

Context: {context}

Question: {question}

Answer thoughtfully and cite specific parts of the context."""

prompt = ChatPromptTemplate.from_template(template)

rag_chain = ( {"context": retriever, "question": RunnablePassthrough()} | prompt | llm | StrOutputParser() )

Query

response = rag_chain.invoke("What are the key features?") print(response)

undefined

response = rag_chain.invoke("What are the key features?") print(response)

undefined

Core Concepts

核心概念

1. Prompt Engineering Patterns

1. 提示词工程模式

python

from langchain_core.prompts import ChatPromptTemplate, FewShotChatMessagePromptTemplate

python

from langchain_core.prompts import ChatPromptTemplate, FewShotChatMessagePromptTemplate

System prompt design

system_prompt = """You are an expert data analyst assistant.

CAPABILITIES:

Analyze data patterns and trends
Generate SQL queries
Explain statistical concepts

CONSTRAINTS:

Only use information provided in the context
Acknowledge uncertainty when relevant
Format outputs in clear, structured way

OUTPUT FORMAT:

Start with a brief summary
Use bullet points for key findings
Include confidence level (high/medium/low) """

system_prompt = """You are an expert data analyst assistant.

CAPABILITIES:

Analyze data patterns and trends
Generate SQL queries
Explain statistical concepts

CONSTRAINTS:

Only use information provided in the context
Acknowledge uncertainty when relevant
Format outputs in clear, structured way

OUTPUT FORMAT:

Start with a brief summary
Use bullet points for key findings
Include confidence level (high/medium/low) """

Few-shot prompting

examples = [ {"input": "What's the average order value?", "output": "

sql\nSELECT AVG(total_amount) as avg_order_value\nFROM orders\nWHERE status = 'completed';\n

"}, {"input": "Show top customers by revenue", "output": "

sql\nSELECT customer_id, SUM(total_amount) as revenue\nFROM orders\nGROUP BY customer_id\nORDER BY revenue DESC\nLIMIT 10;\n

"} ]

example_prompt = ChatPromptTemplate.from_messages([ ("human", "{input}"), ("ai", "{output}") ])

few_shot_prompt = FewShotChatMessagePromptTemplate( example_prompt=example_prompt, examples=examples )

examples = [ {"input": "What's the average order value?", "output": "

sql\nSELECT AVG(total_amount) as avg_order_value\nFROM orders\nWHERE status = 'completed';\n

"}, {"input": "Show top customers by revenue", "output": "

sql\nSELECT customer_id, SUM(total_amount) as revenue\nFROM orders\nGROUP BY customer_id\nORDER BY revenue DESC\nLIMIT 10;\n

"} ]

example_prompt = ChatPromptTemplate.from_messages([ ("human", "{input}"), ("ai", "{output}") ])

few_shot_prompt = FewShotChatMessagePromptTemplate( example_prompt=example_prompt, examples=examples )

Chain of Thought prompting

cot_prompt = """Let's solve this step by step:

Question: {question}

Step 1: Identify the key components Step 2: Break down the problem Step 3: Apply relevant knowledge Step 4: Synthesize the answer

Reasoning:"""

cot_prompt = """Let's solve this step by step:

Question: {question}

Step 1: Identify the key components Step 2: Break down the problem Step 3: Apply relevant knowledge Step 4: Synthesize the answer

Reasoning:"""

Self-consistency (multiple reasoning paths)

async def self_consistent_answer(question: str, n_samples: int = 5) -> str: responses = await asyncio.gather(*[ llm.ainvoke(question) for _ in range(n_samples) ]) # Majority voting or aggregation return aggregate_responses(responses)

undefined

async def self_consistent_answer(question: str, n_samples: int = 5) -> str: responses = await asyncio.gather(*[ llm.ainvoke(question) for _ in range(n_samples) ]) # Majority voting or aggregation return aggregate_responses(responses)

undefined

2. Advanced RAG Patterns

2. 高级RAG模式

python

from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor
from langchain_community.retrievers import BM25Retriever
from langchain.retrievers import EnsembleRetriever

python

from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor
from langchain_community.retrievers import BM25Retriever
from langchain.retrievers import EnsembleRetriever

Hybrid search (dense + sparse)

bm25_retriever = BM25Retriever.from_documents(documents) bm25_retriever.k = 5

chroma_retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

ensemble_retriever = EnsembleRetriever( retrievers=[bm25_retriever, chroma_retriever], weights=[0.4, 0.6] )

bm25_retriever = BM25Retriever.from_documents(documents) bm25_retriever.k = 5

chroma_retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

ensemble_retriever = EnsembleRetriever( retrievers=[bm25_retriever, chroma_retriever], weights=[0.4, 0.6] )

Contextual compression

compressor = LLMChainExtractor.from_llm(llm) compression_retriever = ContextualCompressionRetriever( base_compressor=compressor, base_retriever=ensemble_retriever )

Parent document retriever (for better context)

from langchain.retrievers import ParentDocumentRetriever from langchain.storage import InMemoryStore

parent_splitter = RecursiveCharacterTextSplitter(chunk_size=2000) child_splitter = RecursiveCharacterTextSplitter(chunk_size=400)

store = InMemoryStore() parent_retriever = ParentDocumentRetriever( vectorstore=vectorstore, docstore=store, child_splitter=child_splitter, parent_splitter=parent_splitter )

from langchain.retrievers import ParentDocumentRetriever from langchain.storage import InMemoryStore

parent_splitter = RecursiveCharacterTextSplitter(chunk_size=2000) child_splitter = RecursiveCharacterTextSplitter(chunk_size=400)

store = InMemoryStore() parent_retriever = ParentDocumentRetriever( vectorstore=vectorstore, docstore=store, child_splitter=child_splitter, parent_splitter=parent_splitter )

Self-querying retriever

from langchain.retrievers.self_query.base import SelfQueryRetriever from langchain.chains.query_constructor.base import AttributeInfo

metadata_field_info = [ AttributeInfo(name="source", description="Document source", type="string"), AttributeInfo(name="date", description="Creation date", type="date"), AttributeInfo(name="category", description="Document category", type="string"), ]

self_query_retriever = SelfQueryRetriever.from_llm( llm=llm, vectorstore=vectorstore, document_contents="Technical documentation", metadata_field_info=metadata_field_info )

undefined

from langchain.retrievers.self_query.base import SelfQueryRetriever from langchain.chains.query_constructor.base import AttributeInfo

metadata_field_info = [ AttributeInfo(name="source", description="Document source", type="string"), AttributeInfo(name="date", description="Creation date", type="date"), AttributeInfo(name="category", description="Document category", type="string"), ]

self_query_retriever = SelfQueryRetriever.from_llm( llm=llm, vectorstore=vectorstore, document_contents="Technical documentation", metadata_field_info=metadata_field_info )

undefined

3. Agents and Tool Use

3. 智能体与工具调用

python

from langchain.agents import create_openai_functions_agent, AgentExecutor
from langchain.tools import Tool, StructuredTool
from langchain_core.pydantic_v1 import BaseModel, Field
from typing import Optional

python

from langchain.agents import create_openai_functions_agent, AgentExecutor
from langchain.tools import Tool, StructuredTool
from langchain_core.pydantic_v1 import BaseModel, Field
from typing import Optional

Define tools with Pydantic schemas

class SQLQueryInput(BaseModel): query: str = Field(description="SQL query to execute") limit: Optional[int] = Field(default=100, description="Max rows to return")

def execute_sql(query: str, limit: int = 100) -> str: """Execute SQL query against the database.""" # Validate query (prevent injection) if any(kw in query.upper() for kw in ["DROP", "DELETE", "UPDATE", "INSERT"]): return "Error: Only SELECT queries allowed"

result = db.execute(f"{query} LIMIT {limit}")
return result.to_markdown()

sql_tool = StructuredTool.from_function( func=execute_sql, name="sql_executor", description="Execute SQL queries against the data warehouse", args_schema=SQLQueryInput )

class SQLQueryInput(BaseModel): query: str = Field(description="SQL query to execute") limit: Optional[int] = Field(default=100, description="Max rows to return")

def execute_sql(query: str, limit: int = 100) -> str: """Execute SQL query against the database.""" # Validate query (prevent injection) if any(kw in query.upper() for kw in ["DROP", "DELETE", "UPDATE", "INSERT"]): return "Error: Only SELECT queries allowed"

result = db.execute(f"{query} LIMIT {limit}")
return result.to_markdown()

sql_tool = StructuredTool.from_function( func=execute_sql, name="sql_executor", description="Execute SQL queries against the data warehouse", args_schema=SQLQueryInput )

Calculator tool

def calculate(expression: str) -> str: """Evaluate mathematical expression.""" try: # Safe eval with limited scope allowed_names = {"abs": abs, "round": round, "sum": sum} return str(eval(expression, {"builtins": {}}, allowed_names)) except Exception as e: return f"Error: {e}"

calc_tool = Tool.from_function( func=calculate, name="calculator", description="Evaluate mathematical expressions" )

def calculate(expression: str) -> str: """Evaluate mathematical expression.""" try: # Safe eval with limited scope allowed_names = {"abs": abs, "round": round, "sum": sum} return str(eval(expression, {"builtins": {}}, allowed_names)) except Exception as e: return f"Error: {e}"

calc_tool = Tool.from_function( func=calculate, name="calculator", description="Evaluate mathematical expressions" )

Create agent

tools = [sql_tool, calc_tool] agent = create_openai_functions_agent(llm, tools, prompt) agent_executor = AgentExecutor( agent=agent, tools=tools, verbose=True, max_iterations=5, early_stopping_method="generate" )

result = agent_executor.invoke({"input": "What's the total revenue for Q4 2024?"})

undefined

tools = [sql_tool, calc_tool] agent = create_openai_functions_agent(llm, tools, prompt) agent_executor = AgentExecutor( agent=agent, tools=tools, verbose=True, max_iterations=5, early_stopping_method="generate" )

result = agent_executor.invoke({"input": "What's the total revenue for Q4 2024?"})

undefined

4. Structured Output

4. 结构化输出

python

from langchain_core.pydantic_v1 import BaseModel, Field
from typing import List, Optional
from langchain.output_parsers import PydanticOutputParser

python

from langchain_core.pydantic_v1 import BaseModel, Field
from typing import List, Optional
from langchain.output_parsers import PydanticOutputParser

Define output schema

class DataInsight(BaseModel): title: str = Field(description="Brief title of the insight") description: str = Field(description="Detailed explanation") confidence: float = Field(description="Confidence score 0-1") data_points: List[str] = Field(description="Supporting data points") recommendations: Optional[List[str]] = Field(description="Action items")

class AnalysisReport(BaseModel): summary: str = Field(description="Executive summary") insights: List[DataInsight] = Field(description="Key insights found") methodology: str = Field(description="Analysis approach used")

class DataInsight(BaseModel): title: str = Field(description="Brief title of the insight") description: str = Field(description="Detailed explanation") confidence: float = Field(description="Confidence score 0-1") data_points: List[str] = Field(description="Supporting data points") recommendations: Optional[List[str]] = Field(description="Action items")

class AnalysisReport(BaseModel): summary: str = Field(description="Executive summary") insights: List[DataInsight] = Field(description="Key insights found") methodology: str = Field(description="Analysis approach used")

Parser

parser = PydanticOutputParser(pydantic_object=AnalysisReport)

prompt = ChatPromptTemplate.from_messages([ ("system", "Analyze the data and provide structured insights."), ("human", "{input}\n\n{format_instructions}") ]).partial(format_instructions=parser.get_format_instructions())

chain = prompt | llm | parser

report: AnalysisReport = chain.invoke({"input": "Analyze Q4 sales trends"}) print(report.summary) for insight in report.insights: print(f"- {insight.title}: {insight.confidence:.0%} confidence")

undefined

parser = PydanticOutputParser(pydantic_object=AnalysisReport)

prompt = ChatPromptTemplate.from_messages([ ("system", "Analyze the data and provide structured insights."), ("human", "{input}\n\n{format_instructions}") ]).partial(format_instructions=parser.get_format_instructions())

chain = prompt | llm | parser

report: AnalysisReport = chain.invoke({"input": "Analyze Q4 sales trends"}) print(report.summary) for insight in report.insights: print(f"- {insight.title}: {insight.confidence:.0%} confidence")

undefined

5. Evaluation and Monitoring

5. 评估与监控

python

from langchain.evaluation import load_evaluator
from langsmith import Client
import openai

python

from langchain.evaluation import load_evaluator
from langsmith import Client
import openai

LangSmith for tracing

client = Client()

Create evaluation dataset

examples = [ {"input": "What is RAG?", "output": "Retrieval Augmented Generation..."}, {"input": "How does chunking work?", "output": "Chunking splits documents..."}, ] dataset = client.create_dataset("rag-evaluation") for ex in examples: client.create_example(inputs={"question": ex["input"]}, outputs={"answer": ex["output"]}, dataset_id=dataset.id)

Evaluators

faithfulness_evaluator = load_evaluator("labeled_criteria", criteria="correctness") relevance_evaluator = load_evaluator("embedding_distance")

Custom evaluator for RAG

def evaluate_rag_response(question: str, context: str, response: str) -> dict: """Evaluate RAG response quality."""

# Faithfulness: Is response grounded in context?
faithfulness_prompt = f"""
Context: {context}
Response: {response}

Is the response fully supported by the context?
Score 1-5 and explain.
"""

# Relevance: Does response answer the question?
relevance_prompt = f"""
Question: {question}
Response: {response}

Does the response adequately answer the question?
Score 1-5 and explain.
"""

# Get scores
faithfulness_score = llm.invoke(faithfulness_prompt)
relevance_score = llm.invoke(relevance_prompt)

return {
    "faithfulness": parse_score(faithfulness_score),
    "relevance": parse_score(relevance_score)
}

def evaluate_rag_response(question: str, context: str, response: str) -> dict: """Evaluate RAG response quality."""

# Faithfulness: Is response grounded in context?
faithfulness_prompt = f"""
Context: {context}
Response: {response}

Is the response fully supported by the context?
Score 1-5 and explain.
"""

# Relevance: Does response answer the question?
relevance_prompt = f"""
Question: {question}
Response: {response}

Does the response adequately answer the question?
Score 1-5 and explain.
"""

# Get scores
faithfulness_score = llm.invoke(faithfulness_prompt)
relevance_score = llm.invoke(relevance_prompt)

return {
    "faithfulness": parse_score(faithfulness_score),
    "relevance": parse_score(relevance_score)
}

Production monitoring

from prometheus_client import Counter, Histogram

llm_requests = Counter("llm_requests_total", "Total LLM requests", ["model", "status"]) llm_latency = Histogram("llm_latency_seconds", "LLM request latency") token_usage = Counter("llm_tokens_total", "Total tokens used", ["type"])

@llm_latency.time() def monitored_llm_call(prompt: str) -> str: try: response = llm.invoke(prompt) llm_requests.labels(model="gpt-4", status="success").inc() token_usage.labels(type="input").inc(count_tokens(prompt)) token_usage.labels(type="output").inc(count_tokens(response)) return response except Exception as e: llm_requests.labels(model="gpt-4", status="error").inc() raise

undefined

from prometheus_client import Counter, Histogram

llm_requests = Counter("llm_requests_total", "Total LLM requests", ["model", "status"]) llm_latency = Histogram("llm_latency_seconds", "LLM request latency") token_usage = Counter("llm_tokens_total", "Total tokens used", ["type"])

@llm_latency.time() def monitored_llm_call(prompt: str) -> str: try: response = llm.invoke(prompt) llm_requests.labels(model="gpt-4", status="success").inc() token_usage.labels(type="input").inc(count_tokens(prompt)) token_usage.labels(type="output").inc(count_tokens(response)) return response except Exception as e: llm_requests.labels(model="gpt-4", status="error").inc() raise

undefined

Tools & Technologies

工具与技术

Tool	Purpose	Version (2025)
LangChain	LLM application framework	0.2+
LlamaIndex	Data framework for LLMs	0.10+
OpenAI API	GPT-4, embeddings	Latest
Anthropic API	Claude models	Latest
Chroma	Vector database	0.4+
Pinecone	Managed vector DB	Latest
LangSmith	LLM observability	Latest
Ollama	Local LLM running	0.1+
vLLM	High-perf LLM serving	0.3+

工具	用途	版本（2025年）
LangChain	LLM应用框架	0.2+
LlamaIndex	LLM数据框架	0.10+
OpenAI API	GPT-4、嵌入模型	最新版
Anthropic API	Claude模型	最新版
Chroma	向量数据库	0.4+
Pinecone	托管式向量数据库	最新版
LangSmith	LLM可观测性工具	最新版
Ollama	本地LLM运行工具	0.1+
vLLM	高性能LLM服务框架	0.3+

Learning Path

学习路径

Phase 1: Foundations (Weeks 1-3)

第一阶段：基础（第1-3周）

Week 1: LLM concepts, tokenization, prompting basics
Week 2: OpenAI/Anthropic APIs, prompt engineering
Week 3: LangChain basics, chains, output parsers

第1周：LLM概念、分词、提示词基础
第2周：OpenAI/Anthropic API、提示词工程
第3周：LangChain基础、链、输出解析器

Phase 2: RAG Systems (Weeks 4-7)

第二阶段：RAG系统（第4-7周）

Week 4: Embeddings, vector databases
Week 5: Document processing, chunking strategies
Week 6: Retrieval strategies (hybrid, reranking)
Week 7: Advanced RAG patterns

第4周：嵌入模型、向量数据库
第5周：文档处理、分块策略
第6周：检索策略（混合检索、重排序）
第7周：高级RAG模式

Phase 3: Agents (Weeks 8-10)

第三阶段：智能体（第8-10周）

Week 8: Tool calling, function calling
Week 9: Agent architectures, planning
Week 10: Multi-agent systems

第8周：工具调用、函数调用
第9周：智能体架构、规划
第10周：多智能体系统

Phase 4: Production (Weeks 11-14)

第四阶段：生产部署（第11-14周）

Week 11: Evaluation frameworks
Week 12: Guardrails, safety
Week 13: Deployment, scaling
Week 14: Monitoring, optimization

第11周：评估框架
第12周：防护机制、安全性
第13周：部署、扩容
第14周：监控、优化

Troubleshooting Guide

故障排除指南

Common Failure Modes

常见故障模式

Issue	Symptoms	Root Cause	Fix
Hallucination	Incorrect facts	No grounding	Better RAG, fact-checking
Context Overflow	Truncated response	Too much context	Summarize, filter
Poor Retrieval	Irrelevant chunks	Bad embeddings/chunking	Tune chunk size, reranking
Slow Response	High latency	Large context, no cache	Streaming, caching
Rate Limits	429 errors	Too many requests	Backoff, batch requests

问题	症状	根本原因	修复方案
幻觉	输出错误事实	缺乏上下文支撑	优化RAG、添加事实校验
上下文溢出	响应被截断	上下文内容过多	摘要处理、过滤无关内容
检索效果差	返回无关片段	嵌入模型或分块策略不佳	调整分块大小、添加重排序
响应缓慢	高延迟	上下文过大、无缓存	流式输出、添加缓存
速率限制	429错误	请求过于频繁	退避重试、批量请求

Debug Checklist

调试清单

python

undefined

python

undefined

1. Check retrieval quality

retrieved_docs = retriever.get_relevant_documents("test query") for doc in retrieved_docs: print(f"Score: {doc.metadata.get('score')}") print(f"Content: {doc.page_content[:200]}...")

2. Validate prompt

print(prompt.format(context="test", question="test"))

3. Token counting

import tiktoken enc = tiktoken.encoding_for_model("gpt-4") tokens = len(enc.encode(full_prompt)) print(f"Token count: {tokens}")

4. Test LLM directly

response = llm.invoke("Simple test prompt") print(response)

5. Check embeddings

embedding = embeddings.embed_query("test") print(f"Embedding dim: {len(embedding)}")

undefined

embedding = embeddings.embed_query("test") print(f"Embedding dim: {len(embedding)}")

undefined

Unit Test Template

单元测试模板

python

import pytest
from unittest.mock import Mock, patch
from your_rag_system import RAGPipeline, DocumentProcessor

class TestRAGPipeline:

    @pytest.fixture
    def mock_llm(self):
        llm = Mock()
        llm.invoke.return_value = "Mocked response"
        return llm

    @pytest.fixture
    def rag_pipeline(self, mock_llm):
        return RAGPipeline(llm=mock_llm)

    def test_retrieves_relevant_documents(self, rag_pipeline):
        query = "What is machine learning?"
        docs = rag_pipeline.retrieve(query)

        assert len(docs) > 0
        assert all("machine learning" in doc.page_content.lower()
                   for doc in docs[:3])

    def test_generates_grounded_response(self, rag_pipeline, mock_llm):
        response = rag_pipeline.query("Test question")

        mock_llm.invoke.assert_called_once()
        assert response is not None

    def test_handles_empty_retrieval(self, rag_pipeline):
        with patch.object(rag_pipeline.retriever, 'get_relevant_documents',
                         return_value=[]):
            response = rag_pipeline.query("Obscure question")
            assert "no information" in response.lower()


class TestDocumentProcessor:

    def test_chunks_documents_correctly(self):
        processor = DocumentProcessor(chunk_size=100, chunk_overlap=20)
        text = "A" * 250  # 250 character document

        chunks = processor.split(text)

        assert len(chunks) >= 2
        assert all(len(c) <= 100 for c in chunks)

    def test_preserves_metadata(self):
        processor = DocumentProcessor()
        doc = Document(page_content="Test", metadata={"source": "test.pdf"})

        chunks = processor.split_documents([doc])

        assert all(c.metadata["source"] == "test.pdf" for c in chunks)

python

import pytest
from unittest.mock import Mock, patch
from your_rag_system import RAGPipeline, DocumentProcessor

class TestRAGPipeline:

    @pytest.fixture
    def mock_llm(self):
        llm = Mock()
        llm.invoke.return_value = "Mocked response"
        return llm

    @pytest.fixture
    def rag_pipeline(self, mock_llm):
        return RAGPipeline(llm=mock_llm)

    def test_retrieves_relevant_documents(self, rag_pipeline):
        query = "What is machine learning?"
        docs = rag_pipeline.retrieve(query)

        assert len(docs) > 0
        assert all("machine learning" in doc.page_content.lower()
                   for doc in docs[:3])

    def test_generates_grounded_response(self, rag_pipeline, mock_llm):
        response = rag_pipeline.query("Test question")

        mock_llm.invoke.assert_called_once()
        assert response is not None

    def test_handles_empty_retrieval(self, rag_pipeline):
        with patch.object(rag_pipeline.retriever, 'get_relevant_documents',
                         return_value=[]):
            response = rag_pipeline.query("Obscure question")
            assert "no information" in response.lower()


class TestDocumentProcessor:

    def test_chunks_documents_correctly(self):
        processor = DocumentProcessor(chunk_size=100, chunk_overlap=20)
        text = "A" * 250  # 250 character document

        chunks = processor.split(text)

        assert len(chunks) >= 2
        assert all(len(c) <= 100 for c in chunks)

    def test_preserves_metadata(self):
        processor = DocumentProcessor()
        doc = Document(page_content="Test", metadata={"source": "test.pdf"})

        chunks = processor.split_documents([doc])

        assert all(c.metadata["source"] == "test.pdf" for c in chunks)

Best Practices

最佳实践

Prompt Engineering

提示词工程

python

undefined

python

undefined

✅ DO: Be specific and structured

prompt = """Task: Summarize the document. Format: 3 bullet points Constraints: Max 50 words per point Tone: Professional"""

✅ DO: Include examples

✅ DO: Set clear output format

✅ DO: Handle edge cases in prompt

❌ DON'T: Vague prompts

❌ DON'T: Assume LLM knows context

❌ DON'T: Trust LLM output without validation

undefined

undefined

RAG Systems

RAG系统

python

undefined

python

undefined

✅ DO: Tune chunk size for your domain

✅ DO: Use hybrid retrieval

✅ DO: Implement reranking

✅ DO: Add metadata filtering

❌ DON'T: One-size-fits-all chunking

❌ DON'T: Skip evaluation

❌ DON'T: Ignore retrieval quality

undefined

undefined

Resources

资源

Official Documentation

官方文档

Courses

课程

Research

研究资料

Next Skills

后续技能

After mastering LLMs & Generative AI:

→
```
deep-learning
```
- Understand transformer internals
→
```
mlops
```
- Deploy LLM applications at scale
→
```
big-data
```
- Process training data

Skill Certification Checklist:

Can build production RAG systems
Can implement effective prompt engineering
Can create tool-using agents
Can evaluate and monitor LLM applications
Can optimize for latency and cost

掌握大语言模型与生成式AI后，可学习：

→
```
deep-learning
```
- 理解Transformer内部机制
→
```
mlops
```
- 规模化部署LLM应用
→
```
big-data
```
- 处理训练数据

技能认证清单:

能够构建生产级RAG系统
能够实现有效的提示词工程
能够创建工具调用型智能体
能够评估与监控LLM应用
能够针对延迟与成本进行优化

llms-generative-ai

Original

Translation

LLMs & Generative AI

大语言模型（LLMs）与生成式AI

Quick Start

快速开始

Production RAG System with LangChain (2024-2025)

Production RAG System with LangChain (2024-2025)

Initialize components

Initialize components

Document processing

Document processing

Vector store

Vector store

RAG chain

RAG chain

Query

Query

Core Concepts

核心概念

1. Prompt Engineering Patterns

1. 提示词工程模式

System prompt design

System prompt design

Few-shot prompting

Few-shot prompting

Chain of Thought prompting

Chain of Thought prompting

Self-consistency (multiple reasoning paths)

Self-consistency (multiple reasoning paths)

2. Advanced RAG Patterns

2. 高级RAG模式

Hybrid search (dense + sparse)

Hybrid search (dense + sparse)

Contextual compression

Contextual compression

Parent document retriever (for better context)

Parent document retriever (for better context)

Self-querying retriever

Self-querying retriever

3. Agents and Tool Use

3. 智能体与工具调用

Define tools with Pydantic schemas

Define tools with Pydantic schemas

Calculator tool

Calculator tool

Create agent

Create agent

4. Structured Output

4. 结构化输出

Define output schema

Define output schema

Parser

Parser

5. Evaluation and Monitoring

5. 评估与监控

LangSmith for tracing

LangSmith for tracing

Create evaluation dataset

Create evaluation dataset

Evaluators

Evaluators

Custom evaluator for RAG

Custom evaluator for RAG

Production monitoring

Production monitoring

Tools & Technologies

工具与技术

Learning Path

学习路径

Phase 1: Foundations (Weeks 1-3)

第一阶段：基础（第1-3周）

Phase 2: RAG Systems (Weeks 4-7)

第二阶段：RAG系统（第4-7周）

Phase 3: Agents (Weeks 8-10)

第三阶段：智能体（第8-10周）

Phase 4: Production (Weeks 11-14)

第四阶段：生产部署（第11-14周）

Troubleshooting Guide