instructor
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseInstructor: Structured LLM Outputs
Instructor:LLM结构化输出
When to Use This Skill
何时使用该工具
Use Instructor when you need to:
- Extract structured data from LLM responses reliably
- Validate outputs against Pydantic schemas automatically
- Retry failed extractions with automatic error handling
- Parse complex JSON with type safety and validation
- Stream partial results for real-time processing
- Support multiple LLM providers with consistent API
GitHub Stars: 15,000+ | Battle-tested: 100,000+ developers
在以下场景中使用Instructor:
- 可靠地从LLM响应中提取结构化数据
- 自动根据Pydantic schema验证输出
- 通过自动错误处理重试失败的提取操作
- 以类型安全的方式解析复杂JSON
- 流式传输部分结果以进行实时处理
- 通过统一API支持多个LLM提供商
GitHub星标数:15,000+ | 实战验证:100,000+开发者使用
Installation
安装
bash
undefinedbash
undefinedBase installation
基础安装
pip install instructor
pip install instructor
With specific providers
安装特定提供商依赖
pip install "instructor[anthropic]" # Anthropic Claude
pip install "instructor[openai]" # OpenAI
pip install "instructor[all]" # All providers
undefinedpip install "instructor[anthropic]" # Anthropic Claude
pip install "instructor[openai]" # OpenAI
pip install "instructor[all]" # 所有提供商
undefinedQuick Start
快速开始
Basic Example: Extract User Data
基础示例:提取用户数据
python
import instructor
from pydantic import BaseModel
from anthropic import Anthropicpython
import instructor
from pydantic import BaseModel
from anthropic import AnthropicDefine output structure
定义输出结构
class User(BaseModel):
name: str
age: int
email: str
class User(BaseModel):
name: str
age: int
email: str
Create instructor client
创建Instructor客户端
client = instructor.from_anthropic(Anthropic())
client = instructor.from_anthropic(Anthropic())
Extract structured data
提取结构化数据
user = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": "John Doe is 30 years old. His email is john@example.com"
}],
response_model=User
)
print(user.name) # "John Doe"
print(user.age) # 30
print(user.email) # "john@example.com"
undefineduser = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": "John Doe is 30 years old. His email is john@example.com"
}],
response_model=User
)
print(user.name) # "John Doe"
print(user.age) # 30
print(user.email) # "john@example.com"
undefinedWith OpenAI
与OpenAI配合使用
python
from openai import OpenAI
client = instructor.from_openai(OpenAI())
user = client.chat.completions.create(
model="gpt-4o-mini",
response_model=User,
messages=[{"role": "user", "content": "Extract: Alice, 25, alice@email.com"}]
)python
from openai import OpenAI
client = instructor.from_openai(OpenAI())
user = client.chat.completions.create(
model="gpt-4o-mini",
response_model=User,
messages=[{"role": "user", "content": "Extract: Alice, 25, alice@email.com"}]
)Core Concepts
核心概念
1. Response Models (Pydantic)
1. 响应模型(Pydantic)
Response models define the structure and validation rules for LLM outputs.
响应模型定义了LLM输出的结构和验证规则。
Basic Model
基础模型
python
from pydantic import BaseModel, Field
class Article(BaseModel):
title: str = Field(description="Article title")
author: str = Field(description="Author name")
word_count: int = Field(description="Number of words", gt=0)
tags: list[str] = Field(description="List of relevant tags")
article = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": "Analyze this article: [article text]"
}],
response_model=Article
)Benefits:
- Type safety with Python type hints
- Automatic validation (word_count > 0)
- Self-documenting with Field descriptions
- IDE autocomplete support
python
from pydantic import BaseModel, Field
class Article(BaseModel):
title: str = Field(description="Article title")
author: str = Field(description="Author name")
word_count: int = Field(description="Number of words", gt=0)
tags: list[str] = Field(description="List of relevant tags")
article = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": "Analyze this article: [article text]"
}],
response_model=Article
)优势:
- 借助Python类型提示实现类型安全
- 自动验证(如word_count > 0)
- 通过Field描述实现自文档化
- 支持IDE自动补全
Nested Models
嵌套模型
python
class Address(BaseModel):
street: str
city: str
country: str
class Person(BaseModel):
name: str
age: int
address: Address # Nested model
person = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": "John lives at 123 Main St, Boston, USA"
}],
response_model=Person
)
print(person.address.city) # "Boston"python
class Address(BaseModel):
street: str
city: str
country: str
class Person(BaseModel):
name: str
age: int
address: Address # 嵌套模型
person = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": "John lives at 123 Main St, Boston, USA"
}],
response_model=Person
)
print(person.address.city) # "Boston"Optional Fields
可选字段
python
from typing import Optional
class Product(BaseModel):
name: str
price: float
discount: Optional[float] = None # Optional
description: str = Field(default="No description") # Default valuepython
from typing import Optional
class Product(BaseModel):
name: str
price: float
discount: Optional[float] = None # 可选字段
description: str = Field(default="No description") # 默认值LLM doesn't need to provide discount or description
LLM无需提供discount或description
undefinedundefinedEnums for Constraints
用于约束的枚举类型
python
from enum import Enum
class Sentiment(str, Enum):
POSITIVE = "positive"
NEGATIVE = "negative"
NEUTRAL = "neutral"
class Review(BaseModel):
text: str
sentiment: Sentiment # Only these 3 values allowed
review = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": "This product is amazing!"
}],
response_model=Review
)
print(review.sentiment) # Sentiment.POSITIVEpython
from enum import Enum
class Sentiment(str, Enum):
POSITIVE = "positive"
NEGATIVE = "negative"
NEUTRAL = "neutral"
class Review(BaseModel):
text: str
sentiment: Sentiment # 仅允许这3个值
review = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": "This product is amazing!"
}],
response_model=Review
)
print(review.sentiment) # Sentiment.POSITIVE2. Validation
2. 验证
Pydantic validates LLM outputs automatically. If validation fails, Instructor retries.
Pydantic会自动验证LLM输出。如果验证失败,Instructor会自动重试。
Built-in Validators
内置验证器
python
from pydantic import Field, EmailStr, HttpUrl
class Contact(BaseModel):
name: str = Field(min_length=2, max_length=100)
age: int = Field(ge=0, le=120) # 0 <= age <= 120
email: EmailStr # Validates email format
website: HttpUrl # Validates URL formatpython
from pydantic import Field, EmailStr, HttpUrl
class Contact(BaseModel):
name: str = Field(min_length=2, max_length=100)
age: int = Field(ge=0, le=120) # 0 <= 年龄 <= 120
email: EmailStr # 验证邮箱格式
website: HttpUrl # 验证URL格式If LLM provides invalid data, Instructor retries automatically
如果LLM提供无效数据,Instructor会自动重试
undefinedundefinedCustom Validators
自定义验证器
python
from pydantic import field_validator
class Event(BaseModel):
name: str
date: str
attendees: int
@field_validator('date')
def validate_date(cls, v):
"""Ensure date is in YYYY-MM-DD format."""
import re
if not re.match(r'\d{4}-\d{2}-\d{2}', v):
raise ValueError('Date must be YYYY-MM-DD format')
return v
@field_validator('attendees')
def validate_attendees(cls, v):
"""Ensure positive attendees."""
if v < 1:
raise ValueError('Must have at least 1 attendee')
return vpython
from pydantic import field_validator
class Event(BaseModel):
name: str
date: str
attendees: int
@field_validator('date')
def validate_date(cls, v):
"""确保日期格式为YYYY-MM-DD。"""
import re
if not re.match(r'\d{4}-\d{2}-\d{2}', v):
raise ValueError('日期必须为YYYY-MM-DD格式')
return v
@field_validator('attendees')
def validate_attendees(cls, v):
"""确保参会人数为正数。"""
if v < 1:
raise ValueError('至少要有1名参会者')
return vModel-Level Validation
模型级验证
python
from pydantic import model_validator
class DateRange(BaseModel):
start_date: str
end_date: str
@model_validator(mode='after')
def check_dates(self):
"""Ensure end_date is after start_date."""
from datetime import datetime
start = datetime.strptime(self.start_date, '%Y-%m-%d')
end = datetime.strptime(self.end_date, '%Y-%m-%d')
if end < start:
raise ValueError('end_date must be after start_date')
return selfpython
from pydantic import model_validator
class DateRange(BaseModel):
start_date: str
end_date: str
@model_validator(mode='after')
def check_dates(self):
"""确保结束日期晚于开始日期。"""
from datetime import datetime
start = datetime.strptime(self.start_date, '%Y-%m-%d')
end = datetime.strptime(self.end_date, '%Y-%m-%d')
if end < start:
raise ValueError('结束日期必须晚于开始日期')
return self3. Automatic Retrying
3. 自动重试
Instructor retries automatically when validation fails, providing error feedback to the LLM.
python
undefined当验证失败时,Instructor会自动重试,并向LLM提供错误反馈。
python
undefinedRetries up to 3 times if validation fails
如果验证失败,最多重试3次
user = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": "Extract user from: John, age unknown"
}],
response_model=User,
max_retries=3 # Default is 3
)
user = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": "Extract user from: John, age unknown"
}],
response_model=User,
max_retries=3 # 默认值为3
)
If age can't be extracted, Instructor tells the LLM:
如果无法提取年龄,Instructor会告知LLM:
"Validation error: age - field required"
"Validation error: age - field required"
LLM tries again with better extraction
LLM会根据反馈重新尝试提取
**How it works:**
1. LLM generates output
2. Pydantic validates
3. If invalid: Error message sent back to LLM
4. LLM tries again with error feedback
5. Repeats up to max_retries
**工作原理**:
1. LLM生成输出
2. Pydantic进行验证
3. 如果验证无效:将错误消息发送回LLM
4. LLM根据错误反馈重新尝试
5. 重复操作直至达到max_retries次数4. Streaming
4. 流式传输
Stream partial results for real-time processing.
流式传输部分结果以进行实时处理。
Streaming Partial Objects
流式传输部分对象
python
from instructor import Partial
class Story(BaseModel):
title: str
content: str
tags: list[str]python
from instructor import Partial
class Story(BaseModel):
title: str
content: str
tags: list[str]Stream partial updates as LLM generates
在LLM生成内容时流式传输部分更新
for partial_story in client.messages.create_partial(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": "Write a short sci-fi story"
}],
response_model=Story
):
print(f"Title: {partial_story.title}")
print(f"Content so far: {partial_story.content[:100]}...")
# Update UI in real-time
undefinedfor partial_story in client.messages.create_partial(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": "Write a short sci-fi story"
}],
response_model=Story
):
print(f"标题: {partial_story.title}")
print(f"当前内容: {partial_story.content[:100]}...")
# 实时更新UI
undefinedStreaming Iterables
流式传输可迭代对象
python
class Task(BaseModel):
title: str
priority: strpython
class Task(BaseModel):
title: str
priority: strStream list items as they're generated
在生成列表项时流式传输
tasks = client.messages.create_iterable(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": "Generate 10 project tasks"
}],
response_model=Task
)
for task in tasks:
print(f"- {task.title} ({task.priority})")
# Process each task as it arrives
undefinedtasks = client.messages.create_iterable(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": "Generate 10 project tasks"
}],
response_model=Task
)
for task in tasks:
print(f"- {task.title} ({task.priority})")
# 逐个处理任务
undefinedProvider Configuration
提供商配置
Anthropic Claude
Anthropic Claude
python
import instructor
from anthropic import Anthropic
client = instructor.from_anthropic(
Anthropic(api_key="your-api-key")
)python
import instructor
from anthropic import Anthropic
client = instructor.from_anthropic(
Anthropic(api_key="your-api-key")
)Use with Claude models
与Claude模型配合使用
response = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[...],
response_model=YourModel
)
undefinedresponse = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[...],
response_model=YourModel
)
undefinedOpenAI
OpenAI
python
from openai import OpenAI
client = instructor.from_openai(
OpenAI(api_key="your-api-key")
)
response = client.chat.completions.create(
model="gpt-4o-mini",
response_model=YourModel,
messages=[...]
)python
from openai import OpenAI
client = instructor.from_openai(
OpenAI(api_key="your-api-key")
)
response = client.chat.completions.create(
model="gpt-4o-mini",
response_model=YourModel,
messages=[...]
)Local Models (Ollama)
本地模型(Ollama)
python
from openai import OpenAIpython
from openai import OpenAIPoint to local Ollama server
指向本地Ollama服务器
client = instructor.from_openai(
OpenAI(
base_url="http://localhost:11434/v1",
api_key="ollama" # Required but ignored
),
mode=instructor.Mode.JSON
)
response = client.chat.completions.create(
model="llama3.1",
response_model=YourModel,
messages=[...]
)
undefinedclient = instructor.from_openai(
OpenAI(
base_url="http://localhost:11434/v1",
api_key="ollama" # 必填但会被忽略
),
mode=instructor.Mode.JSON
)
response = client.chat.completions.create(
model="llama3.1",
response_model=YourModel,
messages=[...]
)
undefinedCommon Patterns
常见模式
Pattern 1: Data Extraction from Text
模式1:从文本中提取数据
python
class CompanyInfo(BaseModel):
name: str
founded_year: int
industry: str
employees: int
headquarters: str
text = """
Tesla, Inc. was founded in 2003. It operates in the automotive and energy
industry with approximately 140,000 employees. The company is headquartered
in Austin, Texas.
"""
company = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": f"Extract company information from: {text}"
}],
response_model=CompanyInfo
)python
class CompanyInfo(BaseModel):
name: str
founded_year: int
industry: str
employees: int
headquarters: str
text = """
Tesla, Inc. was founded in 2003. It operates in the automotive and energy
industry with approximately 140,000 employees. The company is headquartered
in Austin, Texas.
"""
company = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": f"Extract company information from: {text}"
}],
response_model=CompanyInfo
)Pattern 2: Classification
模式2:分类
python
class Category(str, Enum):
TECHNOLOGY = "technology"
FINANCE = "finance"
HEALTHCARE = "healthcare"
EDUCATION = "education"
OTHER = "other"
class ArticleClassification(BaseModel):
category: Category
confidence: float = Field(ge=0.0, le=1.0)
keywords: list[str]
classification = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": "Classify this article: [article text]"
}],
response_model=ArticleClassification
)python
class Category(str, Enum):
TECHNOLOGY = "technology"
FINANCE = "finance"
HEALTHCARE = "healthcare"
EDUCATION = "education"
OTHER = "other"
class ArticleClassification(BaseModel):
category: Category
confidence: float = Field(ge=0.0, le=1.0)
keywords: list[str]
classification = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": "Classify this article: [article text]"
}],
response_model=ArticleClassification
)Pattern 3: Multi-Entity Extraction
模式3:多实体提取
python
class Person(BaseModel):
name: str
role: str
class Organization(BaseModel):
name: str
industry: str
class Entities(BaseModel):
people: list[Person]
organizations: list[Organization]
locations: list[str]
text = "Tim Cook, CEO of Apple, announced at the event in Cupertino..."
entities = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": f"Extract all entities from: {text}"
}],
response_model=Entities
)
for person in entities.people:
print(f"{person.name} - {person.role}")python
class Person(BaseModel):
name: str
role: str
class Organization(BaseModel):
name: str
industry: str
class Entities(BaseModel):
people: list[Person]
organizations: list[Organization]
locations: list[str]
text = "Tim Cook, CEO of Apple, announced at the event in Cupertino..."
entities = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": f"Extract all entities from: {text}"
}],
response_model=Entities
)
for person in entities.people:
print(f"{person.name} - {person.role}")Pattern 4: Structured Analysis
模式4:结构化分析
python
class SentimentAnalysis(BaseModel):
overall_sentiment: Sentiment
positive_aspects: list[str]
negative_aspects: list[str]
suggestions: list[str]
score: float = Field(ge=-1.0, le=1.0)
review = "The product works well but setup was confusing..."
analysis = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": f"Analyze this review: {review}"
}],
response_model=SentimentAnalysis
)python
class SentimentAnalysis(BaseModel):
overall_sentiment: Sentiment
positive_aspects: list[str]
negative_aspects: list[str]
suggestions: list[str]
score: float = Field(ge=-1.0, le=1.0)
review = "The product works well but setup was confusing..."
analysis = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": f"Analyze this review: {review}"
}],
response_model=SentimentAnalysis
)Pattern 5: Batch Processing
模式5:批量处理
python
def extract_person(text: str) -> Person:
return client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": f"Extract person from: {text}"
}],
response_model=Person
)
texts = [
"John Doe is a 30-year-old engineer",
"Jane Smith, 25, works in marketing",
"Bob Johnson, age 40, software developer"
]
people = [extract_person(text) for text in texts]python
def extract_person(text: str) -> Person:
return client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": f"Extract person from: {text}"
}],
response_model=Person
)
texts = [
"John Doe is a 30-year-old engineer",
"Jane Smith, 25, works in marketing",
"Bob Johnson, age 40, software developer"
]
people = [extract_person(text) for text in texts]Advanced Features
高级功能
Union Types
联合类型
python
from typing import Union
class TextContent(BaseModel):
type: str = "text"
content: str
class ImageContent(BaseModel):
type: str = "image"
url: HttpUrl
caption: str
class Post(BaseModel):
title: str
content: Union[TextContent, ImageContent] # Either typepython
from typing import Union
class TextContent(BaseModel):
type: str = "text"
content: str
class ImageContent(BaseModel):
type: str = "image"
url: HttpUrl
caption: str
class Post(BaseModel):
title: str
content: Union[TextContent, ImageContent] # 二选一类型LLM chooses appropriate type based on content
LLM会根据内容选择合适的类型
undefinedundefinedDynamic Models
动态模型
python
from pydantic import create_modelpython
from pydantic import create_modelCreate model at runtime
在运行时创建模型
DynamicUser = create_model(
'User',
name=(str, ...),
age=(int, Field(ge=0)),
email=(EmailStr, ...)
)
user = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[...],
response_model=DynamicUser
)
undefinedDynamicUser = create_model(
'User',
name=(str, ...),
age=(int, Field(ge=0)),
email=(EmailStr, ...)
)
user = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[...],
response_model=DynamicUser
)
undefinedCustom Modes
自定义模式
python
undefinedpython
undefinedFor providers without native structured outputs
针对不支持原生结构化输出的提供商
client = instructor.from_anthropic(
Anthropic(),
mode=instructor.Mode.JSON # JSON mode
)
client = instructor.from_anthropic(
Anthropic(),
mode=instructor.Mode.JSON # JSON模式
)
Available modes:
可用模式:
- Mode.ANTHROPIC_TOOLS (recommended for Claude)
- Mode.ANTHROPIC_TOOLS(推荐用于Claude)
- Mode.JSON (fallback)
- Mode.JSON(备选)
- Mode.TOOLS (OpenAI tools)
- Mode.TOOLS(OpenAI工具)
undefinedundefinedContext Management
上下文管理
python
undefinedpython
undefinedSingle-use client
单次使用客户端
with instructor.from_anthropic(Anthropic()) as client:
result = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[...],
response_model=YourModel
)
# Client closed automatically
undefinedwith instructor.from_anthropic(Anthropic()) as client:
result = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[...],
response_model=YourModel
)
# 客户端会自动关闭
undefinedError Handling
错误处理
Handling Validation Errors
处理验证错误
python
from pydantic import ValidationError
try:
user = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[...],
response_model=User,
max_retries=3
)
except ValidationError as e:
print(f"Failed after retries: {e}")
# Handle gracefully
except Exception as e:
print(f"API error: {e}")python
from pydantic import ValidationError
try:
user = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[...],
response_model=User,
max_retries=3
)
except ValidationError as e:
print(f"重试后仍失败: {e}")
# 优雅处理错误
except Exception as e:
print(f"API错误: {e}")Custom Error Messages
自定义错误消息
python
class ValidatedUser(BaseModel):
name: str = Field(description="Full name, 2-100 characters")
age: int = Field(description="Age between 0 and 120", ge=0, le=120)
email: EmailStr = Field(description="Valid email address")
class Config:
# Custom error messages
json_schema_extra = {
"examples": [
{
"name": "John Doe",
"age": 30,
"email": "john@example.com"
}
]
}python
class ValidatedUser(BaseModel):
name: str = Field(description="全名,2-100个字符")
age: int = Field(description="年龄在0到120之间", ge=0, le=120)
email: EmailStr = Field(description="有效的邮箱地址")
class Config:
# 自定义错误消息
json_schema_extra = {
"examples": [
{
"name": "John Doe",
"age": 30,
"email": "john@example.com"
}
]
}Best Practices
最佳实践
1. Clear Field Descriptions
1. 清晰的字段描述
python
undefinedpython
undefined❌ Bad: Vague
❌ 不佳:描述模糊
class Product(BaseModel):
name: str
price: float
class Product(BaseModel):
name: str
price: float
✅ Good: Descriptive
✅ 良好:描述清晰
class Product(BaseModel):
name: str = Field(description="Product name from the text")
price: float = Field(description="Price in USD, without currency symbol")
undefinedclass Product(BaseModel):
name: str = Field(description="文本中的产品名称")
price: float = Field(description="美元价格,不含货币符号")
undefined2. Use Appropriate Validation
2. 使用合适的验证规则
python
undefinedpython
undefined✅ Good: Constrain values
✅ 良好:约束取值范围
class Rating(BaseModel):
score: int = Field(ge=1, le=5, description="Rating from 1 to 5 stars")
review: str = Field(min_length=10, description="Review text, at least 10 chars")
undefinedclass Rating(BaseModel):
score: int = Field(ge=1, le=5, description="1到5星的评分")
review: str = Field(min_length=10, description="评论文本,至少10个字符")
undefined3. Provide Examples in Prompts
3. 在提示词中提供示例
python
messages = [{
"role": "user",
"content": """Extract person info from: "John, 30, engineer"
Example format:
{
"name": "John Doe",
"age": 30,
"occupation": "engineer"
}"""
}]python
messages = [{
"role": "user",
"content": """从文本中提取个人信息: "John, 30, engineer"
示例格式:
{
"name": "John Doe",
"age": 30,
"occupation": "engineer"
}"""
}]4. Use Enums for Fixed Categories
4. 对固定分类使用枚举类型
python
undefinedpython
undefined✅ Good: Enum ensures valid values
✅ 良好:枚举确保取值有效
class Status(str, Enum):
PENDING = "pending"
APPROVED = "approved"
REJECTED = "rejected"
class Application(BaseModel):
status: Status # LLM must choose from enum
undefinedclass Status(str, Enum):
PENDING = "pending"
APPROVED = "approved"
REJECTED = "rejected"
class Application(BaseModel):
status: Status # LLM必须从枚举中选择
undefined5. Handle Missing Data Gracefully
5. 优雅处理缺失数据
python
class PartialData(BaseModel):
required_field: str
optional_field: Optional[str] = None
default_field: str = "default_value"python
class PartialData(BaseModel):
required_field: str
optional_field: Optional[str] = None
default_field: str = "default_value"LLM only needs to provide required_field
LLM只需提供required_field
undefinedundefinedComparison to Alternatives
与替代方案的对比
| Feature | Instructor | Manual JSON | LangChain | DSPy |
|---|---|---|---|---|
| Type Safety | ✅ Yes | ❌ No | ⚠️ Partial | ✅ Yes |
| Auto Validation | ✅ Yes | ❌ No | ❌ No | ⚠️ Limited |
| Auto Retry | ✅ Yes | ❌ No | ❌ No | ✅ Yes |
| Streaming | ✅ Yes | ❌ No | ✅ Yes | ❌ No |
| Multi-Provider | ✅ Yes | ⚠️ Manual | ✅ Yes | ✅ Yes |
| Learning Curve | Low | Low | Medium | High |
When to choose Instructor:
- Need structured, validated outputs
- Want type safety and IDE support
- Require automatic retries
- Building data extraction systems
When to choose alternatives:
- DSPy: Need prompt optimization
- LangChain: Building complex chains
- Manual: Simple, one-off extractions
| 特性 | Instructor | 手动JSON | LangChain | DSPy |
|---|---|---|---|---|
| 类型安全 | ✅ 是 | ❌ 否 | ⚠️ 部分支持 | ✅ 是 |
| 自动验证 | ✅ 是 | ❌ 否 | ❌ 否 | ⚠️ 有限支持 |
| 自动重试 | ✅ 是 | ❌ 否 | ❌ 否 | ✅ 是 |
| 流式传输 | ✅ 是 | ❌ 否 | ✅ 是 | ❌ 否 |
| 多提供商支持 | ✅ 是 | ⚠️ 手动实现 | ✅ 是 | ✅ 是 |
| 学习曲线 | 低 | 低 | 中等 | 高 |
何时选择Instructor:
- 需要结构化、经过验证的输出
- 想要类型安全和IDE支持
- 需要自动重试功能
- 构建数据提取系统
何时选择替代方案:
- DSPy:需要提示词优化
- LangChain:构建复杂的链式流程
- 手动实现:简单的一次性提取任务
Resources
资源
- Documentation: https://python.useinstructor.com
- GitHub: https://github.com/jxnl/instructor (15k+ stars)
- Cookbook: https://python.useinstructor.com/examples
- Discord: Community support available
- 文档:https://python.useinstructor.com
- GitHub:https://github.com/jxnl/instructor(15k+星标)
- 食谱:https://python.useinstructor.com/examples
- Discord:提供社区支持
See Also
另请参阅
- - Advanced validation patterns
references/validation.md - - Provider-specific configuration
references/providers.md - - Real-world use cases
references/examples.md
- - 高级验证模式
references/validation.md - - 提供商特定配置
references/providers.md - - 真实用例
references/examples.md