instructor

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Instructor: Structured LLM Outputs

Instructor:LLM结构化输出

When to Use This Skill

何时使用该工具

Use Instructor when you need to:
  • Extract structured data from LLM responses reliably
  • Validate outputs against Pydantic schemas automatically
  • Retry failed extractions with automatic error handling
  • Parse complex JSON with type safety and validation
  • Stream partial results for real-time processing
  • Support multiple LLM providers with consistent API
GitHub Stars: 15,000+ | Battle-tested: 100,000+ developers
在以下场景中使用Instructor:
  • 可靠地从LLM响应中提取结构化数据
  • 自动根据Pydantic schema验证输出
  • 通过自动错误处理重试失败的提取操作
  • 以类型安全的方式解析复杂JSON
  • 流式传输部分结果以进行实时处理
  • 通过统一API支持多个LLM提供商
GitHub星标数:15,000+ | 实战验证:100,000+开发者使用

Installation

安装

bash
undefined
bash
undefined

Base installation

基础安装

pip install instructor
pip install instructor

With specific providers

安装特定提供商依赖

pip install "instructor[anthropic]" # Anthropic Claude pip install "instructor[openai]" # OpenAI pip install "instructor[all]" # All providers
undefined
pip install "instructor[anthropic]" # Anthropic Claude pip install "instructor[openai]" # OpenAI pip install "instructor[all]" # 所有提供商
undefined

Quick Start

快速开始

Basic Example: Extract User Data

基础示例:提取用户数据

python
import instructor
from pydantic import BaseModel
from anthropic import Anthropic
python
import instructor
from pydantic import BaseModel
from anthropic import Anthropic

Define output structure

定义输出结构

class User(BaseModel): name: str age: int email: str
class User(BaseModel): name: str age: int email: str

Create instructor client

创建Instructor客户端

client = instructor.from_anthropic(Anthropic())
client = instructor.from_anthropic(Anthropic())

Extract structured data

提取结构化数据

user = client.messages.create( model="claude-sonnet-4-5-20250929", max_tokens=1024, messages=[{ "role": "user", "content": "John Doe is 30 years old. His email is john@example.com" }], response_model=User )
print(user.name) # "John Doe" print(user.age) # 30 print(user.email) # "john@example.com"
undefined
user = client.messages.create( model="claude-sonnet-4-5-20250929", max_tokens=1024, messages=[{ "role": "user", "content": "John Doe is 30 years old. His email is john@example.com" }], response_model=User )
print(user.name) # "John Doe" print(user.age) # 30 print(user.email) # "john@example.com"
undefined

With OpenAI

与OpenAI配合使用

python
from openai import OpenAI

client = instructor.from_openai(OpenAI())

user = client.chat.completions.create(
    model="gpt-4o-mini",
    response_model=User,
    messages=[{"role": "user", "content": "Extract: Alice, 25, alice@email.com"}]
)
python
from openai import OpenAI

client = instructor.from_openai(OpenAI())

user = client.chat.completions.create(
    model="gpt-4o-mini",
    response_model=User,
    messages=[{"role": "user", "content": "Extract: Alice, 25, alice@email.com"}]
)

Core Concepts

核心概念

1. Response Models (Pydantic)

1. 响应模型(Pydantic)

Response models define the structure and validation rules for LLM outputs.
响应模型定义了LLM输出的结构和验证规则。

Basic Model

基础模型

python
from pydantic import BaseModel, Field

class Article(BaseModel):
    title: str = Field(description="Article title")
    author: str = Field(description="Author name")
    word_count: int = Field(description="Number of words", gt=0)
    tags: list[str] = Field(description="List of relevant tags")

article = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": "Analyze this article: [article text]"
    }],
    response_model=Article
)
Benefits:
  • Type safety with Python type hints
  • Automatic validation (word_count > 0)
  • Self-documenting with Field descriptions
  • IDE autocomplete support
python
from pydantic import BaseModel, Field

class Article(BaseModel):
    title: str = Field(description="Article title")
    author: str = Field(description="Author name")
    word_count: int = Field(description="Number of words", gt=0)
    tags: list[str] = Field(description="List of relevant tags")

article = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": "Analyze this article: [article text]"
    }],
    response_model=Article
)
优势
  • 借助Python类型提示实现类型安全
  • 自动验证(如word_count > 0)
  • 通过Field描述实现自文档化
  • 支持IDE自动补全

Nested Models

嵌套模型

python
class Address(BaseModel):
    street: str
    city: str
    country: str

class Person(BaseModel):
    name: str
    age: int
    address: Address  # Nested model

person = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": "John lives at 123 Main St, Boston, USA"
    }],
    response_model=Person
)

print(person.address.city)  # "Boston"
python
class Address(BaseModel):
    street: str
    city: str
    country: str

class Person(BaseModel):
    name: str
    age: int
    address: Address  # 嵌套模型

person = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": "John lives at 123 Main St, Boston, USA"
    }],
    response_model=Person
)

print(person.address.city)  # "Boston"

Optional Fields

可选字段

python
from typing import Optional

class Product(BaseModel):
    name: str
    price: float
    discount: Optional[float] = None  # Optional
    description: str = Field(default="No description")  # Default value
python
from typing import Optional

class Product(BaseModel):
    name: str
    price: float
    discount: Optional[float] = None  # 可选字段
    description: str = Field(default="No description")  # 默认值

LLM doesn't need to provide discount or description

LLM无需提供discount或description

undefined
undefined

Enums for Constraints

用于约束的枚举类型

python
from enum import Enum

class Sentiment(str, Enum):
    POSITIVE = "positive"
    NEGATIVE = "negative"
    NEUTRAL = "neutral"

class Review(BaseModel):
    text: str
    sentiment: Sentiment  # Only these 3 values allowed

review = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": "This product is amazing!"
    }],
    response_model=Review
)

print(review.sentiment)  # Sentiment.POSITIVE
python
from enum import Enum

class Sentiment(str, Enum):
    POSITIVE = "positive"
    NEGATIVE = "negative"
    NEUTRAL = "neutral"

class Review(BaseModel):
    text: str
    sentiment: Sentiment  # 仅允许这3个值

review = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": "This product is amazing!"
    }],
    response_model=Review
)

print(review.sentiment)  # Sentiment.POSITIVE

2. Validation

2. 验证

Pydantic validates LLM outputs automatically. If validation fails, Instructor retries.
Pydantic会自动验证LLM输出。如果验证失败,Instructor会自动重试。

Built-in Validators

内置验证器

python
from pydantic import Field, EmailStr, HttpUrl

class Contact(BaseModel):
    name: str = Field(min_length=2, max_length=100)
    age: int = Field(ge=0, le=120)  # 0 <= age <= 120
    email: EmailStr  # Validates email format
    website: HttpUrl  # Validates URL format
python
from pydantic import Field, EmailStr, HttpUrl

class Contact(BaseModel):
    name: str = Field(min_length=2, max_length=100)
    age: int = Field(ge=0, le=120)  # 0 <= 年龄 <= 120
    email: EmailStr  # 验证邮箱格式
    website: HttpUrl  # 验证URL格式

If LLM provides invalid data, Instructor retries automatically

如果LLM提供无效数据,Instructor会自动重试

undefined
undefined

Custom Validators

自定义验证器

python
from pydantic import field_validator

class Event(BaseModel):
    name: str
    date: str
    attendees: int

    @field_validator('date')
    def validate_date(cls, v):
        """Ensure date is in YYYY-MM-DD format."""
        import re
        if not re.match(r'\d{4}-\d{2}-\d{2}', v):
            raise ValueError('Date must be YYYY-MM-DD format')
        return v

    @field_validator('attendees')
    def validate_attendees(cls, v):
        """Ensure positive attendees."""
        if v < 1:
            raise ValueError('Must have at least 1 attendee')
        return v
python
from pydantic import field_validator

class Event(BaseModel):
    name: str
    date: str
    attendees: int

    @field_validator('date')
    def validate_date(cls, v):
        """确保日期格式为YYYY-MM-DD。"""
        import re
        if not re.match(r'\d{4}-\d{2}-\d{2}', v):
            raise ValueError('日期必须为YYYY-MM-DD格式')
        return v

    @field_validator('attendees')
    def validate_attendees(cls, v):
        """确保参会人数为正数。"""
        if v < 1:
            raise ValueError('至少要有1名参会者')
        return v

Model-Level Validation

模型级验证

python
from pydantic import model_validator

class DateRange(BaseModel):
    start_date: str
    end_date: str

    @model_validator(mode='after')
    def check_dates(self):
        """Ensure end_date is after start_date."""
        from datetime import datetime
        start = datetime.strptime(self.start_date, '%Y-%m-%d')
        end = datetime.strptime(self.end_date, '%Y-%m-%d')

        if end < start:
            raise ValueError('end_date must be after start_date')
        return self
python
from pydantic import model_validator

class DateRange(BaseModel):
    start_date: str
    end_date: str

    @model_validator(mode='after')
    def check_dates(self):
        """确保结束日期晚于开始日期。"""
        from datetime import datetime
        start = datetime.strptime(self.start_date, '%Y-%m-%d')
        end = datetime.strptime(self.end_date, '%Y-%m-%d')

        if end < start:
            raise ValueError('结束日期必须晚于开始日期')
        return self

3. Automatic Retrying

3. 自动重试

Instructor retries automatically when validation fails, providing error feedback to the LLM.
python
undefined
当验证失败时,Instructor会自动重试,并向LLM提供错误反馈。
python
undefined

Retries up to 3 times if validation fails

如果验证失败,最多重试3次

user = client.messages.create( model="claude-sonnet-4-5-20250929", max_tokens=1024, messages=[{ "role": "user", "content": "Extract user from: John, age unknown" }], response_model=User, max_retries=3 # Default is 3 )
user = client.messages.create( model="claude-sonnet-4-5-20250929", max_tokens=1024, messages=[{ "role": "user", "content": "Extract user from: John, age unknown" }], response_model=User, max_retries=3 # 默认值为3 )

If age can't be extracted, Instructor tells the LLM:

如果无法提取年龄,Instructor会告知LLM:

"Validation error: age - field required"

"Validation error: age - field required"

LLM tries again with better extraction

LLM会根据反馈重新尝试提取


**How it works:**
1. LLM generates output
2. Pydantic validates
3. If invalid: Error message sent back to LLM
4. LLM tries again with error feedback
5. Repeats up to max_retries

**工作原理**:
1. LLM生成输出
2. Pydantic进行验证
3. 如果验证无效:将错误消息发送回LLM
4. LLM根据错误反馈重新尝试
5. 重复操作直至达到max_retries次数

4. Streaming

4. 流式传输

Stream partial results for real-time processing.
流式传输部分结果以进行实时处理。

Streaming Partial Objects

流式传输部分对象

python
from instructor import Partial

class Story(BaseModel):
    title: str
    content: str
    tags: list[str]
python
from instructor import Partial

class Story(BaseModel):
    title: str
    content: str
    tags: list[str]

Stream partial updates as LLM generates

在LLM生成内容时流式传输部分更新

for partial_story in client.messages.create_partial( model="claude-sonnet-4-5-20250929", max_tokens=1024, messages=[{ "role": "user", "content": "Write a short sci-fi story" }], response_model=Story ): print(f"Title: {partial_story.title}") print(f"Content so far: {partial_story.content[:100]}...") # Update UI in real-time
undefined
for partial_story in client.messages.create_partial( model="claude-sonnet-4-5-20250929", max_tokens=1024, messages=[{ "role": "user", "content": "Write a short sci-fi story" }], response_model=Story ): print(f"标题: {partial_story.title}") print(f"当前内容: {partial_story.content[:100]}...") # 实时更新UI
undefined

Streaming Iterables

流式传输可迭代对象

python
class Task(BaseModel):
    title: str
    priority: str
python
class Task(BaseModel):
    title: str
    priority: str

Stream list items as they're generated

在生成列表项时流式传输

tasks = client.messages.create_iterable( model="claude-sonnet-4-5-20250929", max_tokens=1024, messages=[{ "role": "user", "content": "Generate 10 project tasks" }], response_model=Task )
for task in tasks: print(f"- {task.title} ({task.priority})") # Process each task as it arrives
undefined
tasks = client.messages.create_iterable( model="claude-sonnet-4-5-20250929", max_tokens=1024, messages=[{ "role": "user", "content": "Generate 10 project tasks" }], response_model=Task )
for task in tasks: print(f"- {task.title} ({task.priority})") # 逐个处理任务
undefined

Provider Configuration

提供商配置

Anthropic Claude

Anthropic Claude

python
import instructor
from anthropic import Anthropic

client = instructor.from_anthropic(
    Anthropic(api_key="your-api-key")
)
python
import instructor
from anthropic import Anthropic

client = instructor.from_anthropic(
    Anthropic(api_key="your-api-key")
)

Use with Claude models

与Claude模型配合使用

response = client.messages.create( model="claude-sonnet-4-5-20250929", max_tokens=1024, messages=[...], response_model=YourModel )
undefined
response = client.messages.create( model="claude-sonnet-4-5-20250929", max_tokens=1024, messages=[...], response_model=YourModel )
undefined

OpenAI

OpenAI

python
from openai import OpenAI

client = instructor.from_openai(
    OpenAI(api_key="your-api-key")
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    response_model=YourModel,
    messages=[...]
)
python
from openai import OpenAI

client = instructor.from_openai(
    OpenAI(api_key="your-api-key")
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    response_model=YourModel,
    messages=[...]
)

Local Models (Ollama)

本地模型(Ollama)

python
from openai import OpenAI
python
from openai import OpenAI

Point to local Ollama server

指向本地Ollama服务器

client = instructor.from_openai( OpenAI( base_url="http://localhost:11434/v1", api_key="ollama" # Required but ignored ), mode=instructor.Mode.JSON )
response = client.chat.completions.create( model="llama3.1", response_model=YourModel, messages=[...] )
undefined
client = instructor.from_openai( OpenAI( base_url="http://localhost:11434/v1", api_key="ollama" # 必填但会被忽略 ), mode=instructor.Mode.JSON )
response = client.chat.completions.create( model="llama3.1", response_model=YourModel, messages=[...] )
undefined

Common Patterns

常见模式

Pattern 1: Data Extraction from Text

模式1:从文本中提取数据

python
class CompanyInfo(BaseModel):
    name: str
    founded_year: int
    industry: str
    employees: int
    headquarters: str

text = """
Tesla, Inc. was founded in 2003. It operates in the automotive and energy
industry with approximately 140,000 employees. The company is headquartered
in Austin, Texas.
"""

company = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": f"Extract company information from: {text}"
    }],
    response_model=CompanyInfo
)
python
class CompanyInfo(BaseModel):
    name: str
    founded_year: int
    industry: str
    employees: int
    headquarters: str

text = """
Tesla, Inc. was founded in 2003. It operates in the automotive and energy
industry with approximately 140,000 employees. The company is headquartered
in Austin, Texas.
"""

company = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": f"Extract company information from: {text}"
    }],
    response_model=CompanyInfo
)

Pattern 2: Classification

模式2:分类

python
class Category(str, Enum):
    TECHNOLOGY = "technology"
    FINANCE = "finance"
    HEALTHCARE = "healthcare"
    EDUCATION = "education"
    OTHER = "other"

class ArticleClassification(BaseModel):
    category: Category
    confidence: float = Field(ge=0.0, le=1.0)
    keywords: list[str]

classification = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": "Classify this article: [article text]"
    }],
    response_model=ArticleClassification
)
python
class Category(str, Enum):
    TECHNOLOGY = "technology"
    FINANCE = "finance"
    HEALTHCARE = "healthcare"
    EDUCATION = "education"
    OTHER = "other"

class ArticleClassification(BaseModel):
    category: Category
    confidence: float = Field(ge=0.0, le=1.0)
    keywords: list[str]

classification = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": "Classify this article: [article text]"
    }],
    response_model=ArticleClassification
)

Pattern 3: Multi-Entity Extraction

模式3:多实体提取

python
class Person(BaseModel):
    name: str
    role: str

class Organization(BaseModel):
    name: str
    industry: str

class Entities(BaseModel):
    people: list[Person]
    organizations: list[Organization]
    locations: list[str]

text = "Tim Cook, CEO of Apple, announced at the event in Cupertino..."

entities = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": f"Extract all entities from: {text}"
    }],
    response_model=Entities
)

for person in entities.people:
    print(f"{person.name} - {person.role}")
python
class Person(BaseModel):
    name: str
    role: str

class Organization(BaseModel):
    name: str
    industry: str

class Entities(BaseModel):
    people: list[Person]
    organizations: list[Organization]
    locations: list[str]

text = "Tim Cook, CEO of Apple, announced at the event in Cupertino..."

entities = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": f"Extract all entities from: {text}"
    }],
    response_model=Entities
)

for person in entities.people:
    print(f"{person.name} - {person.role}")

Pattern 4: Structured Analysis

模式4:结构化分析

python
class SentimentAnalysis(BaseModel):
    overall_sentiment: Sentiment
    positive_aspects: list[str]
    negative_aspects: list[str]
    suggestions: list[str]
    score: float = Field(ge=-1.0, le=1.0)

review = "The product works well but setup was confusing..."

analysis = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": f"Analyze this review: {review}"
    }],
    response_model=SentimentAnalysis
)
python
class SentimentAnalysis(BaseModel):
    overall_sentiment: Sentiment
    positive_aspects: list[str]
    negative_aspects: list[str]
    suggestions: list[str]
    score: float = Field(ge=-1.0, le=1.0)

review = "The product works well but setup was confusing..."

analysis = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": f"Analyze this review: {review}"
    }],
    response_model=SentimentAnalysis
)

Pattern 5: Batch Processing

模式5:批量处理

python
def extract_person(text: str) -> Person:
    return client.messages.create(
        model="claude-sonnet-4-5-20250929",
        max_tokens=1024,
        messages=[{
            "role": "user",
            "content": f"Extract person from: {text}"
        }],
        response_model=Person
    )

texts = [
    "John Doe is a 30-year-old engineer",
    "Jane Smith, 25, works in marketing",
    "Bob Johnson, age 40, software developer"
]

people = [extract_person(text) for text in texts]
python
def extract_person(text: str) -> Person:
    return client.messages.create(
        model="claude-sonnet-4-5-20250929",
        max_tokens=1024,
        messages=[{
            "role": "user",
            "content": f"Extract person from: {text}"
        }],
        response_model=Person
    )

texts = [
    "John Doe is a 30-year-old engineer",
    "Jane Smith, 25, works in marketing",
    "Bob Johnson, age 40, software developer"
]

people = [extract_person(text) for text in texts]

Advanced Features

高级功能

Union Types

联合类型

python
from typing import Union

class TextContent(BaseModel):
    type: str = "text"
    content: str

class ImageContent(BaseModel):
    type: str = "image"
    url: HttpUrl
    caption: str

class Post(BaseModel):
    title: str
    content: Union[TextContent, ImageContent]  # Either type
python
from typing import Union

class TextContent(BaseModel):
    type: str = "text"
    content: str

class ImageContent(BaseModel):
    type: str = "image"
    url: HttpUrl
    caption: str

class Post(BaseModel):
    title: str
    content: Union[TextContent, ImageContent]  # 二选一类型

LLM chooses appropriate type based on content

LLM会根据内容选择合适的类型

undefined
undefined

Dynamic Models

动态模型

python
from pydantic import create_model
python
from pydantic import create_model

Create model at runtime

在运行时创建模型

DynamicUser = create_model( 'User', name=(str, ...), age=(int, Field(ge=0)), email=(EmailStr, ...) )
user = client.messages.create( model="claude-sonnet-4-5-20250929", max_tokens=1024, messages=[...], response_model=DynamicUser )
undefined
DynamicUser = create_model( 'User', name=(str, ...), age=(int, Field(ge=0)), email=(EmailStr, ...) )
user = client.messages.create( model="claude-sonnet-4-5-20250929", max_tokens=1024, messages=[...], response_model=DynamicUser )
undefined

Custom Modes

自定义模式

python
undefined
python
undefined

For providers without native structured outputs

针对不支持原生结构化输出的提供商

client = instructor.from_anthropic( Anthropic(), mode=instructor.Mode.JSON # JSON mode )
client = instructor.from_anthropic( Anthropic(), mode=instructor.Mode.JSON # JSON模式 )

Available modes:

可用模式:

- Mode.ANTHROPIC_TOOLS (recommended for Claude)

- Mode.ANTHROPIC_TOOLS(推荐用于Claude)

- Mode.JSON (fallback)

- Mode.JSON(备选)

- Mode.TOOLS (OpenAI tools)

- Mode.TOOLS(OpenAI工具)

undefined
undefined

Context Management

上下文管理

python
undefined
python
undefined

Single-use client

单次使用客户端

with instructor.from_anthropic(Anthropic()) as client: result = client.messages.create( model="claude-sonnet-4-5-20250929", max_tokens=1024, messages=[...], response_model=YourModel ) # Client closed automatically
undefined
with instructor.from_anthropic(Anthropic()) as client: result = client.messages.create( model="claude-sonnet-4-5-20250929", max_tokens=1024, messages=[...], response_model=YourModel ) # 客户端会自动关闭
undefined

Error Handling

错误处理

Handling Validation Errors

处理验证错误

python
from pydantic import ValidationError

try:
    user = client.messages.create(
        model="claude-sonnet-4-5-20250929",
        max_tokens=1024,
        messages=[...],
        response_model=User,
        max_retries=3
    )
except ValidationError as e:
    print(f"Failed after retries: {e}")
    # Handle gracefully

except Exception as e:
    print(f"API error: {e}")
python
from pydantic import ValidationError

try:
    user = client.messages.create(
        model="claude-sonnet-4-5-20250929",
        max_tokens=1024,
        messages=[...],
        response_model=User,
        max_retries=3
    )
except ValidationError as e:
    print(f"重试后仍失败: {e}")
    # 优雅处理错误

except Exception as e:
    print(f"API错误: {e}")

Custom Error Messages

自定义错误消息

python
class ValidatedUser(BaseModel):
    name: str = Field(description="Full name, 2-100 characters")
    age: int = Field(description="Age between 0 and 120", ge=0, le=120)
    email: EmailStr = Field(description="Valid email address")

    class Config:
        # Custom error messages
        json_schema_extra = {
            "examples": [
                {
                    "name": "John Doe",
                    "age": 30,
                    "email": "john@example.com"
                }
            ]
        }
python
class ValidatedUser(BaseModel):
    name: str = Field(description="全名,2-100个字符")
    age: int = Field(description="年龄在0到120之间", ge=0, le=120)
    email: EmailStr = Field(description="有效的邮箱地址")

    class Config:
        # 自定义错误消息
        json_schema_extra = {
            "examples": [
                {
                    "name": "John Doe",
                    "age": 30,
                    "email": "john@example.com"
                }
            ]
        }

Best Practices

最佳实践

1. Clear Field Descriptions

1. 清晰的字段描述

python
undefined
python
undefined

❌ Bad: Vague

❌ 不佳:描述模糊

class Product(BaseModel): name: str price: float
class Product(BaseModel): name: str price: float

✅ Good: Descriptive

✅ 良好:描述清晰

class Product(BaseModel): name: str = Field(description="Product name from the text") price: float = Field(description="Price in USD, without currency symbol")
undefined
class Product(BaseModel): name: str = Field(description="文本中的产品名称") price: float = Field(description="美元价格,不含货币符号")
undefined

2. Use Appropriate Validation

2. 使用合适的验证规则

python
undefined
python
undefined

✅ Good: Constrain values

✅ 良好:约束取值范围

class Rating(BaseModel): score: int = Field(ge=1, le=5, description="Rating from 1 to 5 stars") review: str = Field(min_length=10, description="Review text, at least 10 chars")
undefined
class Rating(BaseModel): score: int = Field(ge=1, le=5, description="1到5星的评分") review: str = Field(min_length=10, description="评论文本,至少10个字符")
undefined

3. Provide Examples in Prompts

3. 在提示词中提供示例

python
messages = [{
    "role": "user",
    "content": """Extract person info from: "John, 30, engineer"

Example format:
{
  "name": "John Doe",
  "age": 30,
  "occupation": "engineer"
}"""
}]
python
messages = [{
    "role": "user",
    "content": """从文本中提取个人信息: "John, 30, engineer"

示例格式:
{
  "name": "John Doe",
  "age": 30,
  "occupation": "engineer"
}"""
}]

4. Use Enums for Fixed Categories

4. 对固定分类使用枚举类型

python
undefined
python
undefined

✅ Good: Enum ensures valid values

✅ 良好:枚举确保取值有效

class Status(str, Enum): PENDING = "pending" APPROVED = "approved" REJECTED = "rejected"
class Application(BaseModel): status: Status # LLM must choose from enum
undefined
class Status(str, Enum): PENDING = "pending" APPROVED = "approved" REJECTED = "rejected"
class Application(BaseModel): status: Status # LLM必须从枚举中选择
undefined

5. Handle Missing Data Gracefully

5. 优雅处理缺失数据

python
class PartialData(BaseModel):
    required_field: str
    optional_field: Optional[str] = None
    default_field: str = "default_value"
python
class PartialData(BaseModel):
    required_field: str
    optional_field: Optional[str] = None
    default_field: str = "default_value"

LLM only needs to provide required_field

LLM只需提供required_field

undefined
undefined

Comparison to Alternatives

与替代方案的对比

FeatureInstructorManual JSONLangChainDSPy
Type Safety✅ Yes❌ No⚠️ Partial✅ Yes
Auto Validation✅ Yes❌ No❌ No⚠️ Limited
Auto Retry✅ Yes❌ No❌ No✅ Yes
Streaming✅ Yes❌ No✅ Yes❌ No
Multi-Provider✅ Yes⚠️ Manual✅ Yes✅ Yes
Learning CurveLowLowMediumHigh
When to choose Instructor:
  • Need structured, validated outputs
  • Want type safety and IDE support
  • Require automatic retries
  • Building data extraction systems
When to choose alternatives:
  • DSPy: Need prompt optimization
  • LangChain: Building complex chains
  • Manual: Simple, one-off extractions
特性Instructor手动JSONLangChainDSPy
类型安全✅ 是❌ 否⚠️ 部分支持✅ 是
自动验证✅ 是❌ 否❌ 否⚠️ 有限支持
自动重试✅ 是❌ 否❌ 否✅ 是
流式传输✅ 是❌ 否✅ 是❌ 否
多提供商支持✅ 是⚠️ 手动实现✅ 是✅ 是
学习曲线中等
何时选择Instructor
  • 需要结构化、经过验证的输出
  • 想要类型安全和IDE支持
  • 需要自动重试功能
  • 构建数据提取系统
何时选择替代方案
  • DSPy:需要提示词优化
  • LangChain:构建复杂的链式流程
  • 手动实现:简单的一次性提取任务

Resources

资源

See Also

另请参阅

  • references/validation.md
    - Advanced validation patterns
  • references/providers.md
    - Provider-specific configuration
  • references/examples.md
    - Real-world use cases
  • references/validation.md
    - 高级验证模式
  • references/providers.md
    - 提供商特定配置
  • references/examples.md
    - 真实用例