instructor

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Instructor: Structured LLM Outputs

Instructor：LLM结构化输出

When to Use This Skill

何时使用该工具

Use Instructor when you need to:

Extract structured data from LLM responses reliably
Validate outputs against Pydantic schemas automatically
Retry failed extractions with automatic error handling
Parse complex JSON with type safety and validation
Stream partial results for real-time processing
Support multiple LLM providers with consistent API

GitHub Stars: 15,000+ | Battle-tested: 100,000+ developers

在以下场景中使用Instructor：

可靠地从LLM响应中提取结构化数据
自动根据Pydantic schema验证输出
通过自动错误处理重试失败的提取操作
以类型安全的方式解析复杂JSON
流式传输部分结果以进行实时处理
通过统一API支持多个LLM提供商

GitHub星标数：15,000+ | 实战验证：100,000+开发者使用

Installation

安装

bash

undefined

bash

undefined

Base installation

基础安装

pip install instructor

With specific providers

安装特定提供商依赖

pip install "instructor[anthropic]" # Anthropic Claude pip install "instructor[openai]" # OpenAI pip install "instructor[all]" # All providers

undefined

pip install "instructor[anthropic]" # Anthropic Claude pip install "instructor[openai]" # OpenAI pip install "instructor[all]" # 所有提供商

undefined

Quick Start

快速开始

Basic Example: Extract User Data

基础示例：提取用户数据

python

import instructor
from pydantic import BaseModel
from anthropic import Anthropic

python

import instructor
from pydantic import BaseModel
from anthropic import Anthropic

Define output structure

定义输出结构

class User(BaseModel): name: str age: int email: str

Create instructor client

创建Instructor客户端

client = instructor.from_anthropic(Anthropic())

Extract structured data

提取结构化数据

user = client.messages.create( model="claude-sonnet-4-5-20250929", max_tokens=1024, messages=[{ "role": "user", "content": "John Doe is 30 years old. His email is john@example.com" }], response_model=User )

print(user.name) # "John Doe" print(user.age) # 30 print(user.email) # "john@example.com"

undefined

print(user.name) # "John Doe" print(user.age) # 30 print(user.email) # "john@example.com"

undefined

With OpenAI

与OpenAI配合使用

python

from openai import OpenAI

client = instructor.from_openai(OpenAI())

user = client.chat.completions.create(
    model="gpt-4o-mini",
    response_model=User,
    messages=[{"role": "user", "content": "Extract: Alice, 25, alice@email.com"}]
)

python

from openai import OpenAI

client = instructor.from_openai(OpenAI())

user = client.chat.completions.create(
    model="gpt-4o-mini",
    response_model=User,
    messages=[{"role": "user", "content": "Extract: Alice, 25, alice@email.com"}]
)

Core Concepts

核心概念

1. Response Models (Pydantic)

1. 响应模型（Pydantic）

Response models define the structure and validation rules for LLM outputs.

响应模型定义了LLM输出的结构和验证规则。

Basic Model

基础模型

python

from pydantic import BaseModel, Field

class Article(BaseModel):
    title: str = Field(description="Article title")
    author: str = Field(description="Author name")
    word_count: int = Field(description="Number of words", gt=0)
    tags: list[str] = Field(description="List of relevant tags")

article = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": "Analyze this article: [article text]"
    }],
    response_model=Article
)

Benefits:

Type safety with Python type hints
Automatic validation (word_count > 0)
Self-documenting with Field descriptions
IDE autocomplete support

python

from pydantic import BaseModel, Field

class Article(BaseModel):
    title: str = Field(description="Article title")
    author: str = Field(description="Author name")
    word_count: int = Field(description="Number of words", gt=0)
    tags: list[str] = Field(description="List of relevant tags")

article = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": "Analyze this article: [article text]"
    }],
    response_model=Article
)

优势：

借助Python类型提示实现类型安全
自动验证（如word_count > 0）
通过Field描述实现自文档化
支持IDE自动补全

Nested Models

嵌套模型

python

class Address(BaseModel):
    street: str
    city: str
    country: str

class Person(BaseModel):
    name: str
    age: int
    address: Address  # Nested model

person = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": "John lives at 123 Main St, Boston, USA"
    }],
    response_model=Person
)

print(person.address.city)  # "Boston"

python

class Address(BaseModel):
    street: str
    city: str
    country: str

class Person(BaseModel):
    name: str
    age: int
    address: Address  # 嵌套模型

person = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": "John lives at 123 Main St, Boston, USA"
    }],
    response_model=Person
)

print(person.address.city)  # "Boston"

Optional Fields

可选字段

python

from typing import Optional

class Product(BaseModel):
    name: str
    price: float
    discount: Optional[float] = None  # Optional
    description: str = Field(default="No description")  # Default value

python

from typing import Optional

class Product(BaseModel):
    name: str
    price: float
    discount: Optional[float] = None  # 可选字段
    description: str = Field(default="No description")  # 默认值

LLM doesn't need to provide discount or description

LLM无需提供discount或description

undefined

undefined

Enums for Constraints

用于约束的枚举类型

python

from enum import Enum

class Sentiment(str, Enum):
    POSITIVE = "positive"
    NEGATIVE = "negative"
    NEUTRAL = "neutral"

class Review(BaseModel):
    text: str
    sentiment: Sentiment  # Only these 3 values allowed

review = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": "This product is amazing!"
    }],
    response_model=Review
)

print(review.sentiment)  # Sentiment.POSITIVE

python

from enum import Enum

class Sentiment(str, Enum):
    POSITIVE = "positive"
    NEGATIVE = "negative"
    NEUTRAL = "neutral"

class Review(BaseModel):
    text: str
    sentiment: Sentiment  # 仅允许这3个值

review = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": "This product is amazing!"
    }],
    response_model=Review
)

print(review.sentiment)  # Sentiment.POSITIVE

2. Validation

2. 验证

Pydantic validates LLM outputs automatically. If validation fails, Instructor retries.

Pydantic会自动验证LLM输出。如果验证失败，Instructor会自动重试。

Built-in Validators

内置验证器

python

from pydantic import Field, EmailStr, HttpUrl

class Contact(BaseModel):
    name: str = Field(min_length=2, max_length=100)
    age: int = Field(ge=0, le=120)  # 0 <= age <= 120
    email: EmailStr  # Validates email format
    website: HttpUrl  # Validates URL format

python

from pydantic import Field, EmailStr, HttpUrl

class Contact(BaseModel):
    name: str = Field(min_length=2, max_length=100)
    age: int = Field(ge=0, le=120)  # 0 <= 年龄 <= 120
    email: EmailStr  # 验证邮箱格式
    website: HttpUrl  # 验证URL格式

If LLM provides invalid data, Instructor retries automatically

如果LLM提供无效数据，Instructor会自动重试

undefined

undefined

Custom Validators

自定义验证器

python

from pydantic import field_validator

class Event(BaseModel):
    name: str
    date: str
    attendees: int

    @field_validator('date')
    def validate_date(cls, v):
        """Ensure date is in YYYY-MM-DD format."""
        import re
        if not re.match(r'\d{4}-\d{2}-\d{2}', v):
            raise ValueError('Date must be YYYY-MM-DD format')
        return v

    @field_validator('attendees')
    def validate_attendees(cls, v):
        """Ensure positive attendees."""
        if v < 1:
            raise ValueError('Must have at least 1 attendee')
        return v

python

from pydantic import field_validator

class Event(BaseModel):
    name: str
    date: str
    attendees: int

    @field_validator('date')
    def validate_date(cls, v):
        """确保日期格式为YYYY-MM-DD。"""
        import re
        if not re.match(r'\d{4}-\d{2}-\d{2}', v):
            raise ValueError('日期必须为YYYY-MM-DD格式')
        return v

    @field_validator('attendees')
    def validate_attendees(cls, v):
        """确保参会人数为正数。"""
        if v < 1:
            raise ValueError('至少要有1名参会者')
        return v

Model-Level Validation

模型级验证

python

from pydantic import model_validator

class DateRange(BaseModel):
    start_date: str
    end_date: str

    @model_validator(mode='after')
    def check_dates(self):
        """Ensure end_date is after start_date."""
        from datetime import datetime
        start = datetime.strptime(self.start_date, '%Y-%m-%d')
        end = datetime.strptime(self.end_date, '%Y-%m-%d')

        if end < start:
            raise ValueError('end_date must be after start_date')
        return self

python

from pydantic import model_validator

class DateRange(BaseModel):
    start_date: str
    end_date: str

    @model_validator(mode='after')
    def check_dates(self):
        """确保结束日期晚于开始日期。"""
        from datetime import datetime
        start = datetime.strptime(self.start_date, '%Y-%m-%d')
        end = datetime.strptime(self.end_date, '%Y-%m-%d')

        if end < start:
            raise ValueError('结束日期必须晚于开始日期')
        return self

3. Automatic Retrying

3. 自动重试

Instructor retries automatically when validation fails, providing error feedback to the LLM.

python

undefined

当验证失败时，Instructor会自动重试，并向LLM提供错误反馈。

python

undefined

Retries up to 3 times if validation fails

如果验证失败，最多重试3次

user = client.messages.create( model="claude-sonnet-4-5-20250929", max_tokens=1024, messages=[{ "role": "user", "content": "Extract user from: John, age unknown" }], response_model=User, max_retries=3 # Default is 3 )

If age can't be extracted, Instructor tells the LLM:

如果无法提取年龄，Instructor会告知LLM：

"Validation error: age - field required"

LLM tries again with better extraction

LLM会根据反馈重新尝试提取


**How it works:**
1. LLM generates output
2. Pydantic validates
3. If invalid: Error message sent back to LLM
4. LLM tries again with error feedback
5. Repeats up to max_retries


**工作原理**：
1. LLM生成输出
2. Pydantic进行验证
3. 如果验证无效：将错误消息发送回LLM
4. LLM根据错误反馈重新尝试
5. 重复操作直至达到max_retries次数

4. Streaming

4. 流式传输

Stream partial results for real-time processing.

流式传输部分结果以进行实时处理。

Streaming Partial Objects

流式传输部分对象

python

from instructor import Partial

class Story(BaseModel):
    title: str
    content: str
    tags: list[str]

python

from instructor import Partial

class Story(BaseModel):
    title: str
    content: str
    tags: list[str]

Stream partial updates as LLM generates

在LLM生成内容时流式传输部分更新

for partial_story in client.messages.create_partial( model="claude-sonnet-4-5-20250929", max_tokens=1024, messages=[{ "role": "user", "content": "Write a short sci-fi story" }], response_model=Story ): print(f"Title: {partial_story.title}") print(f"Content so far: {partial_story.content[:100]}...") # Update UI in real-time

undefined

for partial_story in client.messages.create_partial( model="claude-sonnet-4-5-20250929", max_tokens=1024, messages=[{ "role": "user", "content": "Write a short sci-fi story" }], response_model=Story ): print(f"标题: {partial_story.title}") print(f"当前内容: {partial_story.content[:100]}...") # 实时更新UI

undefined

Streaming Iterables

流式传输可迭代对象

python

class Task(BaseModel):
    title: str
    priority: str

python

class Task(BaseModel):
    title: str
    priority: str

Stream list items as they're generated

在生成列表项时流式传输

tasks = client.messages.create_iterable( model="claude-sonnet-4-5-20250929", max_tokens=1024, messages=[{ "role": "user", "content": "Generate 10 project tasks" }], response_model=Task )

for task in tasks: print(f"- {task.title} ({task.priority})") # Process each task as it arrives

undefined

tasks = client.messages.create_iterable( model="claude-sonnet-4-5-20250929", max_tokens=1024, messages=[{ "role": "user", "content": "Generate 10 project tasks" }], response_model=Task )

for task in tasks: print(f"- {task.title} ({task.priority})") # 逐个处理任务

undefined

Provider Configuration

提供商配置

Anthropic Claude

python

import instructor
from anthropic import Anthropic

client = instructor.from_anthropic(
    Anthropic(api_key="your-api-key")
)

python

import instructor
from anthropic import Anthropic

client = instructor.from_anthropic(
    Anthropic(api_key="your-api-key")
)

Use with Claude models

与Claude模型配合使用

response = client.messages.create( model="claude-sonnet-4-5-20250929", max_tokens=1024, messages=[...], response_model=YourModel )

undefined

response = client.messages.create( model="claude-sonnet-4-5-20250929", max_tokens=1024, messages=[...], response_model=YourModel )

undefined

OpenAI

python

from openai import OpenAI

client = instructor.from_openai(
    OpenAI(api_key="your-api-key")
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    response_model=YourModel,
    messages=[...]
)

python

from openai import OpenAI

client = instructor.from_openai(
    OpenAI(api_key="your-api-key")
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    response_model=YourModel,
    messages=[...]
)

Local Models (Ollama)

本地模型（Ollama）

python

from openai import OpenAI

python

from openai import OpenAI

Point to local Ollama server

指向本地Ollama服务器

client = instructor.from_openai( OpenAI( base_url="http://localhost:11434/v1", api_key="ollama" # Required but ignored ), mode=instructor.Mode.JSON )

response = client.chat.completions.create( model="llama3.1", response_model=YourModel, messages=[...] )

undefined

client = instructor.from_openai( OpenAI( base_url="http://localhost:11434/v1", api_key="ollama" # 必填但会被忽略 ), mode=instructor.Mode.JSON )

response = client.chat.completions.create( model="llama3.1", response_model=YourModel, messages=[...] )

undefined

Common Patterns

常见模式

Pattern 1: Data Extraction from Text

模式1：从文本中提取数据

python

class CompanyInfo(BaseModel):
    name: str
    founded_year: int
    industry: str
    employees: int
    headquarters: str

text = """
Tesla, Inc. was founded in 2003. It operates in the automotive and energy
industry with approximately 140,000 employees. The company is headquartered
in Austin, Texas.
"""

company = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": f"Extract company information from: {text}"
    }],
    response_model=CompanyInfo
)

python

class CompanyInfo(BaseModel):
    name: str
    founded_year: int
    industry: str
    employees: int
    headquarters: str

text = """
Tesla, Inc. was founded in 2003. It operates in the automotive and energy
industry with approximately 140,000 employees. The company is headquartered
in Austin, Texas.
"""

company = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": f"Extract company information from: {text}"
    }],
    response_model=CompanyInfo
)

Pattern 2: Classification

模式2：分类

python

class Category(str, Enum):
    TECHNOLOGY = "technology"
    FINANCE = "finance"
    HEALTHCARE = "healthcare"
    EDUCATION = "education"
    OTHER = "other"

class ArticleClassification(BaseModel):
    category: Category
    confidence: float = Field(ge=0.0, le=1.0)
    keywords: list[str]

classification = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": "Classify this article: [article text]"
    }],
    response_model=ArticleClassification
)

python

class Category(str, Enum):
    TECHNOLOGY = "technology"
    FINANCE = "finance"
    HEALTHCARE = "healthcare"
    EDUCATION = "education"
    OTHER = "other"

class ArticleClassification(BaseModel):
    category: Category
    confidence: float = Field(ge=0.0, le=1.0)
    keywords: list[str]

classification = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": "Classify this article: [article text]"
    }],
    response_model=ArticleClassification
)

Pattern 3: Multi-Entity Extraction

模式3：多实体提取

python

class Person(BaseModel):
    name: str
    role: str

class Organization(BaseModel):
    name: str
    industry: str

class Entities(BaseModel):
    people: list[Person]
    organizations: list[Organization]
    locations: list[str]

text = "Tim Cook, CEO of Apple, announced at the event in Cupertino..."

entities = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": f"Extract all entities from: {text}"
    }],
    response_model=Entities
)

for person in entities.people:
    print(f"{person.name} - {person.role}")

python

class Person(BaseModel):
    name: str
    role: str

class Organization(BaseModel):
    name: str
    industry: str

class Entities(BaseModel):
    people: list[Person]
    organizations: list[Organization]
    locations: list[str]

text = "Tim Cook, CEO of Apple, announced at the event in Cupertino..."

entities = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": f"Extract all entities from: {text}"
    }],
    response_model=Entities
)

for person in entities.people:
    print(f"{person.name} - {person.role}")

Pattern 4: Structured Analysis

模式4：结构化分析

python

class SentimentAnalysis(BaseModel):
    overall_sentiment: Sentiment
    positive_aspects: list[str]
    negative_aspects: list[str]
    suggestions: list[str]
    score: float = Field(ge=-1.0, le=1.0)

review = "The product works well but setup was confusing..."

analysis = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": f"Analyze this review: {review}"
    }],
    response_model=SentimentAnalysis
)

python

class SentimentAnalysis(BaseModel):
    overall_sentiment: Sentiment
    positive_aspects: list[str]
    negative_aspects: list[str]
    suggestions: list[str]
    score: float = Field(ge=-1.0, le=1.0)

review = "The product works well but setup was confusing..."

analysis = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": f"Analyze this review: {review}"
    }],
    response_model=SentimentAnalysis
)

Pattern 5: Batch Processing

模式5：批量处理

python

def extract_person(text: str) -> Person:
    return client.messages.create(
        model="claude-sonnet-4-5-20250929",
        max_tokens=1024,
        messages=[{
            "role": "user",
            "content": f"Extract person from: {text}"
        }],
        response_model=Person
    )

texts = [
    "John Doe is a 30-year-old engineer",
    "Jane Smith, 25, works in marketing",
    "Bob Johnson, age 40, software developer"
]

people = [extract_person(text) for text in texts]

python

def extract_person(text: str) -> Person:
    return client.messages.create(
        model="claude-sonnet-4-5-20250929",
        max_tokens=1024,
        messages=[{
            "role": "user",
            "content": f"Extract person from: {text}"
        }],
        response_model=Person
    )

texts = [
    "John Doe is a 30-year-old engineer",
    "Jane Smith, 25, works in marketing",
    "Bob Johnson, age 40, software developer"
]

people = [extract_person(text) for text in texts]

Advanced Features

高级功能

Union Types

联合类型

python

from typing import Union

class TextContent(BaseModel):
    type: str = "text"
    content: str

class ImageContent(BaseModel):
    type: str = "image"
    url: HttpUrl
    caption: str

class Post(BaseModel):
    title: str
    content: Union[TextContent, ImageContent]  # Either type

python

from typing import Union

class TextContent(BaseModel):
    type: str = "text"
    content: str

class ImageContent(BaseModel):
    type: str = "image"
    url: HttpUrl
    caption: str

class Post(BaseModel):
    title: str
    content: Union[TextContent, ImageContent]  # 二选一类型

LLM chooses appropriate type based on content

LLM会根据内容选择合适的类型

undefined

undefined

Dynamic Models

动态模型

python

from pydantic import create_model

python

from pydantic import create_model

Create model at runtime

在运行时创建模型

DynamicUser = create_model( 'User', name=(str, ...), age=(int, Field(ge=0)), email=(EmailStr, ...) )

user = client.messages.create( model="claude-sonnet-4-5-20250929", max_tokens=1024, messages=[...], response_model=DynamicUser )

undefined

DynamicUser = create_model( 'User', name=(str, ...), age=(int, Field(ge=0)), email=(EmailStr, ...) )

user = client.messages.create( model="claude-sonnet-4-5-20250929", max_tokens=1024, messages=[...], response_model=DynamicUser )

undefined

Custom Modes

自定义模式

python

undefined

python

undefined

For providers without native structured outputs

针对不支持原生结构化输出的提供商

client = instructor.from_anthropic( Anthropic(), mode=instructor.Mode.JSON # JSON mode )

client = instructor.from_anthropic( Anthropic(), mode=instructor.Mode.JSON # JSON模式 )

Available modes:

可用模式：

- Mode.ANTHROPIC_TOOLS (recommended for Claude)

- Mode.ANTHROPIC_TOOLS（推荐用于Claude）

- Mode.JSON (fallback)

- Mode.JSON（备选）

- Mode.TOOLS (OpenAI tools)

- Mode.TOOLS（OpenAI工具）

undefined

undefined

Context Management

上下文管理

python

undefined

python

undefined

Single-use client

单次使用客户端

with instructor.from_anthropic(Anthropic()) as client: result = client.messages.create( model="claude-sonnet-4-5-20250929", max_tokens=1024, messages=[...], response_model=YourModel ) # Client closed automatically

undefined

undefined

Error Handling

错误处理

Handling Validation Errors

处理验证错误

python

from pydantic import ValidationError

try:
    user = client.messages.create(
        model="claude-sonnet-4-5-20250929",
        max_tokens=1024,
        messages=[...],
        response_model=User,
        max_retries=3
    )
except ValidationError as e:
    print(f"Failed after retries: {e}")
    # Handle gracefully

except Exception as e:
    print(f"API error: {e}")

python

from pydantic import ValidationError

try:
    user = client.messages.create(
        model="claude-sonnet-4-5-20250929",
        max_tokens=1024,
        messages=[...],
        response_model=User,
        max_retries=3
    )
except ValidationError as e:
    print(f"重试后仍失败: {e}")
    # 优雅处理错误

except Exception as e:
    print(f"API错误: {e}")

Custom Error Messages

自定义错误消息

python

class ValidatedUser(BaseModel):
    name: str = Field(description="Full name, 2-100 characters")
    age: int = Field(description="Age between 0 and 120", ge=0, le=120)
    email: EmailStr = Field(description="Valid email address")

    class Config:
        # Custom error messages
        json_schema_extra = {
            "examples": [
                {
                    "name": "John Doe",
                    "age": 30,
                    "email": "john@example.com"
                }
            ]
        }

python

class ValidatedUser(BaseModel):
    name: str = Field(description="全名，2-100个字符")
    age: int = Field(description="年龄在0到120之间", ge=0, le=120)
    email: EmailStr = Field(description="有效的邮箱地址")

    class Config:
        # 自定义错误消息
        json_schema_extra = {
            "examples": [
                {
                    "name": "John Doe",
                    "age": 30,
                    "email": "john@example.com"
                }
            ]
        }

Best Practices

最佳实践

1. Clear Field Descriptions

1. 清晰的字段描述

python

undefined

python

undefined

❌ Bad: Vague

❌ 不佳：描述模糊

class Product(BaseModel): name: str price: float

✅ Good: Descriptive

✅ 良好：描述清晰

class Product(BaseModel): name: str = Field(description="Product name from the text") price: float = Field(description="Price in USD, without currency symbol")

undefined

class Product(BaseModel): name: str = Field(description="文本中的产品名称") price: float = Field(description="美元价格，不含货币符号")

undefined

2. Use Appropriate Validation

2. 使用合适的验证规则

python

undefined

python

undefined

✅ Good: Constrain values

✅ 良好：约束取值范围

class Rating(BaseModel): score: int = Field(ge=1, le=5, description="Rating from 1 to 5 stars") review: str = Field(min_length=10, description="Review text, at least 10 chars")

undefined

class Rating(BaseModel): score: int = Field(ge=1, le=5, description="1到5星的评分") review: str = Field(min_length=10, description="评论文本，至少10个字符")

undefined

3. Provide Examples in Prompts

3. 在提示词中提供示例

python

messages = [{
    "role": "user",
    "content": """Extract person info from: "John, 30, engineer"

Example format:
{
  "name": "John Doe",
  "age": 30,
  "occupation": "engineer"
}"""
}]

python

messages = [{
    "role": "user",
    "content": """从文本中提取个人信息: "John, 30, engineer"

示例格式:
{
  "name": "John Doe",
  "age": 30,
  "occupation": "engineer"
}"""
}]

4. Use Enums for Fixed Categories

4. 对固定分类使用枚举类型

python

undefined

python

undefined

✅ Good: Enum ensures valid values

✅ 良好：枚举确保取值有效

class Status(str, Enum): PENDING = "pending" APPROVED = "approved" REJECTED = "rejected"

class Application(BaseModel): status: Status # LLM must choose from enum

undefined

class Status(str, Enum): PENDING = "pending" APPROVED = "approved" REJECTED = "rejected"

class Application(BaseModel): status: Status # LLM必须从枚举中选择

undefined

5. Handle Missing Data Gracefully

5. 优雅处理缺失数据

python

class PartialData(BaseModel):
    required_field: str
    optional_field: Optional[str] = None
    default_field: str = "default_value"

python

class PartialData(BaseModel):
    required_field: str
    optional_field: Optional[str] = None
    default_field: str = "default_value"

LLM only needs to provide required_field

LLM只需提供required_field

undefined

undefined

Comparison to Alternatives

与替代方案的对比

Feature	Instructor	Manual JSON	LangChain	DSPy
Type Safety	✅ Yes	❌ No	⚠️ Partial	✅ Yes
Auto Validation	✅ Yes	❌ No	❌ No	⚠️ Limited
Auto Retry	✅ Yes	❌ No	❌ No	✅ Yes
Streaming	✅ Yes	❌ No	✅ Yes	❌ No
Multi-Provider	✅ Yes	⚠️ Manual	✅ Yes	✅ Yes
Learning Curve	Low	Low	Medium	High

When to choose Instructor:

Need structured, validated outputs
Want type safety and IDE support
Require automatic retries
Building data extraction systems

When to choose alternatives:

DSPy: Need prompt optimization
LangChain: Building complex chains
Manual: Simple, one-off extractions

特性	Instructor	手动JSON	LangChain	DSPy
类型安全	✅ 是	❌ 否	⚠️ 部分支持	✅ 是
自动验证	✅ 是	❌ 否	❌ 否	⚠️ 有限支持
自动重试	✅ 是	❌ 否	❌ 否	✅ 是
流式传输	✅ 是	❌ 否	✅ 是	❌ 否
多提供商支持	✅ 是	⚠️ 手动实现	✅ 是	✅ 是
学习曲线	低	低	中等	高

何时选择Instructor：

需要结构化、经过验证的输出
想要类型安全和IDE支持
需要自动重试功能
构建数据提取系统

何时选择替代方案：

DSPy：需要提示词优化
LangChain：构建复杂的链式流程
手动实现：简单的一次性提取任务

Resources

资源

Documentation: https://python.useinstructor.com
GitHub: https://github.com/jxnl/instructor (15k+ stars)
Cookbook: https://python.useinstructor.com/examples
Discord: Community support available

文档：https://python.useinstructor.com
GitHub：https://github.com/jxnl/instructor（15k+星标）
食谱：https://python.useinstructor.com/examples
Discord：提供社区支持

另请参阅

```
references/validation.md
```
- Advanced validation patterns
```
references/providers.md
```
- Provider-specific configuration
```
references/examples.md
```
- Real-world use cases

```
references/validation.md
```
- 高级验证模式
```
references/providers.md
```
- 提供商特定配置
```
references/examples.md
```
- 真实用例