ai-generating-data

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Generate Synthetic Training Data

生成合成训练数据

Guide the user through generating high-quality synthetic training data with DSPy. This solves the "I don't have data" problem that blocks every other AI workflow.
引导用户使用DSPy生成高质量的合成训练数据。这可以解决阻碍所有其他AI工作流的「没有数据」难题。

When you need synthetic data

何时需要合成数据

  • Cold start: You're building a new feature and have zero labeled examples
  • Not enough for optimization: You have 10-30 examples but optimizers need 200+
  • Privacy/compliance: You can't use real customer data for training
  • Edge cases: Your AI works on common inputs but fails on rare ones
  • Unbalanced categories: Some categories have 500 examples, others have 10
  • New categories: You added a category and have no examples for it
  • Schema changed: Your input/output format changed, old data doesn't fit
  • Proof of concept: PM wants a demo by Friday, no time to collect real data
  • 冷启动:你正在构建新功能,没有任何标注示例
  • 示例数量不足无法优化:你有10-30个示例,但优化器需要200个以上
  • 隐私/合规要求:你无法使用真实客户数据进行训练
  • 边缘场景:你的AI在常见输入上表现正常,但在罕见输入上失效
  • 类别分布不均衡:部分类别有500个示例,其他类别只有10个
  • 新增类别:你添加了一个新类别,没有对应的示例
  • 数据结构变更:你的输入/输出格式发生变化,旧数据不再适用
  • 概念验证:产品经理要求周五前完成演示,没有时间收集真实数据

The core idea

核心思路

Define a generator signature whose outputs match your task's input/output fields. Use an LM to produce examples. Filter for quality. Use for optimization.
Research shows this works surprisingly well:
  • Optimized generator prompts match models trained on 100K+ human labels using only 10 gold labels (arXiv 2406.11706)
  • DSPy-optimized Chain-of-Thought generation outperforms hand-written static templates (arXiv 2508.13930)
The key insight: the prompt used to generate data is a critical hyperparameter — optimizing it matters more than generating more data.
定义一个生成器签名,使其输出与你的任务输入/输出字段匹配。使用大语言模型(LM)生成示例,过滤出高质量示例,再用于优化。
研究表明这种方法效果出奇地好:
  • 经过优化的生成器提示词,仅用10个黄金标签,就能达到使用10万+人工标注数据训练的模型效果(arXiv 2406.11706)
  • DSPy优化的思维链生成效果优于手写的静态模板(arXiv 2508.13930)
关键洞察:用于生成数据的提示词是一个关键超参数——优化它比生成更多数据更重要。

Step 1: Define what an example looks like

步骤1:定义示例格式

Your generator's outputs should match your task's inputs and expected outputs.
python
import dspy
生成器的输出应与你的任务输入和预期输出匹配。
python
import dspy

Your task — what the AI will do in production

Your task — what the AI will do in production

class ClassifyTicket(dspy.Signature): """Classify a support ticket into a category.""" ticket_text: str = dspy.InputField() category: str = dspy.OutputField()
class ClassifyTicket(dspy.Signature): """Classify a support ticket into a category.""" ticket_text: str = dspy.InputField() category: str = dspy.OutputField()

Generator — produces examples for your task

Generator — produces examples for your task

class GenerateTicketExample(dspy.Signature): """Generate a realistic support ticket with its correct category.""" category: str = dspy.InputField(desc="the target category to generate an example for") ticket_text: str = dspy.OutputField(desc="a realistic support ticket for this category")

The generator's output fields become inputs to your task. Think of it as: "given what I want the answer to be, generate a realistic input."
class GenerateTicketExample(dspy.Signature): """Generate a realistic support ticket with its correct category.""" category: str = dspy.InputField(desc="the target category to generate an example for") ticket_text: str = dspy.OutputField(desc="a realistic support ticket for this category")

生成器的输出字段将作为你的任务输入。可以理解为:「给定我想要的答案,生成一个真实的输入。」

Multi-field tasks

多字段任务

If your task has multiple inputs or outputs, mirror all of them:
python
undefined
如果你的任务有多个输入或输出,需全部对应:
python
undefined

Task: extract structured data from text

Task: extract structured data from text

class ExtractContact(dspy.Signature): """Extract contact info from a message.""" message: str = dspy.InputField() name: str = dspy.OutputField() email: str = dspy.OutputField() phone: str = dspy.OutputField()
class ExtractContact(dspy.Signature): """Extract contact info from a message.""" message: str = dspy.InputField() name: str = dspy.OutputField() email: str = dspy.OutputField() phone: str = dspy.OutputField()

Generator: produce realistic messages with known contact info

Generator: produce realistic messages with known contact info

class GenerateContactExample(dspy.Signature): """Generate a realistic message that contains contact information.""" name: str = dspy.InputField(desc="the person's name to embed in the message") email: str = dspy.InputField(desc="the email address to embed in the message") phone: str = dspy.InputField(desc="the phone number to embed in the message") message: str = dspy.OutputField(desc="a realistic message containing this contact info")
undefined
class GenerateContactExample(dspy.Signature): """Generate a realistic message that contains contact information.""" name: str = dspy.InputField(desc="the person's name to embed in the message") email: str = dspy.InputField(desc="the email address to embed in the message") phone: str = dspy.InputField(desc="the phone number to embed in the message") message: str = dspy.OutputField(desc="a realistic message containing this contact info")
undefined

Step 2: Write seed examples

步骤2:编写种子示例

Start with 5-10 hand-written examples. These anchor the generator's understanding of what "realistic" means for your domain.
python
seeds = [
    dspy.Example(
        ticket_text="I was charged twice for my subscription this month. Order #4521.",
        category="billing"
    ).with_inputs("ticket_text"),
    dspy.Example(
        ticket_text="The app crashes when I try to upload a profile photo on Android.",
        category="bug"
    ).with_inputs("ticket_text"),
    dspy.Example(
        ticket_text="How do I export my data to CSV? I can't find the option anywhere.",
        category="how-to"
    ).with_inputs("ticket_text"),
    dspy.Example(
        ticket_text="I'd love to see dark mode added. The white background hurts my eyes.",
        category="feature-request"
    ).with_inputs("ticket_text"),
    dspy.Example(
        ticket_text="My account got locked after too many login attempts. Please help.",
        category="account"
    ).with_inputs("ticket_text"),
]
Even 5 seeds dramatically improve generation quality over zero.
从5-10个手写示例开始。这些示例可以锚定生成器对「真实」的理解,符合你的业务领域。
python
seeds = [
    dspy.Example(
        ticket_text="I was charged twice for my subscription this month. Order #4521.",
        category="billing"
    ).with_inputs("ticket_text"),
    dspy.Example(
        ticket_text="The app crashes when I try to upload a profile photo on Android.",
        category="bug"
    ).with_inputs("ticket_text"),
    dspy.Example(
        ticket_text="How do I export my data to CSV? I can't find the option anywhere.",
        category="how-to"
    ).with_inputs("ticket_text"),
    dspy.Example(
        ticket_text="I'd love to see dark mode added. The white background hurts my eyes.",
        category="feature-request"
    ).with_inputs("ticket_text"),
    dspy.Example(
        ticket_text="My account got locked after too many login attempts. Please help.",
        category="account"
    ).with_inputs("ticket_text"),
]
即使只有5个种子示例,也能比从零开始大幅提升生成质量。

Step 3: Generate in batches

步骤3:批量生成

Two patterns depending on your LM provider:
根据你的大语言模型提供商,有两种模式可选:

Pattern A:
n=N
batch generation

模式A:
n=N
批量生成

When your provider supports the
n
parameter (OpenAI does), this generates multiple completions in one call — faster and often more diverse:
python
generator = dspy.Predict(GenerateTicketExample, n=20)
response = generator(category="billing")
examples = [
    dspy.Example(ticket_text=t, category="billing").with_inputs("ticket_text")
    for t in response.completions.ticket_text
]
当你的提供商支持
n
参数时(如OpenAI),可以在一次调用中生成多个补全结果——速度更快,且通常多样性更高:
python
generator = dspy.Predict(GenerateTicketExample, n=20)
response = generator(category="billing")
examples = [
    dspy.Example(ticket_text=t, category="billing").with_inputs("ticket_text")
    for t in response.completions.ticket_text
]

Pattern B: Loop generation

模式B:循环生成

Works with any provider. More control over each example:
python
examples = []
categories = ["billing", "bug", "how-to", "feature-request", "account"]

for category in categories:
    generator = dspy.Predict(GenerateTicketExample)
    for i in range(40):
        result = generator(category=category)
        examples.append(
            dspy.Example(ticket_text=result.ticket_text, category=category)
            .with_inputs("ticket_text")
        )

print(f"Generated {len(examples)} examples")
The
n
parameter isn't supported by all providers — use the loop pattern as a reliable fallback.
适用于所有提供商,对每个示例有更多控制权:
python
examples = []
categories = ["billing", "bug", "how-to", "feature-request", "account"]

for category in categories:
    generator = dspy.Predict(GenerateTicketExample)
    for i in range(40):
        result = generator(category=category)
        examples.append(
            dspy.Example(ticket_text=result.ticket_text, category=category)
            .with_inputs("ticket_text")
        )

print(f"Generated {len(examples)} examples")
并非所有提供商都支持
n
参数——循环模式是可靠的备选方案。

Generation strategies

生成策略

Pick the strategy that fits your gap:
Category-driven — generate N per category (fixes imbalance):
python
for category in categories:
    for i in range(50):
        result = generator(category=category)
        examples.append(dspy.Example(ticket_text=result.ticket_text, category=category).with_inputs("ticket_text"))
Seed-and-vary — pass a seed example with a variation instruction:
python
class GenerateVariation(dspy.Signature):
    """Generate a variation of this support ticket with a different tone and phrasing."""
    original_ticket: str = dspy.InputField(desc="the original ticket to vary")
    variation_type: str = dspy.InputField(desc="how to vary it: tone, length, complexity, or language")
    ticket_text: str = dspy.OutputField(desc="a new ticket with the same meaning but different style")

vary = dspy.Predict(GenerateVariation)
for seed in seeds:
    for variation in ["angry tone", "very brief", "verbose and detailed", "non-native English"]:
        result = vary(original_ticket=seed.ticket_text, variation_type=variation)
        examples.append(dspy.Example(ticket_text=result.ticket_text, category=seed.category).with_inputs("ticket_text"))
Scenario-driven — specify edge case scenarios:
python
class GenerateScenarioTicket(dspy.Signature):
    """Generate a support ticket matching a specific scenario."""
    category: str = dspy.InputField()
    scenario: str = dspy.InputField(desc="the specific scenario to generate")
    ticket_text: str = dspy.OutputField()

gen = dspy.Predict(GenerateScenarioTicket)
scenarios = [
    ("billing", "customer charged in wrong currency"),
    ("billing", "refund for a cancelled subscription"),
    ("bug", "issue only happens on slow network connections"),
    ("bug", "multi-step reproduction involving two features"),
    ("how-to", "customer is non-technical and confused by jargon"),
]
for category, scenario in scenarios:
    result = gen(category=category, scenario=scenario)
    examples.append(dspy.Example(ticket_text=result.ticket_text, category=category).with_inputs("ticket_text"))
Difficulty-driven — generate easy, medium, hard examples separately:
python
class GenerateByDifficulty(dspy.Signature):
    """Generate a support ticket at a specific difficulty level for classification."""
    category: str = dspy.InputField()
    difficulty: str = dspy.InputField(desc="easy (clear-cut), medium (some ambiguity), or hard (could be multiple categories)")
    ticket_text: str = dspy.OutputField()

gen = dspy.Predict(GenerateByDifficulty)
for category in categories:
    for difficulty in ["easy", "medium", "hard"]:
        for i in range(15):
            result = gen(category=category, difficulty=difficulty)
            examples.append(dspy.Example(ticket_text=result.ticket_text, category=category).with_inputs("ticket_text"))
Diversity trick — add a random
sindex
field to push the LM toward varied outputs:
python
import random

class GenerateDiverse(dspy.Signature):
    """Generate a unique and realistic support ticket."""
    category: str = dspy.InputField()
    sindex: str = dspy.InputField(desc="a unique seed index for diversity")
    ticket_text: str = dspy.OutputField()

gen = dspy.Predict(GenerateDiverse)
for category in categories:
    for i in range(50):
        result = gen(category=category, sindex=str(random.randint(0, 1_000_000)))
        examples.append(dspy.Example(ticket_text=result.ticket_text, category=category).with_inputs("ticket_text"))
The random
sindex
prevents the LM from falling into repetitive patterns.
选择适合你需求缺口的策略:
类别驱动——为每个类别生成N个示例(解决类别不均衡问题):
python
for category in categories:
    for i in range(50):
        result = generator(category=category)
        examples.append(dspy.Example(ticket_text=result.ticket_text, category=category).with_inputs("ticket_text"))
种子变体生成——传入种子示例和变体指令:
python
class GenerateVariation(dspy.Signature):
    """Generate a variation of this support ticket with a different tone and phrasing."""
    original_ticket: str = dspy.InputField(desc="the original ticket to vary")
    variation_type: str = dspy.InputField(desc="how to vary it: tone, length, complexity, or language")
    ticket_text: str = dspy.OutputField(desc="a new ticket with the same meaning but different style")

vary = dspy.Predict(GenerateVariation)
for seed in seeds:
    for variation in ["angry tone", "very brief", "verbose and detailed", "non-native English"]:
        result = vary(original_ticket=seed.ticket_text, variation_type=variation)
        examples.append(dspy.Example(ticket_text=result.ticket_text, category=seed.category).with_inputs("ticket_text"))
场景驱动——指定边缘场景:
python
class GenerateScenarioTicket(dspy.Signature):
    """Generate a support ticket matching a specific scenario."""
    category: str = dspy.InputField()
    scenario: str = dspy.InputField(desc="the specific scenario to generate")
    ticket_text: str = dspy.OutputField()

gen = dspy.Predict(GenerateScenarioTicket)
scenarios = [
    ("billing", "customer charged in wrong currency"),
    ("billing", "refund for a cancelled subscription"),
    ("bug", "issue only happens on slow network connections"),
    ("bug", "multi-step reproduction involving two features"),
    ("how-to", "customer is non-technical and confused by jargon"),
]
for category, scenario in scenarios:
    result = gen(category=category, scenario=scenario)
    examples.append(dspy.Example(ticket_text=result.ticket_text, category=category).with_inputs("ticket_text"))
难度驱动——分别生成简单、中等、困难示例:
python
class GenerateByDifficulty(dspy.Signature):
    """Generate a support ticket at a specific difficulty level for classification."""
    category: str = dspy.InputField()
    difficulty: str = dspy.InputField(desc="easy (clear-cut), medium (some ambiguity), or hard (could be multiple categories)")
    ticket_text: str = dspy.OutputField()

gen = dspy.Predict(GenerateByDifficulty)
for category in categories:
    for difficulty in ["easy", "medium", "hard"]:
        for i in range(15):
            result = gen(category=category, difficulty=difficulty)
            examples.append(dspy.Example(ticket_text=result.ticket_text, category=category).with_inputs("ticket_text"))
多样性技巧——添加随机
sindex
字段,促使大语言模型生成更多样的输出:
python
import random

class GenerateDiverse(dspy.Signature):
    """Generate a unique and realistic support ticket."""
    category: str = dspy.InputField()
    sindex: str = dspy.InputField(desc="a unique seed index for diversity")
    ticket_text: str = dspy.OutputField()

gen = dspy.Predict(GenerateDiverse)
for category in categories:
    for i in range(50):
        result = gen(category=category, sindex=str(random.randint(0, 1_000_000)))
        examples.append(dspy.Example(ticket_text=result.ticket_text, category=category).with_inputs("ticket_text"))
随机
sindex
可以防止大语言模型陷入重复模式。

Step 4: Filter for quality

步骤4:质量过滤

Generated data always contains some bad examples. Filter aggressively — aim to generate 2-3x what you need and keep ~50%.
生成的数据中总会包含一些低质量示例。要严格过滤——目标是生成所需数量的2-3倍,最终保留约50%。

Simple: metric-based filtering

简单方式:基于指标的过滤

Run each generated example through your task program and check with your metric:
python
program = dspy.ChainOfThought(ClassifyTicket)
filtered = []

for ex in examples:
    pred = program(**ex.inputs())
    if metric(ex, pred):
        filtered.append(ex)

print(f"Kept {len(filtered)}/{len(examples)} ({100*len(filtered)//len(examples)}%)")
This works when your program is already decent — it filters out examples that are confusing or mislabeled.
将每个生成的示例传入你的任务程序,用指标进行校验:
python
program = dspy.ChainOfThought(ClassifyTicket)
filtered = []

for ex in examples:
    pred = program(**ex.inputs())
    if metric(ex, pred):
        filtered.append(ex)

print(f"Kept {len(filtered)}/{len(examples)} ({100*len(filtered)//len(examples)}%)")
当你的程序已经具备一定性能时,这种方法有效——它会过滤掉那些模糊或标注错误的示例。

Robust: LM-based assessment

可靠方式:基于大语言模型的评估

Use a separate assessment step to check realism and correctness:
python
class AssessExample(dspy.Signature):
    """Is this a realistic and correctly labeled example?"""
    ticket_text: str = dspy.InputField()
    category: str = dspy.InputField()
    is_realistic: bool = dspy.OutputField(desc="true if this looks like a real support ticket")
    is_correctly_labeled: bool = dspy.OutputField(desc="true if the category matches the ticket")

assessor = dspy.Predict(AssessExample)
filtered = []

for ex in examples:
    result = assessor(ticket_text=ex.ticket_text, category=ex.category)
    if result.is_realistic and result.is_correctly_labeled:
        filtered.append(ex)

print(f"Kept {len(filtered)}/{len(examples)} ({100*len(filtered)//len(examples)}%)")
使用单独的评估步骤检查示例的真实性和正确性:
python
class AssessExample(dspy.Signature):
    """Is this a realistic and correctly labeled example?"""
    ticket_text: str = dspy.InputField()
    category: str = dspy.InputField()
    is_realistic: bool = dspy.OutputField(desc="true if this looks like a real support ticket")
    is_correctly_labeled: bool = dspy.OutputField(desc="true if the category matches the ticket")

assessor = dspy.Predict(AssessExample)
filtered = []

for ex in examples:
    result = assessor(ticket_text=ex.ticket_text, category=ex.category)
    if result.is_realistic and result.is_correctly_labeled:
        filtered.append(ex)

print(f"Kept {len(filtered)}/{len(examples)} ({100*len(filtered)//len(examples)}%)")

Quality gates with
dspy.Suggest

使用
dspy.Suggest
实现质量关卡

For tighter integration, build quality checks into the generator itself. When a
Suggest
constraint fails, DSPy retries the generation:
python
class QualityGenerator(dspy.Module):
    def __init__(self):
        self.generate = dspy.ChainOfThought(GenerateTicketExample)
        self.assess = dspy.Predict(AssessExample)

    def forward(self, category):
        result = self.generate(category=category)
        assessment = self.assess(ticket_text=result.ticket_text, category=category)
        dspy.Suggest(assessment.is_realistic, "Generated ticket should be realistic")
        dspy.Suggest(assessment.is_correctly_labeled, "Category label should be correct")
        return result

generator = QualityGenerator()
为了更紧密的集成,可以在生成器中内置质量检查。当
Suggest
约束不满足时,DSPy会重新尝试生成:
python
class QualityGenerator(dspy.Module):
    def __init__(self):
        self.generate = dspy.ChainOfThought(GenerateTicketExample)
        self.assess = dspy.Predict(AssessExample)

    def forward(self, category):
        result = self.generate(category=category)
        assessment = self.assess(ticket_text=result.ticket_text, category=category)
        dspy.Suggest(assessment.is_realistic, "Generated ticket should be realistic")
        dspy.Suggest(assessment.is_correctly_labeled, "Category label should be correct")
        return result

generator = QualityGenerator()

DSPy retries generation when Suggest constraints fail

DSPy retries generation when Suggest constraints fail

undefined
undefined

Check for duplicates

移除重复示例

Remove near-duplicates to keep your dataset diverse:
python
seen = set()
unique = []
for ex in filtered:
    # Normalize and check
    key = ex.ticket_text.strip().lower()
    if key not in seen:
        seen.add(key)
        unique.append(ex)

print(f"Removed {len(filtered) - len(unique)} near-duplicates")
filtered = unique
移除近似重复的示例,保持数据集的多样性:
python
seen = set()
unique = []
for ex in filtered:
    # Normalize and check
    key = ex.ticket_text.strip().lower()
    if key not in seen:
        seen.add(key)
        unique.append(ex)

print(f"Removed {len(filtered) - len(unique)} near-duplicates")
filtered = unique

Step 5: Optimize the generator itself (advanced)

步骤5:优化生成器本身(进阶)

Research (arXiv 2406.11706) shows that optimizing the prompt used to generate data dramatically improves downstream quality. This is meta-optimization: optimizing the generator so it produces better training data.
python
class DataGenerator(dspy.Module):
    def __init__(self):
        self.generate = dspy.ChainOfThought(GenerateTicketExample)

    def forward(self, category):
        return self.generate(category=category)
研究(arXiv 2406.11706)表明,优化用于生成数据的提示词能显著提升下游任务的性能。这是元优化:优化生成器使其生成更好的训练数据。
python
class DataGenerator(dspy.Module):
    def __init__(self):
        self.generate = dspy.ChainOfThought(GenerateTicketExample)

    def forward(self, category):
        return self.generate(category=category)

Define a metric that measures generated data quality

Define a metric that measures generated data quality

def generator_metric(example, prediction, trace=None): # Check if a downstream classifier gets the right answer on this generated example classifier = dspy.Predict(ClassifyTicket) task_example = dspy.Example(ticket_text=prediction.ticket_text, category=example.category).with_inputs("ticket_text") task_pred = classifier(**task_example.inputs()) return task_pred.category.lower() == example.category.lower()
def generator_metric(example, prediction, trace=None): # Check if a downstream classifier gets the right answer on this generated example classifier = dspy.Predict(ClassifyTicket) task_example = dspy.Example(ticket_text=prediction.ticket_text, category=example.category).with_inputs("ticket_text") task_pred = classifier(**task_example.inputs()) return task_pred.category.lower() == example.category.lower()

Optimize the generator's prompts

Optimize the generator's prompts

optimizer = dspy.BootstrapFewShot(metric=generator_metric) optimized_generator = optimizer.compile(DataGenerator(), trainset=seeds)
optimizer = dspy.BootstrapFewShot(metric=generator_metric) optimized_generator = optimizer.compile(DataGenerator(), trainset=seeds)

Now generate with the optimized generator

Now generate with the optimized generator

better_examples = [] for category in categories: for i in range(50): result = optimized_generator(category=category) better_examples.append( dspy.Example(ticket_text=result.ticket_text, category=category).with_inputs("ticket_text") )

This closes the loop: better generator prompts produce better data, which produces better task programs.
better_examples = [] for category in categories: for i in range(50): result = optimized_generator(category=category) better_examples.append( dspy.Example(ticket_text=result.ticket_text, category=category).with_inputs("ticket_text") )

这形成了闭环:更好的生成器提示词生成更好的数据,进而训练出更好的任务程序。

Step 6: Use generated data for optimization

步骤6:使用生成的数据进行优化

Full pipeline: generate, filter, split, optimize, evaluate.
python
import random
from dspy.evaluate import Evaluate
完整流程:生成、过滤、拆分、优化、评估。
python
import random
from dspy.evaluate import Evaluate

Shuffle and split

Shuffle and split

random.shuffle(filtered) split = int(len(filtered) * 0.8) trainset = filtered[:split] devset = filtered[split:]
print(f"Train: {len(trainset)}, Dev: {len(devset)}")
random.shuffle(filtered) split = int(len(filtered) * 0.8) trainset = filtered[:split] devset = filtered[split:]
print(f"Train: {len(trainset)}, Dev: {len(devset)}")

Configure your task LM (can be cheaper than the generator LM)

Configure your task LM (can be cheaper than the generator LM)

lm = dspy.LM("openai/gpt-4o-mini") dspy.configure(lm=lm)
lm = dspy.LM("openai/gpt-4o-mini") dspy.configure(lm=lm)

Build and optimize your task program

Build and optimize your task program

program = dspy.ChainOfThought(ClassifyTicket)
optimizer = dspy.MIPROv2(metric=metric, auto="medium") optimized = optimizer.compile(program, trainset=trainset)
program = dspy.ChainOfThought(ClassifyTicket)
optimizer = dspy.MIPROv2(metric=metric, auto="medium") optimized = optimizer.compile(program, trainset=trainset)

Evaluate

Evaluate

evaluator = Evaluate(devset=devset, metric=metric, num_threads=4, display_progress=True) score = evaluator(optimized) print(f"Score on synthetic dev set: {score:.1f}%")
evaluator = Evaluate(devset=devset, metric=metric, num_threads=4, display_progress=True) score = evaluator(optimized) print(f"Score on synthetic dev set: {score:.1f}%")

Save

Save

optimized.save("optimized_program.json")

If you have even a small number of real examples, use them as the dev set instead — real data gives more trustworthy evaluation.
optimized.save("optimized_program.json")

如果你有少量真实示例,可以用它们作为开发集——真实数据的评估结果更可信。

Common scenarios

常见场景

Cold start — zero real data. Write 5-10 seeds. Generate 200+ synthetic examples across all categories. Filter and optimize. See examples.md for a full walkthrough.
Edge case gaps — your AI works at 85% but fails on specific scenarios. Run error analysis, identify the failure patterns, then use scenario-driven generation targeting those gaps. Re-optimize with the augmented dataset.
Privacy/compliance — can't use real customer data. Generate synthetic examples with realistic patterns but no PII. Validate with domain-specific assessments. The
dspy.Suggest
quality gate pattern ensures generated data meets your standards.
New categories — added a category with no examples. Use category-driven generation to produce 50+ examples for the new category, then retrain.
Rebalancing — some categories have 500 examples, others have 10. Generate more for underrepresented categories until all are roughly balanced.
Schema changed — your input/output format changed. Generate new examples matching the new schema rather than manually converting old data.
冷启动——没有真实数据。编写5-10个种子示例,为所有类别生成200+合成示例,过滤后进行优化。详见examples.md中的完整流程。
边缘场景缺口——你的AI准确率为85%,但在特定场景下失效。运行错误分析,识别失败模式,然后使用场景驱动生成填补这些缺口,再用扩充后的数据集重新优化。
隐私/合规要求——无法使用真实客户数据。生成具有真实模式但不包含个人身份信息(PII)的合成示例。通过领域特定评估验证数据质量。
dspy.Suggest
质量关卡模式可确保生成的数据符合你的标准。
新增类别——添加了新类别,没有对应的示例。使用类别驱动生成,为新类别生成50+示例,然后重新训练。
类别再平衡——部分类别有500个示例,其他类别只有10个。为代表性不足的类别生成更多示例,直到所有类别数量大致平衡。
数据结构变更——你的输入/输出格式发生变化。生成符合新数据结构的新示例,而非手动转换旧数据。

Tips and pitfalls

技巧与注意事项

  • Always validate generated data — LMs produce plausible but wrong labels. Filter aggressively.
  • Mix synthetic with real data when available — even 20 real examples mixed in improve quality significantly.
  • Use a stronger model to generate, cheaper model for your task — e.g., generate with GPT-4o, run your task on GPT-4o-mini.
  • Generate more than you need — aim for 2-3x your target, keep ~50% after filtering.
  • Check for duplicates — LMs tend to repeat themselves, especially without the diversity trick.
  • Iterate — generate, optimize, evaluate, identify gaps, generate more for gaps.
  • Don't trust synthetic eval scores blindly — if possible, validate final quality on real data.
  • The
    n
    parameter for batch generation isn't supported by all providers
    — use the loop pattern as a reliable fallback.
  • 始终验证生成的数据——大语言模型会生成看似合理但标注错误的示例。要严格过滤。
  • 如果有真实数据,将合成数据与真实数据混合——即使只混合20个真实示例,也能显著提升质量。
  • 用更强的模型生成数据,用更便宜的模型执行任务——例如,用GPT-4o生成数据,用GPT-4o-mini执行任务。
  • 生成比实际需要更多的数据——目标是生成所需数量的2-3倍,过滤后保留约50%。
  • 检查重复示例——大语言模型容易重复生成,尤其是不使用多样性技巧时。
  • 迭代优化——生成、优化、评估、识别缺口、针对缺口生成更多数据。
  • 不要盲目相信合成数据的评估分数——如果可能,用真实数据验证最终性能。
  • 并非所有提供商都支持批量生成的
    n
    参数
    ——循环模式是可靠的备选方案。

Additional resources

额外资源

  • For end-to-end worked examples (cold start, edge cases, privacy), see examples.md
  • Use
    /ai-improving-accuracy
    to measure and improve your optimized program
  • Use
    /ai-fine-tuning
    once you have enough generated data for weight optimization
  • Use
    /ai-kickoff
    to scaffold a project, then fill data with this skill
  • 有关端到端的完整示例(冷启动、边缘场景、隐私要求),请查看examples.md
  • 使用
    /ai-improving-accuracy
    测量并提升优化后的程序性能
  • 当你有足够的生成数据后,使用
    /ai-fine-tuning
    进行权重优化
  • 使用
    /ai-kickoff
    搭建项目框架,然后用本技能填充数据