dspy-bootstrap-fewshot

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

DSPy Bootstrap Few-Shot Optimizer

DSPy Bootstrap Few-Shot Optimizer

Goal

目标

Automatically generate and select optimal few-shot demonstrations for your DSPy program using a teacher model.
使用教师模型为你的DSPy程序自动生成并选择最优的少样本演示样本。

When to Use

适用场景

  • You have 10-50 labeled examples
  • Manual example selection is tedious or suboptimal
  • You want demonstrations with reasoning traces
  • Quick optimization without extensive compute
  • 你拥有10-50个带标签的示例
  • 手动选择示例繁琐或效果不佳
  • 你需要带有推理过程的演示样本
  • 无需大量计算即可快速优化

Related Skills

相关Skill

  • For more data (200+ examples): dspy-miprov2-optimizer
  • For agentic systems: dspy-gepa-reflective
  • Measure improvements: dspy-evaluation-suite
  • 针对更多数据(200+示例):dspy-miprov2-optimizer
  • 针对智能体系统:dspy-gepa-reflective
  • 衡量优化效果:dspy-evaluation-suite

Inputs

输入参数

InputTypeDescription
program
dspy.Module
Your DSPy program to optimize
trainset
list[dspy.Example]
Training examples
metric
callable
Evaluation function
metric_threshold
float
Numerical threshold for accepting demos (optional)
max_bootstrapped_demos
int
Max teacher-generated demos (default: 4)
max_labeled_demos
int
Max direct labeled demos (default: 16)
max_rounds
int
Max bootstrapping attempts per example (default: 1)
teacher_settings
dict
Configuration for teacher model (optional)
输入项类型说明
program
dspy.Module
待优化的DSPy程序
trainset
list[dspy.Example]
训练示例集
metric
callable
评估函数
metric_threshold
float
接受演示样本的数值阈值(可选)
max_bootstrapped_demos
int
教师模型生成的最大演示样本数(默认:4)
max_labeled_demos
int
直接使用的带标签最大演示样本数(默认:16)
max_rounds
int
每个示例的最大引导尝试次数(默认:1)
teacher_settings
dict
教师模型的配置参数(可选)

Outputs

输出结果

OutputTypeDescription
compiled_program
dspy.Module
Optimized program with demos
输出项类型说明
compiled_program
dspy.Module
集成演示样本的优化后程序

Workflow

工作流程

Phase 1: Setup

阶段1:环境搭建

python
import dspy
from dspy.teleprompt import BootstrapFewShot
python
import dspy
from dspy.teleprompt import BootstrapFewShot

Configure LMs

配置语言模型

dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))
undefined
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))
undefined

Phase 2: Define Program and Metric

阶段2:定义程序与评估指标

python
class QA(dspy.Module):
    def __init__(self):
        self.generate = dspy.ChainOfThought("question -> answer")
    
    def forward(self, question):
        return self.generate(question=question)

def validate_answer(example, pred, trace=None):
    return example.answer.lower() in pred.answer.lower()
python
class QA(dspy.Module):
    def __init__(self):
        self.generate = dspy.ChainOfThought("question -> answer")
    
    def forward(self, question):
        return self.generate(question=question)

def validate_answer(example, pred, trace=None):
    return example.answer.lower() in pred.answer.lower()

Phase 3: Compile

阶段3:编译优化

python
optimizer = BootstrapFewShot(
    metric=validate_answer,
    max_bootstrapped_demos=4,
    max_labeled_demos=4,
    teacher_settings={'lm': dspy.LM("openai/gpt-4o")}
)

compiled_qa = optimizer.compile(QA(), trainset=trainset)
python
optimizer = BootstrapFewShot(
    metric=validate_answer,
    max_bootstrapped_demos=4,
    max_labeled_demos=4,
    teacher_settings={'lm': dspy.LM("openai/gpt-4o")}
)

compiled_qa = optimizer.compile(QA(), trainset=trainset)

Phase 4: Use and Save

阶段4:使用与保存

python
undefined
python
undefined

Use optimized program

使用优化后的程序

result = compiled_qa(question="What is photosynthesis?")
result = compiled_qa(question="What is photosynthesis?")

Save for production (state-only, recommended)

生产环境保存(仅保存状态,推荐)

compiled_qa.save("qa_optimized.json", save_program=False)
undefined
compiled_qa.save("qa_optimized.json", save_program=False)
undefined

Production Example

生产环境示例

python
import dspy
from dspy.teleprompt import BootstrapFewShot
from dspy.evaluate import Evaluate
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class ProductionQA(dspy.Module):
    def __init__(self):
        self.cot = dspy.ChainOfThought("question -> answer")
    
    def forward(self, question: str):
        try:
            return self.cot(question=question)
        except Exception as e:
            logger.error(f"Generation failed: {e}")
            return dspy.Prediction(answer="Unable to answer")

def robust_metric(example, pred, trace=None):
    if not pred.answer or pred.answer == "Unable to answer":
        return 0.0
    return float(example.answer.lower() in pred.answer.lower())

def optimize_with_bootstrap(trainset, devset):
    """Full optimization pipeline with validation."""
    
    # Baseline
    baseline = ProductionQA()
    evaluator = Evaluate(devset=devset, metric=robust_metric, num_threads=4)
    baseline_score = evaluator(baseline)
    logger.info(f"Baseline: {baseline_score:.2%}")
    
    # Optimize
    optimizer = BootstrapFewShot(
        metric=robust_metric,
        max_bootstrapped_demos=4,
        max_labeled_demos=4
    )
    
    compiled = optimizer.compile(baseline, trainset=trainset)
    optimized_score = evaluator(compiled)
    logger.info(f"Optimized: {optimized_score:.2%}")
    
    if optimized_score > baseline_score:
        compiled.save("production_qa.json", save_program=False)
        return compiled
    
    logger.warning("Optimization didn't improve; keeping baseline")
    return baseline
python
import dspy
from dspy.teleprompt import BootstrapFewShot
from dspy.evaluate import Evaluate
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class ProductionQA(dspy.Module):
    def __init__(self):
        self.cot = dspy.ChainOfThought("question -> answer")
    
    def forward(self, question: str):
        try:
            return self.cot(question=question)
        except Exception as e:
            logger.error(f"Generation failed: {e}")
            return dspy.Prediction(answer="Unable to answer")

def robust_metric(example, pred, trace=None):
    if not pred.answer or pred.answer == "Unable to answer":
        return 0.0
    return float(example.answer.lower() in pred.answer.lower())

def optimize_with_bootstrap(trainset, devset):
    """包含验证环节的完整优化流程。"""
    
    # 基线模型
    baseline = ProductionQA()
    evaluator = Evaluate(devset=devset, metric=robust_metric, num_threads=4)
    baseline_score = evaluator(baseline)
    logger.info(f"Baseline: {baseline_score:.2%}")
    
    # 优化模型
    optimizer = BootstrapFewShot(
        metric=robust_metric,
        max_bootstrapped_demos=4,
        max_labeled_demos=4
    )
    
    compiled = optimizer.compile(baseline, trainset=trainset)
    optimized_score = evaluator(compiled)
    logger.info(f"Optimized: {optimized_score:.2%}")
    
    if optimized_score > baseline_score:
        compiled.save("production_qa.json", save_program=False)
        return compiled
    
    logger.warning("Optimization didn't improve; keeping baseline")
    return baseline

Best Practices

最佳实践

  1. Quality over quantity - 10 excellent examples beat 100 noisy ones
  2. Use stronger teacher - GPT-4 as teacher for GPT-3.5 student
  3. Validate with held-out set - Always test on unseen data
  4. Start with 4 demos - More isn't always better
  1. 质量优先于数量 - 10个优质示例胜过100个噪声示例
  2. 使用性能更强的教师模型 - 以GPT-4作为教师模型,GPT-3.5作为学生模型
  3. 用预留数据集验证 - 始终在未见过的数据上测试效果
  4. 从4个演示样本开始 - 并非样本越多效果越好

Limitations

局限性

  • Requires labeled training data
  • Teacher model costs can add up
  • May not generalize to very different inputs
  • Limited exploration compared to MIPROv2
  • 需要带标签的训练数据
  • 教师模型的使用成本可能累积
  • 可能无法泛化到差异极大的输入场景
  • 与MIPROv2相比,探索能力有限

Official Documentation

官方文档