dspy-bootstrap-fewshot
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDSPy Bootstrap Few-Shot Optimizer
DSPy Bootstrap Few-Shot Optimizer
Goal
目标
Automatically generate and select optimal few-shot demonstrations for your DSPy program using a teacher model.
使用教师模型为你的DSPy程序自动生成并选择最优的少样本演示样本。
When to Use
适用场景
- You have 10-50 labeled examples
- Manual example selection is tedious or suboptimal
- You want demonstrations with reasoning traces
- Quick optimization without extensive compute
- 你拥有10-50个带标签的示例
- 手动选择示例繁琐或效果不佳
- 你需要带有推理过程的演示样本
- 无需大量计算即可快速优化
Related Skills
相关Skill
- For more data (200+ examples): dspy-miprov2-optimizer
- For agentic systems: dspy-gepa-reflective
- Measure improvements: dspy-evaluation-suite
- 针对更多数据(200+示例):dspy-miprov2-optimizer
- 针对智能体系统:dspy-gepa-reflective
- 衡量优化效果:dspy-evaluation-suite
Inputs
输入参数
| Input | Type | Description |
|---|---|---|
| | Your DSPy program to optimize |
| | Training examples |
| | Evaluation function |
| | Numerical threshold for accepting demos (optional) |
| | Max teacher-generated demos (default: 4) |
| | Max direct labeled demos (default: 16) |
| | Max bootstrapping attempts per example (default: 1) |
| | Configuration for teacher model (optional) |
| 输入项 | 类型 | 说明 |
|---|---|---|
| | 待优化的DSPy程序 |
| | 训练示例集 |
| | 评估函数 |
| | 接受演示样本的数值阈值(可选) |
| | 教师模型生成的最大演示样本数(默认:4) |
| | 直接使用的带标签最大演示样本数(默认:16) |
| | 每个示例的最大引导尝试次数(默认:1) |
| | 教师模型的配置参数(可选) |
Outputs
输出结果
| Output | Type | Description |
|---|---|---|
| | Optimized program with demos |
| 输出项 | 类型 | 说明 |
|---|---|---|
| | 集成演示样本的优化后程序 |
Workflow
工作流程
Phase 1: Setup
阶段1:环境搭建
python
import dspy
from dspy.teleprompt import BootstrapFewShotpython
import dspy
from dspy.teleprompt import BootstrapFewShotConfigure LMs
配置语言模型
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))
undefineddspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))
undefinedPhase 2: Define Program and Metric
阶段2:定义程序与评估指标
python
class QA(dspy.Module):
def __init__(self):
self.generate = dspy.ChainOfThought("question -> answer")
def forward(self, question):
return self.generate(question=question)
def validate_answer(example, pred, trace=None):
return example.answer.lower() in pred.answer.lower()python
class QA(dspy.Module):
def __init__(self):
self.generate = dspy.ChainOfThought("question -> answer")
def forward(self, question):
return self.generate(question=question)
def validate_answer(example, pred, trace=None):
return example.answer.lower() in pred.answer.lower()Phase 3: Compile
阶段3:编译优化
python
optimizer = BootstrapFewShot(
metric=validate_answer,
max_bootstrapped_demos=4,
max_labeled_demos=4,
teacher_settings={'lm': dspy.LM("openai/gpt-4o")}
)
compiled_qa = optimizer.compile(QA(), trainset=trainset)python
optimizer = BootstrapFewShot(
metric=validate_answer,
max_bootstrapped_demos=4,
max_labeled_demos=4,
teacher_settings={'lm': dspy.LM("openai/gpt-4o")}
)
compiled_qa = optimizer.compile(QA(), trainset=trainset)Phase 4: Use and Save
阶段4:使用与保存
python
undefinedpython
undefinedUse optimized program
使用优化后的程序
result = compiled_qa(question="What is photosynthesis?")
result = compiled_qa(question="What is photosynthesis?")
Save for production (state-only, recommended)
生产环境保存(仅保存状态,推荐)
compiled_qa.save("qa_optimized.json", save_program=False)
undefinedcompiled_qa.save("qa_optimized.json", save_program=False)
undefinedProduction Example
生产环境示例
python
import dspy
from dspy.teleprompt import BootstrapFewShot
from dspy.evaluate import Evaluate
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
class ProductionQA(dspy.Module):
def __init__(self):
self.cot = dspy.ChainOfThought("question -> answer")
def forward(self, question: str):
try:
return self.cot(question=question)
except Exception as e:
logger.error(f"Generation failed: {e}")
return dspy.Prediction(answer="Unable to answer")
def robust_metric(example, pred, trace=None):
if not pred.answer or pred.answer == "Unable to answer":
return 0.0
return float(example.answer.lower() in pred.answer.lower())
def optimize_with_bootstrap(trainset, devset):
"""Full optimization pipeline with validation."""
# Baseline
baseline = ProductionQA()
evaluator = Evaluate(devset=devset, metric=robust_metric, num_threads=4)
baseline_score = evaluator(baseline)
logger.info(f"Baseline: {baseline_score:.2%}")
# Optimize
optimizer = BootstrapFewShot(
metric=robust_metric,
max_bootstrapped_demos=4,
max_labeled_demos=4
)
compiled = optimizer.compile(baseline, trainset=trainset)
optimized_score = evaluator(compiled)
logger.info(f"Optimized: {optimized_score:.2%}")
if optimized_score > baseline_score:
compiled.save("production_qa.json", save_program=False)
return compiled
logger.warning("Optimization didn't improve; keeping baseline")
return baselinepython
import dspy
from dspy.teleprompt import BootstrapFewShot
from dspy.evaluate import Evaluate
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
class ProductionQA(dspy.Module):
def __init__(self):
self.cot = dspy.ChainOfThought("question -> answer")
def forward(self, question: str):
try:
return self.cot(question=question)
except Exception as e:
logger.error(f"Generation failed: {e}")
return dspy.Prediction(answer="Unable to answer")
def robust_metric(example, pred, trace=None):
if not pred.answer or pred.answer == "Unable to answer":
return 0.0
return float(example.answer.lower() in pred.answer.lower())
def optimize_with_bootstrap(trainset, devset):
"""包含验证环节的完整优化流程。"""
# 基线模型
baseline = ProductionQA()
evaluator = Evaluate(devset=devset, metric=robust_metric, num_threads=4)
baseline_score = evaluator(baseline)
logger.info(f"Baseline: {baseline_score:.2%}")
# 优化模型
optimizer = BootstrapFewShot(
metric=robust_metric,
max_bootstrapped_demos=4,
max_labeled_demos=4
)
compiled = optimizer.compile(baseline, trainset=trainset)
optimized_score = evaluator(compiled)
logger.info(f"Optimized: {optimized_score:.2%}")
if optimized_score > baseline_score:
compiled.save("production_qa.json", save_program=False)
return compiled
logger.warning("Optimization didn't improve; keeping baseline")
return baselineBest Practices
最佳实践
- Quality over quantity - 10 excellent examples beat 100 noisy ones
- Use stronger teacher - GPT-4 as teacher for GPT-3.5 student
- Validate with held-out set - Always test on unseen data
- Start with 4 demos - More isn't always better
- 质量优先于数量 - 10个优质示例胜过100个噪声示例
- 使用性能更强的教师模型 - 以GPT-4作为教师模型,GPT-3.5作为学生模型
- 用预留数据集验证 - 始终在未见过的数据上测试效果
- 从4个演示样本开始 - 并非样本越多效果越好
Limitations
局限性
- Requires labeled training data
- Teacher model costs can add up
- May not generalize to very different inputs
- Limited exploration compared to MIPROv2
- 需要带标签的训练数据
- 教师模型的使用成本可能累积
- 可能无法泛化到差异极大的输入场景
- 与MIPROv2相比,探索能力有限
Official Documentation
官方文档
- DSPy Documentation: https://dspy.ai/
- DSPy GitHub: https://github.com/stanfordnlp/dspy
- BootstrapFewShot API: https://dspy.ai/api/optimizers/BootstrapFewShot/
- Optimization Guide: https://dspy.ai/learn/optimization/optimizers/
- DSPy 官方文档: https://dspy.ai/
- DSPy GitHub 仓库: https://github.com/stanfordnlp/dspy
- BootstrapFewShot API 文档: https://dspy.ai/api/optimizers/BootstrapFewShot/
- 优化指南: https://dspy.ai/learn/optimization/optimizers/