dspy-simba-optimizer
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDSPy SIMBA Optimizer
DSPy SIMBA 优化器
Goal
目标
Optimize DSPy programs using mini-batch Bayesian optimization with statistical analysis of feedback signals.
通过对反馈信号的统计分析,采用小批量贝叶斯优化来优化DSPy程序。
When to Use
适用场景
- Need lighter-weight alternative to GEPA
- Have custom feedback metrics (not just accuracy)
- Agentic tasks with rich failure signals
- Budget-conscious optimization (fewer eval calls)
- Programs where few-shot examples aren't critical
- 需要GEPA的轻量级替代方案
- 拥有自定义反馈指标(不仅限于准确率)
- 带有丰富失败信号的智能体任务
- 注重成本控制的优化场景(需要更少的评估调用)
- 少样本示例并非关键的程序
Related Skills
相关技能
- Alternative optimizers: dspy-miprov2-optimizer, dspy-gepa-reflective
- Agent optimization: dspy-react-agent-builder
- Evaluation: dspy-evaluation-suite
- 替代优化器:dspy-miprov2-optimizer, dspy-gepa-reflective
- 智能体优化:dspy-react-agent-builder
- 评估工具:dspy-evaluation-suite
Inputs
输入参数
| Input | Type | Description |
|---|---|---|
| | Program to optimize |
| | Training examples |
| | Returns float or |
| | Number of optimization steps |
| | Mini-batch size |
| 输入 | 类型 | 描述 |
|---|---|---|
| | 待优化的程序 |
| | 训练样本集 |
| | 返回浮点数或 |
| | 优化步数 |
| | 小批量大小 |
Outputs
输出结果
| Output | Type | Description |
|---|---|---|
| | SIMBA-optimized program |
| 输出 | 类型 | 描述 |
|---|---|---|
| | 经SIMBA优化后的程序 |
Workflow
工作流程
Phase 1: Understand SIMBA
阶段1:了解SIMBA
SIMBA (Stochastic Introspective Mini-Batch Ascent):
- Iterative prompt optimization with mini-batch sampling
- Identifies challenging examples with high output variability
- Generates self-reflective rules or adds successful demonstrations
- Lighter than GEPA (no reflection LM)
- More flexible than Bootstrap (uses feedback)
Comparison:
- MIPROv2: Best accuracy, lots of data
- GEPA: Agentic systems, expensive
- SIMBA: Custom feedback, budget-friendly
- Bootstrap: Simplest, demo-based
SIMBA(Stochastic Introspective Mini-Batch Ascent,随机自省小批量上升法):
- 采用小批量采样的迭代式提示优化
- 识别具有高输出变异性的挑战性示例
- 生成自反思规则或添加成功演示案例
- 比GEPA更轻量(无需反思语言模型)
- 比Bootstrap更灵活(使用反馈信号)
对比:
- MIPROv2:精度最佳,需要大量数据
- GEPA:适用于智能体系统,成本较高
- SIMBA:支持自定义反馈,性价比高
- Bootstrap:最简单,基于演示案例
Phase 2: Basic SIMBA Optimization
阶段2:基础SIMBA优化
python
import dspy
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))python
import dspy
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))Program to optimize
Program to optimize
class QAPipeline(dspy.Module):
def init(self):
self.generate = dspy.ChainOfThought("question -> answer")
def forward(self, question):
return self.generate(question=question)class QAPipeline(dspy.Module):
def init(self):
self.generate = dspy.ChainOfThought("question -> answer")
def forward(self, question):
return self.generate(question=question)Metric (can return just score or (score, feedback))
Metric (can return just score or (score, feedback))
def qa_metric(example, pred, trace=None):
correct = example.answer.lower() in pred.answer.lower()
return 1.0 if correct else 0.0
def qa_metric(example, pred, trace=None):
correct = example.answer.lower() in pred.answer.lower()
return 1.0 if correct else 0.0
SIMBA optimizer
SIMBA optimizer
optimizer = dspy.SIMBA(
metric=qa_metric,
max_steps=10, # Optimization iterations
bsize=5 # Mini-batch size
)
program = QAPipeline()
compiled = optimizer.compile(program, trainset=trainset)
compiled.save("qa_simba.json")
undefinedoptimizer = dspy.SIMBA(
metric=qa_metric,
max_steps=10, # Optimization iterations
bsize=5 # Mini-batch size
)
program = QAPipeline()
compiled = optimizer.compile(program, trainset=trainset)
compiled.save("qa_simba.json")
undefinedPhase 3: SIMBA with Feedback Signals
阶段3:带反馈信号的SIMBA优化
SIMBA works best with rich feedback:
python
import dspy
def detailed_metric(example, pred, trace=None):
"""Metric with feedback signal."""
expected = example.answer.lower()
actual = pred.answer.lower()
if expected == actual:
return dspy.Prediction(score=1.0, feedback="Perfect match")
elif expected in actual:
return dspy.Prediction(score=0.7, feedback=f"Contains answer but verbose: '{actual}'")
else:
overlap = len(set(expected.split()) & set(actual.split()))
if overlap > 0:
return dspy.Prediction(score=0.3, feedback=f"Partial overlap: {overlap} words")
return dspy.Prediction(score=0.0, feedback=f"No match. Expected '{expected}'")
optimizer = dspy.SIMBA(
metric=detailed_metric,
max_steps=20, # Optimization iterations
bsize=8 # Mini-batch size
)
compiled = optimizer.compile(program, trainset=trainset)SIMBA在有丰富反馈信号时表现最佳:
python
import dspy
def detailed_metric(example, pred, trace=None):
"""Metric with feedback signal."""
expected = example.answer.lower()
actual = pred.answer.lower()
if expected == actual:
return dspy.Prediction(score=1.0, feedback="Perfect match")
elif expected in actual:
return dspy.Prediction(score=0.7, feedback=f"Contains answer but verbose: '{actual}'")
else:
overlap = len(set(expected.split()) & set(actual.split()))
if overlap > 0:
return dspy.Prediction(score=0.3, feedback=f"Partial overlap: {overlap} words")
return dspy.Prediction(score=0.0, feedback=f"No match. Expected '{expected}'")
optimizer = dspy.SIMBA(
metric=detailed_metric,
max_steps=20, # Optimization iterations
bsize=8 # Mini-batch size
)
compiled = optimizer.compile(program, trainset=trainset)Phase 4: Production Agent Optimization
阶段4:生产级智能体优化
python
import dspy
from dspy.evaluate import Evaluate
import logging
logger = logging.getLogger(__name__)python
import dspy
from dspy.evaluate import Evaluate
import logging
logger = logging.getLogger(__name__)Define tools as functions
Define tools as functions
def search(query: str) -> str:
"""Search knowledge base for relevant information."""
retriever = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')
results = retriever(query, k=3)
return "\n".join([r['text'] for r in results])
def calculate(expr: str) -> str:
"""Evaluate Python expressions safely."""
try:
with dspy.PythonInterpreter() as interp:
return str(interp.execute(expr))
except Exception as e:
return f"Error: {e}"
class ResearchAgent(dspy.Module):
def init(self):
self.agent = dspy.ReAct(
"question -> answer",
tools=[search, calculate]
)
def forward(self, question):
return self.agent(question=question)def agent_metric(example, pred, trace=None):
"""Rich metric for agent optimization."""
expected = example.answer.lower().strip()
actual = pred.answer.lower().strip() if pred.answer else ""
# Exact match
if expected == actual:
return dspy.Prediction(score=1.0, feedback="Correct answer")
# Partial match
if expected in actual:
return dspy.Prediction(score=0.7, feedback="Answer contains expected result")
# Check key terms
expected_terms = set(expected.split())
actual_terms = set(actual.split())
overlap = len(expected_terms & actual_terms)
if overlap >= len(expected_terms) * 0.5:
return dspy.Prediction(score=0.5, feedback=f"50%+ term overlap")
return dspy.Prediction(score=0.0, feedback=f"Incorrect: expected '{example.answer}'")def optimize_agent(trainset, devset):
"""Full SIMBA optimization pipeline."""
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))
agent = ResearchAgent()
# Baseline evaluation
eval_metric = lambda ex, pred, trace: agent_metric(ex, pred, trace).score
evaluator = dspy.Evaluate(devset=devset, metric=eval_metric, num_threads=4)
baseline = evaluator(agent)
logger.info(f"Baseline: {baseline:.2%}")
# SIMBA optimization
optimizer = dspy.SIMBA(
metric=agent_metric,
max_steps=25, # Optimization iterations
bsize=6 # Mini-batch size
)
compiled = optimizer.compile(agent, trainset=trainset)
# Evaluate optimized
optimized = evaluator(compiled)
logger.info(f"SIMBA optimized: {optimized:.2%}")
compiled.save("research_agent_simba.json")
return compiledundefineddef search(query: str) -> str:
"""Search knowledge base for relevant information."""
retriever = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')
results = retriever(query, k=3)
return "\n".join([r['text'] for r in results])
def calculate(expr: str) -> str:
"""Evaluate Python expressions safely."""
try:
with dspy.PythonInterpreter() as interp:
return str(interp.execute(expr))
except Exception as e:
return f"Error: {e}"
class ResearchAgent(dspy.Module):
def init(self):
self.agent = dspy.ReAct(
"question -> answer",
tools=[search, calculate]
)
def forward(self, question):
return self.agent(question=question)def agent_metric(example, pred, trace=None):
"""Rich metric for agent optimization."""
expected = example.answer.lower().strip()
actual = pred.answer.lower().strip() if pred.answer else ""
# Exact match
if expected == actual:
return dspy.Prediction(score=1.0, feedback="Correct answer")
# Partial match
if expected in actual:
return dspy.Prediction(score=0.7, feedback="Answer contains expected result")
# Check key terms
expected_terms = set(expected.split())
actual_terms = set(actual.split())
overlap = len(expected_terms & actual_terms)
if overlap >= len(expected_terms) * 0.5:
return dspy.Prediction(score=0.5, feedback=f"50%+ term overlap")
return dspy.Prediction(score=0.0, feedback=f"Incorrect: expected '{example.answer}'")def optimize_agent(trainset, devset):
"""Full SIMBA optimization pipeline."""
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))
agent = ResearchAgent()
# Baseline evaluation
eval_metric = lambda ex, pred, trace: agent_metric(ex, pred, trace).score
evaluator = dspy.Evaluate(devset=devset, metric=eval_metric, num_threads=4)
baseline = evaluator(agent)
logger.info(f"Baseline: {baseline:.2%}")
# SIMBA optimization
optimizer = dspy.SIMBA(
metric=agent_metric,
max_steps=25, # Optimization iterations
bsize=6 # Mini-batch size
)
compiled = optimizer.compile(agent, trainset=trainset)
# Evaluate optimized
optimized = evaluator(compiled)
logger.info(f"SIMBA optimized: {optimized:.2%}")
compiled.save("research_agent_simba.json")
return compiledundefinedConfiguration
配置选项
python
optimizer = dspy.SIMBA(
metric=metric_fn,
max_steps=20, # Optimization iterations
bsize=32, # Mini-batch size (default: 32)
num_candidates=6, # Candidates per iteration (default: 6)
max_demos=4, # Max demos per predictor (default: 4)
temperature_for_sampling=0.2, # Sampling temperature (default: 0.2)
temperature_for_candidates=0.2 # Candidate selection temperature (default: 0.2)
)python
optimizer = dspy.SIMBA(
metric=metric_fn,
max_steps=20, # Optimization iterations
bsize=32, # Mini-batch size (default: 32)
num_candidates=6, # Candidates per iteration (default: 6)
max_demos=4, # Max demos per predictor (default: 4)
temperature_for_sampling=0.2, # Sampling temperature (default: 0.2)
temperature_for_candidates=0.2 # Candidate selection temperature (default: 0.2)
)Best Practices
最佳实践
- Use feedback signals - SIMBA benefits from objects
dspy.Prediction(score=..., feedback=...) - Balance parameters - Adjust (default 32) and
bsize(default 8) based on dataset sizemax_steps - Patience - SIMBA is slower than Bootstrap, faster than GEPA
- Custom metrics - Best for scenarios with nuanced scoring (not binary)
- Tune temperatures - Lower temperatures (0.1-0.3) for exploitation, higher (0.5-1.0) for exploration
- 使用反馈信号 - SIMBA能从对象中获得显著收益
dspy.Prediction(score=..., feedback=...) - 平衡参数 - 根据数据集大小调整(默认32)和
bsize(默认8)max_steps - 耐心等待 - SIMBA比Bootstrap慢,但比GEPA快
- 自定义指标 - 最适合需要细致评分的场景(非二元判断)
- 调整温度参数 - 较低温度(0.1-0.3)用于 exploitation(利用已有知识),较高温度(0.5-1.0)用于 exploration(探索新可能)
Limitations
局限性
- Newer optimizer, less battle-tested than MIPROv2
- Requires thoughtful metric design (garbage in, garbage out)
- Not as thorough as GEPA for agent optimization
- Mini-batch sampling adds variance to results
- No automatic prompt reflection like GEPA
- 属于较新的优化器,不如MIPROv2经过充分实战检验
- 需要精心设计指标(输入垃圾数据,输出也会是垃圾结果)
- 在智能体优化方面不如GEPA彻底
- 小批量采样会增加结果的方差
- 不具备GEPA那样的自动提示反思功能
Official Documentation
官方文档
- DSPy Documentation: https://dspy.ai/
- DSPy GitHub: https://github.com/stanfordnlp/dspy
- SIMBA Optimizer: https://dspy.ai/api/optimizers/SIMBA/
- Optimizers Guide: https://dspy.ai/learn/optimization/optimizers/
- DSPy文档:https://dspy.ai/
- DSPy GitHub:https://github.com/stanfordnlp/dspy
- SIMBA优化器:https://dspy.ai/api/optimizers/SIMBA/
- 优化器指南:https://dspy.ai/learn/optimization/optimizers/