dspy-simba-optimizer

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

DSPy SIMBA Optimizer

DSPy SIMBA 优化器

Goal

目标

Optimize DSPy programs using mini-batch Bayesian optimization with statistical analysis of feedback signals.
通过对反馈信号的统计分析,采用小批量贝叶斯优化来优化DSPy程序。

When to Use

适用场景

  • Need lighter-weight alternative to GEPA
  • Have custom feedback metrics (not just accuracy)
  • Agentic tasks with rich failure signals
  • Budget-conscious optimization (fewer eval calls)
  • Programs where few-shot examples aren't critical
  • 需要GEPA的轻量级替代方案
  • 拥有自定义反馈指标(不仅限于准确率)
  • 带有丰富失败信号的智能体任务
  • 注重成本控制的优化场景(需要更少的评估调用)
  • 少样本示例并非关键的程序

Related Skills

相关技能

  • Alternative optimizers: dspy-miprov2-optimizer, dspy-gepa-reflective
  • Agent optimization: dspy-react-agent-builder
  • Evaluation: dspy-evaluation-suite
  • 替代优化器:dspy-miprov2-optimizer, dspy-gepa-reflective
  • 智能体优化:dspy-react-agent-builder
  • 评估工具:dspy-evaluation-suite

Inputs

输入参数

InputTypeDescription
program
dspy.Module
Program to optimize
trainset
list[dspy.Example]
Training examples
metric
callable
Returns float or
dspy.Prediction(score=..., feedback=...)
max_steps
int
Number of optimization steps
bsize
int
Mini-batch size
输入类型描述
program
dspy.Module
待优化的程序
trainset
list[dspy.Example]
训练样本集
metric
callable
返回浮点数或
dspy.Prediction(score=..., feedback=...)
max_steps
int
优化步数
bsize
int
小批量大小

Outputs

输出结果

OutputTypeDescription
optimized_program
dspy.Module
SIMBA-optimized program
输出类型描述
optimized_program
dspy.Module
经SIMBA优化后的程序

Workflow

工作流程

Phase 1: Understand SIMBA

阶段1:了解SIMBA

SIMBA (Stochastic Introspective Mini-Batch Ascent):
  • Iterative prompt optimization with mini-batch sampling
  • Identifies challenging examples with high output variability
  • Generates self-reflective rules or adds successful demonstrations
  • Lighter than GEPA (no reflection LM)
  • More flexible than Bootstrap (uses feedback)
Comparison:
  • MIPROv2: Best accuracy, lots of data
  • GEPA: Agentic systems, expensive
  • SIMBA: Custom feedback, budget-friendly
  • Bootstrap: Simplest, demo-based
SIMBA(Stochastic Introspective Mini-Batch Ascent,随机自省小批量上升法):
  • 采用小批量采样的迭代式提示优化
  • 识别具有高输出变异性的挑战性示例
  • 生成自反思规则或添加成功演示案例
  • 比GEPA更轻量(无需反思语言模型)
  • 比Bootstrap更灵活(使用反馈信号)
对比:
  • MIPROv2:精度最佳,需要大量数据
  • GEPA:适用于智能体系统,成本较高
  • SIMBA:支持自定义反馈,性价比高
  • Bootstrap:最简单,基于演示案例

Phase 2: Basic SIMBA Optimization

阶段2:基础SIMBA优化

python
import dspy

dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))
python
import dspy

dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))

Program to optimize

Program to optimize

class QAPipeline(dspy.Module): def init(self): self.generate = dspy.ChainOfThought("question -> answer")
def forward(self, question):
    return self.generate(question=question)
class QAPipeline(dspy.Module): def init(self): self.generate = dspy.ChainOfThought("question -> answer")
def forward(self, question):
    return self.generate(question=question)

Metric (can return just score or (score, feedback))

Metric (can return just score or (score, feedback))

def qa_metric(example, pred, trace=None): correct = example.answer.lower() in pred.answer.lower() return 1.0 if correct else 0.0
def qa_metric(example, pred, trace=None): correct = example.answer.lower() in pred.answer.lower() return 1.0 if correct else 0.0

SIMBA optimizer

SIMBA optimizer

optimizer = dspy.SIMBA( metric=qa_metric, max_steps=10, # Optimization iterations bsize=5 # Mini-batch size )
program = QAPipeline() compiled = optimizer.compile(program, trainset=trainset) compiled.save("qa_simba.json")
undefined
optimizer = dspy.SIMBA( metric=qa_metric, max_steps=10, # Optimization iterations bsize=5 # Mini-batch size )
program = QAPipeline() compiled = optimizer.compile(program, trainset=trainset) compiled.save("qa_simba.json")
undefined

Phase 3: SIMBA with Feedback Signals

阶段3:带反馈信号的SIMBA优化

SIMBA works best with rich feedback:
python
import dspy

def detailed_metric(example, pred, trace=None):
    """Metric with feedback signal."""
    expected = example.answer.lower()
    actual = pred.answer.lower()

    if expected == actual:
        return dspy.Prediction(score=1.0, feedback="Perfect match")
    elif expected in actual:
        return dspy.Prediction(score=0.7, feedback=f"Contains answer but verbose: '{actual}'")
    else:
        overlap = len(set(expected.split()) & set(actual.split()))
        if overlap > 0:
            return dspy.Prediction(score=0.3, feedback=f"Partial overlap: {overlap} words")
        return dspy.Prediction(score=0.0, feedback=f"No match. Expected '{expected}'")

optimizer = dspy.SIMBA(
    metric=detailed_metric,
    max_steps=20,  # Optimization iterations
    bsize=8  # Mini-batch size
)

compiled = optimizer.compile(program, trainset=trainset)
SIMBA在有丰富反馈信号时表现最佳:
python
import dspy

def detailed_metric(example, pred, trace=None):
    """Metric with feedback signal."""
    expected = example.answer.lower()
    actual = pred.answer.lower()

    if expected == actual:
        return dspy.Prediction(score=1.0, feedback="Perfect match")
    elif expected in actual:
        return dspy.Prediction(score=0.7, feedback=f"Contains answer but verbose: '{actual}'")
    else:
        overlap = len(set(expected.split()) & set(actual.split()))
        if overlap > 0:
            return dspy.Prediction(score=0.3, feedback=f"Partial overlap: {overlap} words")
        return dspy.Prediction(score=0.0, feedback=f"No match. Expected '{expected}'")

optimizer = dspy.SIMBA(
    metric=detailed_metric,
    max_steps=20,  # Optimization iterations
    bsize=8  # Mini-batch size
)

compiled = optimizer.compile(program, trainset=trainset)

Phase 4: Production Agent Optimization

阶段4:生产级智能体优化

python
import dspy
from dspy.evaluate import Evaluate
import logging

logger = logging.getLogger(__name__)
python
import dspy
from dspy.evaluate import Evaluate
import logging

logger = logging.getLogger(__name__)

Define tools as functions

Define tools as functions

def search(query: str) -> str: """Search knowledge base for relevant information.""" retriever = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts') results = retriever(query, k=3) return "\n".join([r['text'] for r in results])
def calculate(expr: str) -> str: """Evaluate Python expressions safely.""" try: with dspy.PythonInterpreter() as interp: return str(interp.execute(expr)) except Exception as e: return f"Error: {e}"
class ResearchAgent(dspy.Module): def init(self): self.agent = dspy.ReAct( "question -> answer", tools=[search, calculate] )
def forward(self, question):
    return self.agent(question=question)
def agent_metric(example, pred, trace=None): """Rich metric for agent optimization.""" expected = example.answer.lower().strip() actual = pred.answer.lower().strip() if pred.answer else ""
# Exact match
if expected == actual:
    return dspy.Prediction(score=1.0, feedback="Correct answer")

# Partial match
if expected in actual:
    return dspy.Prediction(score=0.7, feedback="Answer contains expected result")

# Check key terms
expected_terms = set(expected.split())
actual_terms = set(actual.split())
overlap = len(expected_terms & actual_terms)

if overlap >= len(expected_terms) * 0.5:
    return dspy.Prediction(score=0.5, feedback=f"50%+ term overlap")

return dspy.Prediction(score=0.0, feedback=f"Incorrect: expected '{example.answer}'")
def optimize_agent(trainset, devset): """Full SIMBA optimization pipeline.""" dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))
agent = ResearchAgent()

# Baseline evaluation
eval_metric = lambda ex, pred, trace: agent_metric(ex, pred, trace).score
evaluator = dspy.Evaluate(devset=devset, metric=eval_metric, num_threads=4)
baseline = evaluator(agent)
logger.info(f"Baseline: {baseline:.2%}")

# SIMBA optimization
optimizer = dspy.SIMBA(
    metric=agent_metric,
    max_steps=25,  # Optimization iterations
    bsize=6  # Mini-batch size
)

compiled = optimizer.compile(agent, trainset=trainset)

# Evaluate optimized
optimized = evaluator(compiled)
logger.info(f"SIMBA optimized: {optimized:.2%}")

compiled.save("research_agent_simba.json")
return compiled
undefined
def search(query: str) -> str: """Search knowledge base for relevant information.""" retriever = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts') results = retriever(query, k=3) return "\n".join([r['text'] for r in results])
def calculate(expr: str) -> str: """Evaluate Python expressions safely.""" try: with dspy.PythonInterpreter() as interp: return str(interp.execute(expr)) except Exception as e: return f"Error: {e}"
class ResearchAgent(dspy.Module): def init(self): self.agent = dspy.ReAct( "question -> answer", tools=[search, calculate] )
def forward(self, question):
    return self.agent(question=question)
def agent_metric(example, pred, trace=None): """Rich metric for agent optimization.""" expected = example.answer.lower().strip() actual = pred.answer.lower().strip() if pred.answer else ""
# Exact match
if expected == actual:
    return dspy.Prediction(score=1.0, feedback="Correct answer")

# Partial match
if expected in actual:
    return dspy.Prediction(score=0.7, feedback="Answer contains expected result")

# Check key terms
expected_terms = set(expected.split())
actual_terms = set(actual.split())
overlap = len(expected_terms & actual_terms)

if overlap >= len(expected_terms) * 0.5:
    return dspy.Prediction(score=0.5, feedback=f"50%+ term overlap")

return dspy.Prediction(score=0.0, feedback=f"Incorrect: expected '{example.answer}'")
def optimize_agent(trainset, devset): """Full SIMBA optimization pipeline.""" dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))
agent = ResearchAgent()

# Baseline evaluation
eval_metric = lambda ex, pred, trace: agent_metric(ex, pred, trace).score
evaluator = dspy.Evaluate(devset=devset, metric=eval_metric, num_threads=4)
baseline = evaluator(agent)
logger.info(f"Baseline: {baseline:.2%}")

# SIMBA optimization
optimizer = dspy.SIMBA(
    metric=agent_metric,
    max_steps=25,  # Optimization iterations
    bsize=6  # Mini-batch size
)

compiled = optimizer.compile(agent, trainset=trainset)

# Evaluate optimized
optimized = evaluator(compiled)
logger.info(f"SIMBA optimized: {optimized:.2%}")

compiled.save("research_agent_simba.json")
return compiled
undefined

Configuration

配置选项

python
optimizer = dspy.SIMBA(
    metric=metric_fn,
    max_steps=20,                          # Optimization iterations
    bsize=32,                              # Mini-batch size (default: 32)
    num_candidates=6,                      # Candidates per iteration (default: 6)
    max_demos=4,                           # Max demos per predictor (default: 4)
    temperature_for_sampling=0.2,          # Sampling temperature (default: 0.2)
    temperature_for_candidates=0.2         # Candidate selection temperature (default: 0.2)
)
python
optimizer = dspy.SIMBA(
    metric=metric_fn,
    max_steps=20,                          # Optimization iterations
    bsize=32,                              # Mini-batch size (default: 32)
    num_candidates=6,                      # Candidates per iteration (default: 6)
    max_demos=4,                           # Max demos per predictor (default: 4)
    temperature_for_sampling=0.2,          # Sampling temperature (default: 0.2)
    temperature_for_candidates=0.2         # Candidate selection temperature (default: 0.2)
)

Best Practices

最佳实践

  1. Use feedback signals - SIMBA benefits from
    dspy.Prediction(score=..., feedback=...)
    objects
  2. Balance parameters - Adjust
    bsize
    (default 32) and
    max_steps
    (default 8) based on dataset size
  3. Patience - SIMBA is slower than Bootstrap, faster than GEPA
  4. Custom metrics - Best for scenarios with nuanced scoring (not binary)
  5. Tune temperatures - Lower temperatures (0.1-0.3) for exploitation, higher (0.5-1.0) for exploration
  1. 使用反馈信号 - SIMBA能从
    dspy.Prediction(score=..., feedback=...)
    对象中获得显著收益
  2. 平衡参数 - 根据数据集大小调整
    bsize
    (默认32)和
    max_steps
    (默认8)
  3. 耐心等待 - SIMBA比Bootstrap慢,但比GEPA快
  4. 自定义指标 - 最适合需要细致评分的场景(非二元判断)
  5. 调整温度参数 - 较低温度(0.1-0.3)用于 exploitation(利用已有知识),较高温度(0.5-1.0)用于 exploration(探索新可能)

Limitations

局限性

  • Newer optimizer, less battle-tested than MIPROv2
  • Requires thoughtful metric design (garbage in, garbage out)
  • Not as thorough as GEPA for agent optimization
  • Mini-batch sampling adds variance to results
  • No automatic prompt reflection like GEPA
  • 属于较新的优化器,不如MIPROv2经过充分实战检验
  • 需要精心设计指标(输入垃圾数据,输出也会是垃圾结果)
  • 在智能体优化方面不如GEPA彻底
  • 小批量采样会增加结果的方差
  • 不具备GEPA那样的自动提示反思功能

Official Documentation

官方文档