ai-switching-models

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Switch Models Without Breaking Things

无需改动代码即可切换AI模型

Guide the user through switching AI models or providers safely. The key insight: optimized prompts don't transfer between models (arxiv 2402.10949v2 — "The Unreasonable Effectiveness of Eccentric Automatic Prompts"). DSPy solves this by separating your task definition (signatures + modules) from model-specific prompts (compiled by optimizers).
本文将指导你安全切换AI模型或提供商。核心发现:针对特定模型优化的提示词无法直接迁移到其他模型(参考论文arxiv 2402.10949v2 — 《反常自动提示词的惊人效果》)。DSPy通过将任务定义(签名+模块)与模型专属提示词(由优化器编译)分离,解决了这一问题。

Why switching models breaks things

为何切换模型会导致问题

Hand-tuned prompts are model-specific. A prompt engineered for GPT-4o will perform differently on Claude, Llama, or even GPT-4o-mini. Research shows optimized prompts for one model can actually hurt performance on another.
DSPy makes switching safe because:
  • Signatures define what the task is (inputs, outputs, types) — model-independent
  • Modules define how to solve it (chain of thought, ReAct, etc.) — model-independent
  • Compiled prompts (few-shot examples, instructions) are model-specific — but re-generated automatically by optimizers
The workflow: keep your program the same, swap the model, re-optimize. Done.
手动调整的提示词是模型专属的。为GPT-4o设计的提示词在Claude、Llama甚至GPT-4o-mini上的表现会截然不同。研究表明,针对某一模型优化的提示词甚至会降低其他模型的性能。
DSPy让切换过程更安全,原因如下:
  • 签名定义任务本身(输入、输出、类型)——与模型无关
  • 模块定义任务的解决方式(思维链、ReAct等)——与模型无关
  • 编译后的提示词(少样本示例、指令)是模型专属的——但可由优化器自动重新生成
工作流程: 保持程序不变,替换模型,重新优化即可完成切换。

When to switch models

何时需要切换模型

  • Cost reduction — "GPT-4o is too expensive, can we use something cheaper?"
  • New model release — "A better model just came out, let's try it"
  • Vendor diversification — "We can't depend on one provider"
  • Data privacy / compliance — "We need to run models on our own infrastructure"
  • Performance regression — "The provider updated their model and our outputs got worse"
  • Capability needs — "We need better code generation / longer context / faster responses"
  • 降低成本——“GPT-4o太贵了,我们能换更便宜的模型吗?”
  • 新模型发布——“刚推出了更好的模型,我们试试吧”
  • 供应商多元化——“我们不能依赖单一提供商”
  • 数据隐私/合规——“我们需要在自有基础设施上运行模型”
  • 性能退化——“提供商更新了模型,我们的输出质量下降了”
  • 能力需求——“我们需要更好的代码生成/更长上下文/更快的响应速度”

Step 1: Configure any provider

步骤1:配置任意AI提供商

DSPy uses LiteLLM under the hood, so you can use any supported provider with a simple string:
python
import dspy
DSPy底层使用LiteLLM,因此你只需通过简单的字符串即可使用所有支持的提供商:
python
import dspy

OpenAI

OpenAI

lm = dspy.LM("openai/gpt-4o") lm = dspy.LM("openai/gpt-4o-mini")
lm = dspy.LM("openai/gpt-4o") lm = dspy.LM("openai/gpt-4o-mini")

Anthropic

Anthropic

lm = dspy.LM("anthropic/claude-sonnet-4-5-20250929") lm = dspy.LM("anthropic/claude-haiku-4-5-20251001")
lm = dspy.LM("anthropic/claude-sonnet-4-5-20250929") lm = dspy.LM("anthropic/claude-haiku-4-5-20251001")

Azure OpenAI

Azure OpenAI

lm = dspy.LM("azure/my-gpt4-deployment")
lm = dspy.LM("azure/my-gpt4-deployment")

Google

Google

lm = dspy.LM("gemini/gemini-2.0-flash")
lm = dspy.LM("gemini/gemini-2.0-flash")

Together AI (open-source models)

Together AI(开源模型)

lm = dspy.LM("together_ai/meta-llama/Llama-3-70b-chat-hf")
lm = dspy.LM("together_ai/meta-llama/Llama-3-70b-chat-hf")

Local models (via Ollama)

本地模型(通过Ollama)

lm = dspy.LM("ollama_chat/llama3.1", api_base="http://localhost:11434")
lm = dspy.LM("ollama_chat/llama3.1", api_base="http://localhost:11434")

Any OpenAI-compatible server (vLLM, TGI, etc.)

任何兼容OpenAI的服务器(vLLM、TGI等)

lm = dspy.LM("openai/my-model", api_base="http://localhost:8000/v1", api_key="none")
dspy.configure(lm=lm)
undefined
lm = dspy.LM("openai/my-model", api_base="http://localhost:8000/v1", api_key="none")
dspy.configure(lm=lm)
undefined

Environment variables

环境变量

Set API keys as environment variables — don't hardcode them:
bash
undefined
将API密钥设置为环境变量——不要硬编码:
bash
undefined

.env file

.env 文件

OPENAI_API_KEY=sk-... ANTHROPIC_API_KEY=sk-ant-... TOGETHER_API_KEY=... AZURE_API_KEY=... AZURE_API_BASE=https://your-resource.openai.azure.com/

See [LiteLLM provider docs](https://docs.litellm.ai/docs/providers) for the full list of 100+ supported providers.
OPENAI_API_KEY=sk-... ANTHROPIC_API_KEY=sk-ant-... TOGETHER_API_KEY=... AZURE_API_KEY=... AZURE_API_BASE=https://your-resource.openai.azure.com/

查看[LiteLLM提供商文档](https://docs.litellm.ai/docs/providers)获取全部100+支持的提供商列表。

Step 2: Benchmark your current model

步骤2:基准测试当前模型

Before changing anything, measure your baseline. You need a metric and test data.
python
from dspy.evaluate import Evaluate
在进行任何改动前,先测量基准性能。你需要一个评估指标和测试数据。
python
from dspy.evaluate import Evaluate

Your existing program and metric

你现有的程序和评估指标

program = MyProgram() program.load("current_optimized.json") # load your production prompts
evaluator = Evaluate( devset=devset, metric=metric, num_threads=4, display_progress=True, display_table=5, )
program = MyProgram() program.load("current_optimized.json") # 加载生产环境的提示词
evaluator = Evaluate( devset=devset, metric=metric, num_threads=4, display_progress=True, display_table=5, )

Benchmark with your current model

用当前模型进行基准测试

current_lm = dspy.LM("openai/gpt-4o") dspy.configure(lm=current_lm) baseline_score = evaluator(program) print(f"Current model baseline: {baseline_score:.1f}%")

If you don't have a metric or test data yet, use `/ai-improving-accuracy` to set them up first.
current_lm = dspy.LM("openai/gpt-4o") dspy.configure(lm=current_lm) baseline_score = evaluator(program) print(f"当前模型基准分数:{baseline_score:.1f}%")

如果你还没有评估指标或测试数据,请先使用`/ai-improving-accuracy`进行设置。

Step 3: Try the new model (quick test)

步骤3:快速测试新模型

Swap the model and run your evaluation without re-optimizing. This demonstrates the problem — your old prompts don't transfer.
python
undefined
替换模型,在不重新优化的情况下运行评估。这会直观展现问题——旧的提示词无法迁移到新模型。
python
undefined

Try the new model with your OLD optimized prompts

使用旧的优化提示词测试新模型

new_lm = dspy.LM("anthropic/claude-sonnet-4-5-20250929") dspy.configure(lm=new_lm)
naive_score = evaluator(program) print(f"Old model (optimized): {baseline_score:.1f}%") print(f"New model (old prompts): {naive_score:.1f}%") print(f"Drop: {baseline_score - naive_score:.1f}%")

You'll typically see a quality drop — this is expected. The optimized prompts were tuned for the old model.
new_lm = dspy.LM("anthropic/claude-sonnet-4-5-20250929") dspy.configure(lm=new_lm)
naive_score = evaluator(program) print(f"旧模型(已优化): {baseline_score:.1f}%") print(f"新模型(使用旧提示词):{naive_score:.1f}%") print(f"性能下降:{baseline_score - naive_score:.1f}%")

通常你会看到性能下降——这是预期的,因为优化后的提示词是针对旧模型调优的。

Step 4: Re-optimize for the new model

步骤4:为新模型重新优化

Now re-optimize your program for the new model. Use the same signatures and modules — only the compiled prompts change.
python
undefined
现在为新模型重新优化你的程序。使用相同的签名和模块——仅需重新生成编译后的提示词。
python
undefined

Configure the new model

配置新模型

new_lm = dspy.LM("anthropic/claude-sonnet-4-5-20250929") dspy.configure(lm=new_lm)
new_lm = dspy.LM("anthropic/claude-sonnet-4-5-20250929") dspy.configure(lm=new_lm)

Start from a fresh (unoptimized) program

初始化一个全新的(未优化的)程序

fresh_program = MyProgram()
fresh_program = MyProgram()

Re-optimize for the new model

为新模型重新优化

optimizer = dspy.MIPROv2(metric=metric, auto="medium") optimized_for_new = optimizer.compile(fresh_program, trainset=trainset)
optimizer = dspy.MIPROv2(metric=metric, auto="medium") optimized_for_new = optimizer.compile(fresh_program, trainset=trainset)

Evaluate

评估性能

reoptimized_score = evaluator(optimized_for_new) print(f"Old model (optimized): {baseline_score:.1f}%") print(f"New model (old prompts): {naive_score:.1f}%") print(f"New model (re-optimized): {reoptimized_score:.1f}%")

The re-optimized score should recover most or all of the quality. If it doesn't, either:
- The new model genuinely can't handle this task as well
- Try a heavier optimization (`auto="heavy"`)
- Try BootstrapFewShot first for a quick sanity check
reoptimized_score = evaluator(optimized_for_new) print(f"旧模型(已优化): {baseline_score:.1f}%") print(f"新模型(使用旧提示词): {naive_score:.1f}%") print(f"新模型(重新优化后): {reoptimized_score:.1f}%")

重新优化后的分数应该能恢复大部分甚至全部性能。如果没有,可能是以下原因:
- 新模型确实无法很好地处理该任务
- 尝试更深度的优化(`auto="heavy"`)
- 先使用BootstrapFewShot进行快速验证

Quick re-optimization (fast test)

快速重新优化(快速测试)

For a quick check before committing to a full MIPROv2 run:
python
optimizer = dspy.BootstrapFewShot(
    metric=metric,
    max_bootstrapped_demos=4,
    max_labeled_demos=4,
)
quick_optimized = optimizer.compile(fresh_program, trainset=trainset)
quick_score = evaluator(quick_optimized)
在进行完整的MIPROv2优化前,可以先进行快速验证:
python
optimizer = dspy.BootstrapFewShot(
    metric=metric,
    max_bootstrapped_demos=4,
    max_labeled_demos=4,
)
quick_optimized = optimizer.compile(fresh_program, trainset=trainset)
quick_score = evaluator(quick_optimized)

Step 5: Compare models systematically

步骤5:系统对比不同模型

Loop over candidate models, optimize each, and build a comparison table:
python
candidates = [
    ("openai/gpt-4o", "GPT-4o"),
    ("openai/gpt-4o-mini", "GPT-4o-mini"),
    ("anthropic/claude-sonnet-4-5-20250929", "Claude Sonnet"),
    ("together_ai/meta-llama/Llama-3-70b-chat-hf", "Llama 3 70B"),
]

results = []
for model_id, label in candidates:
    lm = dspy.LM(model_id)
    dspy.configure(lm=lm)

    # Optimize for this model
    fresh = MyProgram()
    optimizer = dspy.BootstrapFewShot(metric=metric, max_bootstrapped_demos=4)
    optimized = optimizer.compile(fresh, trainset=trainset)

    # Evaluate
    score = evaluator(optimized)

    # Save the optimized program
    optimized.save(f"optimized_{label.lower().replace(' ', '_')}.json")

    results.append({"model": label, "score": score})
    print(f"{label}: {score:.1f}%")
遍历候选模型,为每个模型优化,并生成对比表格:
python
candidates = [
    ("openai/gpt-4o", "GPT-4o"),
    ("openai/gpt-4o-mini", "GPT-4o-mini"),
    ("anthropic/claude-sonnet-4-5-20250929", "Claude Sonnet"),
    ("together_ai/meta-llama/Llama-3-70b-chat-hf", "Llama 3 70B"),
]

results = []
for model_id, label in candidates:
    lm = dspy.LM(model_id)
    dspy.configure(lm=lm)

    # 为该模型优化程序
    fresh = MyProgram()
    optimizer = dspy.BootstrapFewShot(metric=metric, max_bootstrapped_demos=4)
    optimized = optimizer.compile(fresh, trainset=trainset)

    # 评估性能
    score = evaluator(optimized)

    # 保存优化后的程序
    optimized.save(f"optimized_{label.lower().replace(' ', '_')}.json")

    results.append({"model": label, "score": score})
    print(f"{label}: {score:.1f}%")

Print comparison table

打印对比表格

print("\n--- Model Comparison ---") print(f"{'Model':<25} {'Score':>8}") print("-" * 35) for r in sorted(results, key=lambda x: x["score"], reverse=True): print(f"{r['model']:<25} {r['score']:>7.1f}%")

For a more thorough comparison with MIPROv2 and cost/latency tracking, see [examples.md](examples.md).
print("\n--- 模型对比结果 ---") print(f"{'模型':<25} {'分数':>8}") print("-" * 35) for r in sorted(results, key=lambda x: x["score"], reverse=True): print(f"{r['model']:<25} {r['score']:>7.1f}%")

如需使用MIPROv2进行更全面的对比,并跟踪成本和延迟,请参考[examples.md](examples.md)。

Step 6: Mix models in one pipeline

步骤6:在单一流水线中混合使用多个模型

You don't have to use one model for everything. Assign different models to different steps — cheap for simple tasks, expensive for hard ones.
你不必全程使用单一模型。可以为不同步骤分配不同模型——简单任务用廉价模型,复杂任务用高性能模型。

Using
dspy.context
(temporary, per-call)

使用
dspy.context
(临时,按调用设置)

python
cheap_lm = dspy.LM("openai/gpt-4o-mini")
expensive_lm = dspy.LM("openai/gpt-4o")

dspy.configure(lm=expensive_lm)  # default

class MyPipeline(dspy.Module):
    def __init__(self):
        self.classify = dspy.Predict(ClassifySignature)
        self.generate = dspy.ChainOfThought(GenerateSignature)

    def forward(self, text):
        # Cheap model for simple classification
        with dspy.context(lm=cheap_lm):
            category = self.classify(text=text)

        # Expensive model for complex generation
        return self.generate(text=text, category=category.label)
python
cheap_lm = dspy.LM("openai/gpt-4o-mini")
expensive_lm = dspy.LM("openai/gpt-4o")

dspy.configure(lm=expensive_lm)  # 默认模型

class MyPipeline(dspy.Module):
    def __init__(self):
        self.classify = dspy.Predict(ClassifySignature)
        self.generate = dspy.ChainOfThought(GenerateSignature)

    def forward(self, text):
        # 简单分类任务使用廉价模型
        with dspy.context(lm=cheap_lm):
            category = self.classify(text=text)

        # 复杂生成任务使用高性能模型
        return self.generate(text=text, category=category.label)

Using
set_lm
(permanent, per-module)

使用
set_lm
(永久,按模块设置)

python
pipeline = MyPipeline()
pipeline.classify.set_lm(cheap_lm)
pipeline.generate.set_lm(expensive_lm)
See
/ai-cutting-costs
for more cost optimization patterns with per-module LM assignment.
python
pipeline = MyPipeline()
pipeline.classify.set_lm(cheap_lm)
pipeline.generate.set_lm(expensive_lm)
如需了解更多按模块分配模型的成本优化模式,请参考
/ai-cutting-costs

Step 7: Save and deploy

步骤7:保存并部署

Save a separate optimized program for each model you might use in production:
python
undefined
为每个可能在生产环境使用的模型保存单独的优化程序:
python
undefined

Save per-model optimized programs

保存各模型对应的优化程序

optimized_gpt4o.save("optimized_gpt4o.json") optimized_claude.save("optimized_claude.json") optimized_llama.save("optimized_llama.json")
optimized_gpt4o.save("optimized_gpt4o.json") optimized_claude.save("optimized_claude.json") optimized_llama.save("optimized_llama.json")

In production — load the right one

生产环境中——加载对应模型的优化程序

import os
model_name = os.environ.get("AI_MODEL", "openai/gpt-4o") lm = dspy.LM(model_name) dspy.configure(lm=lm)
program = MyProgram() program.load(f"optimized_{model_name.split('/')[-1]}.json")
undefined
import os
model_name = os.environ.get("AI_MODEL", "openai/gpt-4o") lm = dspy.LM(model_name) dspy.configure(lm=lm)
program = MyProgram() program.load(f"optimized_{model_name.split('/')[-1]}.json")
undefined

Common scenarios

常见场景

GPT-4o to GPT-4o-mini (cost reduction)

从GPT-4o切换到GPT-4o-mini(降低成本)

  1. Benchmark GPT-4o baseline (Step 2)
  2. Try GPT-4o-mini with old prompts — see the drop (Step 3)
  3. Re-optimize for GPT-4o-mini with MIPROv2 (Step 4)
  4. Compare scores — if quality is close enough, ship it
  1. 基准测试GPT-4o的性能(步骤2)
  2. 使用旧提示词测试GPT-4o-mini——观察性能下降(步骤3)
  3. 使用MIPROv2为GPT-4o-mini重新优化(步骤4)
  4. 对比分数——如果性能差距可接受,即可部署

OpenAI to Anthropic (vendor diversification)

从OpenAI切换到Anthropic(供应商多元化)

  1. Set up Anthropic API key in environment
  2. Change model string:
    "openai/gpt-4o"
    to
    "anthropic/claude-sonnet-4-5-20250929"
  3. Re-optimize — different models need different prompts
  4. Keep both optimized programs, switch via environment variable
  1. 在环境变量中配置Anthropic的API密钥
  2. 修改模型字符串:从
    "openai/gpt-4o"
    改为
    "anthropic/claude-sonnet-4-5-20250929"
  3. 重新优化——不同模型需要不同的提示词
  4. 保留两个优化后的程序,通过环境变量切换

Cloud to local (data privacy)

从云端模型切换到本地模型(数据隐私)

  1. Set up local model server (Ollama, vLLM, or TGI)
  2. Point DSPy at it:
    dspy.LM("ollama_chat/llama3.1", api_base="http://localhost:11434")
  3. Re-optimize — local models especially need re-optimization
  4. Expect some quality trade-off vs large cloud models; use heavier optimization
  1. 搭建本地模型服务器(Ollama、vLLM或TGI)
  2. 将DSPy指向本地服务器:
    dspy.LM("ollama_chat/llama3.1", api_base="http://localhost:11434")
  3. 重新优化——本地模型尤其需要重新优化
  4. 与大型云端模型相比,性能可能会有一定损失;可使用更深度的优化

Model version update broke things

模型版本更新导致异常

When a provider updates their model (e.g., GPT-4o version bump):
  1. Run your evaluation to confirm the regression
  2. Re-optimize against the updated model
  3. Save the new optimized program
  4. This is why having evaluation + optimization in your workflow matters — version updates become routine, not emergencies
当提供商更新模型时(例如GPT-4o版本升级):
  1. 运行评估确认性能退化
  2. 针对更新后的模型重新优化
  3. 保存新的优化程序
  4. 这就是为什么工作流中需要包含评估和优化——版本更新会变成常规操作,而非紧急事件

Checklist

检查清单

  1. Set up evaluation and metric before switching (use
    /ai-improving-accuracy
    )
  2. Benchmark your current model
  3. Try the new model with old prompts (expect a drop)
  4. Re-optimize for the new model
  5. Compare scores — decide if the trade-off is acceptable
  6. Save per-model optimized programs
  7. Deploy with model selection via environment variable
  1. 在切换前设置好评估指标和测试数据(使用
    /ai-improving-accuracy
  2. 基准测试当前模型的性能
  3. 使用旧提示词测试新模型(预期会有性能下降)
  4. 为新模型重新优化程序
  5. 对比分数——判断性能权衡是否可接受
  6. 保存各模型对应的优化程序
  7. 通过环境变量选择模型进行部署

Additional resources

额外资源

  • For worked examples (cost migration, vendor switch, model shootout), see examples.md
  • Use
    /ai-improving-accuracy
    to set up metrics and evaluation before switching
  • Use
    /ai-cutting-costs
    for per-module model assignment and cost optimization
  • Use
    /ai-building-pipelines
    for multi-step pipelines with mixed models
  • Use
    /ai-fine-tuning
    to distill from an expensive model to a cheap one
  • 如需完整示例(成本迁移、供应商切换、模型对比),请查看examples.md
  • 如需设置指标和评估,请使用
    /ai-improving-accuracy
  • 如需按模块分配模型和成本优化,请使用
    /ai-cutting-costs
  • 如需构建多步骤混合模型流水线,请使用
    /ai-building-pipelines
  • 如需从昂贵模型蒸馏到廉价模型,请使用
    /ai-fine-tuning