ai-switching-models

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Switch Models Without Breaking Things

无需改动代码即可切换AI模型

Guide the user through switching AI models or providers safely. The key insight: optimized prompts don't transfer between models (arxiv 2402.10949v2 — "The Unreasonable Effectiveness of Eccentric Automatic Prompts"). DSPy solves this by separating your task definition (signatures + modules) from model-specific prompts (compiled by optimizers).

本文将指导你安全切换AI模型或提供商。核心发现：针对特定模型优化的提示词无法直接迁移到其他模型（参考论文arxiv 2402.10949v2 — 《反常自动提示词的惊人效果》）。DSPy通过将任务定义（签名+模块）与模型专属提示词（由优化器编译）分离，解决了这一问题。

Why switching models breaks things

为何切换模型会导致问题

Hand-tuned prompts are model-specific. A prompt engineered for GPT-4o will perform differently on Claude, Llama, or even GPT-4o-mini. Research shows optimized prompts for one model can actually hurt performance on another.

DSPy makes switching safe because:

Signatures define what the task is (inputs, outputs, types) — model-independent
Modules define how to solve it (chain of thought, ReAct, etc.) — model-independent
Compiled prompts (few-shot examples, instructions) are model-specific — but re-generated automatically by optimizers

The workflow: keep your program the same, swap the model, re-optimize. Done.

手动调整的提示词是模型专属的。为GPT-4o设计的提示词在Claude、Llama甚至GPT-4o-mini上的表现会截然不同。研究表明，针对某一模型优化的提示词甚至会降低其他模型的性能。

DSPy让切换过程更安全，原因如下：

签名定义任务本身（输入、输出、类型）——与模型无关
模块定义任务的解决方式（思维链、ReAct等）——与模型无关
编译后的提示词（少样本示例、指令）是模型专属的——但可由优化器自动重新生成

工作流程： 保持程序不变，替换模型，重新优化即可完成切换。

When to switch models

何时需要切换模型

Cost reduction — "GPT-4o is too expensive, can we use something cheaper?"
New model release — "A better model just came out, let's try it"
Vendor diversification — "We can't depend on one provider"
Data privacy / compliance — "We need to run models on our own infrastructure"
Performance regression — "The provider updated their model and our outputs got worse"
Capability needs — "We need better code generation / longer context / faster responses"

降低成本——“GPT-4o太贵了，我们能换更便宜的模型吗？”
新模型发布——“刚推出了更好的模型，我们试试吧”
供应商多元化——“我们不能依赖单一提供商”
数据隐私/合规——“我们需要在自有基础设施上运行模型”
性能退化——“提供商更新了模型，我们的输出质量下降了”
能力需求——“我们需要更好的代码生成/更长上下文/更快的响应速度”

Step 1: Configure any provider

步骤1：配置任意AI提供商

DSPy uses LiteLLM under the hood, so you can use any supported provider with a simple string:

python

import dspy

DSPy底层使用LiteLLM，因此你只需通过简单的字符串即可使用所有支持的提供商：

python

import dspy

OpenAI

lm = dspy.LM("openai/gpt-4o") lm = dspy.LM("openai/gpt-4o-mini")

Anthropic

lm = dspy.LM("anthropic/claude-sonnet-4-5-20250929") lm = dspy.LM("anthropic/claude-haiku-4-5-20251001")

Azure OpenAI

lm = dspy.LM("azure/my-gpt4-deployment")

Google

lm = dspy.LM("gemini/gemini-2.0-flash")

Together AI (open-source models)

Together AI（开源模型）

lm = dspy.LM("together_ai/meta-llama/Llama-3-70b-chat-hf")

Local models (via Ollama)

本地模型（通过Ollama）

lm = dspy.LM("ollama_chat/llama3.1", api_base="http://localhost:11434")

Any OpenAI-compatible server (vLLM, TGI, etc.)

任何兼容OpenAI的服务器（vLLM、TGI等）

lm = dspy.LM("openai/my-model", api_base="http://localhost:8000/v1", api_key="none")

dspy.configure(lm=lm)

undefined

lm = dspy.LM("openai/my-model", api_base="http://localhost:8000/v1", api_key="none")

dspy.configure(lm=lm)

undefined

Environment variables

环境变量

Set API keys as environment variables — don't hardcode them:

bash

undefined

将API密钥设置为环境变量——不要硬编码：

bash

undefined

.env file

.env 文件

OPENAI_API_KEY=sk-... ANTHROPIC_API_KEY=sk-ant-... TOGETHER_API_KEY=... AZURE_API_KEY=... AZURE_API_BASE=https://your-resource.openai.azure.com/


See [LiteLLM provider docs](https://docs.litellm.ai/docs/providers) for the full list of 100+ supported providers.

OPENAI_API_KEY=sk-... ANTHROPIC_API_KEY=sk-ant-... TOGETHER_API_KEY=... AZURE_API_KEY=... AZURE_API_BASE=https://your-resource.openai.azure.com/


查看[LiteLLM提供商文档](https://docs.litellm.ai/docs/providers)获取全部100+支持的提供商列表。

Step 2: Benchmark your current model

步骤2：基准测试当前模型

Before changing anything, measure your baseline. You need a metric and test data.

python

from dspy.evaluate import Evaluate

在进行任何改动前，先测量基准性能。你需要一个评估指标和测试数据。

python

from dspy.evaluate import Evaluate

Your existing program and metric

你现有的程序和评估指标

program = MyProgram() program.load("current_optimized.json") # load your production prompts

evaluator = Evaluate( devset=devset, metric=metric, num_threads=4, display_progress=True, display_table=5, )

program = MyProgram() program.load("current_optimized.json") # 加载生产环境的提示词

evaluator = Evaluate( devset=devset, metric=metric, num_threads=4, display_progress=True, display_table=5, )

Benchmark with your current model

用当前模型进行基准测试

current_lm = dspy.LM("openai/gpt-4o") dspy.configure(lm=current_lm) baseline_score = evaluator(program) print(f"Current model baseline: {baseline_score:.1f}%")


If you don't have a metric or test data yet, use `/ai-improving-accuracy` to set them up first.

current_lm = dspy.LM("openai/gpt-4o") dspy.configure(lm=current_lm) baseline_score = evaluator(program) print(f"当前模型基准分数：{baseline_score:.1f}%")


如果你还没有评估指标或测试数据，请先使用`/ai-improving-accuracy`进行设置。

Step 3: Try the new model (quick test)

步骤3：快速测试新模型

Swap the model and run your evaluation without re-optimizing. This demonstrates the problem — your old prompts don't transfer.

python

undefined

替换模型，在不重新优化的情况下运行评估。这会直观展现问题——旧的提示词无法迁移到新模型。

python

undefined

Try the new model with your OLD optimized prompts

使用旧的优化提示词测试新模型

new_lm = dspy.LM("anthropic/claude-sonnet-4-5-20250929") dspy.configure(lm=new_lm)

naive_score = evaluator(program) print(f"Old model (optimized): {baseline_score:.1f}%") print(f"New model (old prompts): {naive_score:.1f}%") print(f"Drop: {baseline_score - naive_score:.1f}%")


You'll typically see a quality drop — this is expected. The optimized prompts were tuned for the old model.

new_lm = dspy.LM("anthropic/claude-sonnet-4-5-20250929") dspy.configure(lm=new_lm)

naive_score = evaluator(program) print(f"旧模型（已优化）： {baseline_score:.1f}%") print(f"新模型（使用旧提示词）：{naive_score:.1f}%") print(f"性能下降：{baseline_score - naive_score:.1f}%")


通常你会看到性能下降——这是预期的，因为优化后的提示词是针对旧模型调优的。

Step 4: Re-optimize for the new model

步骤4：为新模型重新优化

Now re-optimize your program for the new model. Use the same signatures and modules — only the compiled prompts change.

python

undefined

现在为新模型重新优化你的程序。使用相同的签名和模块——仅需重新生成编译后的提示词。

python

undefined

Configure the new model

配置新模型

new_lm = dspy.LM("anthropic/claude-sonnet-4-5-20250929") dspy.configure(lm=new_lm)

Start from a fresh (unoptimized) program

初始化一个全新的（未优化的）程序

fresh_program = MyProgram()

Re-optimize for the new model

为新模型重新优化

optimizer = dspy.MIPROv2(metric=metric, auto="medium") optimized_for_new = optimizer.compile(fresh_program, trainset=trainset)

Evaluate

评估性能

reoptimized_score = evaluator(optimized_for_new) print(f"Old model (optimized): {baseline_score:.1f}%") print(f"New model (old prompts): {naive_score:.1f}%") print(f"New model (re-optimized): {reoptimized_score:.1f}%")


The re-optimized score should recover most or all of the quality. If it doesn't, either:
- The new model genuinely can't handle this task as well
- Try a heavier optimization (`auto="heavy"`)
- Try BootstrapFewShot first for a quick sanity check

reoptimized_score = evaluator(optimized_for_new) print(f"旧模型（已优化）： {baseline_score:.1f}%") print(f"新模型（使用旧提示词）： {naive_score:.1f}%") print(f"新模型（重新优化后）： {reoptimized_score:.1f}%")


重新优化后的分数应该能恢复大部分甚至全部性能。如果没有，可能是以下原因：
- 新模型确实无法很好地处理该任务
- 尝试更深度的优化（`auto="heavy"`）
- 先使用BootstrapFewShot进行快速验证

Quick re-optimization (fast test)

快速重新优化（快速测试）

For a quick check before committing to a full MIPROv2 run:

python

optimizer = dspy.BootstrapFewShot(
    metric=metric,
    max_bootstrapped_demos=4,
    max_labeled_demos=4,
)
quick_optimized = optimizer.compile(fresh_program, trainset=trainset)
quick_score = evaluator(quick_optimized)

在进行完整的MIPROv2优化前，可以先进行快速验证：

python

optimizer = dspy.BootstrapFewShot(
    metric=metric,
    max_bootstrapped_demos=4,
    max_labeled_demos=4,
)
quick_optimized = optimizer.compile(fresh_program, trainset=trainset)
quick_score = evaluator(quick_optimized)

Step 5: Compare models systematically

步骤5：系统对比不同模型

Loop over candidate models, optimize each, and build a comparison table:

python

candidates = [
    ("openai/gpt-4o", "GPT-4o"),
    ("openai/gpt-4o-mini", "GPT-4o-mini"),
    ("anthropic/claude-sonnet-4-5-20250929", "Claude Sonnet"),
    ("together_ai/meta-llama/Llama-3-70b-chat-hf", "Llama 3 70B"),
]

results = []
for model_id, label in candidates:
    lm = dspy.LM(model_id)
    dspy.configure(lm=lm)

    # Optimize for this model
    fresh = MyProgram()
    optimizer = dspy.BootstrapFewShot(metric=metric, max_bootstrapped_demos=4)
    optimized = optimizer.compile(fresh, trainset=trainset)

    # Evaluate
    score = evaluator(optimized)

    # Save the optimized program
    optimized.save(f"optimized_{label.lower().replace(' ', '_')}.json")

    results.append({"model": label, "score": score})
    print(f"{label}: {score:.1f}%")

遍历候选模型，为每个模型优化，并生成对比表格：

python

candidates = [
    ("openai/gpt-4o", "GPT-4o"),
    ("openai/gpt-4o-mini", "GPT-4o-mini"),
    ("anthropic/claude-sonnet-4-5-20250929", "Claude Sonnet"),
    ("together_ai/meta-llama/Llama-3-70b-chat-hf", "Llama 3 70B"),
]

results = []
for model_id, label in candidates:
    lm = dspy.LM(model_id)
    dspy.configure(lm=lm)

    # 为该模型优化程序
    fresh = MyProgram()
    optimizer = dspy.BootstrapFewShot(metric=metric, max_bootstrapped_demos=4)
    optimized = optimizer.compile(fresh, trainset=trainset)

    # 评估性能
    score = evaluator(optimized)

    # 保存优化后的程序
    optimized.save(f"optimized_{label.lower().replace(' ', '_')}.json")

    results.append({"model": label, "score": score})
    print(f"{label}: {score:.1f}%")

Print comparison table

打印对比表格

print("\n--- Model Comparison ---") print(f"{'Model':<25} {'Score':>8}") print("-" * 35) for r in sorted(results, key=lambda x: x["score"], reverse=True): print(f"{r['model']:<25} {r['score']:>7.1f}%")


For a more thorough comparison with MIPROv2 and cost/latency tracking, see [examples.md](examples.md).

print("\n--- 模型对比结果 ---") print(f"{'模型':<25} {'分数':>8}") print("-" * 35) for r in sorted(results, key=lambda x: x["score"], reverse=True): print(f"{r['model']:<25} {r['score']:>7.1f}%")


如需使用MIPROv2进行更全面的对比，并跟踪成本和延迟，请参考[examples.md](examples.md)。

Step 6: Mix models in one pipeline

步骤6：在单一流水线中混合使用多个模型

You don't have to use one model for everything. Assign different models to different steps — cheap for simple tasks, expensive for hard ones.

你不必全程使用单一模型。可以为不同步骤分配不同模型——简单任务用廉价模型，复杂任务用高性能模型。

Using

dspy.context

(temporary, per-call)

使用

dspy.context

（临时，按调用设置）

python

cheap_lm = dspy.LM("openai/gpt-4o-mini")
expensive_lm = dspy.LM("openai/gpt-4o")

dspy.configure(lm=expensive_lm)  # default

class MyPipeline(dspy.Module):
    def __init__(self):
        self.classify = dspy.Predict(ClassifySignature)
        self.generate = dspy.ChainOfThought(GenerateSignature)

    def forward(self, text):
        # Cheap model for simple classification
        with dspy.context(lm=cheap_lm):
            category = self.classify(text=text)

        # Expensive model for complex generation
        return self.generate(text=text, category=category.label)

python

cheap_lm = dspy.LM("openai/gpt-4o-mini")
expensive_lm = dspy.LM("openai/gpt-4o")

dspy.configure(lm=expensive_lm)  # 默认模型

class MyPipeline(dspy.Module):
    def __init__(self):
        self.classify = dspy.Predict(ClassifySignature)
        self.generate = dspy.ChainOfThought(GenerateSignature)

    def forward(self, text):
        # 简单分类任务使用廉价模型
        with dspy.context(lm=cheap_lm):
            category = self.classify(text=text)

        # 复杂生成任务使用高性能模型
        return self.generate(text=text, category=category.label)

Using

set_lm

(permanent, per-module)

使用

set_lm

（永久，按模块设置）

python

pipeline = MyPipeline()
pipeline.classify.set_lm(cheap_lm)
pipeline.generate.set_lm(expensive_lm)

See

/ai-cutting-costs

for more cost optimization patterns with per-module LM assignment.

python

pipeline = MyPipeline()
pipeline.classify.set_lm(cheap_lm)
pipeline.generate.set_lm(expensive_lm)

如需了解更多按模块分配模型的成本优化模式，请参考

/ai-cutting-costs

。

Step 7: Save and deploy

步骤7：保存并部署

Save a separate optimized program for each model you might use in production:

python

undefined

为每个可能在生产环境使用的模型保存单独的优化程序：

python

undefined

Save per-model optimized programs

保存各模型对应的优化程序

optimized_gpt4o.save("optimized_gpt4o.json") optimized_claude.save("optimized_claude.json") optimized_llama.save("optimized_llama.json")

In production — load the right one

生产环境中——加载对应模型的优化程序

import os

model_name = os.environ.get("AI_MODEL", "openai/gpt-4o") lm = dspy.LM(model_name) dspy.configure(lm=lm)

program = MyProgram() program.load(f"optimized_{model_name.split('/')[-1]}.json")

undefined

import os

model_name = os.environ.get("AI_MODEL", "openai/gpt-4o") lm = dspy.LM(model_name) dspy.configure(lm=lm)

program = MyProgram() program.load(f"optimized_{model_name.split('/')[-1]}.json")

undefined

Common scenarios

常见场景

GPT-4o to GPT-4o-mini (cost reduction)

从GPT-4o切换到GPT-4o-mini（降低成本）

Benchmark GPT-4o baseline (Step 2)
Try GPT-4o-mini with old prompts — see the drop (Step 3)
Re-optimize for GPT-4o-mini with MIPROv2 (Step 4)
Compare scores — if quality is close enough, ship it

基准测试GPT-4o的性能（步骤2）
使用旧提示词测试GPT-4o-mini——观察性能下降（步骤3）
使用MIPROv2为GPT-4o-mini重新优化（步骤4）
对比分数——如果性能差距可接受，即可部署

OpenAI to Anthropic (vendor diversification)

从OpenAI切换到Anthropic（供应商多元化）

Set up Anthropic API key in environment

Change model string:

"openai/gpt-4o"

"anthropic/claude-sonnet-4-5-20250929"

Re-optimize — different models need different prompts
Keep both optimized programs, switch via environment variable

在环境变量中配置Anthropic的API密钥

修改模型字符串：从

"openai/gpt-4o"

改为

"anthropic/claude-sonnet-4-5-20250929"

重新优化——不同模型需要不同的提示词
保留两个优化后的程序，通过环境变量切换

Cloud to local (data privacy)

从云端模型切换到本地模型（数据隐私）

Set up local model server (Ollama, vLLM, or TGI)

Point DSPy at it:

dspy.LM("ollama_chat/llama3.1", api_base="http://localhost:11434")

Re-optimize — local models especially need re-optimization
Expect some quality trade-off vs large cloud models; use heavier optimization

搭建本地模型服务器（Ollama、vLLM或TGI）

将DSPy指向本地服务器：

dspy.LM("ollama_chat/llama3.1", api_base="http://localhost:11434")

重新优化——本地模型尤其需要重新优化
与大型云端模型相比，性能可能会有一定损失；可使用更深度的优化

Model version update broke things

模型版本更新导致异常

When a provider updates their model (e.g., GPT-4o version bump):

Run your evaluation to confirm the regression
Re-optimize against the updated model
Save the new optimized program
This is why having evaluation + optimization in your workflow matters — version updates become routine, not emergencies

当提供商更新模型时（例如GPT-4o版本升级）：

运行评估确认性能退化
针对更新后的模型重新优化
保存新的优化程序
这就是为什么工作流中需要包含评估和优化——版本更新会变成常规操作，而非紧急事件

Checklist

检查清单

Set up evaluation and metric before switching (use
```
/ai-improving-accuracy
```
)
Benchmark your current model
Try the new model with old prompts (expect a drop)
Re-optimize for the new model
Compare scores — decide if the trade-off is acceptable
Save per-model optimized programs
Deploy with model selection via environment variable

在切换前设置好评估指标和测试数据（使用
```
/ai-improving-accuracy
```
）
基准测试当前模型的性能
使用旧提示词测试新模型（预期会有性能下降）
为新模型重新优化程序
对比分数——判断性能权衡是否可接受
保存各模型对应的优化程序
通过环境变量选择模型进行部署

Additional resources

额外资源

For worked examples (cost migration, vendor switch, model shootout), see examples.md
Use
```
/ai-improving-accuracy
```
to set up metrics and evaluation before switching
Use
```
/ai-cutting-costs
```
for per-module model assignment and cost optimization
Use
```
/ai-building-pipelines
```
for multi-step pipelines with mixed models
Use
```
/ai-fine-tuning
```
to distill from an expensive model to a cheap one

如需完整示例（成本迁移、供应商切换、模型对比），请查看examples.md
如需设置指标和评估，请使用
```
/ai-improving-accuracy
```
如需按模块分配模型和成本优化，请使用
```
/ai-cutting-costs
```
如需构建多步骤混合模型流水线，请使用
```
/ai-building-pipelines
```
如需从昂贵模型蒸馏到廉价模型，请使用
```
/ai-fine-tuning
```

ai-switching-models

Original

Translation

Switch Models Without Breaking Things

无需改动代码即可切换AI模型

Why switching models breaks things

为何切换模型会导致问题

When to switch models

何时需要切换模型

Step 1: Configure any provider

步骤1：配置任意AI提供商

OpenAI

OpenAI

Anthropic

Anthropic

Azure OpenAI

Azure OpenAI

Google

Google

Together AI (open-source models)

Together AI（开源模型）

Local models (via Ollama)

本地模型（通过Ollama）

Any OpenAI-compatible server (vLLM, TGI, etc.)

任何兼容OpenAI的服务器（vLLM、TGI等）

Environment variables

环境变量

.env file

.env 文件

Step 2: Benchmark your current model

步骤2：基准测试当前模型

Your existing program and metric

你现有的程序和评估指标

Benchmark with your current model

用当前模型进行基准测试

Step 3: Try the new model (quick test)

步骤3：快速测试新模型

Try the new model with your OLD optimized prompts

使用旧的优化提示词测试新模型

Step 4: Re-optimize for the new model

步骤4：为新模型重新优化

Configure the new model

配置新模型

Start from a fresh (unoptimized) program

初始化一个全新的（未优化的）程序

Re-optimize for the new model

为新模型重新优化

Evaluate

评估性能

Quick re-optimization (fast test)

快速重新优化（快速测试）

Step 5: Compare models systematically

步骤5：系统对比不同模型

Print comparison table

打印对比表格

Step 6: Mix models in one pipeline

步骤6：在单一流水线中混合使用多个模型

Using dspy.context (temporary, per-call)

使用dspy.context（临时，按调用设置）

Using set_lm (permanent, per-module)

使用set_lm（永久，按模块设置）

Step 7: Save and deploy

步骤7：保存并部署

Save per-model optimized programs

保存各模型对应的优化程序

In production — load the right one

生产环境中——加载对应模型的优化程序

Common scenarios

常见场景

GPT-4o to GPT-4o-mini (cost reduction)

从GPT-4o切换到GPT-4o-mini（降低成本）

OpenAI to Anthropic (vendor diversification)

从OpenAI切换到Anthropic（供应商多元化）

Cloud to local (data privacy)

从云端模型切换到本地模型（数据隐私）

Model version update broke things

模型版本更新导致异常

Checklist

检查清单

Additional resources

Using
`dspy.context`
(temporary, per-call)

使用
`dspy.context`
（临时，按调用设置）

Using
`set_lm`
(permanent, per-module)

使用
`set_lm`
（永久，按模块设置）