model-merging
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseModel Merging: Combining Pre-trained Models
模型合并:整合预训练模型
When to Use This Skill
何时使用该技术
Use Model Merging when you need to:
- Combine capabilities from multiple fine-tuned models without retraining
- Create specialized models by blending domain-specific expertise (math + coding + chat)
- Improve performance beyond single models (often +5-10% on benchmarks)
- Reduce training costs - no GPUs needed, merges run on CPU
- Experiment rapidly - create new model variants in minutes, not days
- Preserve multiple skills - merge without catastrophic forgetting
Success Stories: Marcoro14-7B-slerp (best on Open LLM Leaderboard 02/2024), many top HuggingFace models use merging
Tools: mergekit (Arcee AI), LazyMergekit, Model Soup
在以下场景中使用模型合并:
- 整合多模型能力:无需重新训练,合并多个微调模型的能力
- 创建专用模型:融合领域专业能力(数学+代码+对话)
- 提升模型性能:超越单个模型的性能(通常在基准测试中提升5-10%)
- 降低训练成本:无需GPU,可在CPU上完成合并
- 快速试验:几分钟内即可创建新的模型变体,无需耗时数天
- 保留多技能:合并时不会出现灾难性遗忘
成功案例:Marcoro14-7B-slerp(2024年2月Open LLM排行榜榜首),众多顶级HuggingFace模型均采用了合并技术
工具:mergekit(Arcee AI)、LazyMergekit、Model Soup
Installation
安装
bash
undefinedbash
undefinedInstall mergekit
安装mergekit
git clone https://github.com/arcee-ai/mergekit.git
cd mergekit
pip install -e .
git clone https://github.com/arcee-ai/mergekit.git
cd mergekit
pip install -e .
Or via pip
或通过pip直接安装
pip install mergekit
pip install mergekit
Optional: Transformer library
可选:安装Transformer库
pip install transformers torch
undefinedpip install transformers torch
undefinedQuick Start
快速开始
Simple Linear Merge
简单线性合并
yaml
undefinedyaml
undefinedconfig.yml - Merge two models with equal weights
config.yml - 以相等权重合并两个模型
merge_method: linear
models:
- model: mistralai/Mistral-7B-v0.1 parameters: weight: 0.5
- model: teknium/OpenHermes-2.5-Mistral-7B parameters: weight: 0.5 dtype: bfloat16
```bashmerge_method: linear
models:
- model: mistralai/Mistral-7B-v0.1 parameters: weight: 0.5
- model: teknium/OpenHermes-2.5-Mistral-7B parameters: weight: 0.5 dtype: bfloat16
```bashRun merge
执行合并
mergekit-yaml config.yml ./merged-model --cuda
mergekit-yaml config.yml ./merged-model --cuda
Use merged model
使用合并后的模型
python -m transformers.models.auto --model_name_or_path ./merged-model
undefinedpython -m transformers.models.auto --model_name_or_path ./merged-model
undefinedSLERP Merge (Best for 2 Models)
SLERP合并(适用于2个模型的最佳方案)
yaml
undefinedyaml
undefinedconfig.yml - Spherical interpolation
config.yml - 球面插值
merge_method: slerp
slices:
- sources:
- model: mistralai/Mistral-7B-v0.1 layer_range: [0, 32]
- model: teknium/OpenHermes-2.5-Mistral-7B layer_range: [0, 32] parameters: t: 0.5 # Interpolation factor (0=model1, 1=model2) dtype: bfloat16
undefinedmerge_method: slerp
slices:
- sources:
- model: mistralai/Mistral-7B-v0.1 layer_range: [0, 32]
- model: teknium/OpenHermes-2.5-Mistral-7B layer_range: [0, 32] parameters: t: 0.5 # 插值因子(0=第一个模型,1=第二个模型) dtype: bfloat16
undefinedCore Concepts
核心概念
1. Merge Methods
1. 合并方法
Linear (Model Soup)
- Simple weighted average of parameters
- Fast, works well for similar models
- Can merge 2+ models
python
merged_weights = w1 * model1_weights + w2 * model2_weights + w3 * model3_weights线性合并(Model Soup)
- 参数的简单加权平均
- 速度快,适用于相似模型
- 可合并2个及以上模型
python
merged_weights = w1 * model1_weights + w2 * model2_weights + w3 * model3_weightswhere w1 + w2 + w3 = 1
其中 w1 + w2 + w3 = 1
**SLERP (Spherical Linear Interpolation)**
- Interpolates along sphere in weight space
- Preserves magnitude of weight vectors
- Best for merging 2 models
- Smoother than linear
```python
**SLERP(球面线性插值)**
- 在权重空间的球面上进行插值
- 保留权重向量的幅度
- 最适合合并2个模型
- 比线性合并更平滑
```pythonSLERP formula
SLERP公式
merged = (sin((1-t)θ) / sin(θ)) * model1 + (sin(tθ) / sin(θ)) * model2
merged = (sin((1-t)θ) / sin(θ)) * model1 + (sin(tθ) / sin(θ)) * model2
where θ = arccos(dot(model1, model2))
其中 θ = arccos(dot(model1, model2))
t ∈ [0, 1]
t ∈ [0, 1]
**Task Arithmetic**
- Extract "task vectors" (fine-tuned - base)
- Combine task vectors, add to base
- Good for merging multiple specialized models
```python
**任务算术**
- 提取“任务向量”(微调模型 - 基础模型)
- 合并任务向量后添加到基础模型
- 适合合并多个专用模型
```pythonTask vector
任务向量
task_vector = finetuned_model - base_model
task_vector = finetuned_model - base_model
Merge multiple task vectors
合并多个任务向量
merged = base_model + α₁task_vector₁ + α₂task_vector₂
**TIES-Merging**
- Task arithmetic + sparsification
- Resolves sign conflicts in parameters
- Best for merging many task-specific models
**DARE (Drop And REscale)**
- Randomly drops fine-tuned parameters
- Rescales remaining parameters
- Reduces redundancy, maintains performancemerged = base_model + α₁task_vector₁ + α₂task_vector₂
**TIES-Merging**
- 任务算术+稀疏化
- 解决参数中的符号冲突
- 最适合合并多个任务专用模型
**DARE(Drop And REscale,丢弃并重缩放)**
- 随机丢弃微调后的参数
- 重新缩放剩余参数
- 减少冗余,保持性能2. Configuration Structure
2. 配置结构
yaml
undefinedyaml
undefinedBasic structure
基础结构
merge_method: <method> # linear, slerp, ties, dare_ties, task_arithmetic
base_model: <path> # Optional: base model for task arithmetic
models:
-
model: <path/to/model1> parameters: weight: <float> # Merge weight density: <float> # For TIES/DARE
-
model: <path/to/model2> parameters: weight: <float>
parameters:
Method-specific parameters
dtype: <dtype> # bfloat16, float16, float32
merge_method: <method> # linear, slerp, ties, dare_ties, task_arithmetic
base_model: <path> # 可选:任务算术使用的基础模型
models:
-
model: <path/to/model1> parameters: weight: <float> # 合并权重 density: <float> # 用于TIES/DARE
-
model: <path/to/model2> parameters: weight: <float>
parameters:
方法特定参数
dtype: <dtype> # bfloat16, float16, float32
Optional
可选
slices: # Layer-wise merging
tokenizer: # Tokenizer configuration
undefinedslices: # 分层合并
tokenizer: # 分词器配置
undefinedMerge Methods Guide
合并方法指南
Linear Merge
线性合并
Best for: Simple model combinations, equal weighting
yaml
merge_method: linear
models:
- model: WizardLM/WizardMath-7B-V1.1
parameters:
weight: 0.4
- model: teknium/OpenHermes-2.5-Mistral-7B
parameters:
weight: 0.3
- model: NousResearch/Nous-Hermes-2-Mistral-7B-DPO
parameters:
weight: 0.3
dtype: bfloat16最佳适用场景:简单模型组合、等权重合并
yaml
merge_method: linear
models:
- model: WizardLM/WizardMath-7B-V1.1
parameters:
weight: 0.4
- model: teknium/OpenHermes-2.5-Mistral-7B
parameters:
weight: 0.3
- model: NousResearch/Nous-Hermes-2-Mistral-7B-DPO
parameters:
weight: 0.3
dtype: bfloat16SLERP Merge
SLERP合并
Best for: Two models, smooth interpolation
yaml
merge_method: slerp
slices:
- sources:
- model: mistralai/Mistral-7B-v0.1
layer_range: [0, 32]
- model: teknium/OpenHermes-2.5-Mistral-7B
layer_range: [0, 32]
parameters:
t: 0.5 # 0.0 = first model, 1.0 = second model
dtype: bfloat16Layer-specific SLERP:
yaml
merge_method: slerp
slices:
- sources:
- model: model_a
layer_range: [0, 32]
- model: model_b
layer_range: [0, 32]
parameters:
t:
- filter: self_attn # Attention layers
value: 0.3
- filter: mlp # MLP layers
value: 0.7
- value: 0.5 # Default for other layers
dtype: bfloat16最佳适用场景:两个模型的平滑插值
yaml
merge_method: slerp
slices:
- sources:
- model: mistralai/Mistral-7B-v0.1
layer_range: [0, 32]
- model: teknium/OpenHermes-2.5-Mistral-7B
layer_range: [0, 32]
parameters:
t: 0.5 # 0.0 = 第一个模型,1.0 = 第二个模型
dtype: bfloat16分层SLERP:
yaml
merge_method: slerp
slices:
- sources:
- model: model_a
layer_range: [0, 32]
- model: model_b
layer_range: [0, 32]
parameters:
t:
- filter: self_attn # 注意力层
value: 0.3
- filter: mlp # MLP层
value: 0.7
- value: 0.5 # 其他层默认值
dtype: bfloat16Task Arithmetic
任务算术
Best for: Combining specialized skills
yaml
merge_method: task_arithmetic
base_model: mistralai/Mistral-7B-v0.1
models:
- model: WizardLM/WizardMath-7B-V1.1 # Math
parameters:
weight: 0.5
- model: teknium/OpenHermes-2.5-Mistral-7B # Chat
parameters:
weight: 0.3
- model: ajibawa-2023/Code-Mistral-7B # Code
parameters:
weight: 0.2
dtype: bfloat16最佳适用场景:合并多种专用技能
yaml
merge_method: task_arithmetic
base_model: mistralai/Mistral-7B-v0.1
models:
- model: WizardLM/WizardMath-7B-V1.1 # 数学
parameters:
weight: 0.5
- model: teknium/OpenHermes-2.5-Mistral-7B # 对话
parameters:
weight: 0.3
- model: ajibawa-2023/Code-Mistral-7B # 代码
parameters:
weight: 0.2
dtype: bfloat16TIES-Merging
TIES-Merging
Best for: Many models, resolving conflicts
yaml
merge_method: ties
base_model: mistralai/Mistral-7B-v0.1
models:
- model: WizardLM/WizardMath-7B-V1.1
parameters:
density: 0.5 # Keep top 50% of parameters
weight: 1.0
- model: teknium/OpenHermes-2.5-Mistral-7B
parameters:
density: 0.5
weight: 1.0
- model: NousResearch/Nous-Hermes-2-Mistral-7B-DPO
parameters:
density: 0.5
weight: 1.0
parameters:
normalize: true
dtype: bfloat16最佳适用场景:多个模型,解决冲突
yaml
merge_method: ties
base_model: mistralai/Mistral-7B-v0.1
models:
- model: WizardLM/WizardMath-7B-V1.1
parameters:
density: 0.5 # 保留前50%的参数
weight: 1.0
- model: teknium/OpenHermes-2.5-Mistral-7B
parameters:
density: 0.5
weight: 1.0
- model: NousResearch/Nous-Hermes-2-Mistral-7B-DPO
parameters:
density: 0.5
weight: 1.0
parameters:
normalize: true
dtype: bfloat16DARE Merge
DARE合并
Best for: Reducing redundancy
yaml
merge_method: dare_ties
base_model: mistralai/Mistral-7B-v0.1
models:
- model: WizardLM/WizardMath-7B-V1.1
parameters:
density: 0.5 # Drop 50% of deltas
weight: 0.6
- model: teknium/OpenHermes-2.5-Mistral-7B
parameters:
density: 0.5
weight: 0.4
parameters:
int8_mask: true # Use int8 for masks (saves memory)
dtype: bfloat16最佳适用场景:减少冗余
yaml
merge_method: dare_ties
base_model: mistralai/Mistral-7B-v0.1
models:
- model: WizardLM/WizardMath-7B-V1.1
parameters:
density: 0.5 # 丢弃50%的增量
weight: 0.6
- model: teknium/OpenHermes-2.5-Mistral-7B
parameters:
density: 0.5
weight: 0.4
parameters:
int8_mask: true # 使用int8掩码(节省内存)
dtype: bfloat16Advanced Patterns
高级模式
Layer-wise Merging
分层合并
yaml
undefinedyaml
undefinedDifferent models for different layers
不同层使用不同模型
merge_method: passthrough
slices:
- sources:
- model: mistralai/Mistral-7B-v0.1 layer_range: [0, 16] # First half
- sources:
- model: teknium/OpenHermes-2.5-Mistral-7B layer_range: [16, 32] # Second half dtype: bfloat16
undefinedmerge_method: passthrough
slices:
- sources:
- model: mistralai/Mistral-7B-v0.1 layer_range: [0, 16] # 前半部分
- sources:
- model: teknium/OpenHermes-2.5-Mistral-7B layer_range: [16, 32] # 后半部分 dtype: bfloat16
undefinedMoE from Merged Models
基于合并模型的MoE(混合专家)
yaml
undefinedyaml
undefinedCreate Mixture of Experts
创建混合专家模型
merge_method: moe
base_model: mistralai/Mistral-7B-v0.1
experts:
- source_model: WizardLM/WizardMath-7B-V1.1
positive_prompts:
- "math"
- "calculate"
- source_model: teknium/OpenHermes-2.5-Mistral-7B
positive_prompts:
- "chat"
- "conversation"
- source_model: ajibawa-2023/Code-Mistral-7B
positive_prompts:
- "code"
- "python" dtype: bfloat16
undefinedmerge_method: moe
base_model: mistralai/Mistral-7B-v0.1
experts:
- source_model: WizardLM/WizardMath-7B-V1.1
positive_prompts:
- "math"
- "calculate"
- source_model: teknium/OpenHermes-2.5-Mistral-7B
positive_prompts:
- "chat"
- "conversation"
- source_model: ajibawa-2023/Code-Mistral-7B
positive_prompts:
- "code"
- "python" dtype: bfloat16
undefinedTokenizer Merging
分词器合并
yaml
merge_method: linear
models:
- model: mistralai/Mistral-7B-v0.1
- model: custom/specialized-model
tokenizer:
source: "union" # Combine vocabularies from both models
tokens:
<|special_token|>:
source: "custom/specialized-model"yaml
merge_method: linear
models:
- model: mistralai/Mistral-7B-v0.1
- model: custom/specialized-model
tokenizer:
source: "union" # 合并两个模型的词汇表
tokens:
<|special_token|>:
source: "custom/specialized-model"Best Practices
最佳实践
1. Model Compatibility
1. 模型兼容性
python
undefinedpython
undefined✅ Good: Same architecture
✅ 推荐:相同架构
models = [
"mistralai/Mistral-7B-v0.1",
"teknium/OpenHermes-2.5-Mistral-7B", # Both Mistral 7B
]
models = [
"mistralai/Mistral-7B-v0.1",
"teknium/OpenHermes-2.5-Mistral-7B", # 均为Mistral 7B架构
]
❌ Bad: Different architectures
❌ 不推荐:不同架构
models = [
"meta-llama/Llama-2-7b-hf", # Llama
"mistralai/Mistral-7B-v0.1", # Mistral (incompatible!)
]
undefinedmodels = [
"meta-llama/Llama-2-7b-hf", # Llama架构
"mistralai/Mistral-7B-v0.1", # Mistral架构(不兼容!)
]
undefined2. Weight Selection
2. 权重选择
yaml
undefinedyaml
undefined✅ Good: Weights sum to 1.0
✅ 推荐:权重总和为1.0
models:
- model: model_a parameters: weight: 0.6
- model: model_b parameters: weight: 0.4 # 0.6 + 0.4 = 1.0
models:
- model: model_a parameters: weight: 0.6
- model: model_b parameters: weight: 0.4 # 0.6 + 0.4 = 1.0
⚠️ Acceptable: Weights don't sum to 1 (for task arithmetic)
⚠️ 可接受:权重总和不为1(适用于任务算术)
models:
- model: model_a parameters: weight: 0.8
- model: model_b parameters: weight: 0.8 # May boost performance
undefinedmodels:
- model: model_a parameters: weight: 0.8
- model: model_b parameters: weight: 0.8 # 可能提升性能
undefined3. Method Selection
3. 方法选择
python
undefinedpython
undefinedChoose merge method based on use case:
根据使用场景选择合并方法:
2 models, smooth blend → SLERP
2个模型,平滑融合 → SLERP
merge_method = "slerp"
merge_method = "slerp"
3+ models, simple average → Linear
3个及以上模型,简单平均 → 线性合并
merge_method = "linear"
merge_method = "linear"
Multiple task-specific models → Task Arithmetic or TIES
多个任务专用模型 → 任务算术或TIES
merge_method = "ties"
merge_method = "ties"
Want to reduce redundancy → DARE
希望减少冗余 → DARE
merge_method = "dare_ties"
undefinedmerge_method = "dare_ties"
undefined4. Density Tuning (TIES/DARE)
4. 密度调优(TIES/DARE)
yaml
undefinedyaml
undefinedStart conservative (keep more parameters)
从保守配置开始(保留更多参数)
parameters:
density: 0.8 # Keep 80%
parameters:
density: 0.8 # 保留80%
If performance good, increase sparsity
如果性能良好,增加稀疏度
parameters:
density: 0.5 # Keep 50%
parameters:
density: 0.5 # 保留50%
If performance degrades, reduce sparsity
如果性能下降,降低稀疏度
parameters:
density: 0.9 # Keep 90%
undefinedparameters:
density: 0.9 # 保留90%
undefined5. Layer-specific Merging
5. 分层合并
yaml
undefinedyaml
undefinedPreserve base model's beginning and end
保留基础模型的开头和结尾层
merge_method: passthrough
slices:
- sources:
- model: base_model layer_range: [0, 2] # Keep first layers
- sources:
- model: merged_middle # Merge middle layers layer_range: [2, 30]
- sources:
- model: base_model layer_range: [30, 32] # Keep last layers
undefinedmerge_method: passthrough
slices:
- sources:
- model: base_model layer_range: [0, 2] # 保留前几层
- sources:
- model: merged_middle # 合并中间层 layer_range: [2, 30]
- sources:
- model: base_model layer_range: [30, 32] # 保留最后几层
undefinedEvaluation & Testing
评估与测试
Benchmark Merged Models
基准测试合并模型
python
from transformers import AutoModelForCausalLM, AutoTokenizerpython
from transformers import AutoModelForCausalLM, AutoTokenizerLoad merged model
加载合并后的模型
model = AutoModelForCausalLM.from_pretrained("./merged-model")
tokenizer = AutoTokenizer.from_pretrained("./merged-model")
model = AutoModelForCausalLM.from_pretrained("./merged-model")
tokenizer = AutoTokenizer.from_pretrained("./merged-model")
Test on various tasks
在不同任务上测试
test_prompts = {
"math": "Calculate: 25 * 17 =",
"code": "Write a Python function to reverse a string:",
"chat": "What is the capital of France?",
}
for task, prompt in test_prompts.items():
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
print(f"{task}: {tokenizer.decode(outputs[0])}")
undefinedtest_prompts = {
"math": "计算:25 * 17 =",
"code": "编写一个Python函数来反转字符串:",
"chat": "法国的首都是什么?",
}
for task, prompt in test_prompts.items():
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
print(f"{task}:{tokenizer.decode(outputs[0])}")
undefinedCommon Benchmarks
常用基准测试
- Open LLM Leaderboard: General capabilities
- MT-Bench: Multi-turn conversation
- MMLU: Multitask accuracy
- HumanEval: Code generation
- GSM8K: Math reasoning
- Open LLM Leaderboard:通用能力
- MT-Bench:多轮对话
- MMLU:多任务准确率
- HumanEval:代码生成
- GSM8K:数学推理
Production Deployment
生产部署
Save and Upload
保存并上传
python
from transformers import AutoModelForCausalLM, AutoTokenizerpython
from transformers import AutoModelForCausalLM, AutoTokenizerLoad merged model
加载合并后的模型
model = AutoModelForCausalLM.from_pretrained("./merged-model")
tokenizer = AutoTokenizer.from_pretrained("./merged-model")
model = AutoModelForCausalLM.from_pretrained("./merged-model")
tokenizer = AutoTokenizer.from_pretrained("./merged-model")
Upload to HuggingFace Hub
上传至HuggingFace Hub
model.push_to_hub("username/my-merged-model")
tokenizer.push_to_hub("username/my-merged-model")
undefinedmodel.push_to_hub("username/my-merged-model")
tokenizer.push_to_hub("username/my-merged-model")
undefinedQuantize Merged Model
量化合并模型
bash
undefinedbash
undefinedQuantize with GGUF
使用GGUF量化
python convert.py ./merged-model --outtype f16 --outfile merged-model.gguf
python convert.py ./merged-model --outtype f16 --outfile merged-model.gguf
Quantize with GPTQ
使用GPTQ量化
python quantize_gptq.py ./merged-model --bits 4 --group_size 128
undefinedpython quantize_gptq.py ./merged-model --bits 4 --group_size 128
undefinedCommon Pitfalls
常见陷阱
❌ Pitfall 1: Merging Incompatible Models
❌ 陷阱1:合并不兼容模型
yaml
undefinedyaml
undefinedWrong: Different architectures
错误:不同架构
models:
- model: meta-llama/Llama-2-7b # Llama architecture
- model: mistralai/Mistral-7B # Mistral architecture
**Fix**: Only merge models with same architecturemodels:
- model: meta-llama/Llama-2-7b # Llama架构
- model: mistralai/Mistral-7B # Mistral架构
**修复方案**:仅合并相同架构的模型❌ Pitfall 2: Over-weighting One Model
❌ 陷阱2:单一模型权重占比过高
yaml
undefinedyaml
undefinedSuboptimal: One model dominates
次优:单一模型主导
models:
- model: model_a parameters: weight: 0.95 # Too high
- model: model_b parameters: weight: 0.05 # Too low
**Fix**: Use more balanced weights (0.3-0.7 range)models:
- model: model_a parameters: weight: 0.95 # 占比过高
- model: model_b parameters: weight: 0.05 # 占比过低
**修复方案**:使用更均衡的权重(0.3-0.7范围)❌ Pitfall 3: Not Evaluating
❌ 陷阱3:未进行评估
bash
undefinedbash
undefinedWrong: Merge and deploy without testing
错误:合并后直接部署,未测试
mergekit-yaml config.yml ./merged-model
mergekit-yaml config.yml ./merged-model
Deploy immediately (risky!)
立即部署(风险高!)
**Fix**: Always benchmark before deploying
**修复方案**:部署前始终进行基准测试Resources
资源
- mergekit GitHub: https://github.com/arcee-ai/mergekit
- HuggingFace Tutorial: https://huggingface.co/blog/mlabonne/merge-models
- LazyMergekit: Automated merging notebook
- TIES Paper: https://arxiv.org/abs/2306.01708
- DARE Paper: https://arxiv.org/abs/2311.03099
- mergekit GitHub:https://github.com/arcee-ai/mergekit
- HuggingFace教程:https://huggingface.co/blog/mlabonne/merge-models
- LazyMergekit:自动化合并笔记本
- TIES论文:https://arxiv.org/abs/2306.01708
- DARE论文:https://arxiv.org/abs/2311.03099
See Also
另请参阅
- - Deep dive into merge algorithms
references/methods.md - - Real-world merge configurations
references/examples.md - - Benchmarking and testing strategies
references/evaluation.md
- - 合并算法深度解析
references/methods.md - - 真实场景合并配置
references/examples.md - - 基准测试与测试策略
references/evaluation.md