llm-training
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseLLM Training
LLM 训练
Frameworks and techniques for training and finetuning large language models.
用于训练和微调大语言模型的框架与技术。
Framework Comparison
框架对比
| Framework | Best For | Multi-GPU | Memory Efficient |
|---|---|---|---|
| Accelerate | Simple distributed | Yes | Basic |
| DeepSpeed | Large models, ZeRO | Yes | Excellent |
| PyTorch Lightning | Clean training loops | Yes | Good |
| Ray Train | Scalable, multi-node | Yes | Good |
| TRL | RLHF, reward modeling | Yes | Good |
| Unsloth | Fast LoRA finetuning | Limited | Excellent |
| 框架 | 适用场景 | 多GPU支持 | 内存效率 |
|---|---|---|---|
| Accelerate | 简单分布式训练 | 是 | 基础级 |
| DeepSpeed | 大模型训练、ZeRO优化 | 是 | 卓越级 |
| PyTorch Lightning | 简洁训练循环 | 是 | 良好 |
| Ray Train | 可扩展多节点训练 | 是 | 良好 |
| TRL | RLHF、奖励模型训练 | 是 | 良好 |
| Unsloth | 快速LoRA微调 | 有限支持 | 卓越级 |
Accelerate (HuggingFace)
Accelerate(HuggingFace)
Minimal wrapper for distributed training. Run for interactive setup.
accelerate configKey concept: Wrap model, optimizer, dataloader with , use for loss.
accelerator.prepare()accelerator.backward()分布式训练的轻量级封装。运行进行交互式配置。
accelerate config核心概念:用封装模型、优化器和数据加载器,使用计算损失。
accelerator.prepare()accelerator.backward()DeepSpeed (Large Models)
DeepSpeed(大模型专用)
Microsoft's optimization library for training massive models.
ZeRO Stages:
- Stage 1: Optimizer states partitioned across GPUs
- Stage 2: + Gradients partitioned
- Stage 3: + Parameters partitioned (for largest models, 100B+)
Key concept: Configure via JSON, higher stages = more memory savings but more communication overhead.
微软推出的大规模模型训练优化库。
ZeRO 阶段:
- Stage 1:优化器状态在GPU间分区存储
- Stage 2:+ 梯度在GPU间分区存储
- Stage 3:+ 参数在GPU间分区存储(适用于100B+参数的超大模型)
核心概念:通过JSON配置,阶段越高内存节省越多,但通信开销也越大。
TRL (RLHF/DPO)
TRL(RLHF/DPO)
HuggingFace library for reinforcement learning from human feedback.
Training types:
- SFT (Supervised Finetuning): Standard instruction tuning
- DPO (Direct Preference Optimization): Simpler than RLHF, uses preference pairs
- PPO: Classic RLHF with reward model
Key concept: DPO is often preferred over PPO - simpler, no reward model needed, just chosen/rejected response pairs.
HuggingFace推出的基于人类反馈的强化学习库。
训练类型:
- SFT(监督微调):标准指令微调
- DPO(直接偏好优化):比RLHF更简单,使用偏好样本对
- PPO:经典RLHF,需搭配奖励模型
核心概念:DPO通常比PPO更受欢迎——更简单,无需奖励模型,仅需选择/拒绝的响应对即可。
Unsloth (Fast LoRA)
Unsloth(快速LoRA微调)
Optimized LoRA finetuning - 2x faster, 60% less memory.
Key concept: Drop-in replacement for standard LoRA with automatic optimizations. Best for 7B-13B models.
经过优化的LoRA微调——速度提升2倍,内存占用减少60%。
核心概念:可直接替代标准LoRA,自带自动优化。最适合7B-13B参数的模型。
Memory Optimization Techniques
内存优化技术
| Technique | Memory Savings | Trade-off |
|---|---|---|
| Gradient checkpointing | ~30-50% | Slower training |
| Mixed precision (fp16/bf16) | ~50% | Minor precision loss |
| 4-bit quantization (QLoRA) | ~75% | Some quality loss |
| Flash Attention | ~20-40% | Requires compatible GPU |
| Gradient accumulation | Effective batch↑ | No memory cost |
| 技术 | 内存节省比例 | 权衡 trade-off |
|---|---|---|
| Gradient checkpointing(梯度检查点) | ~30-50% | 训练速度变慢 |
| Mixed precision(fp16/bf16,混合精度) | ~50% | 精度损失极小 |
| 4-bit quantization(QLoRA,4位量化) | ~75% | 存在一定质量损失 |
| Flash Attention | ~20-40% | 需要兼容的GPU |
| Gradient accumulation(梯度累积) | 有效批量增大 | 无内存成本 |
Decision Guide
决策指南
| Scenario | Recommendation |
|---|---|
| Simple finetuning | Accelerate + PEFT |
| 7B-13B models | Unsloth (fastest) |
| 70B+ models | DeepSpeed ZeRO-3 |
| RLHF/DPO alignment | TRL |
| Multi-node cluster | Ray Train |
| Clean code structure | PyTorch Lightning |
| 场景 | 推荐方案 |
|---|---|
| 简单微调 | Accelerate + PEFT |
| 7B-13B参数模型 | Unsloth(速度最快) |
| 70B+参数模型 | DeepSpeed ZeRO-3 |
| RLHF/DPO对齐训练 | TRL |
| 多节点集群训练 | Ray Train |
| 代码结构简洁性优先 | PyTorch Lightning |
Resources
资源
- Accelerate: https://huggingface.co/docs/accelerate
- DeepSpeed: https://www.deepspeed.ai/
- TRL: https://huggingface.co/docs/trl
- Unsloth: https://github.com/unslothai/unsloth
- Accelerate: https://huggingface.co/docs/accelerate
- DeepSpeed: https://www.deepspeed.ai/
- TRL: https://huggingface.co/docs/trl
- Unsloth: https://github.com/unslothai/unsloth