llm-training

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

LLM Training

LLM 训练

Frameworks and techniques for training and finetuning large language models.

用于训练和微调大语言模型的框架与技术。

Framework Comparison

框架对比

Framework	Best For	Multi-GPU	Memory Efficient
Accelerate	Simple distributed	Yes	Basic
DeepSpeed	Large models, ZeRO	Yes	Excellent
PyTorch Lightning	Clean training loops	Yes	Good
Ray Train	Scalable, multi-node	Yes	Good
TRL	RLHF, reward modeling	Yes	Good
Unsloth	Fast LoRA finetuning	Limited	Excellent

框架	适用场景	多GPU支持	内存效率
Accelerate	简单分布式训练	是	基础级
DeepSpeed	大模型训练、ZeRO优化	是	卓越级
PyTorch Lightning	简洁训练循环	是	良好
Ray Train	可扩展多节点训练	是	良好
TRL	RLHF、奖励模型训练	是	良好
Unsloth	快速LoRA微调	有限支持	卓越级

Accelerate (HuggingFace)

Accelerate（HuggingFace）

Minimal wrapper for distributed training. Run

accelerate config

for interactive setup.

Key concept: Wrap model, optimizer, dataloader with

accelerator.prepare()

, use

accelerator.backward()

for loss.

分布式训练的轻量级封装。运行

accelerate config

进行交互式配置。

核心概念：用

accelerator.prepare()

封装模型、优化器和数据加载器，使用

accelerator.backward()

计算损失。

DeepSpeed (Large Models)

DeepSpeed（大模型专用）

Microsoft's optimization library for training massive models.

ZeRO Stages:

Stage 1: Optimizer states partitioned across GPUs
Stage 2: + Gradients partitioned
Stage 3: + Parameters partitioned (for largest models, 100B+)

Key concept: Configure via JSON, higher stages = more memory savings but more communication overhead.

微软推出的大规模模型训练优化库。

ZeRO 阶段：

Stage 1：优化器状态在GPU间分区存储
Stage 2：+ 梯度在GPU间分区存储
Stage 3：+ 参数在GPU间分区存储（适用于100B+参数的超大模型）

核心概念：通过JSON配置，阶段越高内存节省越多，但通信开销也越大。

TRL (RLHF/DPO)

TRL（RLHF/DPO）

HuggingFace library for reinforcement learning from human feedback.

Training types:

SFT (Supervised Finetuning): Standard instruction tuning
DPO (Direct Preference Optimization): Simpler than RLHF, uses preference pairs
PPO: Classic RLHF with reward model

Key concept: DPO is often preferred over PPO - simpler, no reward model needed, just chosen/rejected response pairs.

HuggingFace推出的基于人类反馈的强化学习库。

训练类型：

SFT（监督微调）：标准指令微调
DPO（直接偏好优化）：比RLHF更简单，使用偏好样本对
PPO：经典RLHF，需搭配奖励模型

核心概念：DPO通常比PPO更受欢迎——更简单，无需奖励模型，仅需选择/拒绝的响应对即可。

Unsloth (Fast LoRA)

Unsloth（快速LoRA微调）

Optimized LoRA finetuning - 2x faster, 60% less memory.

Key concept: Drop-in replacement for standard LoRA with automatic optimizations. Best for 7B-13B models.

经过优化的LoRA微调——速度提升2倍，内存占用减少60%。

核心概念：可直接替代标准LoRA，自带自动优化。最适合7B-13B参数的模型。

Memory Optimization Techniques

内存优化技术

Technique	Memory Savings	Trade-off
Gradient checkpointing	~30-50%	Slower training
Mixed precision (fp16/bf16)	~50%	Minor precision loss
4-bit quantization (QLoRA)	~75%	Some quality loss
Flash Attention	~20-40%	Requires compatible GPU
Gradient accumulation	Effective batch↑	No memory cost

技术	内存节省比例	权衡 trade-off
Gradient checkpointing（梯度检查点）	~30-50%	训练速度变慢
Mixed precision（fp16/bf16，混合精度）	~50%	精度损失极小
4-bit quantization（QLoRA，4位量化）	~75%	存在一定质量损失
Flash Attention	~20-40%	需要兼容的GPU
Gradient accumulation（梯度累积）	有效批量增大	无内存成本

Decision Guide

决策指南

Scenario	Recommendation
Simple finetuning	Accelerate + PEFT
7B-13B models	Unsloth (fastest)
70B+ models	DeepSpeed ZeRO-3
RLHF/DPO alignment	TRL
Multi-node cluster	Ray Train
Clean code structure	PyTorch Lightning

场景	推荐方案
简单微调	Accelerate + PEFT
7B-13B参数模型	Unsloth（速度最快）
70B+参数模型	DeepSpeed ZeRO-3
RLHF/DPO对齐训练	TRL
多节点集群训练	Ray Train
代码结构简洁性优先	PyTorch Lightning

Resources

资源

Accelerate: https://huggingface.co/docs/accelerate
DeepSpeed: https://www.deepspeed.ai/
TRL: https://huggingface.co/docs/trl
Unsloth: https://github.com/unslothai/unsloth

Accelerate: https://huggingface.co/docs/accelerate
DeepSpeed: https://www.deepspeed.ai/
TRL: https://huggingface.co/docs/trl
Unsloth: https://github.com/unslothai/unsloth