pytorch-patterns
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePyTorch Development Patterns
PyTorch开发模式
Idiomatic PyTorch patterns and best practices for building robust, efficient, and reproducible deep learning applications.
用于构建稳健、高效且可复现的深度学习应用的PyTorch惯用模式与最佳实践。
When to Activate
适用场景
- Writing new PyTorch models or training scripts
- Reviewing deep learning code
- Debugging training loops or data pipelines
- Optimizing GPU memory usage or training speed
- Setting up reproducible experiments
- 编写新的PyTorch模型或训练脚本
- 审核深度学习代码
- 调试训练循环或数据流水线
- 优化GPU内存使用或训练速度
- 设置可复现的实验
Core Principles
核心原则
1. Device-Agnostic Code
1. 设备无关代码
Always write code that works on both CPU and GPU without hardcoding devices.
python
undefined始终编写可同时在CPU和GPU上运行的代码,避免硬编码设备。
python
undefinedGood: Device-agnostic
推荐:设备无关
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = MyModel().to(device)
data = data.to(device)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = MyModel().to(device)
data = data.to(device)
Bad: Hardcoded device
不推荐:硬编码设备
model = MyModel().cuda() # Crashes if no GPU
data = data.cuda()
undefinedmodel = MyModel().cuda() # 无GPU时会崩溃
data = data.cuda()
undefined2. Reproducibility First
2. 优先保证可复现性
Set all random seeds for reproducible results.
python
undefined设置所有随机种子以获得可复现的结果。
python
undefinedGood: Full reproducibility setup
推荐:完整的可复现性设置
def set_seed(seed: int = 42) -> None:
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
np.random.seed(seed)
random.seed(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
def set_seed(seed: int = 42) -> None:
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
np.random.seed(seed)
random.seed(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
Bad: No seed control
不推荐:无种子控制
model = MyModel() # Different weights every run
undefinedmodel = MyModel() # 每次运行权重都不同
undefined3. Explicit Shape Management
3. 显式管理张量形状
Always document and verify tensor shapes.
python
undefined始终记录并验证张量形状。
python
undefinedGood: Shape-annotated forward pass
推荐:标注形状的前向传播
def forward(self, x: torch.Tensor) -> torch.Tensor:
# x: (batch_size, channels, height, width)
x = self.conv1(x) # -> (batch_size, 32, H, W)
x = self.pool(x) # -> (batch_size, 32, H//2, W//2)
x = x.view(x.size(0), -1) # -> (batch_size, 32H//2W//2)
return self.fc(x) # -> (batch_size, num_classes)
def forward(self, x: torch.Tensor) -> torch.Tensor:
# x: (batch_size, channels, height, width)
x = self.conv1(x) # -> (batch_size, 32, H, W)
x = self.pool(x) # -> (batch_size, 32, H//2, W//2)
x = x.view(x.size(0), -1) # -> (batch_size, 32H//2W//2)
return self.fc(x) # -> (batch_size, num_classes)
Bad: No shape tracking
不推荐:无形状追踪
def forward(self, x):
x = self.conv1(x)
x = self.pool(x)
x = x.view(x.size(0), -1) # What size is this?
return self.fc(x) # Will this even work?
undefineddef forward(self, x):
x = self.conv1(x)
x = self.pool(x)
x = x.view(x.size(0), -1) # 这是什么尺寸?
return self.fc(x) # 这能运行吗?
undefinedModel Architecture Patterns
模型架构模式
Clean nn.Module Structure
清晰的nn.Module结构
python
undefinedpython
undefinedGood: Well-organized module
推荐:结构清晰的模块
class ImageClassifier(nn.Module):
def init(self, num_classes: int, dropout: float = 0.5) -> None:
super().init()
self.features = nn.Sequential(
nn.Conv2d(3, 64, kernel_size=3, padding=1),
nn.BatchNorm2d(64),
nn.ReLU(inplace=True),
nn.MaxPool2d(2),
)
self.classifier = nn.Sequential(
nn.Dropout(dropout),
nn.Linear(64 * 16 * 16, num_classes),
)
def forward(self, x: torch.Tensor) -> torch.Tensor:
x = self.features(x)
x = x.view(x.size(0), -1)
return self.classifier(x)class ImageClassifier(nn.Module):
def init(self, num_classes: int, dropout: float = 0.5) -> None:
super().init()
self.features = nn.Sequential(
nn.Conv2d(3, 64, kernel_size=3, padding=1),
nn.BatchNorm2d(64),
nn.ReLU(inplace=True),
nn.MaxPool2d(2),
)
self.classifier = nn.Sequential(
nn.Dropout(dropout),
nn.Linear(64 * 16 * 16, num_classes),
)
def forward(self, x: torch.Tensor) -> torch.Tensor:
x = self.features(x)
x = x.view(x.size(0), -1)
return self.classifier(x)Bad: Everything in forward
不推荐:所有逻辑都在forward中
class ImageClassifier(nn.Module):
def init(self):
super().init()
def forward(self, x):
x = F.conv2d(x, weight=self.make_weight()) # Creates weight each call!
return xundefinedclass ImageClassifier(nn.Module):
def init(self):
super().init()
def forward(self, x):
x = F.conv2d(x, weight=self.make_weight()) # 每次调用都创建权重!
return xundefinedProper Weight Initialization
正确的权重初始化
python
undefinedpython
undefinedGood: Explicit initialization
推荐:显式初始化
def init_weights(self, module: nn.Module) -> None:
if isinstance(module, nn.Linear):
nn.init.kaiming_normal(module.weight, mode="fan_out", nonlinearity="relu")
if module.bias is not None:
nn.init.zeros_(module.bias)
elif isinstance(module, nn.Conv2d):
nn.init.kaiming_normal_(module.weight, mode="fan_out", nonlinearity="relu")
elif isinstance(module, nn.BatchNorm2d):
nn.init.ones_(module.weight)
nn.init.zeros_(module.bias)
model = MyModel()
model.apply(model._init_weights)
undefineddef init_weights(self, module: nn.Module) -> None:
if isinstance(module, nn.Linear):
nn.init.kaiming_normal(module.weight, mode="fan_out", nonlinearity="relu")
if module.bias is not None:
nn.init.zeros_(module.bias)
elif isinstance(module, nn.Conv2d):
nn.init.kaiming_normal_(module.weight, mode="fan_out", nonlinearity="relu")
elif isinstance(module, nn.BatchNorm2d):
nn.init.ones_(module.weight)
nn.init.zeros_(module.bias)
model = MyModel()
model.apply(model._init_weights)
undefinedTraining Loop Patterns
训练循环模式
Standard Training Loop
标准训练循环
python
undefinedpython
undefinedGood: Complete training loop with best practices
推荐:包含最佳实践的完整训练循环
def train_one_epoch(
model: nn.Module,
dataloader: DataLoader,
optimizer: torch.optim.Optimizer,
criterion: nn.Module,
device: torch.device,
scaler: torch.amp.GradScaler | None = None,
) -> float:
model.train() # Always set train mode
total_loss = 0.0
for batch_idx, (data, target) in enumerate(dataloader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad(set_to_none=True) # More efficient than zero_grad()
# Mixed precision training
with torch.amp.autocast("cuda", enabled=scaler is not None):
output = model(data)
loss = criterion(output, target)
if scaler is not None:
scaler.scale(loss).backward()
scaler.unscale_(optimizer)
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
scaler.step(optimizer)
scaler.update()
else:
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
optimizer.step()
total_loss += loss.item()
return total_loss / len(dataloader)undefineddef train_one_epoch(
model: nn.Module,
dataloader: DataLoader,
optimizer: torch.optim.Optimizer,
criterion: nn.Module,
device: torch.device,
scaler: torch.amp.GradScaler | None = None,
) -> float:
model.train() # 始终设置训练模式
total_loss = 0.0
for batch_idx, (data, target) in enumerate(dataloader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad(set_to_none=True) # 比zero_grad()更高效
# 混合精度训练
with torch.amp.autocast("cuda", enabled=scaler is not None):
output = model(data)
loss = criterion(output, target)
if scaler is not None:
scaler.scale(loss).backward()
scaler.unscale_(optimizer)
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
scaler.step(optimizer)
scaler.update()
else:
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
optimizer.step()
total_loss += loss.item()
return total_loss / len(dataloader)undefinedValidation Loop
验证循环
python
undefinedpython
undefinedGood: Proper evaluation
推荐:规范的评估流程
@torch.no_grad() # More efficient than wrapping in torch.no_grad() block
def evaluate(
model: nn.Module,
dataloader: DataLoader,
criterion: nn.Module,
device: torch.device,
) -> tuple[float, float]:
model.eval() # Always set eval mode — disables dropout, uses running BN stats
total_loss = 0.0
correct = 0
total = 0
for data, target in dataloader:
data, target = data.to(device), target.to(device)
output = model(data)
total_loss += criterion(output, target).item()
correct += (output.argmax(1) == target).sum().item()
total += target.size(0)
return total_loss / len(dataloader), correct / totalundefined@torch.no_grad() # 比包裹在torch.no_grad()块中更高效
def evaluate(
model: nn.Module,
dataloader: DataLoader,
criterion: nn.Module,
device: torch.device,
) -> tuple[float, float]:
model.eval() # 始终设置评估模式——禁用dropout,使用BN运行时统计
total_loss = 0.0
correct = 0
total = 0
for data, target in dataloader:
data, target = data.to(device), target.to(device)
output = model(data)
total_loss += criterion(output, target).item()
correct += (output.argmax(1) == target).sum().item()
total += target.size(0)
return total_loss / len(dataloader), correct / totalundefinedData Pipeline Patterns
数据流水线模式
Custom Dataset
自定义Dataset
python
undefinedpython
undefinedGood: Clean Dataset with type hints
推荐:带类型提示的清晰Dataset
class ImageDataset(Dataset):
def init(
self,
image_dir: str,
labels: dict[str, int],
transform: transforms.Compose | None = None,
) -> None:
self.image_paths = list(Path(image_dir).glob("*.jpg"))
self.labels = labels
self.transform = transform
def __len__(self) -> int:
return len(self.image_paths)
def __getitem__(self, idx: int) -> tuple[torch.Tensor, int]:
img = Image.open(self.image_paths[idx]).convert("RGB")
label = self.labels[self.image_paths[idx].stem]
if self.transform:
img = self.transform(img)
return img, labelundefinedclass ImageDataset(Dataset):
def init(
self,
image_dir: str,
labels: dict[str, int],
transform: transforms.Compose | None = None,
) -> None:
self.image_paths = list(Path(image_dir).glob("*.jpg"))
self.labels = labels
self.transform = transform
def __len__(self) -> int:
return len(self.image_paths)
def __getitem__(self, idx: int) -> tuple[torch.Tensor, int]:
img = Image.open(self.image_paths[idx]).convert("RGB")
label = self.labels[self.image_paths[idx].stem]
if self.transform:
img = self.transform(img)
return img, labelundefinedEfficient DataLoader Configuration
高效的DataLoader配置
python
undefinedpython
undefinedGood: Optimized DataLoader
推荐:优化后的DataLoader
dataloader = DataLoader(
dataset,
batch_size=32,
shuffle=True, # Shuffle for training
num_workers=4, # Parallel data loading
pin_memory=True, # Faster CPU->GPU transfer
persistent_workers=True, # Keep workers alive between epochs
drop_last=True, # Consistent batch sizes for BatchNorm
)
dataloader = DataLoader(
dataset,
batch_size=32,
shuffle=True, # 训练时打乱数据
num_workers=4, # 并行数据加载
pin_memory=True, # 更快的CPU→GPU传输
persistent_workers=True, # 跨epoch保持worker存活
drop_last=True, # 为BatchNorm提供一致的批次大小
)
Bad: Slow defaults
不推荐:缓慢的默认配置
dataloader = DataLoader(dataset, batch_size=32) # num_workers=0, no pin_memory
undefineddataloader = DataLoader(dataset, batch_size=32) # num_workers=0,无pin_memory
undefinedCustom Collate for Variable-Length Data
变长数据的自定义Collate
python
undefinedpython
undefinedGood: Pad sequences in collate_fn
推荐:在collate_fn中对序列进行填充
def collate_fn(batch: list[tuple[torch.Tensor, int]]) -> tuple[torch.Tensor, torch.Tensor]:
sequences, labels = zip(*batch)
# Pad to max length in batch
padded = nn.utils.rnn.pad_sequence(sequences, batch_first=True, padding_value=0)
return padded, torch.tensor(labels)
dataloader = DataLoader(dataset, batch_size=32, collate_fn=collate_fn)
undefineddef collate_fn(batch: list[tuple[torch.Tensor, int]]) -> tuple[torch.Tensor, torch.Tensor]:
sequences, labels = zip(*batch)
# 填充至批次中的最大长度
padded = nn.utils.rnn.pad_sequence(sequences, batch_first=True, padding_value=0)
return padded, torch.tensor(labels)
dataloader = DataLoader(dataset, batch_size=32, collate_fn=collate_fn)
undefinedCheckpointing Patterns
检查点模式
Save and Load Checkpoints
保存与加载检查点
python
undefinedpython
undefinedGood: Complete checkpoint with all training state
推荐:包含所有训练状态的完整检查点
def save_checkpoint(
model: nn.Module,
optimizer: torch.optim.Optimizer,
epoch: int,
loss: float,
path: str,
) -> None:
torch.save({
"epoch": epoch,
"model_state_dict": model.state_dict(),
"optimizer_state_dict": optimizer.state_dict(),
"loss": loss,
}, path)
def load_checkpoint(
path: str,
model: nn.Module,
optimizer: torch.optim.Optimizer | None = None,
) -> dict:
checkpoint = torch.load(path, map_location="cpu", weights_only=True)
model.load_state_dict(checkpoint["model_state_dict"])
if optimizer:
optimizer.load_state_dict(checkpoint["optimizer_state_dict"])
return checkpoint
def save_checkpoint(
model: nn.Module,
optimizer: torch.optim.Optimizer,
epoch: int,
loss: float,
path: str,
) -> None:
torch.save({
"epoch": epoch,
"model_state_dict": model.state_dict(),
"optimizer_state_dict": optimizer.state_dict(),
"loss": loss,
}, path)
def load_checkpoint(
path: str,
model: nn.Module,
optimizer: torch.optim.Optimizer | None = None,
) -> dict:
checkpoint = torch.load(path, map_location="cpu", weights_only=True)
model.load_state_dict(checkpoint["model_state_dict"])
if optimizer:
optimizer.load_state_dict(checkpoint["optimizer_state_dict"])
return checkpoint
Bad: Only saving model weights (can't resume training)
不推荐:仅保存模型权重(无法恢复训练)
torch.save(model.state_dict(), "model.pt")
undefinedtorch.save(model.state_dict(), "model.pt")
undefinedPerformance Optimization
性能优化
Mixed Precision Training
混合精度训练
python
undefinedpython
undefinedGood: AMP with GradScaler
推荐:使用GradScaler的AMP
scaler = torch.amp.GradScaler("cuda")
for data, target in dataloader:
with torch.amp.autocast("cuda"):
output = model(data)
loss = criterion(output, target)
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()
optimizer.zero_grad(set_to_none=True)
undefinedscaler = torch.amp.GradScaler("cuda")
for data, target in dataloader:
with torch.amp.autocast("cuda"):
output = model(data)
loss = criterion(output, target)
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()
optimizer.zero_grad(set_to_none=True)
undefinedGradient Checkpointing for Large Models
大模型的梯度检查点
python
undefinedpython
undefinedGood: Trade compute for memory
推荐:以计算换内存
from torch.utils.checkpoint import checkpoint
class LargeModel(nn.Module):
def forward(self, x: torch.Tensor) -> torch.Tensor:
# Recompute activations during backward to save memory
x = checkpoint(self.block1, x, use_reentrant=False)
x = checkpoint(self.block2, x, use_reentrant=False)
return self.head(x)
undefinedfrom torch.utils.checkpoint import checkpoint
class LargeModel(nn.Module):
def forward(self, x: torch.Tensor) -> torch.Tensor:
# 反向传播时重新计算激活值以节省内存
x = checkpoint(self.block1, x, use_reentrant=False)
x = checkpoint(self.block2, x, use_reentrant=False)
return self.head(x)
undefinedtorch.compile for Speed
使用torch.compile提升速度
python
undefinedpython
undefinedGood: Compile the model for faster execution (PyTorch 2.0+)
推荐:编译模型以加快执行速度(PyTorch 2.0+)
model = MyModel().to(device)
model = torch.compile(model, mode="reduce-overhead")
model = MyModel().to(device)
model = torch.compile(model, mode="reduce-overhead")
Modes: "default" (safe), "reduce-overhead" (faster), "max-autotune" (fastest)
模式:"default"(安全), "reduce-overhead"(更快), "max-autotune"(最快)
undefinedundefinedQuick Reference: PyTorch Idioms
快速参考:PyTorch惯用写法
| Idiom | Description |
|---|---|
| Always set mode before train/eval |
| Disable gradients for inference |
| More efficient gradient clearing |
| Device-agnostic tensor/model placement |
| Mixed precision for 2x speed |
| Faster CPU→GPU data transfer |
| JIT compilation for speed (2.0+) |
| Secure model loading |
| Reproducible experiments |
| Trade compute for memory |
| 惯用写法 | 描述 |
|---|---|
| 训练/评估前务必设置模式 |
| 推理时禁用梯度计算 |
| 更高效的梯度清除方式 |
| 设备无关的张量/模型放置 |
| 混合精度训练,速度提升2倍 |
| 更快的CPU→GPU数据传输 |
| JIT编译提速(2.0+) |
| 安全的模型加载方式 |
| 实现可复现的实验 |
| 以计算换内存 |
Anti-Patterns to Avoid
需要避免的反模式
python
undefinedpython
undefinedBad: Forgetting model.eval() during validation
不推荐:验证时忘记设置model.eval()
model.train()
with torch.no_grad():
output = model(val_data) # Dropout still active! BatchNorm uses batch stats!
model.train()
with torch.no_grad():
output = model(val_data) # Dropout仍处于激活状态!BatchNorm使用批次统计!
Good: Always set eval mode
推荐:始终设置评估模式
model.eval()
with torch.no_grad():
output = model(val_data)
model.eval()
with torch.no_grad():
output = model(val_data)
Bad: In-place operations breaking autograd
不推荐:原地操作破坏自动求导
x = F.relu(x, inplace=True) # Can break gradient computation
x += residual # In-place add breaks autograd graph
x = F.relu(x, inplace=True) # 可能破坏梯度计算
x += residual # 原地加法破坏自动求导图
Good: Out-of-place operations
推荐:非原地操作
x = F.relu(x)
x = x + residual
x = F.relu(x)
x = x + residual
Bad: Moving data to GPU inside the training loop repeatedly
不推荐:在训练循环内重复将数据移至GPU
for data, target in dataloader:
model = model.cuda() # Moves model EVERY iteration!
for data, target in dataloader:
model = model.cuda() # 每次迭代都移动模型!
Good: Move model once before the loop
推荐:循环前仅移动一次模型
model = model.to(device)
for data, target in dataloader:
data, target = data.to(device), target.to(device)
model = model.to(device)
for data, target in dataloader:
data, target = data.to(device), target.to(device)
Bad: Using .item() before backward
不推荐:在backward前调用.item()
loss = criterion(output, target).item() # Detaches from graph!
loss.backward() # Error: can't backprop through .item()
loss = criterion(output, target).item() # 从图中分离!
loss.backward() # 错误:无法通过.item()反向传播
Good: Call .item() only for logging
推荐:仅在日志中调用.item()
loss = criterion(output, target)
loss.backward()
print(f"Loss: {loss.item():.4f}") # .item() after backward is fine
loss = criterion(output, target)
loss.backward()
print(f"Loss: {loss.item():.4f}") # backward后调用.item()没问题
Bad: Not using torch.save properly
不推荐:不正确使用torch.save
torch.save(model, "model.pt") # Saves entire model (fragile, not portable)
torch.save(model, "model.pt") # 保存整个模型(脆弱,不具可移植性)
Good: Save state_dict
推荐:保存state_dict
torch.save(model.state_dict(), "model.pt")
__Remember__: PyTorch code should be device-agnostic, reproducible, and memory-conscious. When in doubt, profile with `torch.profiler` and check GPU memory with `torch.cuda.memory_summary()`.torch.save(model.state_dict(), "model.pt")
__请记住__:PyTorch代码应具备设备无关性、可复现性,且注重内存使用。如有疑问,请使用`torch.profiler`进行性能分析,并通过`torch.cuda.memory_summary()`查看GPU内存情况。