pytorch-patterns

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

PyTorch Development Patterns

PyTorch开发模式

Idiomatic PyTorch patterns and best practices for building robust, efficient, and reproducible deep learning applications.
用于构建稳健、高效且可复现的深度学习应用的PyTorch惯用模式与最佳实践。

When to Activate

适用场景

  • Writing new PyTorch models or training scripts
  • Reviewing deep learning code
  • Debugging training loops or data pipelines
  • Optimizing GPU memory usage or training speed
  • Setting up reproducible experiments
  • 编写新的PyTorch模型或训练脚本
  • 审核深度学习代码
  • 调试训练循环或数据流水线
  • 优化GPU内存使用或训练速度
  • 设置可复现的实验

Core Principles

核心原则

1. Device-Agnostic Code

1. 设备无关代码

Always write code that works on both CPU and GPU without hardcoding devices.
python
undefined
始终编写可同时在CPU和GPU上运行的代码,避免硬编码设备。
python
undefined

Good: Device-agnostic

推荐:设备无关

device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model = MyModel().to(device) data = data.to(device)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model = MyModel().to(device) data = data.to(device)

Bad: Hardcoded device

不推荐:硬编码设备

model = MyModel().cuda() # Crashes if no GPU data = data.cuda()
undefined
model = MyModel().cuda() # 无GPU时会崩溃 data = data.cuda()
undefined

2. Reproducibility First

2. 优先保证可复现性

Set all random seeds for reproducible results.
python
undefined
设置所有随机种子以获得可复现的结果。
python
undefined

Good: Full reproducibility setup

推荐:完整的可复现性设置

def set_seed(seed: int = 42) -> None: torch.manual_seed(seed) torch.cuda.manual_seed_all(seed) np.random.seed(seed) random.seed(seed) torch.backends.cudnn.deterministic = True torch.backends.cudnn.benchmark = False
def set_seed(seed: int = 42) -> None: torch.manual_seed(seed) torch.cuda.manual_seed_all(seed) np.random.seed(seed) random.seed(seed) torch.backends.cudnn.deterministic = True torch.backends.cudnn.benchmark = False

Bad: No seed control

不推荐:无种子控制

model = MyModel() # Different weights every run
undefined
model = MyModel() # 每次运行权重都不同
undefined

3. Explicit Shape Management

3. 显式管理张量形状

Always document and verify tensor shapes.
python
undefined
始终记录并验证张量形状。
python
undefined

Good: Shape-annotated forward pass

推荐:标注形状的前向传播

def forward(self, x: torch.Tensor) -> torch.Tensor: # x: (batch_size, channels, height, width) x = self.conv1(x) # -> (batch_size, 32, H, W) x = self.pool(x) # -> (batch_size, 32, H//2, W//2) x = x.view(x.size(0), -1) # -> (batch_size, 32H//2W//2) return self.fc(x) # -> (batch_size, num_classes)
def forward(self, x: torch.Tensor) -> torch.Tensor: # x: (batch_size, channels, height, width) x = self.conv1(x) # -> (batch_size, 32, H, W) x = self.pool(x) # -> (batch_size, 32, H//2, W//2) x = x.view(x.size(0), -1) # -> (batch_size, 32H//2W//2) return self.fc(x) # -> (batch_size, num_classes)

Bad: No shape tracking

不推荐:无形状追踪

def forward(self, x): x = self.conv1(x) x = self.pool(x) x = x.view(x.size(0), -1) # What size is this? return self.fc(x) # Will this even work?
undefined
def forward(self, x): x = self.conv1(x) x = self.pool(x) x = x.view(x.size(0), -1) # 这是什么尺寸? return self.fc(x) # 这能运行吗?
undefined

Model Architecture Patterns

模型架构模式

Clean nn.Module Structure

清晰的nn.Module结构

python
undefined
python
undefined

Good: Well-organized module

推荐:结构清晰的模块

class ImageClassifier(nn.Module): def init(self, num_classes: int, dropout: float = 0.5) -> None: super().init() self.features = nn.Sequential( nn.Conv2d(3, 64, kernel_size=3, padding=1), nn.BatchNorm2d(64), nn.ReLU(inplace=True), nn.MaxPool2d(2), ) self.classifier = nn.Sequential( nn.Dropout(dropout), nn.Linear(64 * 16 * 16, num_classes), )
def forward(self, x: torch.Tensor) -> torch.Tensor:
    x = self.features(x)
    x = x.view(x.size(0), -1)
    return self.classifier(x)
class ImageClassifier(nn.Module): def init(self, num_classes: int, dropout: float = 0.5) -> None: super().init() self.features = nn.Sequential( nn.Conv2d(3, 64, kernel_size=3, padding=1), nn.BatchNorm2d(64), nn.ReLU(inplace=True), nn.MaxPool2d(2), ) self.classifier = nn.Sequential( nn.Dropout(dropout), nn.Linear(64 * 16 * 16, num_classes), )
def forward(self, x: torch.Tensor) -> torch.Tensor:
    x = self.features(x)
    x = x.view(x.size(0), -1)
    return self.classifier(x)

Bad: Everything in forward

不推荐:所有逻辑都在forward中

class ImageClassifier(nn.Module): def init(self): super().init()
def forward(self, x):
    x = F.conv2d(x, weight=self.make_weight())  # Creates weight each call!
    return x
undefined
class ImageClassifier(nn.Module): def init(self): super().init()
def forward(self, x):
    x = F.conv2d(x, weight=self.make_weight())  # 每次调用都创建权重!
    return x
undefined

Proper Weight Initialization

正确的权重初始化

python
undefined
python
undefined

Good: Explicit initialization

推荐:显式初始化

def init_weights(self, module: nn.Module) -> None: if isinstance(module, nn.Linear): nn.init.kaiming_normal(module.weight, mode="fan_out", nonlinearity="relu") if module.bias is not None: nn.init.zeros_(module.bias) elif isinstance(module, nn.Conv2d): nn.init.kaiming_normal_(module.weight, mode="fan_out", nonlinearity="relu") elif isinstance(module, nn.BatchNorm2d): nn.init.ones_(module.weight) nn.init.zeros_(module.bias)
model = MyModel() model.apply(model._init_weights)
undefined
def init_weights(self, module: nn.Module) -> None: if isinstance(module, nn.Linear): nn.init.kaiming_normal(module.weight, mode="fan_out", nonlinearity="relu") if module.bias is not None: nn.init.zeros_(module.bias) elif isinstance(module, nn.Conv2d): nn.init.kaiming_normal_(module.weight, mode="fan_out", nonlinearity="relu") elif isinstance(module, nn.BatchNorm2d): nn.init.ones_(module.weight) nn.init.zeros_(module.bias)
model = MyModel() model.apply(model._init_weights)
undefined

Training Loop Patterns

训练循环模式

Standard Training Loop

标准训练循环

python
undefined
python
undefined

Good: Complete training loop with best practices

推荐:包含最佳实践的完整训练循环

def train_one_epoch( model: nn.Module, dataloader: DataLoader, optimizer: torch.optim.Optimizer, criterion: nn.Module, device: torch.device, scaler: torch.amp.GradScaler | None = None, ) -> float: model.train() # Always set train mode total_loss = 0.0
for batch_idx, (data, target) in enumerate(dataloader):
    data, target = data.to(device), target.to(device)

    optimizer.zero_grad(set_to_none=True)  # More efficient than zero_grad()

    # Mixed precision training
    with torch.amp.autocast("cuda", enabled=scaler is not None):
        output = model(data)
        loss = criterion(output, target)

    if scaler is not None:
        scaler.scale(loss).backward()
        scaler.unscale_(optimizer)
        torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
        scaler.step(optimizer)
        scaler.update()
    else:
        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
        optimizer.step()

    total_loss += loss.item()

return total_loss / len(dataloader)
undefined
def train_one_epoch( model: nn.Module, dataloader: DataLoader, optimizer: torch.optim.Optimizer, criterion: nn.Module, device: torch.device, scaler: torch.amp.GradScaler | None = None, ) -> float: model.train() # 始终设置训练模式 total_loss = 0.0
for batch_idx, (data, target) in enumerate(dataloader):
    data, target = data.to(device), target.to(device)

    optimizer.zero_grad(set_to_none=True)  # 比zero_grad()更高效

    # 混合精度训练
    with torch.amp.autocast("cuda", enabled=scaler is not None):
        output = model(data)
        loss = criterion(output, target)

    if scaler is not None:
        scaler.scale(loss).backward()
        scaler.unscale_(optimizer)
        torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
        scaler.step(optimizer)
        scaler.update()
    else:
        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
        optimizer.step()

    total_loss += loss.item()

return total_loss / len(dataloader)
undefined

Validation Loop

验证循环

python
undefined
python
undefined

Good: Proper evaluation

推荐:规范的评估流程

@torch.no_grad() # More efficient than wrapping in torch.no_grad() block def evaluate( model: nn.Module, dataloader: DataLoader, criterion: nn.Module, device: torch.device, ) -> tuple[float, float]: model.eval() # Always set eval mode — disables dropout, uses running BN stats total_loss = 0.0 correct = 0 total = 0
for data, target in dataloader:
    data, target = data.to(device), target.to(device)
    output = model(data)
    total_loss += criterion(output, target).item()
    correct += (output.argmax(1) == target).sum().item()
    total += target.size(0)

return total_loss / len(dataloader), correct / total
undefined
@torch.no_grad() # 比包裹在torch.no_grad()块中更高效 def evaluate( model: nn.Module, dataloader: DataLoader, criterion: nn.Module, device: torch.device, ) -> tuple[float, float]: model.eval() # 始终设置评估模式——禁用dropout,使用BN运行时统计 total_loss = 0.0 correct = 0 total = 0
for data, target in dataloader:
    data, target = data.to(device), target.to(device)
    output = model(data)
    total_loss += criterion(output, target).item()
    correct += (output.argmax(1) == target).sum().item()
    total += target.size(0)

return total_loss / len(dataloader), correct / total
undefined

Data Pipeline Patterns

数据流水线模式

Custom Dataset

自定义Dataset

python
undefined
python
undefined

Good: Clean Dataset with type hints

推荐:带类型提示的清晰Dataset

class ImageDataset(Dataset): def init( self, image_dir: str, labels: dict[str, int], transform: transforms.Compose | None = None, ) -> None: self.image_paths = list(Path(image_dir).glob("*.jpg")) self.labels = labels self.transform = transform
def __len__(self) -> int:
    return len(self.image_paths)

def __getitem__(self, idx: int) -> tuple[torch.Tensor, int]:
    img = Image.open(self.image_paths[idx]).convert("RGB")
    label = self.labels[self.image_paths[idx].stem]

    if self.transform:
        img = self.transform(img)

    return img, label
undefined
class ImageDataset(Dataset): def init( self, image_dir: str, labels: dict[str, int], transform: transforms.Compose | None = None, ) -> None: self.image_paths = list(Path(image_dir).glob("*.jpg")) self.labels = labels self.transform = transform
def __len__(self) -> int:
    return len(self.image_paths)

def __getitem__(self, idx: int) -> tuple[torch.Tensor, int]:
    img = Image.open(self.image_paths[idx]).convert("RGB")
    label = self.labels[self.image_paths[idx].stem]

    if self.transform:
        img = self.transform(img)

    return img, label
undefined

Efficient DataLoader Configuration

高效的DataLoader配置

python
undefined
python
undefined

Good: Optimized DataLoader

推荐:优化后的DataLoader

dataloader = DataLoader( dataset, batch_size=32, shuffle=True, # Shuffle for training num_workers=4, # Parallel data loading pin_memory=True, # Faster CPU->GPU transfer persistent_workers=True, # Keep workers alive between epochs drop_last=True, # Consistent batch sizes for BatchNorm )
dataloader = DataLoader( dataset, batch_size=32, shuffle=True, # 训练时打乱数据 num_workers=4, # 并行数据加载 pin_memory=True, # 更快的CPU→GPU传输 persistent_workers=True, # 跨epoch保持worker存活 drop_last=True, # 为BatchNorm提供一致的批次大小 )

Bad: Slow defaults

不推荐:缓慢的默认配置

dataloader = DataLoader(dataset, batch_size=32) # num_workers=0, no pin_memory
undefined
dataloader = DataLoader(dataset, batch_size=32) # num_workers=0,无pin_memory
undefined

Custom Collate for Variable-Length Data

变长数据的自定义Collate

python
undefined
python
undefined

Good: Pad sequences in collate_fn

推荐:在collate_fn中对序列进行填充

def collate_fn(batch: list[tuple[torch.Tensor, int]]) -> tuple[torch.Tensor, torch.Tensor]: sequences, labels = zip(*batch) # Pad to max length in batch padded = nn.utils.rnn.pad_sequence(sequences, batch_first=True, padding_value=0) return padded, torch.tensor(labels)
dataloader = DataLoader(dataset, batch_size=32, collate_fn=collate_fn)
undefined
def collate_fn(batch: list[tuple[torch.Tensor, int]]) -> tuple[torch.Tensor, torch.Tensor]: sequences, labels = zip(*batch) # 填充至批次中的最大长度 padded = nn.utils.rnn.pad_sequence(sequences, batch_first=True, padding_value=0) return padded, torch.tensor(labels)
dataloader = DataLoader(dataset, batch_size=32, collate_fn=collate_fn)
undefined

Checkpointing Patterns

检查点模式

Save and Load Checkpoints

保存与加载检查点

python
undefined
python
undefined

Good: Complete checkpoint with all training state

推荐:包含所有训练状态的完整检查点

def save_checkpoint( model: nn.Module, optimizer: torch.optim.Optimizer, epoch: int, loss: float, path: str, ) -> None: torch.save({ "epoch": epoch, "model_state_dict": model.state_dict(), "optimizer_state_dict": optimizer.state_dict(), "loss": loss, }, path)
def load_checkpoint( path: str, model: nn.Module, optimizer: torch.optim.Optimizer | None = None, ) -> dict: checkpoint = torch.load(path, map_location="cpu", weights_only=True) model.load_state_dict(checkpoint["model_state_dict"]) if optimizer: optimizer.load_state_dict(checkpoint["optimizer_state_dict"]) return checkpoint
def save_checkpoint( model: nn.Module, optimizer: torch.optim.Optimizer, epoch: int, loss: float, path: str, ) -> None: torch.save({ "epoch": epoch, "model_state_dict": model.state_dict(), "optimizer_state_dict": optimizer.state_dict(), "loss": loss, }, path)
def load_checkpoint( path: str, model: nn.Module, optimizer: torch.optim.Optimizer | None = None, ) -> dict: checkpoint = torch.load(path, map_location="cpu", weights_only=True) model.load_state_dict(checkpoint["model_state_dict"]) if optimizer: optimizer.load_state_dict(checkpoint["optimizer_state_dict"]) return checkpoint

Bad: Only saving model weights (can't resume training)

不推荐:仅保存模型权重(无法恢复训练)

torch.save(model.state_dict(), "model.pt")
undefined
torch.save(model.state_dict(), "model.pt")
undefined

Performance Optimization

性能优化

Mixed Precision Training

混合精度训练

python
undefined
python
undefined

Good: AMP with GradScaler

推荐:使用GradScaler的AMP

scaler = torch.amp.GradScaler("cuda") for data, target in dataloader: with torch.amp.autocast("cuda"): output = model(data) loss = criterion(output, target) scaler.scale(loss).backward() scaler.step(optimizer) scaler.update() optimizer.zero_grad(set_to_none=True)
undefined
scaler = torch.amp.GradScaler("cuda") for data, target in dataloader: with torch.amp.autocast("cuda"): output = model(data) loss = criterion(output, target) scaler.scale(loss).backward() scaler.step(optimizer) scaler.update() optimizer.zero_grad(set_to_none=True)
undefined

Gradient Checkpointing for Large Models

大模型的梯度检查点

python
undefined
python
undefined

Good: Trade compute for memory

推荐:以计算换内存

from torch.utils.checkpoint import checkpoint
class LargeModel(nn.Module): def forward(self, x: torch.Tensor) -> torch.Tensor: # Recompute activations during backward to save memory x = checkpoint(self.block1, x, use_reentrant=False) x = checkpoint(self.block2, x, use_reentrant=False) return self.head(x)
undefined
from torch.utils.checkpoint import checkpoint
class LargeModel(nn.Module): def forward(self, x: torch.Tensor) -> torch.Tensor: # 反向传播时重新计算激活值以节省内存 x = checkpoint(self.block1, x, use_reentrant=False) x = checkpoint(self.block2, x, use_reentrant=False) return self.head(x)
undefined

torch.compile for Speed

使用torch.compile提升速度

python
undefined
python
undefined

Good: Compile the model for faster execution (PyTorch 2.0+)

推荐:编译模型以加快执行速度(PyTorch 2.0+)

model = MyModel().to(device) model = torch.compile(model, mode="reduce-overhead")
model = MyModel().to(device) model = torch.compile(model, mode="reduce-overhead")

Modes: "default" (safe), "reduce-overhead" (faster), "max-autotune" (fastest)

模式:"default"(安全), "reduce-overhead"(更快), "max-autotune"(最快)

undefined
undefined

Quick Reference: PyTorch Idioms

快速参考:PyTorch惯用写法

IdiomDescription
model.train()
/
model.eval()
Always set mode before train/eval
torch.no_grad()
Disable gradients for inference
optimizer.zero_grad(set_to_none=True)
More efficient gradient clearing
.to(device)
Device-agnostic tensor/model placement
torch.amp.autocast
Mixed precision for 2x speed
pin_memory=True
Faster CPU→GPU data transfer
torch.compile
JIT compilation for speed (2.0+)
weights_only=True
Secure model loading
torch.manual_seed
Reproducible experiments
gradient_checkpointing
Trade compute for memory
惯用写法描述
model.train()
/
model.eval()
训练/评估前务必设置模式
torch.no_grad()
推理时禁用梯度计算
optimizer.zero_grad(set_to_none=True)
更高效的梯度清除方式
.to(device)
设备无关的张量/模型放置
torch.amp.autocast
混合精度训练,速度提升2倍
pin_memory=True
更快的CPU→GPU数据传输
torch.compile
JIT编译提速(2.0+)
weights_only=True
安全的模型加载方式
torch.manual_seed
实现可复现的实验
gradient_checkpointing
以计算换内存

Anti-Patterns to Avoid

需要避免的反模式

python
undefined
python
undefined

Bad: Forgetting model.eval() during validation

不推荐:验证时忘记设置model.eval()

model.train() with torch.no_grad(): output = model(val_data) # Dropout still active! BatchNorm uses batch stats!
model.train() with torch.no_grad(): output = model(val_data) # Dropout仍处于激活状态!BatchNorm使用批次统计!

Good: Always set eval mode

推荐:始终设置评估模式

model.eval() with torch.no_grad(): output = model(val_data)
model.eval() with torch.no_grad(): output = model(val_data)

Bad: In-place operations breaking autograd

不推荐:原地操作破坏自动求导

x = F.relu(x, inplace=True) # Can break gradient computation x += residual # In-place add breaks autograd graph
x = F.relu(x, inplace=True) # 可能破坏梯度计算 x += residual # 原地加法破坏自动求导图

Good: Out-of-place operations

推荐:非原地操作

x = F.relu(x) x = x + residual
x = F.relu(x) x = x + residual

Bad: Moving data to GPU inside the training loop repeatedly

不推荐:在训练循环内重复将数据移至GPU

for data, target in dataloader: model = model.cuda() # Moves model EVERY iteration!
for data, target in dataloader: model = model.cuda() # 每次迭代都移动模型!

Good: Move model once before the loop

推荐:循环前仅移动一次模型

model = model.to(device) for data, target in dataloader: data, target = data.to(device), target.to(device)
model = model.to(device) for data, target in dataloader: data, target = data.to(device), target.to(device)

Bad: Using .item() before backward

不推荐:在backward前调用.item()

loss = criterion(output, target).item() # Detaches from graph! loss.backward() # Error: can't backprop through .item()
loss = criterion(output, target).item() # 从图中分离! loss.backward() # 错误:无法通过.item()反向传播

Good: Call .item() only for logging

推荐:仅在日志中调用.item()

loss = criterion(output, target) loss.backward() print(f"Loss: {loss.item():.4f}") # .item() after backward is fine
loss = criterion(output, target) loss.backward() print(f"Loss: {loss.item():.4f}") # backward后调用.item()没问题

Bad: Not using torch.save properly

不推荐:不正确使用torch.save

torch.save(model, "model.pt") # Saves entire model (fragile, not portable)
torch.save(model, "model.pt") # 保存整个模型(脆弱,不具可移植性)

Good: Save state_dict

推荐:保存state_dict

torch.save(model.state_dict(), "model.pt")

__Remember__: PyTorch code should be device-agnostic, reproducible, and memory-conscious. When in doubt, profile with `torch.profiler` and check GPU memory with `torch.cuda.memory_summary()`.
torch.save(model.state_dict(), "model.pt")

__请记住__:PyTorch代码应具备设备无关性、可复现性,且注重内存使用。如有疑问,请使用`torch.profiler`进行性能分析,并通过`torch.cuda.memory_summary()`查看GPU内存情况。