tensorboard
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseTensorBoard: Visualization Toolkit for ML
TensorBoard:机器学习可视化工具包
When to Use This Skill
何时使用该工具
Use TensorBoard when you need to:
- Visualize training metrics like loss and accuracy over time
- Debug models with histograms and distributions
- Compare experiments across multiple runs
- Visualize model graphs and architecture
- Project embeddings to lower dimensions (t-SNE, PCA)
- Track hyperparameter experiments
- Profile performance and identify bottlenecks
- Visualize images and text during training
Users: 20M+ downloads/year | GitHub Stars: 27k+ | License: Apache 2.0
在以下场景中使用TensorBoard:
- 可视化训练指标:如随时间变化的损失值和准确率
- 调试模型:使用直方图和分布图表
- 对比实验:跨多次运行对比实验结果
- 可视化模型图:展示模型架构
- 嵌入投影:将高维数据降维展示(t-SNE、PCA)
- 追踪超参数:记录超参数实验过程
- 性能分析:识别性能瓶颈
- 可视化训练中的图像与文本
用户规模:年下载量超2000万 | GitHub星标:27000+ | 许可证:Apache 2.0
Installation
安装
bash
undefinedbash
undefinedInstall TensorBoard
安装TensorBoard
pip install tensorboard
pip install tensorboard
PyTorch integration
PyTorch集成
pip install torch torchvision tensorboard
pip install torch torchvision tensorboard
TensorFlow integration (TensorBoard included)
TensorFlow集成(已包含TensorBoard)
pip install tensorflow
pip install tensorflow
Launch TensorBoard
启动TensorBoard
tensorboard --logdir=runs
tensorboard --logdir=runs
Access at http://localhost:6006
undefinedundefinedQuick Start
快速开始
PyTorch
PyTorch
python
from torch.utils.tensorboard import SummaryWriterpython
from torch.utils.tensorboard import SummaryWriterCreate writer
创建写入器
writer = SummaryWriter('runs/experiment_1')
writer = SummaryWriter('runs/experiment_1')
Training loop
训练循环
for epoch in range(10):
train_loss = train_epoch()
val_acc = validate()
# Log metrics
writer.add_scalar('Loss/train', train_loss, epoch)
writer.add_scalar('Accuracy/val', val_acc, epoch)for epoch in range(10):
train_loss = train_epoch()
val_acc = validate()
# 记录指标
writer.add_scalar('Loss/train', train_loss, epoch)
writer.add_scalar('Accuracy/val', val_acc, epoch)Close writer
关闭写入器
writer.close()
writer.close()
Launch: tensorboard --logdir=runs
启动:tensorboard --logdir=runs
undefinedundefinedTensorFlow/Keras
TensorFlow/Keras
python
import tensorflow as tfpython
import tensorflow as tfCreate callback
创建回调函数
tensorboard_callback = tf.keras.callbacks.TensorBoard(
log_dir='logs/fit',
histogram_freq=1
)
tensorboard_callback = tf.keras.callbacks.TensorBoard(
log_dir='logs/fit',
histogram_freq=1
)
Train model
训练模型
model.fit(
x_train, y_train,
epochs=10,
validation_data=(x_val, y_val),
callbacks=[tensorboard_callback]
)
model.fit(
x_train, y_train,
epochs=10,
validation_data=(x_val, y_val),
callbacks=[tensorboard_callback]
)
Launch: tensorboard --logdir=logs
启动:tensorboard --logdir=logs
undefinedundefinedCore Concepts
核心概念
1. SummaryWriter (PyTorch)
1. SummaryWriter(PyTorch)
python
from torch.utils.tensorboard import SummaryWriterpython
from torch.utils.tensorboard import SummaryWriterDefault directory: runs/CURRENT_DATETIME
默认目录:runs/CURRENT_DATETIME
writer = SummaryWriter()
writer = SummaryWriter()
Custom directory
自定义目录
writer = SummaryWriter('runs/experiment_1')
writer = SummaryWriter('runs/experiment_1')
Custom comment (appended to default directory)
自定义注释(追加到默认目录)
writer = SummaryWriter(comment='baseline')
writer = SummaryWriter(comment='baseline')
Log data
记录数据
writer.add_scalar('Loss/train', 0.5, step=0)
writer.add_scalar('Loss/train', 0.3, step=1)
writer.add_scalar('Loss/train', 0.5, step=0)
writer.add_scalar('Loss/train', 0.3, step=1)
Flush and close
刷新并关闭
writer.flush()
writer.close()
undefinedwriter.flush()
writer.close()
undefined2. Logging Scalars
2. 记录标量数据
python
undefinedpython
undefinedPyTorch
PyTorch
from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter()
for epoch in range(100):
train_loss = train()
val_loss = validate()
# Log individual metrics
writer.add_scalar('Loss/train', train_loss, epoch)
writer.add_scalar('Loss/val', val_loss, epoch)
writer.add_scalar('Accuracy/train', train_acc, epoch)
writer.add_scalar('Accuracy/val', val_acc, epoch)
# Learning rate
lr = optimizer.param_groups[0]['lr']
writer.add_scalar('Learning_rate', lr, epoch)writer.close()
```pythonfrom torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter()
for epoch in range(100):
train_loss = train()
val_loss = validate()
# 记录单个指标
writer.add_scalar('Loss/train', train_loss, epoch)
writer.add_scalar('Loss/val', val_loss, epoch)
writer.add_scalar('Accuracy/train', train_acc, epoch)
writer.add_scalar('Accuracy/val', val_acc, epoch)
# 学习率
lr = optimizer.param_groups[0]['lr']
writer.add_scalar('Learning_rate', lr, epoch)writer.close()
```pythonTensorFlow
TensorFlow
import tensorflow as tf
train_summary_writer = tf.summary.create_file_writer('logs/train')
val_summary_writer = tf.summary.create_file_writer('logs/val')
for epoch in range(100):
with train_summary_writer.as_default():
tf.summary.scalar('loss', train_loss, step=epoch)
tf.summary.scalar('accuracy', train_acc, step=epoch)
with val_summary_writer.as_default():
tf.summary.scalar('loss', val_loss, step=epoch)
tf.summary.scalar('accuracy', val_acc, step=epoch)undefinedimport tensorflow as tf
train_summary_writer = tf.summary.create_file_writer('logs/train')
val_summary_writer = tf.summary.create_file_writer('logs/val')
for epoch in range(100):
with train_summary_writer.as_default():
tf.summary.scalar('loss', train_loss, step=epoch)
tf.summary.scalar('accuracy', train_acc, step=epoch)
with val_summary_writer.as_default():
tf.summary.scalar('loss', val_loss, step=epoch)
tf.summary.scalar('accuracy', val_acc, step=epoch)undefined3. Logging Multiple Scalars
3. 记录多个标量数据
python
undefinedpython
undefinedPyTorch: Group related metrics
PyTorch:分组相关指标
writer.add_scalars('Loss', {
'train': train_loss,
'validation': val_loss,
'test': test_loss
}, epoch)
writer.add_scalars('Metrics', {
'accuracy': accuracy,
'precision': precision,
'recall': recall,
'f1': f1_score
}, epoch)
undefinedwriter.add_scalars('Loss', {
'train': train_loss,
'validation': val_loss,
'test': test_loss
}, epoch)
writer.add_scalars('Metrics', {
'accuracy': accuracy,
'precision': precision,
'recall': recall,
'f1': f1_score
}, epoch)
undefined4. Logging Images
4. 记录图像
python
undefinedpython
undefinedPyTorch
PyTorch
import torch
from torchvision.utils import make_grid
import torch
from torchvision.utils import make_grid
Single image
单张图像
writer.add_image('Input/sample', img_tensor, epoch)
writer.add_image('Input/sample', img_tensor, epoch)
Multiple images as grid
多张图像组成网格
img_grid = make_grid(images[:64], nrow=8)
writer.add_image('Batch/inputs', img_grid, epoch)
img_grid = make_grid(images[:64], nrow=8)
writer.add_image('Batch/inputs', img_grid, epoch)
Predictions visualization
预测结果可视化
pred_grid = make_grid(predictions[:16], nrow=4)
writer.add_image('Predictions', pred_grid, epoch)
```pythonpred_grid = make_grid(predictions[:16], nrow=4)
writer.add_image('Predictions', pred_grid, epoch)
```pythonTensorFlow
TensorFlow
import tensorflow as tf
with file_writer.as_default():
# Encode images as PNG
tf.summary.image('Training samples', images, step=epoch, max_outputs=25)
undefinedimport tensorflow as tf
with file_writer.as_default():
# 将图像编码为PNG格式
tf.summary.image('Training samples', images, step=epoch, max_outputs=25)
undefined5. Logging Histograms
5. 记录直方图
python
undefinedpython
undefinedPyTorch: Track weight distributions
PyTorch:追踪权重分布
for name, param in model.named_parameters():
writer.add_histogram(name, param, epoch)
# Track gradients
if param.grad is not None:
writer.add_histogram(f'{name}.grad', param.grad, epoch)for name, param in model.named_parameters():
writer.add_histogram(name, param, epoch)
# 追踪梯度
if param.grad is not None:
writer.add_histogram(f'{name}.grad', param.grad, epoch)Track activations
追踪激活值
writer.add_histogram('Activations/relu1', activations, epoch)
```pythonwriter.add_histogram('Activations/relu1', activations, epoch)
```pythonTensorFlow
TensorFlow
with file_writer.as_default():
tf.summary.histogram('weights/layer1', layer1.kernel, step=epoch)
tf.summary.histogram('activations/relu1', activations, step=epoch)
undefinedwith file_writer.as_default():
tf.summary.histogram('weights/layer1', layer1.kernel, step=epoch)
tf.summary.histogram('activations/relu1', activations, step=epoch)
undefined6. Logging Model Graph
6. 记录模型图
python
undefinedpython
undefinedPyTorch
PyTorch
import torch
model = MyModel()
dummy_input = torch.randn(1, 3, 224, 224)
writer.add_graph(model, dummy_input)
writer.close()
```pythonimport torch
model = MyModel()
dummy_input = torch.randn(1, 3, 224, 224)
writer.add_graph(model, dummy_input)
writer.close()
```pythonTensorFlow (automatic with Keras)
TensorFlow(Keras自动支持)
tensorboard_callback = tf.keras.callbacks.TensorBoard(
log_dir='logs',
write_graph=True
)
model.fit(x, y, callbacks=[tensorboard_callback])
undefinedtensorboard_callback = tf.keras.callbacks.TensorBoard(
log_dir='logs',
write_graph=True
)
model.fit(x, y, callbacks=[tensorboard_callback])
undefinedAdvanced Features
高级功能
Embedding Projector
嵌入投影器
Visualize high-dimensional data (embeddings, features) in 2D/3D.
python
import torch
from torch.utils.tensorboard import SummaryWriter将高维数据(嵌入向量、特征)以2D/3D形式可视化。
python
import torch
from torch.utils.tensorboard import SummaryWriterGet embeddings (e.g., word embeddings, image features)
获取嵌入向量(如词嵌入、图像特征)
embeddings = model.get_embeddings(data) # Shape: (N, embedding_dim)
embeddings = model.get_embeddings(data) # 形状:(N, embedding_dim)
Metadata (labels for each point)
元数据(每个点的标签)
metadata = ['class_1', 'class_2', 'class_1', ...]
metadata = ['class_1', 'class_2', 'class_1', ...]
Images (optional, for image embeddings)
图像(可选,用于图像嵌入)
label_images = torch.stack([img1, img2, img3, ...])
label_images = torch.stack([img1, img2, img3, ...])
Log to TensorBoard
记录到TensorBoard
writer.add_embedding(
embeddings,
metadata=metadata,
label_img=label_images,
global_step=epoch
)
**In TensorBoard:**
- Navigate to "Projector" tab
- Choose PCA, t-SNE, or UMAP visualization
- Search, filter, and explore clusterswriter.add_embedding(
embeddings,
metadata=metadata,
label_img=label_images,
global_step=epoch
)
**在TensorBoard中操作**:
- 导航至「Projector」标签页
- 选择PCA、t-SNE或UMAP可视化方式
- 搜索、过滤并探索聚类结果Hyperparameter Tuning
超参数调优
python
from torch.utils.tensorboard import SummaryWriterpython
from torch.utils.tensorboard import SummaryWriterTry different hyperparameters
尝试不同超参数
for lr in [0.001, 0.01, 0.1]:
for batch_size in [16, 32, 64]:
# Create unique run directory
writer = SummaryWriter(f'runs/lr{lr}_bs{batch_size}')
# Log hyperparameters
writer.add_hparams(
{'lr': lr, 'batch_size': batch_size},
{'hparam/accuracy': final_acc, 'hparam/loss': final_loss}
)
# Train and log
for epoch in range(10):
loss = train(lr, batch_size)
writer.add_scalar('Loss/train', loss, epoch)
writer.close()for lr in [0.001, 0.01, 0.1]:
for batch_size in [16, 32, 64]:
# 创建唯一的运行目录
writer = SummaryWriter(f'runs/lr{lr}_bs{batch_size}')
# 记录超参数
writer.add_hparams(
{'lr': lr, 'batch_size': batch_size},
{'hparam/accuracy': final_acc, 'hparam/loss': final_loss}
)
# 训练并记录
for epoch in range(10):
loss = train(lr, batch_size)
writer.add_scalar('Loss/train', loss, epoch)
writer.close()Compare in TensorBoard's "HParams" tab
在TensorBoard的「HParams」标签页中对比
undefinedundefinedText Logging
文本记录
python
undefinedpython
undefinedPyTorch: Log text (e.g., model predictions, summaries)
PyTorch:记录文本(如模型预测结果、摘要)
writer.add_text('Predictions', f'Epoch {epoch}: {predictions}', epoch)
writer.add_text('Config', str(config), 0)
writer.add_text('Predictions', f'Epoch {epoch}: {predictions}', epoch)
writer.add_text('Config', str(config), 0)
Log markdown tables
记录Markdown表格
markdown_table = """
| Metric | Value |
|---|---|
| Accuracy | 0.95 |
| F1 Score | 0.93 |
| """ | |
| writer.add_text('Results', markdown_table, epoch) |
undefinedmarkdown_table = """
| Metric | Value |
|---|---|
| Accuracy | 0.95 |
| F1 Score | 0.93 |
| """ | |
| writer.add_text('Results', markdown_table, epoch) |
undefinedPR Curves
PR曲线
Precision-Recall curves for classification.
python
from torch.utils.tensorboard import SummaryWriter用于分类任务的精确率-召回率曲线。
python
from torch.utils.tensorboard import SummaryWriterGet predictions and labels
获取预测结果和标签
predictions = model(test_data) # Shape: (N, num_classes)
labels = test_labels # Shape: (N,)
predictions = model(test_data) # 形状:(N, num_classes)
labels = test_labels # 形状:(N,)
Log PR curve for each class
为每个类别记录PR曲线
for i in range(num_classes):
writer.add_pr_curve(
f'PR_curve/class_{i}',
labels == i,
predictions[:, i],
global_step=epoch
)
undefinedfor i in range(num_classes):
writer.add_pr_curve(
f'PR_curve/class_{i}',
labels == i,
predictions[:, i],
global_step=epoch
)
undefinedIntegration Examples
集成示例
PyTorch Training Loop
PyTorch训练循环
python
import torch
import torch.nn as nn
from torch.utils.tensorboard import SummaryWriterpython
import torch
import torch.nn as nn
from torch.utils.tensorboard import SummaryWriterSetup
初始化
writer = SummaryWriter('runs/resnet_experiment')
model = ResNet50()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()
writer = SummaryWriter('runs/resnet_experiment')
model = ResNet50()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()
Log model graph
记录模型图
dummy_input = torch.randn(1, 3, 224, 224)
writer.add_graph(model, dummy_input)
dummy_input = torch.randn(1, 3, 224, 224)
writer.add_graph(model, dummy_input)
Training loop
训练循环
for epoch in range(50):
model.train()
train_loss = 0.0
train_correct = 0
for batch_idx, (data, target) in enumerate(train_loader):
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
train_loss += loss.item()
pred = output.argmax(dim=1)
train_correct += pred.eq(target).sum().item()
# Log batch metrics (every 100 batches)
if batch_idx % 100 == 0:
global_step = epoch * len(train_loader) + batch_idx
writer.add_scalar('Loss/train_batch', loss.item(), global_step)
# Epoch metrics
train_loss /= len(train_loader)
train_acc = train_correct / len(train_loader.dataset)
# Validation
model.eval()
val_loss = 0.0
val_correct = 0
with torch.no_grad():
for data, target in val_loader:
output = model(data)
val_loss += criterion(output, target).item()
pred = output.argmax(dim=1)
val_correct += pred.eq(target).sum().item()
val_loss /= len(val_loader)
val_acc = val_correct / len(val_loader.dataset)
# Log epoch metrics
writer.add_scalars('Loss', {'train': train_loss, 'val': val_loss}, epoch)
writer.add_scalars('Accuracy', {'train': train_acc, 'val': val_acc}, epoch)
# Log learning rate
writer.add_scalar('Learning_rate', optimizer.param_groups[0]['lr'], epoch)
# Log histograms (every 5 epochs)
if epoch % 5 == 0:
for name, param in model.named_parameters():
writer.add_histogram(name, param, epoch)
# Log sample predictions
if epoch % 10 == 0:
sample_images = data[:8]
writer.add_image('Sample_inputs', make_grid(sample_images), epoch)writer.close()
undefinedfor epoch in range(50):
model.train()
train_loss = 0.0
train_correct = 0
for batch_idx, (data, target) in enumerate(train_loader):
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
train_loss += loss.item()
pred = output.argmax(dim=1)
train_correct += pred.eq(target).sum().item()
# 记录批次指标(每100个批次记录一次)
if batch_idx % 100 == 0:
global_step = epoch * len(train_loader) + batch_idx
writer.add_scalar('Loss/train_batch', loss.item(), global_step)
# epoch指标
train_loss /= len(train_loader)
train_acc = train_correct / len(train_loader.dataset)
# 验证
model.eval()
val_loss = 0.0
val_correct = 0
with torch.no_grad():
for data, target in val_loader:
output = model(data)
val_loss += criterion(output, target).item()
pred = output.argmax(dim=1)
val_correct += pred.eq(target).sum().item()
val_loss /= len(val_loader)
val_acc = val_correct / len(val_loader.dataset)
# 记录epoch指标
writer.add_scalars('Loss', {'train': train_loss, 'val': val_loss}, epoch)
writer.add_scalars('Accuracy', {'train': train_acc, 'val': val_acc}, epoch)
# 记录学习率
writer.add_scalar('Learning_rate', optimizer.param_groups[0]['lr'], epoch)
# 记录直方图(每5个epoch记录一次)
if epoch % 5 == 0:
for name, param in model.named_parameters():
writer.add_histogram(name, param, epoch)
# 记录样本预测结果(每10个epoch记录一次)
if epoch % 10 == 0:
sample_images = data[:8]
writer.add_image('Sample_inputs', make_grid(sample_images), epoch)writer.close()
undefinedTensorFlow/Keras Training
TensorFlow/Keras训练
python
import tensorflow as tfpython
import tensorflow as tfDefine model
定义模型
model = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(32, 3, activation='relu', input_shape=(28, 28, 1)),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
model = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(32, 3, activation='relu', input_shape=(28, 28, 1)),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
TensorBoard callback
TensorBoard回调函数
tensorboard_callback = tf.keras.callbacks.TensorBoard(
log_dir='logs/fit',
histogram_freq=1, # Log histograms every epoch
write_graph=True, # Visualize model graph
write_images=True, # Visualize weights as images
update_freq='epoch', # Log metrics every epoch
profile_batch='500,520', # Profile batches 500-520
embeddings_freq=1 # Log embeddings every epoch
)
tensorboard_callback = tf.keras.callbacks.TensorBoard(
log_dir='logs/fit',
histogram_freq=1, # 每个epoch记录一次直方图
write_graph=True, # 可视化模型图
write_images=True, # 将权重可视化为图像
update_freq='epoch', # 每个epoch记录一次指标
profile_batch='500,520', # 分析第500-520个批次
embeddings_freq=1 # 每个epoch记录一次嵌入向量
)
Train
训练
model.fit(
x_train, y_train,
epochs=10,
validation_data=(x_val, y_val),
callbacks=[tensorboard_callback]
)
undefinedmodel.fit(
x_train, y_train,
epochs=10,
validation_data=(x_val, y_val),
callbacks=[tensorboard_callback]
)
undefinedComparing Experiments
对比实验
Multiple Runs
多轮运行
bash
undefinedbash
undefinedRun experiments with different configs
使用不同配置运行实验
python train.py --lr 0.001 --logdir runs/exp1
python train.py --lr 0.01 --logdir runs/exp2
python train.py --lr 0.1 --logdir runs/exp3
python train.py --lr 0.001 --logdir runs/exp1
python train.py --lr 0.01 --logdir runs/exp2
python train.py --lr 0.1 --logdir runs/exp3
View all runs together
同时查看所有运行结果
tensorboard --logdir=runs
**In TensorBoard:**
- All runs appear in the same dashboard
- Toggle runs on/off for comparison
- Use regex to filter run names
- Overlay charts to compare metricstensorboard --logdir=runs
**在TensorBoard中操作**:
- 所有运行结果会显示在同一个仪表板中
- 可切换运行结果的显示/隐藏状态进行对比
- 使用正则表达式过滤运行名称
- 叠加图表以对比指标Organizing Experiments
实验组织
python
undefinedpython
undefinedHierarchical organization
层级化组织
runs/
├── baseline/
│ ├── run_1/
│ └── run_2/
├── improved/
│ ├── run_1/
│ └── run_2/
└── final/
└── run_1/
runs/
├── baseline/
│ ├── run_1/
│ └── run_2/
├── improved/
│ ├── run_1/
│ └── run_2/
└── final/
└── run_1/
Log with hierarchy
按层级记录
writer = SummaryWriter('runs/baseline/run_1')
undefinedwriter = SummaryWriter('runs/baseline/run_1')
undefinedBest Practices
最佳实践
1. Use Descriptive Run Names
1. 使用描述性的运行名称
python
undefinedpython
undefined✅ Good: Descriptive names
✅ 推荐:描述性名称
from datetime import datetime
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
writer = SummaryWriter(f'runs/resnet50_lr0.001_bs32_{timestamp}')
from datetime import datetime
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
writer = SummaryWriter(f'runs/resnet50_lr0.001_bs32_{timestamp}')
❌ Bad: Auto-generated names
❌ 不推荐:自动生成的名称
writer = SummaryWriter() # Creates runs/Jan01_12-34-56_hostname
undefinedwriter = SummaryWriter() # 会创建runs/Jan01_12-34-56_hostname目录
undefined2. Group Related Metrics
2. 对相关指标进行分组
python
undefinedpython
undefined✅ Good: Grouped metrics
✅ 推荐:分组后的指标
writer.add_scalar('Loss/train', train_loss, step)
writer.add_scalar('Loss/val', val_loss, step)
writer.add_scalar('Accuracy/train', train_acc, step)
writer.add_scalar('Accuracy/val', val_acc, step)
writer.add_scalar('Loss/train', train_loss, step)
writer.add_scalar('Loss/val', val_loss, step)
writer.add_scalar('Accuracy/train', train_acc, step)
writer.add_scalar('Accuracy/val', val_acc, step)
❌ Bad: Flat namespace
❌ 不推荐:扁平化命名空间
writer.add_scalar('train_loss', train_loss, step)
writer.add_scalar('val_loss', val_loss, step)
undefinedwriter.add_scalar('train_loss', train_loss, step)
writer.add_scalar('val_loss', val_loss, step)
undefined3. Log Regularly but Not Too Often
3. 定期记录但不过于频繁
python
undefinedpython
undefined✅ Good: Log epoch metrics always, batch metrics occasionally
✅ 推荐:始终记录epoch指标,偶尔记录批次指标
for epoch in range(100):
for batch_idx, (data, target) in enumerate(train_loader):
loss = train_step(data, target)
# Log every 100 batches
if batch_idx % 100 == 0:
writer.add_scalar('Loss/batch', loss, global_step)
# Always log epoch metrics
writer.add_scalar('Loss/epoch', epoch_loss, epoch)for epoch in range(100):
for batch_idx, (data, target) in enumerate(train_loader):
loss = train_step(data, target)
# 每100个批次记录一次
if batch_idx % 100 == 0:
writer.add_scalar('Loss/batch', loss, global_step)
# 始终记录epoch指标
writer.add_scalar('Loss/epoch', epoch_loss, epoch)❌ Bad: Log every batch (creates huge log files)
❌ 不推荐:每个批次都记录(会生成巨大的日志文件)
for batch in train_loader:
writer.add_scalar('Loss', loss, step) # Too frequent
undefinedfor batch in train_loader:
writer.add_scalar('Loss', loss, step) # 过于频繁
undefined4. Close Writer When Done
4. 使用完毕后关闭写入器
python
undefinedpython
undefined✅ Good: Use context manager
✅ 推荐:使用上下文管理器
with SummaryWriter('runs/exp1') as writer:
for epoch in range(10):
writer.add_scalar('Loss', loss, epoch)
with SummaryWriter('runs/exp1') as writer:
for epoch in range(10):
writer.add_scalar('Loss', loss, epoch)
Automatically closes
会自动关闭
Or manually
或者手动关闭
writer = SummaryWriter('runs/exp1')
writer = SummaryWriter('runs/exp1')
... logging ...
... 记录数据 ...
writer.close()
undefinedwriter.close()
undefined5. Use Separate Writers for Train/Val
5. 为训练和验证使用独立的写入器
python
undefinedpython
undefined✅ Good: Separate log directories
✅ 推荐:使用独立的日志目录
train_writer = SummaryWriter('runs/exp1/train')
val_writer = SummaryWriter('runs/exp1/val')
train_writer.add_scalar('loss', train_loss, epoch)
val_writer.add_scalar('loss', val_loss, epoch)
undefinedtrain_writer = SummaryWriter('runs/exp1/train')
val_writer = SummaryWriter('runs/exp1/val')
train_writer.add_scalar('loss', train_loss, epoch)
val_writer.add_scalar('loss', val_loss, epoch)
undefinedPerformance Profiling
性能分析
TensorFlow Profiler
TensorFlow分析器
python
undefinedpython
undefinedEnable profiling
启用分析功能
tensorboard_callback = tf.keras.callbacks.TensorBoard(
log_dir='logs',
profile_batch='10,20' # Profile batches 10-20
)
model.fit(x, y, callbacks=[tensorboard_callback])
tensorboard_callback = tf.keras.callbacks.TensorBoard(
log_dir='logs',
profile_batch='10,20' # 分析第10-20个批次
)
model.fit(x, y, callbacks=[tensorboard_callback])
View in TensorBoard Profile tab
在TensorBoard的Profile标签页中查看
Shows: GPU utilization, kernel stats, memory usage, bottlenecks
展示内容:GPU利用率、内核统计、内存使用情况、性能瓶颈
undefinedundefinedPyTorch Profiler
PyTorch分析器
python
import torch.profiler as profiler
with profiler.profile(
activities=[
profiler.ProfilerActivity.CPU,
profiler.ProfilerActivity.CUDA
],
on_trace_ready=torch.profiler.tensorboard_trace_handler('./runs/profiler'),
record_shapes=True,
with_stack=True
) as prof:
for batch in train_loader:
loss = train_step(batch)
prof.step()python
import torch.profiler as profiler
with profiler.profile(
activities=[
profiler.ProfilerActivity.CPU,
profiler.ProfilerActivity.CUDA
],
on_trace_ready=torch.profiler.tensorboard_trace_handler('./runs/profiler'),
record_shapes=True,
with_stack=True
) as prof:
for batch in train_loader:
loss = train_step(batch)
prof.step()View in TensorBoard Profile tab
在TensorBoard的Profile标签页中查看
undefinedundefinedResources
资源
- Documentation: https://www.tensorflow.org/tensorboard
- PyTorch Integration: https://pytorch.org/docs/stable/tensorboard.html
- GitHub: https://github.com/tensorflow/tensorboard (27k+ stars)
- TensorBoard.dev: https://tensorboard.dev (share experiments publicly)
- 官方文档:https://www.tensorflow.org/tensorboard
- PyTorch集成文档:https://pytorch.org/docs/stable/tensorboard.html
- GitHub仓库:https://github.com/tensorflow/tensorboard(27000+星标)
- TensorBoard.dev:https://tensorboard.dev(公开分享实验结果)
See Also
相关文档
- - Comprehensive visualization guide
references/visualization.md - - Performance profiling patterns
references/profiling.md - - Framework-specific integration examples
references/integrations.md
- - 全面的可视化指南
references/visualization.md - - 性能分析模式
references/profiling.md - - 框架专属集成示例
references/integrations.md