experiment-tracking-swanlab
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSwanLab: Open-Source Experiment Tracking
SwanLab:开源实验追踪
When to Use This Skill
何时使用本技能
Use SwanLab when you need to:
- Track ML experiments with metrics, configs, tags, and descriptions
- Visualize training with scalar charts and logged media
- Compare runs across seeds, checkpoints, and hyperparameters
- Work locally or self-hosted instead of depending on managed SaaS
- Integrate with PyTorch, Transformers, PyTorch Lightning, or Fastai
Deployment: Cloud, local, or self-hosted | Media: images, audio, text, GIFs, point clouds, molecules | Integrations: PyTorch, Transformers, PyTorch Lightning, Fastai
当你有以下需求时,可使用SwanLab:
- 追踪ML实验:通过指标、配置、标签和描述记录实验信息
- 可视化训练过程:利用标量图表和已记录的媒体展示训练情况
- 对比不同运行结果:跨随机种子、检查点和超参数对比实验运行结果
- 本地或自托管部署:无需依赖托管式SaaS服务
- 与主流框架集成:支持PyTorch、Transformers、PyTorch Lightning和Fastai
部署方式:云端、本地或自托管 | 支持媒体类型:图片、音频、文本、GIF、点云、分子结构 | 集成框架:PyTorch、Transformers、PyTorch Lightning、Fastai
Installation
安装步骤
bash
undefinedbash
undefinedInstall SwanLab plus the media dependencies used in this skill
Install SwanLab plus the media dependencies used in this skill
pip install "swanlab>=0.7.11" "pillow>=9.0.0" "soundfile>=0.12.0"
pip install "swanlab>=0.7.11" "pillow>=9.0.0" "soundfile>=0.12.0"
Add local dashboard support for mode="local" and swanlab watch
Add local dashboard support for mode="local" and swanlab watch
pip install "swanlab[dashboard]>=0.7.11"
pip install "swanlab[dashboard]>=0.7.11"
Optional framework integrations
Optional framework integrations
pip install transformers pytorch-lightning fastai
pip install transformers pytorch-lightning fastai
Login for cloud or self-hosted usage
Login for cloud or self-hosted usage
swanlab login
`pillow` and `soundfile` are the media dependencies used by the Image and Audio examples in this skill. `swanlab[dashboard]` adds the local dashboard dependency required by `mode="local"` and `swanlab watch`.swanlab login
`pillow`和`soundfile`是本技能中图片和音频示例所需的媒体依赖库。`swanlab[dashboard]`添加了`mode="local"`和`swanlab watch`命令所需的本地仪表板依赖。Quick Start
快速入门
Basic Experiment Tracking
基础实验追踪
python
import swanlab
run = swanlab.init(
project="my-project",
experiment_name="baseline",
config={
"learning_rate": 1e-3,
"epochs": 10,
"batch_size": 32,
"model": "resnet18",
},
)
for epoch in range(run.config.epochs):
train_loss = train_epoch()
val_loss = validate()
swanlab.log(
{
"train/loss": train_loss,
"val/loss": val_loss,
"epoch": epoch,
}
)
run.finish()python
import swanlab
run = swanlab.init(
project="my-project",
experiment_name="baseline",
config={
"learning_rate": 1e-3,
"epochs": 10,
"batch_size": 32,
"model": "resnet18",
},
)
for epoch in range(run.config.epochs):
train_loss = train_epoch()
val_loss = validate()
swanlab.log(
{
"train/loss": train_loss,
"val/loss": val_loss,
"epoch": epoch,
}
)
run.finish()With PyTorch
与PyTorch集成
python
import torch
import torch.nn as nn
import torch.optim as optim
import swanlab
run = swanlab.init(
project="pytorch-demo",
experiment_name="mnist-mlp",
config={
"learning_rate": 1e-3,
"batch_size": 64,
"epochs": 10,
"hidden_size": 128,
},
)
model = nn.Sequential(
nn.Flatten(),
nn.Linear(28 * 28, run.config.hidden_size),
nn.ReLU(),
nn.Linear(run.config.hidden_size, 10),
)
optimizer = optim.Adam(model.parameters(), lr=run.config.learning_rate)
criterion = nn.CrossEntropyLoss()
for epoch in range(run.config.epochs):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
optimizer.zero_grad()
logits = model(data)
loss = criterion(logits, target)
loss.backward()
optimizer.step()
if batch_idx % 100 == 0:
swanlab.log(
{
"train/loss": loss.item(),
"train/epoch": epoch,
"train/batch": batch_idx,
}
)
run.finish()python
import torch
import torch.nn as nn
import torch.optim as optim
import swanlab
run = swanlab.init(
project="pytorch-demo",
experiment_name="mnist-mlp",
config={
"learning_rate": 1e-3,
"batch_size": 64,
"epochs": 10,
"hidden_size": 128,
},
)
model = nn.Sequential(
nn.Flatten(),
nn.Linear(28 * 28, run.config.hidden_size),
nn.ReLU(),
nn.Linear(run.config.hidden_size, 10),
)
optimizer = optim.Adam(model.parameters(), lr=run.config.learning_rate)
criterion = nn.CrossEntropyLoss()
for epoch in range(run.config.epochs):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
optimizer.zero_grad()
logits = model(data)
loss = criterion(logits, target)
loss.backward()
optimizer.step()
if batch_idx % 100 == 0:
swanlab.log(
{
"train/loss": loss.item(),
"train/epoch": epoch,
"train/batch": batch_idx,
}
)
run.finish()Core Concepts
核心概念
1. Projects and Experiments
1. 项目与实验
Project: Collection of related experiments
Experiment: Single execution of a training or evaluation workflow
Experiment: Single execution of a training or evaluation workflow
python
import swanlab
run = swanlab.init(
project="image-classification",
experiment_name="resnet18-seed42",
description="Baseline run on ImageNet subset",
tags=["baseline", "resnet18"],
config={
"model": "resnet18",
"seed": 42,
"batch_size": 64,
"learning_rate": 3e-4,
},
)
print(run.id)
print(run.config.learning_rate)项目:相关实验的集合
实验:训练或评估工作流的单次执行
实验:训练或评估工作流的单次执行
python
import swanlab
run = swanlab.init(
project="image-classification",
experiment_name="resnet18-seed42",
description="Baseline run on ImageNet subset",
tags=["baseline", "resnet18"],
config={
"model": "resnet18",
"seed": 42,
"batch_size": 64,
"learning_rate": 3e-4,
},
)
print(run.id)
print(run.config.learning_rate)2. Configuration Tracking
2. 配置追踪
python
config = {
"model": "resnet18",
"seed": 42,
"batch_size": 64,
"learning_rate": 3e-4,
"epochs": 20,
}
run = swanlab.init(project="my-project", config=config)
learning_rate = run.config.learning_rate
batch_size = run.config.batch_sizepython
config = {
"model": "resnet18",
"seed": 42,
"batch_size": 64,
"learning_rate": 3e-4,
"epochs": 20,
}
run = swanlab.init(project="my-project", config=config)
learning_rate = run.config.learning_rate
batch_size = run.config.batch_size3. Metric Logging
3. 指标日志
python
undefinedpython
undefinedLog scalars
Log scalars
swanlab.log({"loss": 0.42, "accuracy": 0.91})
swanlab.log({"loss": 0.42, "accuracy": 0.91})
Log multiple metrics
Log multiple metrics
swanlab.log(
{
"train/loss": train_loss,
"train/accuracy": train_acc,
"val/loss": val_loss,
"val/accuracy": val_acc,
"lr": current_lr,
"epoch": epoch,
}
)
swanlab.log(
{
"train/loss": train_loss,
"train/accuracy": train_acc,
"val/loss": val_loss,
"val/accuracy": val_acc,
"lr": current_lr,
"epoch": epoch,
}
)
Log with custom step
Log with custom step
swanlab.log({"loss": loss}, step=global_step)
undefinedswanlab.log({"loss": loss}, step=global_step)
undefined4. Media and Chart Logging
4. 媒体与图表日志
python
import numpy as np
import swanlabpython
import numpy as np
import swanlabImage
Image
image = np.random.randint(0, 255, (224, 224, 3), dtype=np.uint8)
swanlab.log({"examples/image": swanlab.Image(image, caption="Augmented sample")})
image = np.random.randint(0, 255, (224, 224, 3), dtype=np.uint8)
swanlab.log({"examples/image": swanlab.Image(image, caption="Augmented sample")})
Audio
Audio
wave = np.sin(np.linspace(0, 8 * np.pi, 16000)).astype("float32")
swanlab.log({"examples/audio": swanlab.Audio(wave, sample_rate=16000)})
wave = np.sin(np.linspace(0, 8 * np.pi, 16000)).astype("float32")
swanlab.log({"examples/audio": swanlab.Audio(wave, sample_rate=16000)})
Text
Text
swanlab.log({"examples/text": swanlab.Text("Training notes for this run.")})
swanlab.log({"examples/text": swanlab.Text("Training notes for this run.")})
GIF video
GIF video
swanlab.log({"examples/video": swanlab.Video("predictions.gif", caption="Validation rollout")})
swanlab.log({"examples/video": swanlab.Video("predictions.gif", caption="Validation rollout")})
Point cloud
Point cloud
points = np.random.rand(128, 3).astype("float32")
swanlab.log({"examples/point_cloud": swanlab.Object3D(points, caption="Point cloud sample")})
points = np.random.rand(128, 3).astype("float32")
swanlab.log({"examples/point_cloud": swanlab.Object3D(points, caption="Point cloud sample")})
Molecule
Molecule
swanlab.log({"examples/molecule": swanlab.Molecule.from_smiles("CCO", caption="Ethanol")})
```pythonswanlab.log({"examples/molecule": swanlab.Molecule.from_smiles("CCO", caption="Ethanol")})
```pythonCustom chart with swanlab.echarts
Custom chart with swanlab.echarts
line = swanlab.echarts.Line()
line.add_xaxis(["epoch-1", "epoch-2", "epoch-3"])
line.add_yaxis("train/loss", [0.92, 0.61, 0.44])
line.set_global_opts(
title_opts=swanlab.echarts.options.TitleOpts(title="Training Loss")
)
swanlab.log({"charts/loss_curve": line})
See [references/visualization.md](references/visualization.md) for more chart and media patterns.line = swanlab.echarts.Line()
line.add_xaxis(["epoch-1", "epoch-2", "epoch-3"])
line.add_yaxis("train/loss", [0.92, 0.61, 0.44])
line.set_global_opts(
title_opts=swanlab.echarts.options.TitleOpts(title="Training Loss")
)
swanlab.log({"charts/loss_curve": line})
更多图表和媒体记录方式请参考 [references/visualization.md](references/visualization.md)。5. Local and Self-Hosted Workflows
5. 本地与自托管工作流
python
import os
import swanlabpython
import os
import swanlabSelf-hosted or cloud login
Self-hosted or cloud login
swanlab.login(
api_key=os.environ["SWANLAB_API_KEY"],
host="http://your-server:5092",
)
swanlab.login(
api_key=os.environ["SWANLAB_API_KEY"],
host="http://your-server:5092",
)
Local-only logging
Local-only logging
run = swanlab.init(
project="offline-demo",
mode="local",
logdir="./swanlog",
)
swanlab.log({"loss": 0.35, "epoch": 1})
run.finish()
```bashrun = swanlab.init(
project="offline-demo",
mode="local",
logdir="./swanlog",
)
swanlab.log({"loss": 0.35, "epoch": 1})
run.finish()
```bashView local logs
View local logs
swanlab watch -l ./swanlog
swanlab watch -l ./swanlog
Sync local logs later
Sync local logs later
swanlab sync ./swanlog
undefinedswanlab sync ./swanlog
undefinedIntegration Examples
集成示例
HuggingFace Transformers
HuggingFace Transformers
python
from transformers import Trainer, TrainingArguments
training_args = TrainingArguments(
output_dir="./results",
per_device_train_batch_size=8,
evaluation_strategy="epoch",
logging_steps=50,
report_to="swanlab",
run_name="bert-finetune",
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
)
trainer.train()See references/integrations.md for callback-based setups and additional framework patterns.
python
from transformers import Trainer, TrainingArguments
training_args = TrainingArguments(
output_dir="./results",
per_device_train_batch_size=8,
evaluation_strategy="epoch",
logging_steps=50,
report_to="swanlab",
run_name="bert-finetune",
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
)
trainer.train()更多基于回调的设置和其他框架集成方式请参考 references/integrations.md。
PyTorch Lightning
PyTorch Lightning
python
import pytorch_lightning as pl
from swanlab.integration.pytorch_lightning import SwanLabLogger
swanlab_logger = SwanLabLogger(
project="lightning-demo",
experiment_name="mnist-classifier",
config={"batch_size": 64, "max_epochs": 10},
)
trainer = pl.Trainer(
logger=swanlab_logger,
max_epochs=10,
accelerator="auto",
)
trainer.fit(model, train_loader, val_loader)python
import pytorch_lightning as pl
from swanlab.integration.pytorch_lightning import SwanLabLogger
swanlab_logger = SwanLabLogger(
project="lightning-demo",
experiment_name="mnist-classifier",
config={"batch_size": 64, "max_epochs": 10},
)
trainer = pl.Trainer(
logger=swanlab_logger,
max_epochs=10,
accelerator="auto",
)
trainer.fit(model, train_loader, val_loader)Fastai
Fastai
python
from fastai.vision.all import accuracy, resnet34, vision_learner
from swanlab.integration.fastai import SwanLabCallback
learn = vision_learner(dls, resnet34, metrics=accuracy)
learn.fit(
5,
cbs=[
SwanLabCallback(
project="fastai-demo",
experiment_name="pets-classification",
config={"arch": "resnet34", "epochs": 5},
)
],
)See references/integrations.md for fuller framework examples.
python
from fastai.vision.all import accuracy, resnet34, vision_learner
from swanlab.integration.fastai import SwanLabCallback
learn = vision_learner(dls, resnet34, metrics=accuracy)
learn.fit(
5,
cbs=[
SwanLabCallback(
project="fastai-demo",
experiment_name="pets-classification",
config={"arch": "resnet34", "epochs": 5},
)
],
)更多完整的框架集成示例请参考 references/integrations.md。
Best Practices
最佳实践
1. Use Stable Metric Names
1. 使用稳定的指标命名
python
undefinedpython
undefinedGood: grouped metric namespaces
Good: grouped metric namespaces
swanlab.log({
"train/loss": train_loss,
"train/accuracy": train_acc,
"val/loss": val_loss,
"val/accuracy": val_acc,
})
swanlab.log({
"train/loss": train_loss,
"train/accuracy": train_acc,
"val/loss": val_loss,
"val/accuracy": val_acc,
})
Avoid mixing flat and grouped names for the same metric family
Avoid mixing flat and grouped names for the same metric family
undefinedundefined2. Initialize Early and Capture Config Once
2. 尽早初始化并一次性捕获配置
python
run = swanlab.init(
project="image-classification",
experiment_name="resnet18-baseline",
config={
"model": "resnet18",
"learning_rate": 3e-4,
"batch_size": 64,
"seed": 42,
},
)python
run = swanlab.init(
project="image-classification",
experiment_name="resnet18-baseline",
config={
"model": "resnet18",
"learning_rate": 3e-4,
"batch_size": 64,
"seed": 42,
},
)3. Save Checkpoints Locally
3. 本地保存检查点
python
import torch
import swanlab
checkpoint_path = "checkpoints/best.pth"
torch.save(model.state_dict(), checkpoint_path)
swanlab.log(
{
"best/val_accuracy": best_val_accuracy,
"artifacts/checkpoint_path": swanlab.Text(checkpoint_path),
}
)python
import torch
import swanlab
checkpoint_path = "checkpoints/best.pth"
torch.save(model.state_dict(), checkpoint_path)
swanlab.log(
{
"best/val_accuracy": best_val_accuracy,
"artifacts/checkpoint_path": swanlab.Text(checkpoint_path),
}
)4. Use Local Mode for Offline-First Workflows
4. 为离线优先工作流使用本地模式
python
run = swanlab.init(project="offline-demo", mode="local", logdir="./swanlog")python
run = swanlab.init(project="offline-demo", mode="local", logdir="./swanlog")... training code ...
... training code ...
run.finish()
run.finish()
Inspect later with: swanlab watch -l ./swanlog
Inspect later with: swanlab watch -l ./swanlog
undefinedundefined5. Keep Advanced Patterns in References
5. 高级用法请参考参考文档
- Use references/visualization.md for advanced chart and media patterns
- Use references/integrations.md for callback-based and framework-specific integration details
- 高级图表和媒体记录方式请参考 references/visualization.md
- 基于回调的集成和框架特定细节请参考 references/integrations.md
Resources
相关资源
See Also
拓展阅读
- references/integrations.md - Framework-specific examples
- references/visualization.md - Charts and media logging patterns
- references/integrations.md - 框架特定示例
- references/visualization.md - 图表与媒体记录模式