experiment-tracking-swanlab

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

SwanLab: Open-Source Experiment Tracking

SwanLab:开源实验追踪

When to Use This Skill

何时使用本技能

Use SwanLab when you need to:
  • Track ML experiments with metrics, configs, tags, and descriptions
  • Visualize training with scalar charts and logged media
  • Compare runs across seeds, checkpoints, and hyperparameters
  • Work locally or self-hosted instead of depending on managed SaaS
  • Integrate with PyTorch, Transformers, PyTorch Lightning, or Fastai
Deployment: Cloud, local, or self-hosted | Media: images, audio, text, GIFs, point clouds, molecules | Integrations: PyTorch, Transformers, PyTorch Lightning, Fastai
当你有以下需求时,可使用SwanLab:
  • 追踪ML实验:通过指标、配置、标签和描述记录实验信息
  • 可视化训练过程:利用标量图表和已记录的媒体展示训练情况
  • 对比不同运行结果:跨随机种子、检查点和超参数对比实验运行结果
  • 本地或自托管部署:无需依赖托管式SaaS服务
  • 与主流框架集成:支持PyTorch、Transformers、PyTorch Lightning和Fastai
部署方式:云端、本地或自托管 | 支持媒体类型:图片、音频、文本、GIF、点云、分子结构 | 集成框架:PyTorch、Transformers、PyTorch Lightning、Fastai

Installation

安装步骤

bash
undefined
bash
undefined

Install SwanLab plus the media dependencies used in this skill

Install SwanLab plus the media dependencies used in this skill

pip install "swanlab>=0.7.11" "pillow>=9.0.0" "soundfile>=0.12.0"
pip install "swanlab>=0.7.11" "pillow>=9.0.0" "soundfile>=0.12.0"

Add local dashboard support for mode="local" and swanlab watch

Add local dashboard support for mode="local" and swanlab watch

pip install "swanlab[dashboard]>=0.7.11"
pip install "swanlab[dashboard]>=0.7.11"

Optional framework integrations

Optional framework integrations

pip install transformers pytorch-lightning fastai
pip install transformers pytorch-lightning fastai

Login for cloud or self-hosted usage

Login for cloud or self-hosted usage

swanlab login

`pillow` and `soundfile` are the media dependencies used by the Image and Audio examples in this skill. `swanlab[dashboard]` adds the local dashboard dependency required by `mode="local"` and `swanlab watch`.
swanlab login

`pillow`和`soundfile`是本技能中图片和音频示例所需的媒体依赖库。`swanlab[dashboard]`添加了`mode="local"`和`swanlab watch`命令所需的本地仪表板依赖。

Quick Start

快速入门

Basic Experiment Tracking

基础实验追踪

python
import swanlab

run = swanlab.init(
    project="my-project",
    experiment_name="baseline",
    config={
        "learning_rate": 1e-3,
        "epochs": 10,
        "batch_size": 32,
        "model": "resnet18",
    },
)

for epoch in range(run.config.epochs):
    train_loss = train_epoch()
    val_loss = validate()

    swanlab.log(
        {
            "train/loss": train_loss,
            "val/loss": val_loss,
            "epoch": epoch,
        }
    )

run.finish()
python
import swanlab

run = swanlab.init(
    project="my-project",
    experiment_name="baseline",
    config={
        "learning_rate": 1e-3,
        "epochs": 10,
        "batch_size": 32,
        "model": "resnet18",
    },
)

for epoch in range(run.config.epochs):
    train_loss = train_epoch()
    val_loss = validate()

    swanlab.log(
        {
            "train/loss": train_loss,
            "val/loss": val_loss,
            "epoch": epoch,
        }
    )

run.finish()

With PyTorch

与PyTorch集成

python
import torch
import torch.nn as nn
import torch.optim as optim
import swanlab

run = swanlab.init(
    project="pytorch-demo",
    experiment_name="mnist-mlp",
    config={
        "learning_rate": 1e-3,
        "batch_size": 64,
        "epochs": 10,
        "hidden_size": 128,
    },
)

model = nn.Sequential(
    nn.Flatten(),
    nn.Linear(28 * 28, run.config.hidden_size),
    nn.ReLU(),
    nn.Linear(run.config.hidden_size, 10),
)
optimizer = optim.Adam(model.parameters(), lr=run.config.learning_rate)
criterion = nn.CrossEntropyLoss()

for epoch in range(run.config.epochs):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        optimizer.zero_grad()
        logits = model(data)
        loss = criterion(logits, target)
        loss.backward()
        optimizer.step()

        if batch_idx % 100 == 0:
            swanlab.log(
                {
                    "train/loss": loss.item(),
                    "train/epoch": epoch,
                    "train/batch": batch_idx,
                }
            )

run.finish()
python
import torch
import torch.nn as nn
import torch.optim as optim
import swanlab

run = swanlab.init(
    project="pytorch-demo",
    experiment_name="mnist-mlp",
    config={
        "learning_rate": 1e-3,
        "batch_size": 64,
        "epochs": 10,
        "hidden_size": 128,
    },
)

model = nn.Sequential(
    nn.Flatten(),
    nn.Linear(28 * 28, run.config.hidden_size),
    nn.ReLU(),
    nn.Linear(run.config.hidden_size, 10),
)
optimizer = optim.Adam(model.parameters(), lr=run.config.learning_rate)
criterion = nn.CrossEntropyLoss()

for epoch in range(run.config.epochs):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        optimizer.zero_grad()
        logits = model(data)
        loss = criterion(logits, target)
        loss.backward()
        optimizer.step()

        if batch_idx % 100 == 0:
            swanlab.log(
                {
                    "train/loss": loss.item(),
                    "train/epoch": epoch,
                    "train/batch": batch_idx,
                }
            )

run.finish()

Core Concepts

核心概念

1. Projects and Experiments

1. 项目与实验

Project: Collection of related experiments
Experiment: Single execution of a training or evaluation workflow
python
import swanlab

run = swanlab.init(
    project="image-classification",
    experiment_name="resnet18-seed42",
    description="Baseline run on ImageNet subset",
    tags=["baseline", "resnet18"],
    config={
        "model": "resnet18",
        "seed": 42,
        "batch_size": 64,
        "learning_rate": 3e-4,
    },
)

print(run.id)
print(run.config.learning_rate)
项目:相关实验的集合
实验:训练或评估工作流的单次执行
python
import swanlab

run = swanlab.init(
    project="image-classification",
    experiment_name="resnet18-seed42",
    description="Baseline run on ImageNet subset",
    tags=["baseline", "resnet18"],
    config={
        "model": "resnet18",
        "seed": 42,
        "batch_size": 64,
        "learning_rate": 3e-4,
    },
)

print(run.id)
print(run.config.learning_rate)

2. Configuration Tracking

2. 配置追踪

python
config = {
    "model": "resnet18",
    "seed": 42,
    "batch_size": 64,
    "learning_rate": 3e-4,
    "epochs": 20,
}

run = swanlab.init(project="my-project", config=config)

learning_rate = run.config.learning_rate
batch_size = run.config.batch_size
python
config = {
    "model": "resnet18",
    "seed": 42,
    "batch_size": 64,
    "learning_rate": 3e-4,
    "epochs": 20,
}

run = swanlab.init(project="my-project", config=config)

learning_rate = run.config.learning_rate
batch_size = run.config.batch_size

3. Metric Logging

3. 指标日志

python
undefined
python
undefined

Log scalars

Log scalars

swanlab.log({"loss": 0.42, "accuracy": 0.91})
swanlab.log({"loss": 0.42, "accuracy": 0.91})

Log multiple metrics

Log multiple metrics

swanlab.log( { "train/loss": train_loss, "train/accuracy": train_acc, "val/loss": val_loss, "val/accuracy": val_acc, "lr": current_lr, "epoch": epoch, } )
swanlab.log( { "train/loss": train_loss, "train/accuracy": train_acc, "val/loss": val_loss, "val/accuracy": val_acc, "lr": current_lr, "epoch": epoch, } )

Log with custom step

Log with custom step

swanlab.log({"loss": loss}, step=global_step)
undefined
swanlab.log({"loss": loss}, step=global_step)
undefined

4. Media and Chart Logging

4. 媒体与图表日志

python
import numpy as np
import swanlab
python
import numpy as np
import swanlab

Image

Image

image = np.random.randint(0, 255, (224, 224, 3), dtype=np.uint8) swanlab.log({"examples/image": swanlab.Image(image, caption="Augmented sample")})
image = np.random.randint(0, 255, (224, 224, 3), dtype=np.uint8) swanlab.log({"examples/image": swanlab.Image(image, caption="Augmented sample")})

Audio

Audio

wave = np.sin(np.linspace(0, 8 * np.pi, 16000)).astype("float32") swanlab.log({"examples/audio": swanlab.Audio(wave, sample_rate=16000)})
wave = np.sin(np.linspace(0, 8 * np.pi, 16000)).astype("float32") swanlab.log({"examples/audio": swanlab.Audio(wave, sample_rate=16000)})

Text

Text

swanlab.log({"examples/text": swanlab.Text("Training notes for this run.")})
swanlab.log({"examples/text": swanlab.Text("Training notes for this run.")})

GIF video

GIF video

swanlab.log({"examples/video": swanlab.Video("predictions.gif", caption="Validation rollout")})
swanlab.log({"examples/video": swanlab.Video("predictions.gif", caption="Validation rollout")})

Point cloud

Point cloud

points = np.random.rand(128, 3).astype("float32") swanlab.log({"examples/point_cloud": swanlab.Object3D(points, caption="Point cloud sample")})
points = np.random.rand(128, 3).astype("float32") swanlab.log({"examples/point_cloud": swanlab.Object3D(points, caption="Point cloud sample")})

Molecule

Molecule

swanlab.log({"examples/molecule": swanlab.Molecule.from_smiles("CCO", caption="Ethanol")})

```python
swanlab.log({"examples/molecule": swanlab.Molecule.from_smiles("CCO", caption="Ethanol")})

```python

Custom chart with swanlab.echarts

Custom chart with swanlab.echarts

line = swanlab.echarts.Line() line.add_xaxis(["epoch-1", "epoch-2", "epoch-3"]) line.add_yaxis("train/loss", [0.92, 0.61, 0.44]) line.set_global_opts( title_opts=swanlab.echarts.options.TitleOpts(title="Training Loss") )
swanlab.log({"charts/loss_curve": line})

See [references/visualization.md](references/visualization.md) for more chart and media patterns.
line = swanlab.echarts.Line() line.add_xaxis(["epoch-1", "epoch-2", "epoch-3"]) line.add_yaxis("train/loss", [0.92, 0.61, 0.44]) line.set_global_opts( title_opts=swanlab.echarts.options.TitleOpts(title="Training Loss") )
swanlab.log({"charts/loss_curve": line})

更多图表和媒体记录方式请参考 [references/visualization.md](references/visualization.md)。

5. Local and Self-Hosted Workflows

5. 本地与自托管工作流

python
import os
import swanlab
python
import os
import swanlab

Self-hosted or cloud login

Self-hosted or cloud login

swanlab.login( api_key=os.environ["SWANLAB_API_KEY"], host="http://your-server:5092", )
swanlab.login( api_key=os.environ["SWANLAB_API_KEY"], host="http://your-server:5092", )

Local-only logging

Local-only logging

run = swanlab.init( project="offline-demo", mode="local", logdir="./swanlog", )
swanlab.log({"loss": 0.35, "epoch": 1}) run.finish()

```bash
run = swanlab.init( project="offline-demo", mode="local", logdir="./swanlog", )
swanlab.log({"loss": 0.35, "epoch": 1}) run.finish()

```bash

View local logs

View local logs

swanlab watch -l ./swanlog
swanlab watch -l ./swanlog

Sync local logs later

Sync local logs later

swanlab sync ./swanlog
undefined
swanlab sync ./swanlog
undefined

Integration Examples

集成示例

HuggingFace Transformers

HuggingFace Transformers

python
from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
    output_dir="./results",
    per_device_train_batch_size=8,
    evaluation_strategy="epoch",
    logging_steps=50,
    report_to="swanlab",
    run_name="bert-finetune",
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
)

trainer.train()
See references/integrations.md for callback-based setups and additional framework patterns.
python
from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
    output_dir="./results",
    per_device_train_batch_size=8,
    evaluation_strategy="epoch",
    logging_steps=50,
    report_to="swanlab",
    run_name="bert-finetune",
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
)

trainer.train()
更多基于回调的设置和其他框架集成方式请参考 references/integrations.md

PyTorch Lightning

PyTorch Lightning

python
import pytorch_lightning as pl
from swanlab.integration.pytorch_lightning import SwanLabLogger

swanlab_logger = SwanLabLogger(
    project="lightning-demo",
    experiment_name="mnist-classifier",
    config={"batch_size": 64, "max_epochs": 10},
)

trainer = pl.Trainer(
    logger=swanlab_logger,
    max_epochs=10,
    accelerator="auto",
)

trainer.fit(model, train_loader, val_loader)
python
import pytorch_lightning as pl
from swanlab.integration.pytorch_lightning import SwanLabLogger

swanlab_logger = SwanLabLogger(
    project="lightning-demo",
    experiment_name="mnist-classifier",
    config={"batch_size": 64, "max_epochs": 10},
)

trainer = pl.Trainer(
    logger=swanlab_logger,
    max_epochs=10,
    accelerator="auto",
)

trainer.fit(model, train_loader, val_loader)

Fastai

Fastai

python
from fastai.vision.all import accuracy, resnet34, vision_learner
from swanlab.integration.fastai import SwanLabCallback

learn = vision_learner(dls, resnet34, metrics=accuracy)
learn.fit(
    5,
    cbs=[
        SwanLabCallback(
            project="fastai-demo",
            experiment_name="pets-classification",
            config={"arch": "resnet34", "epochs": 5},
        )
    ],
)
See references/integrations.md for fuller framework examples.
python
from fastai.vision.all import accuracy, resnet34, vision_learner
from swanlab.integration.fastai import SwanLabCallback

learn = vision_learner(dls, resnet34, metrics=accuracy)
learn.fit(
    5,
    cbs=[
        SwanLabCallback(
            project="fastai-demo",
            experiment_name="pets-classification",
            config={"arch": "resnet34", "epochs": 5},
        )
    ],
)
更多完整的框架集成示例请参考 references/integrations.md

Best Practices

最佳实践

1. Use Stable Metric Names

1. 使用稳定的指标命名

python
undefined
python
undefined

Good: grouped metric namespaces

Good: grouped metric namespaces

swanlab.log({ "train/loss": train_loss, "train/accuracy": train_acc, "val/loss": val_loss, "val/accuracy": val_acc, })
swanlab.log({ "train/loss": train_loss, "train/accuracy": train_acc, "val/loss": val_loss, "val/accuracy": val_acc, })

Avoid mixing flat and grouped names for the same metric family

Avoid mixing flat and grouped names for the same metric family

undefined
undefined

2. Initialize Early and Capture Config Once

2. 尽早初始化并一次性捕获配置

python
run = swanlab.init(
    project="image-classification",
    experiment_name="resnet18-baseline",
    config={
        "model": "resnet18",
        "learning_rate": 3e-4,
        "batch_size": 64,
        "seed": 42,
    },
)
python
run = swanlab.init(
    project="image-classification",
    experiment_name="resnet18-baseline",
    config={
        "model": "resnet18",
        "learning_rate": 3e-4,
        "batch_size": 64,
        "seed": 42,
    },
)

3. Save Checkpoints Locally

3. 本地保存检查点

python
import torch
import swanlab

checkpoint_path = "checkpoints/best.pth"
torch.save(model.state_dict(), checkpoint_path)

swanlab.log(
    {
        "best/val_accuracy": best_val_accuracy,
        "artifacts/checkpoint_path": swanlab.Text(checkpoint_path),
    }
)
python
import torch
import swanlab

checkpoint_path = "checkpoints/best.pth"
torch.save(model.state_dict(), checkpoint_path)

swanlab.log(
    {
        "best/val_accuracy": best_val_accuracy,
        "artifacts/checkpoint_path": swanlab.Text(checkpoint_path),
    }
)

4. Use Local Mode for Offline-First Workflows

4. 为离线优先工作流使用本地模式

python
run = swanlab.init(project="offline-demo", mode="local", logdir="./swanlog")
python
run = swanlab.init(project="offline-demo", mode="local", logdir="./swanlog")

... training code ...

... training code ...

run.finish()
run.finish()

Inspect later with: swanlab watch -l ./swanlog

Inspect later with: swanlab watch -l ./swanlog

undefined
undefined

5. Keep Advanced Patterns in References

5. 高级用法请参考参考文档

  • Use references/visualization.md for advanced chart and media patterns
  • Use references/integrations.md for callback-based and framework-specific integration details
  • 高级图表和媒体记录方式请参考 references/visualization.md
  • 基于回调的集成和框架特定细节请参考 references/integrations.md

Resources

相关资源

See Also

拓展阅读

  • references/integrations.md - Framework-specific examples
  • references/visualization.md - Charts and media logging patterns
  • references/integrations.md - 框架特定示例
  • references/visualization.md - 图表与媒体记录模式