experiment-tracking

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Experiment Tracking

实验追踪

Track ML experiments, metrics, and models.
追踪机器学习实验、指标与模型。

Comparison

平台对比

PlatformBest ForSelf-hostedVisualization
MLflowOpen-source, model registryYesBasic
W&BCollaboration, sweepsLimitedExcellent
NeptuneTeam collaborationNoGood
ClearMLFull MLOpsYesGood

平台最适用场景可自部署可视化能力
MLflow开源方案、模型注册基础级
W&B团队协作、超参数调优搜索有限支持优秀级
Neptune团队协作良好级
ClearML全流程MLOps良好级

MLflow

MLflow

Open-source platform from Databricks.
Core components:
  • Tracking: Log parameters, metrics, artifacts
  • Projects: Reproducible runs (MLproject file)
  • Models: Package and deploy models
  • Registry: Model versioning and staging
Strengths: Self-hosted, open-source, model registry, framework integrations Limitations: Basic visualization, less collaborative features
Key concept: Autologging for major frameworks - automatic metric capture with one line.

Databricks推出的开源平台。
核心组件:
  • Tracking(追踪):记录参数、指标、制品
  • Projects(项目):可复现的运行环境(基于MLproject文件)
  • Models(模型):模型打包与部署
  • Registry(注册中心):模型版本管理与 staging 阶段管控
优势:支持自部署、开源、内置模型注册中心、多框架集成 局限性:可视化能力基础、协作功能较少 核心概念:主流框架自动记录——仅需一行代码即可自动捕获指标。

Weights & Biases (W&B)

Weights & Biases (W&B)

Cloud-first experiment tracking with excellent visualization.
Core features:
  • Experiment tracking: Metrics, hyperparameters, system stats
  • Sweeps: Hyperparameter search (grid, random, Bayesian)
  • Artifacts: Dataset and model versioning
  • Reports: Shareable documentation
Strengths: Beautiful visualizations, team collaboration, hyperparameter sweeps Limitations: Cloud-dependent, limited self-hosting
Key concept:
wandb.init()
+
wandb.log()
- simple API, powerful features.

优先支持云端的实验追踪工具,具备出色的可视化能力。
核心功能:
  • 实验追踪:指标、超参数、系统状态
  • Sweeps(调优搜索):超参数搜索(网格、随机、贝叶斯算法)
  • Artifacts(制品):数据集与模型版本管理
  • Reports(报告):可分享的文档
优势:可视化效果出色、支持团队协作、内置超参数调优搜索 局限性:依赖云端、自部署支持有限 核心概念
wandb.init()
+
wandb.log()
——简洁API,功能强大。

What to Track

需要追踪的内容

CategoryExamples
HyperparametersLearning rate, batch size, architecture
MetricsLoss, accuracy, F1, per-epoch values
ArtifactsModel checkpoints, configs, datasets
SystemGPU usage, memory, runtime
CodeGit commit, diff, requirements

分类示例
超参数学习率、批量大小、模型架构
指标损失值、准确率、F1值、每轮训练数值
制品模型 checkpoint、配置文件、数据集
系统状态GPU使用率、内存占用、运行时长
代码信息Git提交记录、代码差异、依赖清单

Model Registry Concepts

模型注册中心核心概念

StagePurpose
NoneJust logged, not registered
StagingTesting, validation
ProductionServing live traffic
ArchivedDeprecated, kept for reference

阶段用途
None(未注册)仅已记录,未纳入注册中心
Staging(预发布)测试、验证阶段
Production(生产环境)用于线上服务
Archived(已归档)已弃用,仅留作参考

Decision Guide

选型指南

ScenarioRecommendation
Self-hosted requirementMLflow
Team collaborationW&B
Model registry focusMLflow
Hyperparameter sweepsW&B
Beautiful dashboardsW&B
Full MLOps pipelineMLflow + deployment tools
场景推荐方案
需自部署MLflow
团队协作需求W&B
聚焦模型注册管理MLflow
超参数调优需求W&B
需要精美仪表盘W&B
全流程MLOps管线MLflow + 部署工具

Resources

参考资源