mlflow
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseMLflow: ML Lifecycle Management Platform
MLflow:机器学习生命周期管理平台
When to Use This Skill
何时使用该工具
Use MLflow when you need to:
- Track ML experiments with parameters, metrics, and artifacts
- Manage model registry with versioning and stage transitions
- Deploy models to various platforms (local, cloud, serving)
- Reproduce experiments with project configurations
- Compare model versions and performance metrics
- Collaborate on ML projects with team workflows
- Integrate with any ML framework (framework-agnostic)
Users: 20,000+ organizations | GitHub Stars: 23k+ | License: Apache 2.0
当你需要以下功能时,使用MLflow:
- 跟踪机器学习实验:记录参数、指标和产物
- 管理模型注册表:支持版本控制和阶段转换
- 部署模型:可部署到多种平台(本地、云端、服务端)
- 复现实验:通过项目配置复现实验结果
- 对比模型版本:比较不同版本的性能指标
- 团队协作:在机器学习项目中实现团队协作流程
- 框架兼容:可与任意机器学习框架集成(与框架无关)
用户规模:20,000+ 组织机构 | GitHub星标:23k+ | 许可证:Apache 2.0
Installation
安装
bash
undefinedbash
undefinedInstall MLflow
Install MLflow
pip install mlflow
pip install mlflow
Install with extras
Install with extras
pip install mlflow[extras] # Includes SQLAlchemy, boto3, etc.
pip install mlflow[extras] # Includes SQLAlchemy, boto3, etc.
Start MLflow UI
Start MLflow UI
mlflow ui
mlflow ui
Access at http://localhost:5000
Access at http://localhost:5000
undefinedundefinedQuick Start
快速入门
Basic Tracking
基础跟踪
python
import mlflowpython
import mlflowStart a run
Start a run
with mlflow.start_run():
# Log parameters
mlflow.log_param("learning_rate", 0.001)
mlflow.log_param("batch_size", 32)
# Your training code
model = train_model()
# Log metrics
mlflow.log_metric("train_loss", 0.15)
mlflow.log_metric("val_accuracy", 0.92)
# Log model
mlflow.sklearn.log_model(model, "model")undefinedwith mlflow.start_run():
# Log parameters
mlflow.log_param("learning_rate", 0.001)
mlflow.log_param("batch_size", 32)
# Your training code
model = train_model()
# Log metrics
mlflow.log_metric("train_loss", 0.15)
mlflow.log_metric("val_accuracy", 0.92)
# Log model
mlflow.sklearn.log_model(model, "model")undefinedAutologging (Automatic Tracking)
自动日志记录(Autologging)
python
import mlflow
from sklearn.ensemble import RandomForestClassifierpython
import mlflow
from sklearn.ensemble import RandomForestClassifierEnable autologging
Enable autologging
mlflow.autolog()
mlflow.autolog()
Train (automatically logged)
Train (automatically logged)
model = RandomForestClassifier(n_estimators=100, max_depth=5)
model.fit(X_train, y_train)
model = RandomForestClassifier(n_estimators=100, max_depth=5)
model.fit(X_train, y_train)
Metrics, parameters, and model logged automatically!
Metrics, parameters, and model logged automatically!
undefinedundefinedCore Concepts
核心概念
1. Experiments and Runs
1. 实验与运行(Experiments and Runs)
Experiment: Logical container for related runs
Run: Single execution of ML code (parameters, metrics, artifacts)
python
import mlflowExperiment(实验):相关运行的逻辑容器
Run(运行):机器学习代码的单次执行(包含参数、指标、产物)
python
import mlflowCreate/set experiment
Create/set experiment
mlflow.set_experiment("my-experiment")
mlflow.set_experiment("my-experiment")
Start a run
Start a run
with mlflow.start_run(run_name="baseline-model"):
# Log params
mlflow.log_param("model", "ResNet50")
mlflow.log_param("epochs", 10)
# Train
model = train()
# Log metrics
mlflow.log_metric("accuracy", 0.95)
# Log model
mlflow.pytorch.log_model(model, "model")with mlflow.start_run(run_name="baseline-model"):
# Log params
mlflow.log_param("model", "ResNet50")
mlflow.log_param("epochs", 10)
# Train
model = train()
# Log metrics
mlflow.log_metric("accuracy", 0.95)
# Log model
mlflow.pytorch.log_model(model, "model")Run ID is automatically generated
Run ID is automatically generated
print(f"Run ID: {mlflow.active_run().info.run_id}")
undefinedprint(f"Run ID: {mlflow.active_run().info.run_id}")
undefined2. Logging Parameters
2. 记录参数(Logging Parameters)
python
with mlflow.start_run():
# Single parameter
mlflow.log_param("learning_rate", 0.001)
# Multiple parameters
mlflow.log_params({
"batch_size": 32,
"epochs": 50,
"optimizer": "Adam",
"dropout": 0.2
})
# Nested parameters (as dict)
config = {
"model": {
"architecture": "ResNet50",
"pretrained": True
},
"training": {
"lr": 0.001,
"weight_decay": 1e-4
}
}
# Log as JSON string or individual params
for key, value in config.items():
mlflow.log_param(key, str(value))python
with mlflow.start_run():
# Single parameter
mlflow.log_param("learning_rate", 0.001)
# Multiple parameters
mlflow.log_params({
"batch_size": 32,
"epochs": 50,
"optimizer": "Adam",
"dropout": 0.2
})
# Nested parameters (as dict)
config = {
"model": {
"architecture": "ResNet50",
"pretrained": True
},
"training": {
"lr": 0.001,
"weight_decay": 1e-4
}
}
# Log as JSON string or individual params
for key, value in config.items():
mlflow.log_param(key, str(value))3. Logging Metrics
3. 记录指标(Logging Metrics)
python
with mlflow.start_run():
# Training loop
for epoch in range(NUM_EPOCHS):
train_loss = train_epoch()
val_loss = validate()
# Log metrics at each step
mlflow.log_metric("train_loss", train_loss, step=epoch)
mlflow.log_metric("val_loss", val_loss, step=epoch)
# Log multiple metrics
mlflow.log_metrics({
"train_accuracy": train_acc,
"val_accuracy": val_acc
}, step=epoch)
# Log final metrics (no step)
mlflow.log_metric("final_accuracy", final_acc)python
with mlflow.start_run():
# Training loop
for epoch in range(NUM_EPOCHS):
train_loss = train_epoch()
val_loss = validate()
# Log metrics at each step
mlflow.log_metric("train_loss", train_loss, step=epoch)
mlflow.log_metric("val_loss", val_loss, step=epoch)
# Log multiple metrics
mlflow.log_metrics({
"train_accuracy": train_acc,
"val_accuracy": val_acc
}, step=epoch)
# Log final metrics (no step)
mlflow.log_metric("final_accuracy", final_acc)4. Logging Artifacts
4. 记录产物(Logging Artifacts)
python
with mlflow.start_run():
# Log file
model.save('model.pkl')
mlflow.log_artifact('model.pkl')
# Log directory
os.makedirs('plots', exist_ok=True)
plt.savefig('plots/loss_curve.png')
mlflow.log_artifacts('plots')
# Log text
with open('config.txt', 'w') as f:
f.write(str(config))
mlflow.log_artifact('config.txt')
# Log dict as JSON
mlflow.log_dict({'config': config}, 'config.json')python
with mlflow.start_run():
# Log file
model.save('model.pkl')
mlflow.log_artifact('model.pkl')
# Log directory
os.makedirs('plots', exist_ok=True)
plt.savefig('plots/loss_curve.png')
mlflow.log_artifacts('plots')
# Log text
with open('config.txt', 'w') as f:
f.write(str(config))
mlflow.log_artifact('config.txt')
# Log dict as JSON
mlflow.log_dict({'config': config}, 'config.json')5. Logging Models
5. 记录模型(Logging Models)
python
undefinedpython
undefinedPyTorch
PyTorch
import mlflow.pytorch
with mlflow.start_run():
model = train_pytorch_model()
mlflow.pytorch.log_model(model, "model")
import mlflow.pytorch
with mlflow.start_run():
model = train_pytorch_model()
mlflow.pytorch.log_model(model, "model")
Scikit-learn
Scikit-learn
import mlflow.sklearn
with mlflow.start_run():
model = train_sklearn_model()
mlflow.sklearn.log_model(model, "model")
import mlflow.sklearn
with mlflow.start_run():
model = train_sklearn_model()
mlflow.sklearn.log_model(model, "model")
Keras/TensorFlow
Keras/TensorFlow
import mlflow.keras
with mlflow.start_run():
model = train_keras_model()
mlflow.keras.log_model(model, "model")
import mlflow.keras
with mlflow.start_run():
model = train_keras_model()
mlflow.keras.log_model(model, "model")
HuggingFace Transformers
HuggingFace Transformers
import mlflow.transformers
with mlflow.start_run():
mlflow.transformers.log_model(
transformers_model={
"model": model,
"tokenizer": tokenizer
},
artifact_path="model"
)
undefinedimport mlflow.transformers
with mlflow.start_run():
mlflow.transformers.log_model(
transformers_model={
"model": model,
"tokenizer": tokenizer
},
artifact_path="model"
)
undefinedAutologging
自动日志记录(Autologging)
Automatically log metrics, parameters, and models for popular frameworks.
自动为主流框架记录指标、参数和模型。
Enable Autologging
启用自动日志记录
python
import mlflowpython
import mlflowEnable for all supported frameworks
Enable for all supported frameworks
mlflow.autolog()
mlflow.autolog()
Or enable for specific framework
Or enable for specific framework
mlflow.sklearn.autolog()
mlflow.pytorch.autolog()
mlflow.keras.autolog()
mlflow.xgboost.autolog()
undefinedmlflow.sklearn.autolog()
mlflow.pytorch.autolog()
mlflow.keras.autolog()
mlflow.xgboost.autolog()
undefinedAutologging with Scikit-learn
Scikit-learn自动日志记录
python
import mlflow
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_splitpython
import mlflow
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_splitEnable autologging
Enable autologging
mlflow.sklearn.autolog()
mlflow.sklearn.autolog()
Split data
Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
Train (automatically logs params, metrics, model)
Train (automatically logs params, metrics, model)
with mlflow.start_run():
model = RandomForestClassifier(n_estimators=100, max_depth=5, random_state=42)
model.fit(X_train, y_train)
# Metrics like accuracy, f1_score logged automatically
# Model logged automatically
# Training duration loggedundefinedwith mlflow.start_run():
model = RandomForestClassifier(n_estimators=100, max_depth=5, random_state=42)
model.fit(X_train, y_train)
# Metrics like accuracy, f1_score logged automatically
# Model logged automatically
# Training duration loggedundefinedAutologging with PyTorch Lightning
PyTorch Lightning自动日志记录
python
import mlflow
import pytorch_lightning as plpython
import mlflow
import pytorch_lightning as plEnable autologging
Enable autologging
mlflow.pytorch.autolog()
mlflow.pytorch.autolog()
Train
Train
with mlflow.start_run():
trainer = pl.Trainer(max_epochs=10)
trainer.fit(model, datamodule=dm)
# Hyperparameters logged
# Training metrics logged
# Best model checkpoint loggedundefinedwith mlflow.start_run():
trainer = pl.Trainer(max_epochs=10)
trainer.fit(model, datamodule=dm)
# Hyperparameters logged
# Training metrics logged
# Best model checkpoint loggedundefinedModel Registry
模型注册表(Model Registry)
Manage model lifecycle with versioning and stage transitions.
通过版本控制和阶段转换管理模型生命周期。
Register Model
注册模型
python
import mlflowpython
import mlflowLog and register model
Log and register model
with mlflow.start_run():
model = train_model()
# Log model
mlflow.sklearn.log_model(
model,
"model",
registered_model_name="my-classifier" # Register immediately
)with mlflow.start_run():
model = train_model()
# Log model
mlflow.sklearn.log_model(
model,
"model",
registered_model_name="my-classifier" # Register immediately
)Or register later
Or register later
run_id = "abc123"
model_uri = f"runs:/{run_id}/model"
mlflow.register_model(model_uri, "my-classifier")
undefinedrun_id = "abc123"
model_uri = f"runs:/{run_id}/model"
mlflow.register_model(model_uri, "my-classifier")
undefinedModel Stages
模型阶段(Model Stages)
Transition models between stages: None → Staging → Production → Archived
python
from mlflow.tracking import MlflowClient
client = MlflowClient()在不同阶段之间转换模型:None(未分类) → Staging(测试) → Production(生产) → Archived(归档)
python
from mlflow.tracking import MlflowClient
client = MlflowClient()Promote to staging
Promote to staging
client.transition_model_version_stage(
name="my-classifier",
version=3,
stage="Staging"
)
client.transition_model_version_stage(
name="my-classifier",
version=3,
stage="Staging"
)
Promote to production
Promote to production
client.transition_model_version_stage(
name="my-classifier",
version=3,
stage="Production",
archive_existing_versions=True # Archive old production versions
)
client.transition_model_version_stage(
name="my-classifier",
version=3,
stage="Production",
archive_existing_versions=True # Archive old production versions
)
Archive model
Archive model
client.transition_model_version_stage(
name="my-classifier",
version=2,
stage="Archived"
)
undefinedclient.transition_model_version_stage(
name="my-classifier",
version=2,
stage="Archived"
)
undefinedLoad Model from Registry
从注册表加载模型
python
import mlflow.pyfuncpython
import mlflow.pyfuncLoad latest production model
Load latest production model
model = mlflow.pyfunc.load_model("models:/my-classifier/Production")
model = mlflow.pyfunc.load_model("models:/my-classifier/Production")
Load specific version
Load specific version
model = mlflow.pyfunc.load_model("models:/my-classifier/3")
model = mlflow.pyfunc.load_model("models:/my-classifier/3")
Load from staging
Load from staging
model = mlflow.pyfunc.load_model("models:/my-classifier/Staging")
model = mlflow.pyfunc.load_model("models:/my-classifier/Staging")
Use model
Use model
predictions = model.predict(X_test)
undefinedpredictions = model.predict(X_test)
undefinedModel Versioning
模型版本控制(Model Versioning)
python
client = MlflowClient()python
client = MlflowClient()List all versions
List all versions
versions = client.search_model_versions("name='my-classifier'")
for v in versions:
print(f"Version {v.version}: {v.current_stage}")
versions = client.search_model_versions("name='my-classifier'")
for v in versions:
print(f"Version {v.version}: {v.current_stage}")
Get latest version by stage
Get latest version by stage
latest_prod = client.get_latest_versions("my-classifier", stages=["Production"])
latest_staging = client.get_latest_versions("my-classifier", stages=["Staging"])
latest_prod = client.get_latest_versions("my-classifier", stages=["Production"])
latest_staging = client.get_latest_versions("my-classifier", stages=["Staging"])
Get model version details
Get model version details
version_info = client.get_model_version(name="my-classifier", version="3")
print(f"Run ID: {version_info.run_id}")
print(f"Stage: {version_info.current_stage}")
print(f"Tags: {version_info.tags}")
undefinedversion_info = client.get_model_version(name="my-classifier", version="3")
print(f"Run ID: {version_info.run_id}")
print(f"Stage: {version_info.current_stage}")
print(f"Tags: {version_info.tags}")
undefinedModel Annotations
模型注释(Model Annotations)
python
client = MlflowClient()python
client = MlflowClient()Add description
Add description
client.update_model_version(
name="my-classifier",
version="3",
description="ResNet50 classifier trained on 1M images with 95% accuracy"
)
client.update_model_version(
name="my-classifier",
version="3",
description="ResNet50 classifier trained on 1M images with 95% accuracy"
)
Add tags
Add tags
client.set_model_version_tag(
name="my-classifier",
version="3",
key="validation_status",
value="approved"
)
client.set_model_version_tag(
name="my-classifier",
version="3",
key="deployed_date",
value="2025-01-15"
)
undefinedclient.set_model_version_tag(
name="my-classifier",
version="3",
key="validation_status",
value="approved"
)
client.set_model_version_tag(
name="my-classifier",
version="3",
key="deployed_date",
value="2025-01-15"
)
undefinedSearching Runs
搜索运行记录(Searching Runs)
Find runs programmatically.
python
from mlflow.tracking import MlflowClient
client = MlflowClient()通过编程方式查找运行记录。
python
from mlflow.tracking import MlflowClient
client = MlflowClient()Search all runs in experiment
Search all runs in experiment
experiment_id = client.get_experiment_by_name("my-experiment").experiment_id
runs = client.search_runs(
experiment_ids=[experiment_id],
filter_string="metrics.accuracy > 0.9",
order_by=["metrics.accuracy DESC"],
max_results=10
)
for run in runs:
print(f"Run ID: {run.info.run_id}")
print(f"Accuracy: {run.data.metrics['accuracy']}")
print(f"Params: {run.data.params}")
experiment_id = client.get_experiment_by_name("my-experiment").experiment_id
runs = client.search_runs(
experiment_ids=[experiment_id],
filter_string="metrics.accuracy > 0.9",
order_by=["metrics.accuracy DESC"],
max_results=10
)
for run in runs:
print(f"Run ID: {run.info.run_id}")
print(f"Accuracy: {run.data.metrics['accuracy']}")
print(f"Params: {run.data.params}")
Search with complex filters
Search with complex filters
runs = client.search_runs(
experiment_ids=[experiment_id],
filter_string="""
metrics.accuracy > 0.9 AND
params.model = 'ResNet50' AND
tags.dataset = 'ImageNet'
""",
order_by=["metrics.f1_score DESC"]
)
undefinedruns = client.search_runs(
experiment_ids=[experiment_id],
filter_string="""
metrics.accuracy > 0.9 AND
params.model = 'ResNet50' AND
tags.dataset = 'ImageNet'
""",
order_by=["metrics.f1_score DESC"]
)
undefinedIntegration Examples
集成示例
PyTorch
PyTorch
python
import mlflow
import torch
import torch.nn as nnpython
import mlflow
import torch
import torch.nn as nnEnable autologging
Enable autologging
mlflow.pytorch.autolog()
with mlflow.start_run():
# Log config
config = {
"lr": 0.001,
"epochs": 10,
"batch_size": 32
}
mlflow.log_params(config)
# Train
model = create_model()
optimizer = torch.optim.Adam(model.parameters(), lr=config["lr"])
for epoch in range(config["epochs"]):
train_loss = train_epoch(model, optimizer, train_loader)
val_loss, val_acc = validate(model, val_loader)
# Log metrics
mlflow.log_metrics({
"train_loss": train_loss,
"val_loss": val_loss,
"val_accuracy": val_acc
}, step=epoch)
# Log model
mlflow.pytorch.log_model(model, "model")undefinedmlflow.pytorch.autolog()
with mlflow.start_run():
# Log config
config = {
"lr": 0.001,
"epochs": 10,
"batch_size": 32
}
mlflow.log_params(config)
# Train
model = create_model()
optimizer = torch.optim.Adam(model.parameters(), lr=config["lr"])
for epoch in range(config["epochs"]):
train_loss = train_epoch(model, optimizer, train_loader)
val_loss, val_acc = validate(model, val_loader)
# Log metrics
mlflow.log_metrics({
"train_loss": train_loss,
"val_loss": val_loss,
"val_accuracy": val_acc
}, step=epoch)
# Log model
mlflow.pytorch.log_model(model, "model")undefinedHuggingFace Transformers
HuggingFace Transformers
python
import mlflow
from transformers import Trainer, TrainingArgumentspython
import mlflow
from transformers import Trainer, TrainingArgumentsEnable autologging
Enable autologging
mlflow.transformers.autolog()
training_args = TrainingArguments(
output_dir="./results",
num_train_epochs=3,
per_device_train_batch_size=16,
evaluation_strategy="epoch",
save_strategy="epoch",
load_best_model_at_end=True
)
mlflow.transformers.autolog()
training_args = TrainingArguments(
output_dir="./results",
num_train_epochs=3,
per_device_train_batch_size=16,
evaluation_strategy="epoch",
save_strategy="epoch",
load_best_model_at_end=True
)
Start MLflow run
Start MLflow run
with mlflow.start_run():
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset
)
# Train (automatically logged)
trainer.train()
# Log final model to registry
mlflow.transformers.log_model(
transformers_model={
"model": trainer.model,
"tokenizer": tokenizer
},
artifact_path="model",
registered_model_name="hf-classifier"
)undefinedwith mlflow.start_run():
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset
)
# Train (automatically logged)
trainer.train()
# Log final model to registry
mlflow.transformers.log_model(
transformers_model={
"model": trainer.model,
"tokenizer": tokenizer
},
artifact_path="model",
registered_model_name="hf-classifier"
)undefinedXGBoost
XGBoost
python
import mlflow
import xgboost as xgbpython
import mlflow
import xgboost as xgbEnable autologging
Enable autologging
mlflow.xgboost.autolog()
with mlflow.start_run():
dtrain = xgb.DMatrix(X_train, label=y_train)
dval = xgb.DMatrix(X_val, label=y_val)
params = {
'max_depth': 6,
'learning_rate': 0.1,
'objective': 'binary:logistic',
'eval_metric': ['logloss', 'auc']
}
# Train (automatically logged)
model = xgb.train(
params,
dtrain,
num_boost_round=100,
evals=[(dtrain, 'train'), (dval, 'val')],
early_stopping_rounds=10
)
# Model and metrics logged automaticallyundefinedmlflow.xgboost.autolog()
with mlflow.start_run():
dtrain = xgb.DMatrix(X_train, label=y_train)
dval = xgb.DMatrix(X_val, label=y_val)
params = {
'max_depth': 6,
'learning_rate': 0.1,
'objective': 'binary:logistic',
'eval_metric': ['logloss', 'auc']
}
# Train (automatically logged)
model = xgb.train(
params,
dtrain,
num_boost_round=100,
evals=[(dtrain, 'train'), (dval, 'val')],
early_stopping_rounds=10
)
# Model and metrics logged automaticallyundefinedBest Practices
最佳实践
1. Organize with Experiments
1. 用实验组织项目
python
undefinedpython
undefined✅ Good: Separate experiments for different tasks
✅ Good: Separate experiments for different tasks
mlflow.set_experiment("sentiment-analysis")
mlflow.set_experiment("image-classification")
mlflow.set_experiment("recommendation-system")
mlflow.set_experiment("sentiment-analysis")
mlflow.set_experiment("image-classification")
mlflow.set_experiment("recommendation-system")
❌ Bad: Everything in one experiment
❌ Bad: Everything in one experiment
mlflow.set_experiment("all-models")
undefinedmlflow.set_experiment("all-models")
undefined2. Use Descriptive Run Names
2. 使用描述性的运行名称
python
undefinedpython
undefined✅ Good: Descriptive names
✅ Good: Descriptive names
with mlflow.start_run(run_name="resnet50-imagenet-lr0.001-bs32"):
train()
with mlflow.start_run(run_name="resnet50-imagenet-lr0.001-bs32"):
train()
❌ Bad: No name (auto-generated UUID)
❌ Bad: No name (auto-generated UUID)
with mlflow.start_run():
train()
undefinedwith mlflow.start_run():
train()
undefined3. Log Comprehensive Metadata
3. 记录全面的元数据
python
with mlflow.start_run():
# Log hyperparameters
mlflow.log_params({
"learning_rate": 0.001,
"batch_size": 32,
"epochs": 50
})
# Log system info
mlflow.set_tags({
"dataset": "ImageNet",
"framework": "PyTorch 2.0",
"gpu": "A100",
"git_commit": get_git_commit()
})
# Log data info
mlflow.log_param("train_samples", len(train_dataset))
mlflow.log_param("val_samples", len(val_dataset))python
with mlflow.start_run():
# Log hyperparameters
mlflow.log_params({
"learning_rate": 0.001,
"batch_size": 32,
"epochs": 50
})
# Log system info
mlflow.set_tags({
"dataset": "ImageNet",
"framework": "PyTorch 2.0",
"gpu": "A100",
"git_commit": get_git_commit()
})
# Log data info
mlflow.log_param("train_samples", len(train_dataset))
mlflow.log_param("val_samples", len(val_dataset))4. Track Model Lineage
4. 跟踪模型 lineage
python
undefinedpython
undefinedLink runs to understand lineage
Link runs to understand lineage
with mlflow.start_run(run_name="preprocessing"):
data = preprocess()
mlflow.log_artifact("data.csv")
preprocessing_run_id = mlflow.active_run().info.run_id
with mlflow.start_run(run_name="training"):
# Reference parent run
mlflow.set_tag("preprocessing_run_id", preprocessing_run_id)
model = train(data)
undefinedwith mlflow.start_run(run_name="preprocessing"):
data = preprocess()
mlflow.log_artifact("data.csv")
preprocessing_run_id = mlflow.active_run().info.run_id
with mlflow.start_run(run_name="training"):
# Reference parent run
mlflow.set_tag("preprocessing_run_id", preprocessing_run_id)
model = train(data)
undefined5. Use Model Registry for Deployment
5. 使用模型注册表进行部署
python
undefinedpython
undefined✅ Good: Use registry for production
✅ Good: Use registry for production
model_uri = "models:/my-classifier/Production"
model = mlflow.pyfunc.load_model(model_uri)
model_uri = "models:/my-classifier/Production"
model = mlflow.pyfunc.load_model(model_uri)
❌ Bad: Hard-code run IDs
❌ Bad: Hard-code run IDs
model_uri = "runs:/abc123/model"
model = mlflow.pyfunc.load_model(model_uri)
undefinedmodel_uri = "runs:/abc123/model"
model = mlflow.pyfunc.load_model(model_uri)
undefinedDeployment
部署
Serve Model Locally
本地部署模型
bash
undefinedbash
undefinedServe registered model
Serve registered model
mlflow models serve -m "models:/my-classifier/Production" -p 5001
mlflow models serve -m "models:/my-classifier/Production" -p 5001
Serve from run
Serve from run
mlflow models serve -m "runs:/<RUN_ID>/model" -p 5001
mlflow models serve -m "runs:/<RUN_ID>/model" -p 5001
Test endpoint
Test endpoint
curl http://127.0.0.1:5001/invocations -H 'Content-Type: application/json' -d '{
"inputs": [[1.0, 2.0, 3.0, 4.0]]
}'
undefinedcurl http://127.0.0.1:5001/invocations -H 'Content-Type: application/json' -d '{
"inputs": [[1.0, 2.0, 3.0, 4.0]]
}'
undefinedDeploy to Cloud
部署到云端
bash
undefinedbash
undefinedDeploy to AWS SageMaker
Deploy to AWS SageMaker
mlflow sagemaker deploy -m "models:/my-classifier/Production" --region-name us-west-2
mlflow sagemaker deploy -m "models:/my-classifier/Production" --region-name us-west-2
Deploy to Azure ML
Deploy to Azure ML
mlflow azureml deploy -m "models:/my-classifier/Production"
undefinedmlflow azureml deploy -m "models:/my-classifier/Production"
undefinedConfiguration
配置
Tracking Server
跟踪服务器(Tracking Server)
bash
undefinedbash
undefinedStart tracking server with backend store
Start tracking server with backend store
mlflow server
--backend-store-uri postgresql://user:password@localhost/mlflow
--default-artifact-root s3://my-bucket/mlflow
--host 0.0.0.0
--port 5000
--backend-store-uri postgresql://user:password@localhost/mlflow
--default-artifact-root s3://my-bucket/mlflow
--host 0.0.0.0
--port 5000
undefinedmlflow server
--backend-store-uri postgresql://user:password@localhost/mlflow
--default-artifact-root s3://my-bucket/mlflow
--host 0.0.0.0
--port 5000
--backend-store-uri postgresql://user:password@localhost/mlflow
--default-artifact-root s3://my-bucket/mlflow
--host 0.0.0.0
--port 5000
undefinedClient Configuration
客户端配置
python
import mlflowpython
import mlflowSet tracking URI
Set tracking URI
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_tracking_uri("http://localhost:5000")
Or use environment variable
Or use environment variable
export MLFLOW_TRACKING_URI=http://localhost:5000
export MLFLOW_TRACKING_URI=http://localhost:5000
undefinedundefinedResources
资源
- Documentation: https://mlflow.org/docs/latest
- GitHub: https://github.com/mlflow/mlflow (23k+ stars)
- Examples: https://github.com/mlflow/mlflow/tree/master/examples
- Community: https://mlflow.org/community
See Also
另请参阅
- - Comprehensive tracking guide
references/tracking.md - - Model lifecycle management
references/model-registry.md - - Production deployment patterns
references/deployment.md
- - 全面的跟踪指南
references/tracking.md - - 模型生命周期管理
references/model-registry.md - - 生产部署模式
references/deployment.md