ml-pipeline-workflow

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

ML Pipeline Workflow

ML流水线工作流

Complete end-to-end MLOps pipeline orchestration from data preparation through model deployment.

实现从数据准备到模型部署的端到端MLOps流水线编排。

Overview

概述

This skill provides comprehensive guidance for building production ML pipelines that handle the full lifecycle: data ingestion → preparation → training → validation → deployment → monitoring.

本技能提供了构建覆盖完整生命周期的生产级ML流水线的全面指导：数据采集 → 准备 → 训练 → 验证 → 部署 → 监控。

When to Use This Skill

何时使用该技能

Building new ML pipelines from scratch
Designing workflow orchestration for ML systems
Implementing data → model → deployment automation
Setting up reproducible training workflows
Creating DAG-based ML orchestration
Integrating ML components into production systems

从零开始构建新的ML流水线
为ML系统设计工作流编排
实现从数据→模型→部署的自动化
搭建可复现的训练工作流
创建基于DAG的ML编排
将ML组件集成到生产系统中

What This Skill Provides

本技能提供的内容

Core Capabilities

核心能力

Pipeline Architecture
- End-to-end workflow design
- DAG orchestration patterns (Airflow, Dagster, Kubeflow)
- Component dependencies and data flow
- Error handling and retry strategies
Data Preparation
- Data validation and quality checks
- Feature engineering pipelines
- Data versioning and lineage
- Train/validation/test splitting strategies
Model Training
- Training job orchestration
- Hyperparameter management
- Experiment tracking integration
- Distributed training patterns
Model Validation
- Validation frameworks and metrics
- A/B testing infrastructure
- Performance regression detection
- Model comparison workflows
Deployment Automation
- Model serving patterns
- Canary deployments
- Blue-green deployment strategies
- Rollback mechanisms

流水线架构
- 端到端工作流设计
- DAG编排模式（Airflow、Dagster、Kubeflow）
- 组件依赖与数据流
- 错误处理与重试策略
数据准备
- 数据验证与质量检查
- 特征工程流水线
- 数据版本控制与血缘追踪
- 训练/验证/测试集拆分策略
模型训练
- 训练任务编排
- 超参数管理
- 实验追踪集成
- 分布式训练模式
模型验证
- 验证框架与指标
- A/B测试基础设施
- 性能退化检测
- 模型对比工作流
部署自动化
- 模型服务模式
- 金丝雀部署
- 蓝绿部署策略
- 回滚机制

Reference Documentation

参考文档

See the

references/

directory for detailed guides:

data-preparation.md - Data cleaning, validation, and feature engineering
model-training.md - Training workflows and best practices
model-validation.md - Validation strategies and metrics
model-deployment.md - Deployment patterns and serving architectures

请查看

references/

目录下的详细指南：

data-preparation.md - 数据清洗、验证与特征工程
model-training.md - 训练工作流与最佳实践
model-validation.md - 验证策略与指标
model-deployment.md - 部署模式与服务架构

Assets and Templates

资产与模板

The

assets/

directory contains:

pipeline-dag.yaml.template - DAG template for workflow orchestration
training-config.yaml - Training configuration template
validation-checklist.md - Pre-deployment validation checklist

assets/

目录包含：

pipeline-dag.yaml.template - 工作流编排的DAG模板
training-config.yaml - 训练配置模板
validation-checklist.md - 部署前验证清单

Usage Patterns

使用模式

Basic Pipeline Setup

基础流水线搭建

python

undefined

python

undefined

1. Define pipeline stages

stages = [ "data_ingestion", "data_validation", "feature_engineering", "model_training", "model_validation", "model_deployment" ]

2. Configure dependencies

See assets/pipeline-dag.yaml.template for full example

undefined

undefined

Production Workflow

生产级工作流

Data Preparation Phase
- Ingest raw data from sources
- Run data quality checks
- Apply feature transformations
- Version processed datasets
Training Phase
- Load versioned training data
- Execute training jobs
- Track experiments and metrics
- Save trained models
Validation Phase
- Run validation test suite
- Compare against baseline
- Generate performance reports
- Approve for deployment
Deployment Phase
- Package model artifacts
- Deploy to serving infrastructure
- Configure monitoring
- Validate production traffic

数据准备阶段
- 从数据源采集原始数据
- 运行数据质量检查
- 应用特征转换
- 为处理后的数据集打版本
训练阶段
- 加载带版本的训练数据
- 执行训练任务
- 追踪实验与指标
- 保存训练好的模型
验证阶段
- 运行验证测试套件
- 与基线模型对比
- 生成性能报告
- 批准部署
部署阶段
- 打包模型工件
- 部署到服务基础设施
- 配置监控
- 验证生产流量

Best Practices

最佳实践

Pipeline Design

流水线设计

Modularity: Each stage should be independently testable
Idempotency: Re-running stages should be safe
Observability: Log metrics at every stage
Versioning: Track data, code, and model versions
Failure Handling: Implement retry logic and alerting

模块化：每个阶段应可独立测试
幂等性：重新运行阶段应是安全的
可观测性：在每个阶段记录指标
版本控制：追踪数据、代码与模型版本
故障处理：实现重试逻辑与告警

Data Management

数据管理

Use data validation libraries (Great Expectations, TFX)
Version datasets with DVC or similar tools
Document feature engineering transformations
Maintain data lineage tracking

使用数据验证库（Great Expectations、TFX）
用DVC或类似工具为数据集打版本
记录特征工程转换过程
维护数据血缘追踪

Model Operations

模型运维

Separate training and serving infrastructure
Use model registries (MLflow, Weights & Biases)
Implement gradual rollouts for new models
Monitor model performance drift
Maintain rollback capabilities

分离训练与服务基础设施
使用模型注册表（MLflow、Weights & Biases）
为新模型实现渐进式发布
监控模型性能漂移
保留回滚能力

Deployment Strategies

部署策略

Start with shadow deployments
Use canary releases for validation
Implement A/B testing infrastructure
Set up automated rollback triggers
Monitor latency and throughput

从影子部署开始
使用金丝雀发布进行验证
搭建A/B测试基础设施
设置自动化回滚触发器
监控延迟与吞吐量

Integration Points

集成点

Orchestration Tools

编排工具

Apache Airflow: DAG-based workflow orchestration
Dagster: Asset-based pipeline orchestration
Kubeflow Pipelines: Kubernetes-native ML workflows
Prefect: Modern dataflow automation

Apache Airflow：基于DAG的工作流编排
Dagster：基于资产的流水线编排
Kubeflow Pipelines：Kubernetes原生ML工作流
Prefect：现代数据流自动化

Experiment Tracking

实验追踪

MLflow for experiment tracking and model registry
Weights & Biases for visualization and collaboration
TensorBoard for training metrics

MLflow用于实验追踪与模型注册表
Weights & Biases用于可视化与协作
TensorBoard用于训练指标

Deployment Platforms

部署平台

AWS SageMaker for managed ML infrastructure
Google Vertex AI for GCP deployments
Azure ML for Azure cloud
Kubernetes + KServe for cloud-agnostic serving

AWS SageMaker用于托管式ML基础设施
Google Vertex AI用于GCP部署
Azure ML用于Azure云
Kubernetes + KServe用于云无关的服务

Progressive Disclosure

渐进式扩展

Start with the basics and gradually add complexity:

Level 1: Simple linear pipeline (data → train → deploy)
Level 2: Add validation and monitoring stages
Level 3: Implement hyperparameter tuning
Level 4: Add A/B testing and gradual rollouts
Level 5: Multi-model pipelines with ensemble strategies

从基础开始，逐步增加复杂度：

Level 1：简单线性流水线（数据→训练→部署）
Level 2：添加验证与监控阶段
Level 3：实现超参数调优
Level 4：添加A/B测试与渐进式发布
Level 5：多模型流水线与集成策略

Common Patterns

常见模式

Batch Training Pipeline

批量训练流水线

yaml

undefined

yaml

undefined

See assets/pipeline-dag.yaml.template

stages:

name: data_preparation dependencies: []
name: model_training dependencies: [data_preparation]
name: model_evaluation dependencies: [model_training]
name: model_deployment dependencies: [model_evaluation]

undefined

stages:

name: data_preparation dependencies: []
name: model_training dependencies: [data_preparation]
name: model_evaluation dependencies: [model_training]
name: model_deployment dependencies: [model_evaluation]

undefined

Real-time Feature Pipeline

实时特征流水线

python

undefined

python

undefined

Stream processing for real-time features

Combined with batch training

See references/data-preparation.md

undefined

undefined

Continuous Training

持续训练

python

undefined

python

undefined

Automated retraining on schedule

Triggered by data drift detection

See references/model-training.md

undefined

undefined

Troubleshooting

故障排查

Common Issues

常见问题

Pipeline failures: Check dependencies and data availability
Training instability: Review hyperparameters and data quality
Deployment issues: Validate model artifacts and serving config
Performance degradation: Monitor data drift and model metrics

流水线失败：检查依赖与数据可用性
训练不稳定：查看超参数与数据质量
部署问题：验证模型工件与服务配置
性能下降：监控数据漂移与模型指标

Debugging Steps

调试步骤

Check pipeline logs for each stage
Validate input/output data at boundaries
Test components in isolation
Review experiment tracking metrics
Inspect model artifacts and metadata

检查每个阶段的流水线日志
验证边界处的输入/输出数据
独立测试组件
查看实验追踪指标
检查模型工件与元数据

Next Steps

后续步骤

After setting up your pipeline:

Explore hyperparameter-tuning skill for optimization
Learn experiment-tracking-setup for MLflow/W&B
Review model-deployment-patterns for serving strategies
Implement monitoring with observability tools

搭建完流水线后：

探索hyperparameter-tuning技能进行优化
学习experiment-tracking-setup以集成MLflow/W&B
查看model-deployment-patterns了解服务策略
用可观测性工具实现监控

ml-pipeline-workflow

Original

Translation

ML Pipeline Workflow

ML流水线工作流

Overview

概述

When to Use This Skill

何时使用该技能

What This Skill Provides

本技能提供的内容

Core Capabilities

核心能力

Reference Documentation

参考文档

Assets and Templates

资产与模板

Usage Patterns

使用模式

Basic Pipeline Setup

基础流水线搭建

1. Define pipeline stages

1. Define pipeline stages

2. Configure dependencies

2. Configure dependencies

See assets/pipeline-dag.yaml.template for full example

See assets/pipeline-dag.yaml.template for full example

Production Workflow

生产级工作流

Best Practices

最佳实践

Pipeline Design

流水线设计

Data Management

数据管理

Model Operations

模型运维

Deployment Strategies

部署策略

Integration Points

集成点

Orchestration Tools

编排工具

Experiment Tracking

实验追踪

Deployment Platforms

部署平台

Progressive Disclosure

渐进式扩展

Common Patterns

常见模式

Batch Training Pipeline

批量训练流水线

See assets/pipeline-dag.yaml.template

See assets/pipeline-dag.yaml.template

Real-time Feature Pipeline

实时特征流水线

Stream processing for real-time features

Stream processing for real-time features

Combined with batch training

Combined with batch training

See references/data-preparation.md

See references/data-preparation.md

Continuous Training

持续训练

Automated retraining on schedule

Automated retraining on schedule

Triggered by data drift detection

Triggered by data drift detection

See references/model-training.md

See references/model-training.md

Troubleshooting

故障排查

Common Issues

常见问题

Debugging Steps

调试步骤

Next Steps

后续步骤

Related Skills