azure-ai-ml-py
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAzure Machine Learning SDK v2 for Python
Azure Machine Learning SDK v2 for Python
Client library for managing Azure ML resources: workspaces, jobs, models, data, and compute.
用于管理Azure ML资源的客户端库:工作区、作业、模型、数据和计算资源。
Installation
安装
bash
pip install azure-ai-mlbash
pip install azure-ai-mlEnvironment Variables
环境变量
bash
AZURE_SUBSCRIPTION_ID=<your-subscription-id>
AZURE_RESOURCE_GROUP=<your-resource-group>
AZURE_ML_WORKSPACE_NAME=<your-workspace-name>bash
AZURE_SUBSCRIPTION_ID=<your-subscription-id>
AZURE_RESOURCE_GROUP=<your-resource-group>
AZURE_ML_WORKSPACE_NAME=<your-workspace-name>Authentication
身份验证
python
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
ml_client = MLClient(
credential=DefaultAzureCredential(),
subscription_id=os.environ["AZURE_SUBSCRIPTION_ID"],
resource_group_name=os.environ["AZURE_RESOURCE_GROUP"],
workspace_name=os.environ["AZURE_ML_WORKSPACE_NAME"]
)python
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
ml_client = MLClient(
credential=DefaultAzureCredential(),
subscription_id=os.environ["AZURE_SUBSCRIPTION_ID"],
resource_group_name=os.environ["AZURE_RESOURCE_GROUP"],
workspace_name=os.environ["AZURE_ML_WORKSPACE_NAME"]
)From Config File
从配置文件加载
python
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredentialpython
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredentialUses config.json in current directory or parent
Uses config.json in current directory or parent
ml_client = MLClient.from_config(
credential=DefaultAzureCredential()
)
undefinedml_client = MLClient.from_config(
credential=DefaultAzureCredential()
)
undefinedWorkspace Management
工作区管理
Create Workspace
创建工作区
python
from azure.ai.ml.entities import Workspace
ws = Workspace(
name="my-workspace",
location="eastus",
display_name="My Workspace",
description="ML workspace for experiments",
tags={"purpose": "demo"}
)
ml_client.workspaces.begin_create(ws).result()python
from azure.ai.ml.entities import Workspace
ws = Workspace(
name="my-workspace",
location="eastus",
display_name="My Workspace",
description="ML workspace for experiments",
tags={"purpose": "demo"}
)
ml_client.workspaces.begin_create(ws).result()List Workspaces
列出工作区
python
for ws in ml_client.workspaces.list():
print(f"{ws.name}: {ws.location}")python
for ws in ml_client.workspaces.list():
print(f"{ws.name}: {ws.location}")Data Assets
数据资产
Register Data
注册数据
python
from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypespython
from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypesRegister a file
Register a file
my_data = Data(
name="my-dataset",
version="1",
path="azureml://datastores/workspaceblobstore/paths/data/train.csv",
type=AssetTypes.URI_FILE,
description="Training data"
)
ml_client.data.create_or_update(my_data)
undefinedmy_data = Data(
name="my-dataset",
version="1",
path="azureml://datastores/workspaceblobstore/paths/data/train.csv",
type=AssetTypes.URI_FILE,
description="Training data"
)
ml_client.data.create_or_update(my_data)
undefinedRegister Folder
注册文件夹
python
my_data = Data(
name="my-folder-dataset",
version="1",
path="azureml://datastores/workspaceblobstore/paths/data/",
type=AssetTypes.URI_FOLDER
)
ml_client.data.create_or_update(my_data)python
my_data = Data(
name="my-folder-dataset",
version="1",
path="azureml://datastores/workspaceblobstore/paths/data/",
type=AssetTypes.URI_FOLDER
)
ml_client.data.create_or_update(my_data)Model Registry
模型注册
Register Model
注册模型
python
from azure.ai.ml.entities import Model
from azure.ai.ml.constants import AssetTypes
model = Model(
name="my-model",
version="1",
path="./model/",
type=AssetTypes.CUSTOM_MODEL,
description="My trained model"
)
ml_client.models.create_or_update(model)python
from azure.ai.ml.entities import Model
from azure.ai.ml.constants import AssetTypes
model = Model(
name="my-model",
version="1",
path="./model/",
type=AssetTypes.CUSTOM_MODEL,
description="My trained model"
)
ml_client.models.create_or_update(model)List Models
列出模型
python
for model in ml_client.models.list(name="my-model"):
print(f"{model.name} v{model.version}")python
for model in ml_client.models.list(name="my-model"):
print(f"{model.name} v{model.version}")Compute
计算资源
Create Compute Cluster
创建计算集群
python
from azure.ai.ml.entities import AmlCompute
cluster = AmlCompute(
name="cpu-cluster",
type="amlcompute",
size="Standard_DS3_v2",
min_instances=0,
max_instances=4,
idle_time_before_scale_down=120
)
ml_client.compute.begin_create_or_update(cluster).result()python
from azure.ai.ml.entities import AmlCompute
cluster = AmlCompute(
name="cpu-cluster",
type="amlcompute",
size="Standard_DS3_v2",
min_instances=0,
max_instances=4,
idle_time_before_scale_down=120
)
ml_client.compute.begin_create_or_update(cluster).result()List Compute
列出计算资源
python
for compute in ml_client.compute.list():
print(f"{compute.name}: {compute.type}")python
for compute in ml_client.compute.list():
print(f"{compute.name}: {compute.type}")Jobs
作业
Command Job
命令作业
python
from azure.ai.ml import command, Input
job = command(
code="./src",
command="python train.py --data ${{inputs.data}} --lr ${{inputs.learning_rate}}",
inputs={
"data": Input(type="uri_folder", path="azureml:my-dataset:1"),
"learning_rate": 0.01
},
environment="AzureML-sklearn-1.0-ubuntu20.04-py38-cpu@latest",
compute="cpu-cluster",
display_name="training-job"
)
returned_job = ml_client.jobs.create_or_update(job)
print(f"Job URL: {returned_job.studio_url}")python
from azure.ai.ml import command, Input
job = command(
code="./src",
command="python train.py --data ${{inputs.data}} --lr ${{inputs.learning_rate}}",
inputs={
"data": Input(type="uri_folder", path="azureml:my-dataset:1"),
"learning_rate": 0.01
},
environment="AzureML-sklearn-1.0-ubuntu20.04-py38-cpu@latest",
compute="cpu-cluster",
display_name="training-job"
)
returned_job = ml_client.jobs.create_or_update(job)
print(f"Job URL: {returned_job.studio_url}")Monitor Job
监控作业
python
ml_client.jobs.stream(returned_job.name)python
ml_client.jobs.stream(returned_job.name)Pipelines
管道
python
from azure.ai.ml import dsl, Input, Output
from azure.ai.ml.entities import Pipeline
@dsl.pipeline(
compute="cpu-cluster",
description="Training pipeline"
)
def training_pipeline(data_input):
prep_step = prep_component(data=data_input)
train_step = train_component(
data=prep_step.outputs.output_data,
learning_rate=0.01
)
return {"model": train_step.outputs.model}
pipeline = training_pipeline(
data_input=Input(type="uri_folder", path="azureml:my-dataset:1")
)
pipeline_job = ml_client.jobs.create_or_update(pipeline)python
from azure.ai.ml import dsl, Input, Output
from azure.ai.ml.entities import Pipeline
@dsl.pipeline(
compute="cpu-cluster",
description="Training pipeline"
)
def training_pipeline(data_input):
prep_step = prep_component(data=data_input)
train_step = train_component(
data=prep_step.outputs.output_data,
learning_rate=0.01
)
return {"model": train_step.outputs.model}
pipeline = training_pipeline(
data_input=Input(type="uri_folder", path="azureml:my-dataset:1")
)
pipeline_job = ml_client.jobs.create_or_update(pipeline)Environments
环境
Create Custom Environment
创建自定义环境
python
from azure.ai.ml.entities import Environment
env = Environment(
name="my-env",
version="1",
image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04",
conda_file="./environment.yml"
)
ml_client.environments.create_or_update(env)python
from azure.ai.ml.entities import Environment
env = Environment(
name="my-env",
version="1",
image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04",
conda_file="./environment.yml"
)
ml_client.environments.create_or_update(env)Datastores
数据存储
List Datastores
列出数据存储
python
for ds in ml_client.datastores.list():
print(f"{ds.name}: {ds.type}")python
for ds in ml_client.datastores.list():
print(f"{ds.name}: {ds.type}")Get Default Datastore
获取默认数据存储
python
default_ds = ml_client.datastores.get_default()
print(f"Default: {default_ds.name}")python
default_ds = ml_client.datastores.get_default()
print(f"Default: {default_ds.name}")MLClient Operations
MLClient 操作
| Property | Operations |
|---|---|
| create, get, list, delete |
| create_or_update, get, list, stream, cancel |
| create_or_update, get, list, archive |
| create_or_update, get, list |
| begin_create_or_update, get, list, delete |
| create_or_update, get, list |
| create_or_update, get, list, get_default |
| create_or_update, get, list |
| 属性 | 操作 |
|---|---|
| create, get, list, delete |
| create_or_update, get, list, stream, cancel |
| create_or_update, get, list, archive |
| create_or_update, get, list |
| begin_create_or_update, get, list, delete |
| create_or_update, get, list |
| create_or_update, get, list, get_default |
| create_or_update, get, list |
Best Practices
最佳实践
- Use versioning for data, models, and environments
- Configure idle scale-down to reduce compute costs
- Use environments for reproducible training
- Stream job logs to monitor progress
- Register models after successful training jobs
- Use pipelines for multi-step workflows
- Tag resources for organization and cost tracking
- 为数据、模型和环境使用版本控制
- 配置空闲自动缩容以降低计算成本
- 使用环境实现可复现的训练
- 流式传输作业日志以监控进度
- 训练作业成功后注册模型
- 使用管道处理多步骤工作流
- 为资源添加标签以便于管理和成本跟踪