azure-ai-ml-py

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Azure Machine Learning SDK v2 for Python

Client library for managing Azure ML resources: workspaces, jobs, models, data, and compute.

用于管理Azure ML资源的客户端库：工作区、作业、模型、数据和计算资源。

Installation

安装

bash

pip install azure-ai-ml

bash

pip install azure-ai-ml

Environment Variables

环境变量

bash

AZURE_SUBSCRIPTION_ID=<your-subscription-id>
AZURE_RESOURCE_GROUP=<your-resource-group>
AZURE_ML_WORKSPACE_NAME=<your-workspace-name>

bash

AZURE_SUBSCRIPTION_ID=<your-subscription-id>
AZURE_RESOURCE_GROUP=<your-resource-group>
AZURE_ML_WORKSPACE_NAME=<your-workspace-name>

Authentication

身份验证

python

from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

ml_client = MLClient(
    credential=DefaultAzureCredential(),
    subscription_id=os.environ["AZURE_SUBSCRIPTION_ID"],
    resource_group_name=os.environ["AZURE_RESOURCE_GROUP"],
    workspace_name=os.environ["AZURE_ML_WORKSPACE_NAME"]
)

python

from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

ml_client = MLClient(
    credential=DefaultAzureCredential(),
    subscription_id=os.environ["AZURE_SUBSCRIPTION_ID"],
    resource_group_name=os.environ["AZURE_RESOURCE_GROUP"],
    workspace_name=os.environ["AZURE_ML_WORKSPACE_NAME"]
)

From Config File

从配置文件加载

python

from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

python

from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

Uses config.json in current directory or parent

ml_client = MLClient.from_config( credential=DefaultAzureCredential() )

undefined

ml_client = MLClient.from_config( credential=DefaultAzureCredential() )

undefined

Workspace Management

工作区管理

Create Workspace

创建工作区

python

from azure.ai.ml.entities import Workspace

ws = Workspace(
    name="my-workspace",
    location="eastus",
    display_name="My Workspace",
    description="ML workspace for experiments",
    tags={"purpose": "demo"}
)

ml_client.workspaces.begin_create(ws).result()

python

from azure.ai.ml.entities import Workspace

ws = Workspace(
    name="my-workspace",
    location="eastus",
    display_name="My Workspace",
    description="ML workspace for experiments",
    tags={"purpose": "demo"}
)

ml_client.workspaces.begin_create(ws).result()

List Workspaces

列出工作区

python

for ws in ml_client.workspaces.list():
    print(f"{ws.name}: {ws.location}")

python

for ws in ml_client.workspaces.list():
    print(f"{ws.name}: {ws.location}")

Data Assets

数据资产

Register Data

注册数据

python

from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes

python

from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes

Register a file

my_data = Data( name="my-dataset", version="1", path="azureml://datastores/workspaceblobstore/paths/data/train.csv", type=AssetTypes.URI_FILE, description="Training data" )

ml_client.data.create_or_update(my_data)

undefined

my_data = Data( name="my-dataset", version="1", path="azureml://datastores/workspaceblobstore/paths/data/train.csv", type=AssetTypes.URI_FILE, description="Training data" )

ml_client.data.create_or_update(my_data)

undefined

Register Folder

注册文件夹

python

my_data = Data(
    name="my-folder-dataset",
    version="1",
    path="azureml://datastores/workspaceblobstore/paths/data/",
    type=AssetTypes.URI_FOLDER
)

ml_client.data.create_or_update(my_data)

python

my_data = Data(
    name="my-folder-dataset",
    version="1",
    path="azureml://datastores/workspaceblobstore/paths/data/",
    type=AssetTypes.URI_FOLDER
)

ml_client.data.create_or_update(my_data)

Model Registry

模型注册

Register Model

注册模型

python

from azure.ai.ml.entities import Model
from azure.ai.ml.constants import AssetTypes

model = Model(
    name="my-model",
    version="1",
    path="./model/",
    type=AssetTypes.CUSTOM_MODEL,
    description="My trained model"
)

ml_client.models.create_or_update(model)

python

from azure.ai.ml.entities import Model
from azure.ai.ml.constants import AssetTypes

model = Model(
    name="my-model",
    version="1",
    path="./model/",
    type=AssetTypes.CUSTOM_MODEL,
    description="My trained model"
)

ml_client.models.create_or_update(model)

List Models

列出模型

python

for model in ml_client.models.list(name="my-model"):
    print(f"{model.name} v{model.version}")

python

for model in ml_client.models.list(name="my-model"):
    print(f"{model.name} v{model.version}")

Compute

计算资源

Create Compute Cluster

创建计算集群

python

from azure.ai.ml.entities import AmlCompute

cluster = AmlCompute(
    name="cpu-cluster",
    type="amlcompute",
    size="Standard_DS3_v2",
    min_instances=0,
    max_instances=4,
    idle_time_before_scale_down=120
)

ml_client.compute.begin_create_or_update(cluster).result()

python

from azure.ai.ml.entities import AmlCompute

cluster = AmlCompute(
    name="cpu-cluster",
    type="amlcompute",
    size="Standard_DS3_v2",
    min_instances=0,
    max_instances=4,
    idle_time_before_scale_down=120
)

ml_client.compute.begin_create_or_update(cluster).result()

List Compute

列出计算资源

python

for compute in ml_client.compute.list():
    print(f"{compute.name}: {compute.type}")

python

for compute in ml_client.compute.list():
    print(f"{compute.name}: {compute.type}")

Jobs

作业

Command Job

命令作业

python

from azure.ai.ml import command, Input

job = command(
    code="./src",
    command="python train.py --data ${{inputs.data}} --lr ${{inputs.learning_rate}}",
    inputs={
        "data": Input(type="uri_folder", path="azureml:my-dataset:1"),
        "learning_rate": 0.01
    },
    environment="AzureML-sklearn-1.0-ubuntu20.04-py38-cpu@latest",
    compute="cpu-cluster",
    display_name="training-job"
)

returned_job = ml_client.jobs.create_or_update(job)
print(f"Job URL: {returned_job.studio_url}")

python

from azure.ai.ml import command, Input

job = command(
    code="./src",
    command="python train.py --data ${{inputs.data}} --lr ${{inputs.learning_rate}}",
    inputs={
        "data": Input(type="uri_folder", path="azureml:my-dataset:1"),
        "learning_rate": 0.01
    },
    environment="AzureML-sklearn-1.0-ubuntu20.04-py38-cpu@latest",
    compute="cpu-cluster",
    display_name="training-job"
)

returned_job = ml_client.jobs.create_or_update(job)
print(f"Job URL: {returned_job.studio_url}")

Monitor Job

监控作业

python

ml_client.jobs.stream(returned_job.name)

python

ml_client.jobs.stream(returned_job.name)

Pipelines

管道

python

from azure.ai.ml import dsl, Input, Output
from azure.ai.ml.entities import Pipeline

@dsl.pipeline(
    compute="cpu-cluster",
    description="Training pipeline"
)
def training_pipeline(data_input):
    prep_step = prep_component(data=data_input)
    train_step = train_component(
        data=prep_step.outputs.output_data,
        learning_rate=0.01
    )
    return {"model": train_step.outputs.model}

pipeline = training_pipeline(
    data_input=Input(type="uri_folder", path="azureml:my-dataset:1")
)

pipeline_job = ml_client.jobs.create_or_update(pipeline)

python

from azure.ai.ml import dsl, Input, Output
from azure.ai.ml.entities import Pipeline

@dsl.pipeline(
    compute="cpu-cluster",
    description="Training pipeline"
)
def training_pipeline(data_input):
    prep_step = prep_component(data=data_input)
    train_step = train_component(
        data=prep_step.outputs.output_data,
        learning_rate=0.01
    )
    return {"model": train_step.outputs.model}

pipeline = training_pipeline(
    data_input=Input(type="uri_folder", path="azureml:my-dataset:1")
)

pipeline_job = ml_client.jobs.create_or_update(pipeline)

Environments

环境

Create Custom Environment

创建自定义环境

python

from azure.ai.ml.entities import Environment

env = Environment(
    name="my-env",
    version="1",
    image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04",
    conda_file="./environment.yml"
)

ml_client.environments.create_or_update(env)

python

from azure.ai.ml.entities import Environment

env = Environment(
    name="my-env",
    version="1",
    image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04",
    conda_file="./environment.yml"
)

ml_client.environments.create_or_update(env)

Datastores

数据存储

List Datastores

列出数据存储

python

for ds in ml_client.datastores.list():
    print(f"{ds.name}: {ds.type}")

python

for ds in ml_client.datastores.list():
    print(f"{ds.name}: {ds.type}")

Get Default Datastore

获取默认数据存储

python

default_ds = ml_client.datastores.get_default()
print(f"Default: {default_ds.name}")

python

default_ds = ml_client.datastores.get_default()
print(f"Default: {default_ds.name}")

MLClient Operations

MLClient 操作

Property	Operations
`workspaces`	create, get, list, delete
`jobs`	create_or_update, get, list, stream, cancel
`models`	create_or_update, get, list, archive
`data`	create_or_update, get, list
`compute`	begin_create_or_update, get, list, delete
`environments`	create_or_update, get, list
`datastores`	create_or_update, get, list, get_default
`components`	create_or_update, get, list

属性	操作
`workspaces`	create, get, list, delete
`jobs`	create_or_update, get, list, stream, cancel
`models`	create_or_update, get, list, archive
`data`	create_or_update, get, list
`compute`	begin_create_or_update, get, list, delete
`environments`	create_or_update, get, list
`datastores`	create_or_update, get, list, get_default
`components`	create_or_update, get, list

Best Practices

最佳实践

Use versioning for data, models, and environments
Configure idle scale-down to reduce compute costs
Use environments for reproducible training
Stream job logs to monitor progress
Register models after successful training jobs
Use pipelines for multi-step workflows
Tag resources for organization and cost tracking

为数据、模型和环境使用版本控制
配置空闲自动缩容以降低计算成本
使用环境实现可复现的训练
流式传输作业日志以监控进度
训练作业成功后注册模型
使用管道处理多步骤工作流
为资源添加标签以便于管理和成本跟踪