container-apps-gpu-2025

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Azure Container Apps GPU Support - 2025 Features

Azure Container Apps GPU支持 - 2025年新特性

Complete knowledge base for Azure Container Apps with GPU support, serverless capabilities, and Dapr integration (2025 GA features).
这是关于支持GPU、具备无服务器能力和Dapr集成的Azure Container Apps的完整知识库(2025年正式发布特性)。

Overview

概述

Azure Container Apps is a serverless container platform with native GPU support, Dapr integration, and scale-to-zero capabilities for cost-efficient AI/ML workloads.
Azure Container Apps是一款原生支持GPU、Dapr集成和缩容至零能力的无服务器容器平台,可为AI/ML工作负载提供高成本效益的运行环境。

Key 2025 Features (Build Announcements)

2025年核心特性(Build大会发布内容)

1. Serverless GPU (GA)

1. 无服务器GPU(正式发布)

  • Automatic scaling: Scale GPU workloads based on demand
  • Scale-to-zero: Pay only when GPU is actively used
  • Per-second billing: Granular cost control
  • Optimized cold start: Fast initialization for AI models
  • Reduced operational overhead: No infrastructure management
  • 自动扩缩容:根据需求自动调整GPU工作负载规模
  • 缩容至零:仅在GPU活跃使用时付费
  • 按秒计费:精细化成本控制
  • 优化冷启动:AI模型快速初始化
  • 降低运维开销:无需管理基础设施

2. Dedicated GPU (GA)

2. 专用GPU(正式发布)

  • Consistent performance: Dedicated GPU resources
  • Simplified AI deployment: Easy model hosting
  • Long-running workloads: Ideal for training and continuous inference
  • Multiple GPU types: NVIDIA A100, T4, and more
  • 性能稳定:独享GPU资源
  • AI部署简化:模型托管流程便捷
  • 长时工作负载适配:适用于模型训练和持续推理场景
  • 多GPU类型支持:NVIDIA A100、T4等多种型号

3. Dynamic Sessions with GPU (Early Access)

3. GPU动态会话(预览版)

  • Sandboxed execution: Run untrusted AI-generated code
  • Hyper-V isolation: Enhanced security
  • GPU-powered Python interpreter: Handle compute-intensive AI workloads
  • Scale at runtime: Dynamic resource allocation
  • 沙箱执行:运行不可信的AI生成代码
  • Hyper-V隔离:增强安全性
  • GPU加速Python解释器:处理计算密集型AI工作负载
  • 运行时扩缩容:动态分配资源

4. Foundry Models Integration

4. Foundry模型集成

  • Deploy AI models directly: During container app creation
  • Ready-to-use models: Pre-configured inference endpoints
  • Azure AI Foundry: Seamless integration
  • 直接部署AI模型:在容器应用创建阶段即可完成部署
  • 预配置模型:提供现成的推理端点
  • Azure AI Foundry无缝集成:实现生态联动

5. Workflow with Durable Task Scheduler (Preview)

5. 基于Durable Task Scheduler的工作流(预览版)

  • Long-running workflows: Reliable orchestration
  • State management: Automatic persistence
  • Event-driven: Trigger workflows from events
  • 长时工作流支持:可靠的编排能力
  • 状态管理:自动持久化状态
  • 事件驱动:通过事件触发工作流

6. Native Azure Functions Support

6. 原生Azure Functions支持

  • Functions runtime: Run Azure Functions in Container Apps
  • Consistent development: Same code, serverless execution
  • Event triggers: All Functions triggers supported
  • Functions运行时:在Container Apps中运行Azure Functions
  • 开发一致性:代码复用,无服务器执行
  • 全事件触发器支持:兼容所有Functions触发器

7. Dapr Integration (GA)

7. Dapr集成(正式发布)

  • Service discovery: Built-in DNS-based discovery
  • State management: Distributed state stores
  • Pub/sub messaging: Reliable messaging patterns
  • Service invocation: Resilient service-to-service calls
  • Observability: Integrated tracing and metrics
  • 服务发现:内置基于DNS的发现机制
  • 状态管理:分布式状态存储
  • 发布/订阅消息:可靠的消息传递模式
  • 服务调用:弹性服务间调用
  • 可观测性:集成链路追踪和指标监控

Creating Container Apps with GPU

创建支持GPU的容器应用

Basic Container App with Serverless GPU

基础无服务器GPU容器应用

bash
undefined
bash
undefined

Create Container Apps environment

创建Container Apps环境

az containerapp env create
--name myenv
--resource-group MyRG
--location eastus
--logs-workspace-id <workspace-id>
--logs-workspace-key <workspace-key>
az containerapp env create
--name myenv
--resource-group MyRG
--location eastus
--logs-workspace-id <workspace-id>
--logs-workspace-key <workspace-key>

Create Container App with GPU

创建带GPU的容器应用

az containerapp create
--name myapp-gpu
--resource-group MyRG
--environment myenv
--image myregistry.azurecr.io/ai-model:latest
--cpu 4
--memory 8Gi
--gpu-type nvidia-a100
--gpu-count 1
--min-replicas 0
--max-replicas 10
--ingress external
--target-port 8080
undefined
az containerapp create
--name myapp-gpu
--resource-group MyRG
--environment myenv
--image myregistry.azurecr.io/ai-model:latest
--cpu 4
--memory 8Gi
--gpu-type nvidia-a100
--gpu-count 1
--min-replicas 0
--max-replicas 10
--ingress external
--target-port 8080
undefined

Production-Ready Container App with GPU

生产级GPU容器应用

bash
az containerapp create \
  --name myapp-gpu-prod \
  --resource-group MyRG \
  --environment myenv \
  \
  # Container configuration
  --image myregistry.azurecr.io/ai-model:latest \
  --registry-server myregistry.azurecr.io \
  --registry-identity system \
  \
  # Resources
  --cpu 4 \
  --memory 8Gi \
  --gpu-type nvidia-a100 \
  --gpu-count 1 \
  \
  # Scaling
  --min-replicas 0 \
  --max-replicas 20 \
  --scale-rule-name http-scaling \
  --scale-rule-type http \
  --scale-rule-http-concurrency 10 \
  \
  # Networking
  --ingress external \
  --target-port 8080 \
  --transport http2 \
  --exposed-port 8080 \
  \
  # Security
  --registry-identity system \
  --env-vars "AZURE_CLIENT_ID=secretref:client-id" \
  \
  # Monitoring
  --dapr-app-id myapp \
  --dapr-app-port 8080 \
  --dapr-app-protocol http \
  --enable-dapr \
  \
  # Identity
  --system-assigned
bash
az containerapp create \
  --name myapp-gpu-prod \
  --resource-group MyRG \
  --environment myenv \
  \
  # 容器配置
  --image myregistry.azurecr.io/ai-model:latest \
  --registry-server myregistry.azurecr.io \
  --registry-identity system \
  \
  # 资源配置
  --cpu 4 \
  --memory 8Gi \
  --gpu-type nvidia-a100 \
  --gpu-count 1 \
  \
  # 扩缩容配置
  --min-replicas 0 \
  --max-replicas 20 \
  --scale-rule-name http-scaling \
  --scale-rule-type http \
  --scale-rule-http-concurrency 10 \
  \
  # 网络配置
  --ingress external \
  --target-port 8080 \
  --transport http2 \
  --exposed-port 8080 \
  \
  # 安全配置
  --registry-identity system \
  --env-vars "AZURE_CLIENT_ID=secretref:client-id" \
  \
  # 监控配置
  --dapr-app-id myapp \
  --dapr-app-port 8080 \
  --dapr-app-protocol http \
  --enable-dapr \
  \
  # 身份配置
  --system-assigned

Container Apps Environment Configuration

容器应用环境配置

Environment with Zone Redundancy

支持区域冗余的环境

bash
az containerapp env create \
  --name myenv-prod \
  --resource-group MyRG \
  --location eastus \
  --logs-workspace-id <workspace-id> \
  --logs-workspace-key <workspace-key> \
  --zone-redundant true \
  --enable-workload-profiles true
bash
az containerapp env create \
  --name myenv-prod \
  --resource-group MyRG \
  --location eastus \
  --logs-workspace-id <workspace-id> \
  --logs-workspace-key <workspace-key> \
  --zone-redundant true \
  --enable-workload-profiles true

Workload Profiles (Dedicated GPU)

工作负载配置文件(专用GPU)

bash
undefined
bash
undefined

Create environment with workload profiles

创建带工作负载配置文件的环境

az containerapp env create
--name myenv-gpu
--resource-group MyRG
--location eastus
--enable-workload-profiles true
az containerapp env create
--name myenv-gpu
--resource-group MyRG
--location eastus
--enable-workload-profiles true

Add GPU workload profile

添加GPU工作负载配置文件

az containerapp env workload-profile add
--name myenv-gpu
--resource-group MyRG
--workload-profile-name gpu-profile
--workload-profile-type GPU-A100
--min-nodes 0
--max-nodes 10
az containerapp env workload-profile add
--name myenv-gpu
--resource-group MyRG
--workload-profile-name gpu-profile
--workload-profile-type GPU-A100
--min-nodes 0
--max-nodes 10

Create container app with GPU profile

创建使用GPU配置文件的容器应用

az containerapp create
--name myapp-dedicated-gpu
--resource-group MyRG
--environment myenv-gpu
--workload-profile-name gpu-profile
--image myregistry.azurecr.io/training-job:latest
--cpu 8
--memory 16Gi
--min-replicas 1
--max-replicas 5
undefined
az containerapp create
--name myapp-dedicated-gpu
--resource-group MyRG
--environment myenv-gpu
--workload-profile-name gpu-profile
--image myregistry.azurecr.io/training-job:latest
--cpu 8
--memory 16Gi
--min-replicas 1
--max-replicas 5
undefined

GPU Scaling Rules

GPU扩缩容规则

Custom Prometheus Scaling

自定义Prometheus扩缩容

bash
az containerapp create \
  --name myapp-gpu-prometheus \
  --resource-group MyRG \
  --environment myenv \
  --image myregistry.azurecr.io/ai-model:latest \
  --cpu 4 \
  --memory 8Gi \
  --gpu-type nvidia-a100 \
  --gpu-count 1 \
  --min-replicas 0 \
  --max-replicas 10 \
  --scale-rule-name gpu-utilization \
  --scale-rule-type custom \
  --scale-rule-custom-type prometheus \
  --scale-rule-metadata \
    serverAddress=http://prometheus.monitoring.svc.cluster.local:9090 \
    metricName=gpu_utilization \
    threshold=80 \
    query="avg(nvidia_gpu_utilization{app='myapp'})"
bash
az containerapp create \
  --name myapp-gpu-prometheus \
  --resource-group MyRG \
  --environment myenv \
  --image myregistry.azurecr.io/ai-model:latest \
  --cpu 4 \
  --memory 8Gi \
  --gpu-type nvidia-a100 \
  --gpu-count 1 \
  --min-replicas 0 \
  --max-replicas 10 \
  --scale-rule-name gpu-utilization \
  --scale-rule-type custom \
  --scale-rule-custom-type prometheus \
  --scale-rule-metadata \
    serverAddress=http://prometheus.monitoring.svc.cluster.local:9090 \
    metricName=gpu_utilization \
    threshold=80 \
    query="avg(nvidia_gpu_utilization{app='myapp'})"

Queue-Based Scaling (Azure Service Bus)

基于队列的扩缩容(Azure Service Bus)

bash
az containerapp create \
  --name myapp-queue-processor \
  --resource-group MyRG \
  --environment myenv \
  --image myregistry.azurecr.io/batch-processor:latest \
  --cpu 4 \
  --memory 8Gi \
  --gpu-type nvidia-t4 \
  --gpu-count 1 \
  --min-replicas 0 \
  --max-replicas 50 \
  --scale-rule-name queue-scaling \
  --scale-rule-type azure-servicebus \
  --scale-rule-metadata \
    queueName=ai-jobs \
    namespace=myservicebus \
    messageCount=5 \
  --scale-rule-auth connection=servicebus-connection
bash
az containerapp create \
  --name myapp-queue-processor \
  --resource-group MyRG \
  --environment myenv \
  --image myregistry.azurecr.io/batch-processor:latest \
  --cpu 4 \
  --memory 8Gi \
  --gpu-type nvidia-t4 \
  --gpu-count 1 \
  --min-replicas 0 \
  --max-replicas 50 \
  --scale-rule-name queue-scaling \
  --scale-rule-type azure-servicebus \
  --scale-rule-metadata \
    queueName=ai-jobs \
    namespace=myservicebus \
    messageCount=5 \
  --scale-rule-auth connection=servicebus-connection

Dapr Integration

Dapr集成

Enable Dapr on Container App

在容器应用中启用Dapr

bash
az containerapp create \
  --name myapp-dapr \
  --resource-group MyRG \
  --environment myenv \
  --image myregistry.azurecr.io/myapp:latest \
  --enable-dapr \
  --dapr-app-id myapp \
  --dapr-app-port 8080 \
  --dapr-app-protocol http \
  --dapr-http-max-request-size 4 \
  --dapr-http-read-buffer-size 4 \
  --dapr-log-level info \
  --dapr-enable-api-logging true
bash
az containerapp create \
  --name myapp-dapr \
  --resource-group MyRG \
  --environment myenv \
  --image myregistry.azurecr.io/myapp:latest \
  --enable-dapr \
  --dapr-app-id myapp \
  --dapr-app-port 8080 \
  --dapr-app-protocol http \
  --dapr-http-max-request-size 4 \
  --dapr-http-read-buffer-size 4 \
  --dapr-log-level info \
  --dapr-enable-api-logging true

Dapr State Store (Azure Cosmos DB)

Dapr状态存储(Azure Cosmos DB)

yaml
undefined
yaml
undefined

Create Dapr component for state store

创建用于状态存储的Dapr组件

apiVersion: dapr.io/v1alpha1 kind: Component metadata: name: statestore spec: type: state.azure.cosmosdb version: v1 metadata: - name: url value: "https://mycosmosdb.documents.azure.com:443/" - name: masterKey secretRef: cosmosdb-key - name: database value: "mydb" - name: collection value: "state"

```bash
apiVersion: dapr.io/v1alpha1 kind: Component metadata: name: statestore spec: type: state.azure.cosmosdb version: v1 metadata: - name: url value: "https://mycosmosdb.documents.azure.com:443/" - name: masterKey secretRef: cosmosdb-key - name: database value: "mydb" - name: collection value: "state"

```bash

Create the component

创建组件

az containerapp env dapr-component set
--name myenv
--resource-group MyRG
--dapr-component-name statestore
--yaml component.yaml
undefined
az containerapp env dapr-component set
--name myenv
--resource-group MyRG
--dapr-component-name statestore
--yaml component.yaml
undefined

Dapr Pub/Sub (Azure Service Bus)

Dapr发布/订阅(Azure Service Bus)

yaml
apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
  name: pubsub
spec:
  type: pubsub.azure.servicebus.topics
  version: v1
  metadata:
    - name: connectionString
      secretRef: servicebus-connection
    - name: consumerID
      value: "myapp"
yaml
apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
  name: pubsub
spec:
  type: pubsub.azure.servicebus.topics
  version: v1
  metadata:
    - name: connectionString
      secretRef: servicebus-connection
    - name: consumerID
      value: "myapp"

Service-to-Service Invocation

服务间调用

python
undefined
python
undefined

Python example using Dapr SDK

使用Dapr SDK的Python示例

from dapr.clients import DaprClient
with DaprClient() as client: # Invoke another service response = client.invoke_method( app_id='other-service', method_name='process', data='{"input": "data"}' )
# Save state
client.save_state(
    store_name='statestore',
    key='mykey',
    value='myvalue'
)

# Publish message
client.publish_event(
    pubsub_name='pubsub',
    topic_name='orders',
    data='{"orderId": "123"}'
)
undefined
from dapr.clients import DaprClient
with DaprClient() as client: # 调用其他服务 response = client.invoke_method( app_id='other-service', method_name='process', data='{"input": "data"}' )
# 保存状态
client.save_state(
    store_name='statestore',
    key='mykey',
    value='myvalue'
)

# 发布消息
client.publish_event(
    pubsub_name='pubsub',
    topic_name='orders',
    data='{"orderId": "123"}'
)
undefined

AI Model Deployment Patterns

AI模型部署模式

OpenAI-Compatible Endpoint

兼容OpenAI的端点

dockerfile
undefined
dockerfile
undefined

Dockerfile for vLLM model serving

用于vLLM模型服务的Dockerfile

FROM vllm/vllm-openai:latest
ENV MODEL_NAME="meta-llama/Llama-3.1-8B-Instruct" ENV GPU_MEMORY_UTILIZATION=0.9 ENV MAX_MODEL_LEN=4096
CMD ["--model", "${MODEL_NAME}",
"--gpu-memory-utilization", "${GPU_MEMORY_UTILIZATION}",
"--max-model-len", "${MAX_MODEL_LEN}",
"--port", "8080"]

```bash
FROM vllm/vllm-openai:latest
ENV MODEL_NAME="meta-llama/Llama-3.1-8B-Instruct" ENV GPU_MEMORY_UTILIZATION=0.9 ENV MAX_MODEL_LEN=4096
CMD ["--model", "${MODEL_NAME}",
"--gpu-memory-utilization", "${GPU_MEMORY_UTILIZATION}",
"--max-model-len", "${MAX_MODEL_LEN}",
"--port", "8080"]

```bash

Deploy vLLM model

部署vLLM模型

az containerapp create
--name llama-inference
--resource-group MyRG
--environment myenv
--image vllm/vllm-openai:latest
--cpu 8
--memory 32Gi
--gpu-type nvidia-a100
--gpu-count 1
--min-replicas 1
--max-replicas 5
--target-port 8080
--ingress external
--env-vars
MODEL_NAME="meta-llama/Llama-3.1-8B-Instruct"
GPU_MEMORY_UTILIZATION="0.9"
HF_TOKEN=secretref:huggingface-token
undefined
az containerapp create
--name llama-inference
--resource-group MyRG
--environment myenv
--image vllm/vllm-openai:latest
--cpu 8
--memory 32Gi
--gpu-type nvidia-a100
--gpu-count 1
--min-replicas 1
--max-replicas 5
--target-port 8080
--ingress external
--env-vars
MODEL_NAME="meta-llama/Llama-3.1-8B-Instruct"
GPU_MEMORY_UTILIZATION="0.9"
HF_TOKEN=secretref:huggingface-token
undefined

Stable Diffusion Image Generation

Stable Diffusion图像生成

bash
az containerapp create \
  --name stable-diffusion \
  --resource-group MyRG \
  --environment myenv \
  --image myregistry.azurecr.io/stable-diffusion:latest \
  --cpu 4 \
  --memory 16Gi \
  --gpu-type nvidia-a100 \
  --gpu-count 1 \
  --min-replicas 0 \
  --max-replicas 10 \
  --target-port 7860 \
  --ingress external \
  --scale-rule-name http-scaling \
  --scale-rule-type http \
  --scale-rule-http-concurrency 1
bash
az containerapp create \
  --name stable-diffusion \
  --resource-group MyRG \
  --environment myenv \
  --image myregistry.azurecr.io/stable-diffusion:latest \
  --cpu 4 \
  --memory 16Gi \
  --gpu-type nvidia-a100 \
  --gpu-count 1 \
  --min-replicas 0 \
  --max-replicas 10 \
  --target-port 7860 \
  --ingress external \
  --scale-rule-name http-scaling \
  --scale-rule-type http \
  --scale-rule-http-concurrency 1

Batch Processing Job

批量处理任务

bash
az containerapp job create \
  --name batch-training-job \
  --resource-group MyRG \
  --environment myenv \
  --trigger-type Manual \
  --image myregistry.azurecr.io/training:latest \
  --cpu 8 \
  --memory 32Gi \
  --gpu-type nvidia-a100 \
  --gpu-count 2 \
  --parallelism 1 \
  --replica-timeout 7200 \
  --replica-retry-limit 3 \
  --env-vars \
    DATASET_URL="https://mystorage.blob.core.windows.net/datasets/train.csv" \
    MODEL_OUTPUT="https://mystorage.blob.core.windows.net/models/" \
    EPOCHS="100"
bash
az containerapp job create \
  --name batch-training-job \
  --resource-group MyRG \
  --environment myenv \
  --trigger-type Manual \
  --image myregistry.azurecr.io/training:latest \
  --cpu 8 \
  --memory 32Gi \
  --gpu-type nvidia-a100 \
  --gpu-count 2 \
  --parallelism 1 \
  --replica-timeout 7200 \
  --replica-retry-limit 3 \
  --env-vars \
    DATASET_URL="https://mystorage.blob.core.windows.net/datasets/train.csv" \
    MODEL_OUTPUT="https://mystorage.blob.core.windows.net/models/" \
    EPOCHS="100"

Execute job

执行任务

az containerapp job start
--name batch-training-job
--resource-group MyRG
undefined
az containerapp job start
--name batch-training-job
--resource-group MyRG
undefined

Monitoring and Observability

监控与可观测性

Application Insights Integration

Application Insights集成

bash
az containerapp create \
  --name myapp-monitored \
  --resource-group MyRG \
  --environment myenv \
  --image myregistry.azurecr.io/myapp:latest \
  --env-vars \
    APPLICATIONINSIGHTS_CONNECTION_STRING=secretref:appinsights-connection
bash
az containerapp create \
  --name myapp-monitored \
  --resource-group MyRG \
  --environment myenv \
  --image myregistry.azurecr.io/myapp:latest \
  --env-vars \
    APPLICATIONINSIGHTS_CONNECTION_STRING=secretref:appinsights-connection

Query Logs

查询日志

bash
undefined
bash
undefined

Stream logs

流式查看日志

az containerapp logs show
--name myapp-gpu
--resource-group MyRG
--follow
az containerapp logs show
--name myapp-gpu
--resource-group MyRG
--follow

Query with Log Analytics

使用Log Analytics查询

az monitor log-analytics query
--workspace <workspace-id>
--analytics-query "ContainerAppConsoleLogs_CL | where ContainerAppName_s == 'myapp-gpu' | take 100"
undefined
az monitor log-analytics query
--workspace <workspace-id>
--analytics-query "ContainerAppConsoleLogs_CL | where ContainerAppName_s == 'myapp-gpu' | take 100"
undefined

Metrics and Alerts

指标与告警

bash
undefined
bash
undefined

Create metric alert for GPU usage

创建GPU使用率指标告警

az monitor metrics alert create
--name high-gpu-usage
--resource-group MyRG
--scopes $(az containerapp show -g MyRG -n myapp-gpu --query id -o tsv)
--condition "avg Requests > 100"
--window-size 5m
--evaluation-frequency 1m
--action <action-group-id>
undefined
az monitor metrics alert create
--name high-gpu-usage
--resource-group MyRG
--scopes $(az containerapp show -g MyRG -n myapp-gpu --query id -o tsv)
--condition "avg Requests > 100"
--window-size 5m
--evaluation-frequency 1m
--action <action-group-id>
undefined

Security Best Practices

安全最佳实践

Managed Identity

托管身份

bash
undefined
bash
undefined

Create with system-assigned identity

创建带系统分配身份的应用

az containerapp create
--name myapp-identity
--resource-group MyRG
--environment myenv
--system-assigned
--image myregistry.azurecr.io/myapp:latest
az containerapp create
--name myapp-identity
--resource-group MyRG
--environment myenv
--system-assigned
--image myregistry.azurecr.io/myapp:latest

Get identity principal ID

获取身份主体ID

IDENTITY_ID=$(az containerapp show -g MyRG -n myapp-identity --query identity.principalId -o tsv)
IDENTITY_ID=$(az containerapp show -g MyRG -n myapp-identity --query identity.principalId -o tsv)

Assign role to access Key Vault

分配访问Key Vault的角色

az role assignment create
--assignee $IDENTITY_ID
--role "Key Vault Secrets User"
--scope /subscriptions/<sub-id>/resourceGroups/MyRG/providers/Microsoft.KeyVault/vaults/mykeyvault
az role assignment create
--assignee $IDENTITY_ID
--role "Key Vault Secrets User"
--scope /subscriptions/<sub-id>/resourceGroups/MyRG/providers/Microsoft.KeyVault/vaults/mykeyvault

Use user-assigned identity

使用用户分配身份

az identity create --name myapp-identity --resource-group MyRG IDENTITY_RESOURCE_ID=$(az identity show -g MyRG -n myapp-identity --query id -o tsv)
az containerapp create
--name myapp-user-identity
--resource-group MyRG
--environment myenv
--user-assigned $IDENTITY_RESOURCE_ID
--image myregistry.azurecr.io/myapp:latest
undefined
az identity create --name myapp-identity --resource-group MyRG IDENTITY_RESOURCE_ID=$(az identity show -g MyRG -n myapp-identity --query id -o tsv)
az containerapp create
--name myapp-user-identity
--resource-group MyRG
--environment myenv
--user-assigned $IDENTITY_RESOURCE_ID
--image myregistry.azurecr.io/myapp:latest
undefined

Secret Management

密钥管理

bash
undefined
bash
undefined

Add secrets

添加密钥

az containerapp secret set
--name myapp-gpu
--resource-group MyRG
--secrets
huggingface-token="<token>"
api-key="<key>"
az containerapp secret set
--name myapp-gpu
--resource-group MyRG
--secrets
huggingface-token="<token>"
api-key="<key>"

Reference secrets in environment variables

在环境变量中引用密钥

az containerapp update
--name myapp-gpu
--resource-group MyRG
--set-env-vars
HF_TOKEN=secretref:huggingface-token
API_KEY=secretref:api-key
undefined
az containerapp update
--name myapp-gpu
--resource-group MyRG
--set-env-vars
HF_TOKEN=secretref:huggingface-token
API_KEY=secretref:api-key
undefined

Cost Optimization

成本优化

Scale-to-Zero Configuration

缩容至零配置

bash
az containerapp create \
  --name myapp-scale-zero \
  --resource-group MyRG \
  --environment myenv \
  --image myregistry.azurecr.io/myapp:latest \
  --min-replicas 0 \
  --max-replicas 10 \
  --scale-rule-name http-scaling \
  --scale-rule-type http \
  --scale-rule-http-concurrency 10
Cost savings: Pay only when requests are being processed. GPU costs are per-second when active.
bash
az containerapp create \
  --name myapp-scale-zero \
  --resource-group MyRG \
  --environment myenv \
  --image myregistry.azurecr.io/myapp:latest \
  --min-replicas 0 \
  --max-replicas 10 \
  --scale-rule-name http-scaling \
  --scale-rule-type http \
  --scale-rule-http-concurrency 10
成本节省:仅在处理请求时付费,GPU资源按活跃时长按秒计费。

Right-Sizing Resources

资源合理配置

bash
undefined
bash
undefined

Start with minimal resources

从最小资源开始配置

--cpu 2 --memory 4Gi --gpu-count 1
--cpu 2 --memory 4Gi --gpu-count 1

Monitor and adjust based on actual usage

根据实际使用情况监控并调整

az monitor metrics list
--resource $(az containerapp show -g MyRG -n myapp-gpu --query id -o tsv)
--metric "CpuPercentage,MemoryPercentage"
undefined
az monitor metrics list
--resource $(az containerapp show -g MyRG -n myapp-gpu --query id -o tsv)
--metric "CpuPercentage,MemoryPercentage"
undefined

Use Spot/Preemptible GPUs (Future Feature)

使用Spot/抢占式GPU(未来特性)

When available, configure spot instances for non-critical workloads to save up to 80% on GPU costs.
当该特性可用时,为非关键工作负载配置Spot实例,可节省高达80%的GPU成本。

Troubleshooting

故障排查

Check Revision Status

检查修订版本状态

bash
az containerapp revision list \
  --name myapp-gpu \
  --resource-group MyRG \
  --output table
bash
az containerapp revision list \
  --name myapp-gpu \
  --resource-group MyRG \
  --output table

View Revision Details

查看修订版本详情

bash
az containerapp revision show \
  --name <revision-name> \
  --app myapp-gpu \
  --resource-group MyRG
bash
az containerapp revision show \
  --name <revision-name> \
  --app myapp-gpu \
  --resource-group MyRG

Restart Container App

重启容器应用

bash
az containerapp update \
  --name myapp-gpu \
  --resource-group MyRG \
  --force-restart
bash
az containerapp update \
  --name myapp-gpu \
  --resource-group MyRG \
  --force-restart

GPU Not Available

GPU无法分配

If GPU is not provisioning:
  1. Check region availability: Not all regions support GPU
  2. Verify quota: Request quota increase if needed
  3. Check workload profile: Ensure GPU workload profile is created
若GPU资源无法分配,请检查:
  1. 区域可用性:并非所有区域都支持GPU
  2. 配额限制:若需要,申请提高配额
  3. 工作负载配置文件:确保已创建GPU工作负载配置文件

Best Practices

最佳实践

✓ Use scale-to-zero for intermittent workloads ✓ Implement health probes (liveness and readiness) ✓ Use managed identities for authentication ✓ Store secrets in Azure Key Vault ✓ Enable Dapr for microservices patterns ✓ Configure appropriate scaling rules ✓ Monitor GPU utilization and adjust resources ✓ Use Container Apps jobs for batch processing ✓ Implement retry logic for transient failures ✓ Use Application Insights for observability
✓ 为间歇性工作负载启用缩容至零 ✓ 配置健康探针(存活探针和就绪探针) ✓ 使用托管身份进行认证 ✓ 在Azure Key Vault中存储密钥 ✓ 为微服务场景启用Dapr ✓ 配置合适的扩缩容规则 ✓ 监控GPU利用率并调整资源配置 ✓ 使用Container Apps任务处理批量作业 ✓ 为瞬时故障实现重试逻辑 ✓ 使用Application Insights进行可观测性管理

References

参考链接

Azure Container Apps with GPU support provides the ultimate serverless platform for AI/ML workloads!
支持GPU的Azure Container Apps为AI/ML工作负载提供了终极无服务器平台!