Azure Container Apps GPU Support - 2025 Features

Azure Container Apps GPU支持 - 2025年新特性

Complete knowledge base for Azure Container Apps with GPU support, serverless capabilities, and Dapr integration (2025 GA features).

这是关于支持GPU、具备无服务器能力和Dapr集成的Azure Container Apps的完整知识库（2025年正式发布特性）。

Overview

概述

Azure Container Apps is a serverless container platform with native GPU support, Dapr integration, and scale-to-zero capabilities for cost-efficient AI/ML workloads.

Azure Container Apps是一款原生支持GPU、Dapr集成和缩容至零能力的无服务器容器平台，可为AI/ML工作负载提供高成本效益的运行环境。

Key 2025 Features (Build Announcements)

2025年核心特性（Build大会发布内容）

1. Serverless GPU (GA)

1. 无服务器GPU（正式发布）

Automatic scaling: Scale GPU workloads based on demand
Scale-to-zero: Pay only when GPU is actively used
Per-second billing: Granular cost control
Optimized cold start: Fast initialization for AI models
Reduced operational overhead: No infrastructure management

自动扩缩容：根据需求自动调整GPU工作负载规模
缩容至零：仅在GPU活跃使用时付费
按秒计费：精细化成本控制
优化冷启动：AI模型快速初始化
降低运维开销：无需管理基础设施

2. Dedicated GPU (GA)

2. 专用GPU（正式发布）

Consistent performance: Dedicated GPU resources
Simplified AI deployment: Easy model hosting
Long-running workloads: Ideal for training and continuous inference
Multiple GPU types: NVIDIA A100, T4, and more

性能稳定：独享GPU资源
AI部署简化：模型托管流程便捷
长时工作负载适配：适用于模型训练和持续推理场景
多GPU类型支持：NVIDIA A100、T4等多种型号

3. Dynamic Sessions with GPU (Early Access)

3. GPU动态会话（预览版）

Sandboxed execution: Run untrusted AI-generated code
Hyper-V isolation: Enhanced security
GPU-powered Python interpreter: Handle compute-intensive AI workloads
Scale at runtime: Dynamic resource allocation

沙箱执行：运行不可信的AI生成代码
Hyper-V隔离：增强安全性
GPU加速Python解释器：处理计算密集型AI工作负载
运行时扩缩容：动态分配资源

4. Foundry Models Integration

4. Foundry模型集成

Deploy AI models directly: During container app creation
Ready-to-use models: Pre-configured inference endpoints
Azure AI Foundry: Seamless integration

直接部署AI模型：在容器应用创建阶段即可完成部署
预配置模型：提供现成的推理端点
Azure AI Foundry无缝集成：实现生态联动

5. Workflow with Durable Task Scheduler (Preview)

5. 基于Durable Task Scheduler的工作流（预览版）

Long-running workflows: Reliable orchestration
State management: Automatic persistence
Event-driven: Trigger workflows from events

长时工作流支持：可靠的编排能力
状态管理：自动持久化状态
事件驱动：通过事件触发工作流

6. Native Azure Functions Support

6. 原生Azure Functions支持

Functions runtime: Run Azure Functions in Container Apps
Consistent development: Same code, serverless execution
Event triggers: All Functions triggers supported

Functions运行时：在Container Apps中运行Azure Functions
开发一致性：代码复用，无服务器执行
全事件触发器支持：兼容所有Functions触发器

7. Dapr Integration (GA)

7. Dapr集成（正式发布）

Service discovery: Built-in DNS-based discovery
State management: Distributed state stores
Pub/sub messaging: Reliable messaging patterns
Service invocation: Resilient service-to-service calls
Observability: Integrated tracing and metrics

服务发现：内置基于DNS的发现机制
状态管理：分布式状态存储
发布/订阅消息：可靠的消息传递模式
服务调用：弹性服务间调用
可观测性：集成链路追踪和指标监控

Creating Container Apps with GPU

创建支持GPU的容器应用

Basic Container App with Serverless GPU

基础无服务器GPU容器应用

bash

undefined

bash

undefined

Create Container Apps environment

创建Container Apps环境

az containerapp env create
--name myenv
--resource-group MyRG
--location eastus
--logs-workspace-id <workspace-id>
--logs-workspace-key <workspace-key>

Create Container App with GPU

创建带GPU的容器应用

az containerapp create
--name myapp-gpu
--resource-group MyRG
--environment myenv
--image myregistry.azurecr.io/ai-model:latest
--cpu 4
--memory 8Gi
--gpu-type nvidia-a100
--gpu-count 1
--min-replicas 0
--max-replicas 10
--ingress external
--target-port 8080

undefined

az containerapp create
--name myapp-gpu
--resource-group MyRG
--environment myenv
--image myregistry.azurecr.io/ai-model:latest
--cpu 4
--memory 8Gi
--gpu-type nvidia-a100
--gpu-count 1
--min-replicas 0
--max-replicas 10
--ingress external
--target-port 8080

undefined

Production-Ready Container App with GPU

生产级GPU容器应用

bash

az containerapp create \
  --name myapp-gpu-prod \
  --resource-group MyRG \
  --environment myenv \
  \
  # Container configuration
  --image myregistry.azurecr.io/ai-model:latest \
  --registry-server myregistry.azurecr.io \
  --registry-identity system \
  \
  # Resources
  --cpu 4 \
  --memory 8Gi \
  --gpu-type nvidia-a100 \
  --gpu-count 1 \
  \
  # Scaling
  --min-replicas 0 \
  --max-replicas 20 \
  --scale-rule-name http-scaling \
  --scale-rule-type http \
  --scale-rule-http-concurrency 10 \
  \
  # Networking
  --ingress external \
  --target-port 8080 \
  --transport http2 \
  --exposed-port 8080 \
  \
  # Security
  --registry-identity system \
  --env-vars "AZURE_CLIENT_ID=secretref:client-id" \
  \
  # Monitoring
  --dapr-app-id myapp \
  --dapr-app-port 8080 \
  --dapr-app-protocol http \
  --enable-dapr \
  \
  # Identity
  --system-assigned

bash

az containerapp create \
  --name myapp-gpu-prod \
  --resource-group MyRG \
  --environment myenv \
  \
  # 容器配置
  --image myregistry.azurecr.io/ai-model:latest \
  --registry-server myregistry.azurecr.io \
  --registry-identity system \
  \
  # 资源配置
  --cpu 4 \
  --memory 8Gi \
  --gpu-type nvidia-a100 \
  --gpu-count 1 \
  \
  # 扩缩容配置
  --min-replicas 0 \
  --max-replicas 20 \
  --scale-rule-name http-scaling \
  --scale-rule-type http \
  --scale-rule-http-concurrency 10 \
  \
  # 网络配置
  --ingress external \
  --target-port 8080 \
  --transport http2 \
  --exposed-port 8080 \
  \
  # 安全配置
  --registry-identity system \
  --env-vars "AZURE_CLIENT_ID=secretref:client-id" \
  \
  # 监控配置
  --dapr-app-id myapp \
  --dapr-app-port 8080 \
  --dapr-app-protocol http \
  --enable-dapr \
  \
  # 身份配置
  --system-assigned

Container Apps Environment Configuration

容器应用环境配置

Environment with Zone Redundancy

支持区域冗余的环境

bash

az containerapp env create \
  --name myenv-prod \
  --resource-group MyRG \
  --location eastus \
  --logs-workspace-id <workspace-id> \
  --logs-workspace-key <workspace-key> \
  --zone-redundant true \
  --enable-workload-profiles true

bash

az containerapp env create \
  --name myenv-prod \
  --resource-group MyRG \
  --location eastus \
  --logs-workspace-id <workspace-id> \
  --logs-workspace-key <workspace-key> \
  --zone-redundant true \
  --enable-workload-profiles true

Workload Profiles (Dedicated GPU)

工作负载配置文件（专用GPU）

bash

undefined

bash

undefined

Create environment with workload profiles

创建带工作负载配置文件的环境

az containerapp env create
--name myenv-gpu
--resource-group MyRG
--location eastus
--enable-workload-profiles true

Add GPU workload profile

添加GPU工作负载配置文件

az containerapp env workload-profile add
--name myenv-gpu
--resource-group MyRG
--workload-profile-name gpu-profile
--workload-profile-type GPU-A100
--min-nodes 0
--max-nodes 10

Create container app with GPU profile

创建使用GPU配置文件的容器应用

az containerapp create
--name myapp-dedicated-gpu
--resource-group MyRG
--environment myenv-gpu
--workload-profile-name gpu-profile
--image myregistry.azurecr.io/training-job:latest
--cpu 8
--memory 16Gi
--min-replicas 1
--max-replicas 5

undefined

az containerapp create
--name myapp-dedicated-gpu
--resource-group MyRG
--environment myenv-gpu
--workload-profile-name gpu-profile
--image myregistry.azurecr.io/training-job:latest
--cpu 8
--memory 16Gi
--min-replicas 1
--max-replicas 5

undefined

GPU Scaling Rules

GPU扩缩容规则

Custom Prometheus Scaling

自定义Prometheus扩缩容

bash

az containerapp create \
  --name myapp-gpu-prometheus \
  --resource-group MyRG \
  --environment myenv \
  --image myregistry.azurecr.io/ai-model:latest \
  --cpu 4 \
  --memory 8Gi \
  --gpu-type nvidia-a100 \
  --gpu-count 1 \
  --min-replicas 0 \
  --max-replicas 10 \
  --scale-rule-name gpu-utilization \
  --scale-rule-type custom \
  --scale-rule-custom-type prometheus \
  --scale-rule-metadata \
    serverAddress=http://prometheus.monitoring.svc.cluster.local:9090 \
    metricName=gpu_utilization \
    threshold=80 \
    query="avg(nvidia_gpu_utilization{app='myapp'})"

bash

az containerapp create \
  --name myapp-gpu-prometheus \
  --resource-group MyRG \
  --environment myenv \
  --image myregistry.azurecr.io/ai-model:latest \
  --cpu 4 \
  --memory 8Gi \
  --gpu-type nvidia-a100 \
  --gpu-count 1 \
  --min-replicas 0 \
  --max-replicas 10 \
  --scale-rule-name gpu-utilization \
  --scale-rule-type custom \
  --scale-rule-custom-type prometheus \
  --scale-rule-metadata \
    serverAddress=http://prometheus.monitoring.svc.cluster.local:9090 \
    metricName=gpu_utilization \
    threshold=80 \
    query="avg(nvidia_gpu_utilization{app='myapp'})"

Queue-Based Scaling (Azure Service Bus)

基于队列的扩缩容（Azure Service Bus）

bash

az containerapp create \
  --name myapp-queue-processor \
  --resource-group MyRG \
  --environment myenv \
  --image myregistry.azurecr.io/batch-processor:latest \
  --cpu 4 \
  --memory 8Gi \
  --gpu-type nvidia-t4 \
  --gpu-count 1 \
  --min-replicas 0 \
  --max-replicas 50 \
  --scale-rule-name queue-scaling \
  --scale-rule-type azure-servicebus \
  --scale-rule-metadata \
    queueName=ai-jobs \
    namespace=myservicebus \
    messageCount=5 \
  --scale-rule-auth connection=servicebus-connection

bash

az containerapp create \
  --name myapp-queue-processor \
  --resource-group MyRG \
  --environment myenv \
  --image myregistry.azurecr.io/batch-processor:latest \
  --cpu 4 \
  --memory 8Gi \
  --gpu-type nvidia-t4 \
  --gpu-count 1 \
  --min-replicas 0 \
  --max-replicas 50 \
  --scale-rule-name queue-scaling \
  --scale-rule-type azure-servicebus \
  --scale-rule-metadata \
    queueName=ai-jobs \
    namespace=myservicebus \
    messageCount=5 \
  --scale-rule-auth connection=servicebus-connection

Dapr Integration

Dapr集成

Enable Dapr on Container App

在容器应用中启用Dapr

bash

az containerapp create \
  --name myapp-dapr \
  --resource-group MyRG \
  --environment myenv \
  --image myregistry.azurecr.io/myapp:latest \
  --enable-dapr \
  --dapr-app-id myapp \
  --dapr-app-port 8080 \
  --dapr-app-protocol http \
  --dapr-http-max-request-size 4 \
  --dapr-http-read-buffer-size 4 \
  --dapr-log-level info \
  --dapr-enable-api-logging true

bash

az containerapp create \
  --name myapp-dapr \
  --resource-group MyRG \
  --environment myenv \
  --image myregistry.azurecr.io/myapp:latest \
  --enable-dapr \
  --dapr-app-id myapp \
  --dapr-app-port 8080 \
  --dapr-app-protocol http \
  --dapr-http-max-request-size 4 \
  --dapr-http-read-buffer-size 4 \
  --dapr-log-level info \
  --dapr-enable-api-logging true

Dapr State Store (Azure Cosmos DB)

Dapr状态存储（Azure Cosmos DB）

yaml

undefined

yaml

undefined

Create Dapr component for state store

创建用于状态存储的Dapr组件

apiVersion: dapr.io/v1alpha1 kind: Component metadata: name: statestore spec: type: state.azure.cosmosdb version: v1 metadata: - name: url value: "https://mycosmosdb.documents.azure.com:443/" - name: masterKey secretRef: cosmosdb-key - name: database value: "mydb" - name: collection value: "state"


```bash

apiVersion: dapr.io/v1alpha1 kind: Component metadata: name: statestore spec: type: state.azure.cosmosdb version: v1 metadata: - name: url value: "https://mycosmosdb.documents.azure.com:443/" - name: masterKey secretRef: cosmosdb-key - name: database value: "mydb" - name: collection value: "state"


```bash

Create the component

创建组件

az containerapp env dapr-component set
--name myenv
--resource-group MyRG
--dapr-component-name statestore
--yaml component.yaml

undefined

az containerapp env dapr-component set
--name myenv
--resource-group MyRG
--dapr-component-name statestore
--yaml component.yaml

undefined

Dapr Pub/Sub (Azure Service Bus)

Dapr发布/订阅（Azure Service Bus）

yaml

apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
  name: pubsub
spec:
  type: pubsub.azure.servicebus.topics
  version: v1
  metadata:
    - name: connectionString
      secretRef: servicebus-connection
    - name: consumerID
      value: "myapp"

yaml

apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
  name: pubsub
spec:
  type: pubsub.azure.servicebus.topics
  version: v1
  metadata:
    - name: connectionString
      secretRef: servicebus-connection
    - name: consumerID
      value: "myapp"

Service-to-Service Invocation

服务间调用

python

undefined

python

undefined

Python example using Dapr SDK

使用Dapr SDK的Python示例

from dapr.clients import DaprClient

with DaprClient() as client: # Invoke another service response = client.invoke_method( app_id='other-service', method_name='process', data='{"input": "data"}' )

# Save state
client.save_state(
    store_name='statestore',
    key='mykey',
    value='myvalue'
)

# Publish message
client.publish_event(
    pubsub_name='pubsub',
    topic_name='orders',
    data='{"orderId": "123"}'
)

undefined

from dapr.clients import DaprClient

with DaprClient() as client: # 调用其他服务 response = client.invoke_method( app_id='other-service', method_name='process', data='{"input": "data"}' )

# 保存状态
client.save_state(
    store_name='statestore',
    key='mykey',
    value='myvalue'
)

# 发布消息
client.publish_event(
    pubsub_name='pubsub',
    topic_name='orders',
    data='{"orderId": "123"}'
)

undefined

AI Model Deployment Patterns

AI模型部署模式

OpenAI-Compatible Endpoint

兼容OpenAI的端点

dockerfile

undefined

dockerfile

undefined

Dockerfile for vLLM model serving

用于vLLM模型服务的Dockerfile

FROM vllm/vllm-openai:latest

ENV MODEL_NAME="meta-llama/Llama-3.1-8B-Instruct" ENV GPU_MEMORY_UTILIZATION=0.9 ENV MAX_MODEL_LEN=4096

CMD ["--model", "${MODEL_NAME}",
"--gpu-memory-utilization", "${GPU_MEMORY_UTILIZATION}",
"--max-model-len", "${MAX_MODEL_LEN}",
"--port", "8080"]


```bash

FROM vllm/vllm-openai:latest

ENV MODEL_NAME="meta-llama/Llama-3.1-8B-Instruct" ENV GPU_MEMORY_UTILIZATION=0.9 ENV MAX_MODEL_LEN=4096

CMD ["--model", "${MODEL_NAME}",
"--gpu-memory-utilization", "${GPU_MEMORY_UTILIZATION}",
"--max-model-len", "${MAX_MODEL_LEN}",
"--port", "8080"]


```bash

Deploy vLLM model

部署vLLM模型

az containerapp create
--name llama-inference
--resource-group MyRG
--environment myenv
--image vllm/vllm-openai:latest
--cpu 8
--memory 32Gi
--gpu-type nvidia-a100
--gpu-count 1
--min-replicas 1
--max-replicas 5
--target-port 8080
--ingress external
--env-vars
MODEL_NAME="meta-llama/Llama-3.1-8B-Instruct"
GPU_MEMORY_UTILIZATION="0.9"
HF_TOKEN=secretref:huggingface-token

undefined

az containerapp create
--name llama-inference
--resource-group MyRG
--environment myenv
--image vllm/vllm-openai:latest
--cpu 8
--memory 32Gi
--gpu-type nvidia-a100
--gpu-count 1
--min-replicas 1
--max-replicas 5
--target-port 8080
--ingress external
--env-vars
MODEL_NAME="meta-llama/Llama-3.1-8B-Instruct"
GPU_MEMORY_UTILIZATION="0.9"
HF_TOKEN=secretref:huggingface-token

undefined

Stable Diffusion Image Generation

Stable Diffusion图像生成

bash

az containerapp create \
  --name stable-diffusion \
  --resource-group MyRG \
  --environment myenv \
  --image myregistry.azurecr.io/stable-diffusion:latest \
  --cpu 4 \
  --memory 16Gi \
  --gpu-type nvidia-a100 \
  --gpu-count 1 \
  --min-replicas 0 \
  --max-replicas 10 \
  --target-port 7860 \
  --ingress external \
  --scale-rule-name http-scaling \
  --scale-rule-type http \
  --scale-rule-http-concurrency 1

bash

az containerapp create \
  --name stable-diffusion \
  --resource-group MyRG \
  --environment myenv \
  --image myregistry.azurecr.io/stable-diffusion:latest \
  --cpu 4 \
  --memory 16Gi \
  --gpu-type nvidia-a100 \
  --gpu-count 1 \
  --min-replicas 0 \
  --max-replicas 10 \
  --target-port 7860 \
  --ingress external \
  --scale-rule-name http-scaling \
  --scale-rule-type http \
  --scale-rule-http-concurrency 1

Batch Processing Job

批量处理任务

bash

az containerapp job create \
  --name batch-training-job \
  --resource-group MyRG \
  --environment myenv \
  --trigger-type Manual \
  --image myregistry.azurecr.io/training:latest \
  --cpu 8 \
  --memory 32Gi \
  --gpu-type nvidia-a100 \
  --gpu-count 2 \
  --parallelism 1 \
  --replica-timeout 7200 \
  --replica-retry-limit 3 \
  --env-vars \
    DATASET_URL="https://mystorage.blob.core.windows.net/datasets/train.csv" \
    MODEL_OUTPUT="https://mystorage.blob.core.windows.net/models/" \
    EPOCHS="100"

bash

az containerapp job create \
  --name batch-training-job \
  --resource-group MyRG \
  --environment myenv \
  --trigger-type Manual \
  --image myregistry.azurecr.io/training:latest \
  --cpu 8 \
  --memory 32Gi \
  --gpu-type nvidia-a100 \
  --gpu-count 2 \
  --parallelism 1 \
  --replica-timeout 7200 \
  --replica-retry-limit 3 \
  --env-vars \
    DATASET_URL="https://mystorage.blob.core.windows.net/datasets/train.csv" \
    MODEL_OUTPUT="https://mystorage.blob.core.windows.net/models/" \
    EPOCHS="100"

Execute job

执行任务

az containerapp job start
--name batch-training-job
--resource-group MyRG

undefined

az containerapp job start
--name batch-training-job
--resource-group MyRG

undefined

Monitoring and Observability

监控与可观测性

Application Insights Integration

Application Insights集成

bash

az containerapp create \
  --name myapp-monitored \
  --resource-group MyRG \
  --environment myenv \
  --image myregistry.azurecr.io/myapp:latest \
  --env-vars \
    APPLICATIONINSIGHTS_CONNECTION_STRING=secretref:appinsights-connection

bash

az containerapp create \
  --name myapp-monitored \
  --resource-group MyRG \
  --environment myenv \
  --image myregistry.azurecr.io/myapp:latest \
  --env-vars \
    APPLICATIONINSIGHTS_CONNECTION_STRING=secretref:appinsights-connection

Query Logs

查询日志

bash

undefined

bash

undefined

Stream logs

流式查看日志

az containerapp logs show
--name myapp-gpu
--resource-group MyRG
--follow

Query with Log Analytics

使用Log Analytics查询

az monitor log-analytics query
--workspace <workspace-id>
--analytics-query "ContainerAppConsoleLogs_CL | where ContainerAppName_s == 'myapp-gpu' | take 100"

undefined

az monitor log-analytics query
--workspace <workspace-id>
--analytics-query "ContainerAppConsoleLogs_CL | where ContainerAppName_s == 'myapp-gpu' | take 100"

undefined

Metrics and Alerts

指标与告警

bash

undefined

bash

undefined

Create metric alert for GPU usage

创建GPU使用率指标告警

az monitor metrics alert create
--name high-gpu-usage
--resource-group MyRG
--scopes $(az containerapp show -g MyRG -n myapp-gpu --query id -o tsv)
--condition "avg Requests > 100"
--window-size 5m
--evaluation-frequency 1m
--action <action-group-id>

undefined

az monitor metrics alert create
--name high-gpu-usage
--resource-group MyRG
--scopes $(az containerapp show -g MyRG -n myapp-gpu --query id -o tsv)
--condition "avg Requests > 100"
--window-size 5m
--evaluation-frequency 1m
--action <action-group-id>

undefined

Security Best Practices

安全最佳实践

Managed Identity

托管身份

bash

undefined

bash

undefined

Create with system-assigned identity

创建带系统分配身份的应用

az containerapp create
--name myapp-identity
--resource-group MyRG
--environment myenv
--system-assigned
--image myregistry.azurecr.io/myapp:latest

Get identity principal ID

获取身份主体ID

IDENTITY_ID=$(az containerapp show -g MyRG -n myapp-identity --query identity.principalId -o tsv)

Assign role to access Key Vault

分配访问Key Vault的角色

az role assignment create
--assignee $IDENTITY_ID
--role "Key Vault Secrets User"
--scope /subscriptions/<sub-id>/resourceGroups/MyRG/providers/Microsoft.KeyVault/vaults/mykeyvault

Use user-assigned identity

使用用户分配身份

az identity create --name myapp-identity --resource-group MyRG IDENTITY_RESOURCE_ID=$(az identity show -g MyRG -n myapp-identity --query id -o tsv)

az containerapp create
--name myapp-user-identity
--resource-group MyRG
--environment myenv
--user-assigned $IDENTITY_RESOURCE_ID
--image myregistry.azurecr.io/myapp:latest

undefined

az identity create --name myapp-identity --resource-group MyRG IDENTITY_RESOURCE_ID=$(az identity show -g MyRG -n myapp-identity --query id -o tsv)

az containerapp create
--name myapp-user-identity
--resource-group MyRG
--environment myenv
--user-assigned $IDENTITY_RESOURCE_ID
--image myregistry.azurecr.io/myapp:latest

undefined

Secret Management

密钥管理

bash

undefined

bash

undefined

Add secrets

添加密钥

az containerapp secret set
--name myapp-gpu
--resource-group MyRG
--secrets
huggingface-token="<token>"
api-key="<key>"

Reference secrets in environment variables

在环境变量中引用密钥

az containerapp update
--name myapp-gpu
--resource-group MyRG
--set-env-vars
HF_TOKEN=secretref:huggingface-token
API_KEY=secretref:api-key

undefined

az containerapp update
--name myapp-gpu
--resource-group MyRG
--set-env-vars
HF_TOKEN=secretref:huggingface-token
API_KEY=secretref:api-key

undefined

Cost Optimization

成本优化

Scale-to-Zero Configuration

缩容至零配置

bash

az containerapp create \
  --name myapp-scale-zero \
  --resource-group MyRG \
  --environment myenv \
  --image myregistry.azurecr.io/myapp:latest \
  --min-replicas 0 \
  --max-replicas 10 \
  --scale-rule-name http-scaling \
  --scale-rule-type http \
  --scale-rule-http-concurrency 10

Cost savings: Pay only when requests are being processed. GPU costs are per-second when active.

bash

az containerapp create \
  --name myapp-scale-zero \
  --resource-group MyRG \
  --environment myenv \
  --image myregistry.azurecr.io/myapp:latest \
  --min-replicas 0 \
  --max-replicas 10 \
  --scale-rule-name http-scaling \
  --scale-rule-type http \
  --scale-rule-http-concurrency 10

成本节省：仅在处理请求时付费，GPU资源按活跃时长按秒计费。

Right-Sizing Resources

资源合理配置

bash

undefined

bash

undefined

Start with minimal resources

从最小资源开始配置

--cpu 2 --memory 4Gi --gpu-count 1

Monitor and adjust based on actual usage

根据实际使用情况监控并调整

az monitor metrics list
--resource $(az containerapp show -g MyRG -n myapp-gpu --query id -o tsv)
--metric "CpuPercentage,MemoryPercentage"

undefined

az monitor metrics list
--resource $(az containerapp show -g MyRG -n myapp-gpu --query id -o tsv)
--metric "CpuPercentage,MemoryPercentage"

undefined

Use Spot/Preemptible GPUs (Future Feature)

使用Spot/抢占式GPU（未来特性）

When available, configure spot instances for non-critical workloads to save up to 80% on GPU costs.

当该特性可用时，为非关键工作负载配置Spot实例，可节省高达80%的GPU成本。

Troubleshooting

故障排查

Check Revision Status

检查修订版本状态

bash

az containerapp revision list \
  --name myapp-gpu \
  --resource-group MyRG \
  --output table

bash

az containerapp revision list \
  --name myapp-gpu \
  --resource-group MyRG \
  --output table

View Revision Details

查看修订版本详情

bash

az containerapp revision show \
  --name <revision-name> \
  --app myapp-gpu \
  --resource-group MyRG

bash

az containerapp revision show \
  --name <revision-name> \
  --app myapp-gpu \
  --resource-group MyRG

Restart Container App

重启容器应用

bash

az containerapp update \
  --name myapp-gpu \
  --resource-group MyRG \
  --force-restart

bash

az containerapp update \
  --name myapp-gpu \
  --resource-group MyRG \
  --force-restart

GPU Not Available

GPU无法分配

If GPU is not provisioning:

Check region availability: Not all regions support GPU
Verify quota: Request quota increase if needed
Check workload profile: Ensure GPU workload profile is created

若GPU资源无法分配，请检查：

区域可用性：并非所有区域都支持GPU
配额限制：若需要，申请提高配额
工作负载配置文件：确保已创建GPU工作负载配置文件

Best Practices

最佳实践

✓ Use scale-to-zero for intermittent workloads ✓ Implement health probes (liveness and readiness) ✓ Use managed identities for authentication ✓ Store secrets in Azure Key Vault ✓ Enable Dapr for microservices patterns ✓ Configure appropriate scaling rules ✓ Monitor GPU utilization and adjust resources ✓ Use Container Apps jobs for batch processing ✓ Implement retry logic for transient failures ✓ Use Application Insights for observability

✓ 为间歇性工作负载启用缩容至零 ✓ 配置健康探针（存活探针和就绪探针） ✓ 使用托管身份进行认证 ✓ 在Azure Key Vault中存储密钥 ✓ 为微服务场景启用Dapr ✓ 配置合适的扩缩容规则 ✓ 监控GPU利用率并调整资源配置 ✓ 使用Container Apps任务处理批量作业 ✓ 为瞬时故障实现重试逻辑 ✓ 使用Application Insights进行可观测性管理

References

参考链接

Azure Container Apps with GPU support provides the ultimate serverless platform for AI/ML workloads!

支持GPU的Azure Container Apps为AI/ML工作负载提供了终极无服务器平台！

container-apps-gpu-2025

Original

Translation

Azure Container Apps GPU Support - 2025 Features

Azure Container Apps GPU支持 - 2025年新特性

Overview

概述

Key 2025 Features (Build Announcements)

2025年核心特性（Build大会发布内容）

1. Serverless GPU (GA)

1. 无服务器GPU（正式发布）

2. Dedicated GPU (GA)

2. 专用GPU（正式发布）

3. Dynamic Sessions with GPU (Early Access)

3. GPU动态会话（预览版）

4. Foundry Models Integration

4. Foundry模型集成

5. Workflow with Durable Task Scheduler (Preview)

5. 基于Durable Task Scheduler的工作流（预览版）

6. Native Azure Functions Support

6. 原生Azure Functions支持

7. Dapr Integration (GA)

7. Dapr集成（正式发布）

Creating Container Apps with GPU

创建支持GPU的容器应用

Basic Container App with Serverless GPU

基础无服务器GPU容器应用

Create Container Apps environment

创建Container Apps环境

Create Container App with GPU

创建带GPU的容器应用

Production-Ready Container App with GPU

生产级GPU容器应用

Container Apps Environment Configuration

容器应用环境配置

Environment with Zone Redundancy

支持区域冗余的环境

Workload Profiles (Dedicated GPU)

工作负载配置文件（专用GPU）

Create environment with workload profiles

创建带工作负载配置文件的环境

Add GPU workload profile

添加GPU工作负载配置文件

Create container app with GPU profile

创建使用GPU配置文件的容器应用

GPU Scaling Rules

GPU扩缩容规则

Custom Prometheus Scaling

自定义Prometheus扩缩容

Queue-Based Scaling (Azure Service Bus)

基于队列的扩缩容（Azure Service Bus）

Dapr Integration

Dapr集成

Enable Dapr on Container App

在容器应用中启用Dapr

Dapr State Store (Azure Cosmos DB)

Dapr状态存储（Azure Cosmos DB）

Create Dapr component for state store

创建用于状态存储的Dapr组件

Create the component

创建组件

Dapr Pub/Sub (Azure Service Bus)

Dapr发布/订阅（Azure Service Bus）

Service-to-Service Invocation

服务间调用

Python example using Dapr SDK

使用Dapr SDK的Python示例

AI Model Deployment Patterns

AI模型部署模式

OpenAI-Compatible Endpoint

兼容OpenAI的端点

Dockerfile for vLLM model serving

用于vLLM模型服务的Dockerfile

Deploy vLLM model

部署vLLM模型

Stable Diffusion Image Generation

Stable Diffusion图像生成

Batch Processing Job

批量处理任务

Execute job