azure-openai-2025

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Azure OpenAI Service - 2025 Models and Features

Azure OpenAI Service - 2025款模型及功能

Complete knowledge base for Azure OpenAI Service with latest 2025 models including GPT-5, GPT-4.1, reasoning models, and Azure AI Foundry integration.
本知识库涵盖Azure OpenAI Service的完整内容,包括2025年最新模型(GPT-5、GPT-4.1、推理模型)以及Azure AI Foundry集成功能。

Overview

概述

Azure OpenAI Service provides REST API access to OpenAI's most powerful models with enterprise-grade security, compliance, and regional availability.
Azure OpenAI Service 提供REST API访问OpenAI最强大的模型,同时具备企业级安全性、合规性和区域可用性。

Latest Models (2025)

2025年最新模型

GPT-5 Series (GA August 2025)

GPT-5系列(2025年8月正式发布)

Registration Required Models:
  • gpt-5-pro
    : Highest capability, complex reasoning
  • gpt-5
    : Balanced performance and cost
  • gpt-5-codex
    : Optimized for code generation
No Registration Required:
  • gpt-5-mini
    : Faster, more affordable
  • gpt-5-nano
    : Ultra-fast for simple tasks
  • gpt-5-chat
    : Optimized for conversational use
需注册的模型:
  • gpt-5-pro
    :最高性能,支持复杂推理
  • gpt-5
    :性能与成本平衡
  • gpt-5-codex
    :针对代码生成优化
无需注册的模型:
  • gpt-5-mini
    :速度更快,成本更低
  • gpt-5-nano
    :超高速,适用于简单任务
  • gpt-5-chat
    :针对对话场景优化

GPT-4.1 Series

GPT-4.1系列

  • gpt-4.1
    : 1 million token context window
  • gpt-4.1-mini
    : Efficient version with 1M context
  • gpt-4.1-nano
    : Fastest variant
Key Improvements:
  • 1,000,000 token context (vs 128K in GPT-4 Turbo)
  • Better instruction following
  • Reduced hallucinations
  • Improved multilingual support
  • gpt-4.1
    :100万token上下文窗口
  • gpt-4.1-mini
    :高效版本,支持100万上下文
  • gpt-4.1-nano
    :最快变体
主要改进:
  • 100万token上下文(对比GPT-4 Turbo的128K)
  • 更好的指令遵循能力
  • 减少幻觉现象
  • 改进的多语言支持

Reasoning Models

推理模型

o4-mini: Lightweight reasoning model
  • Faster inference
  • Lower cost
  • Suitable for structured reasoning tasks
o3: Advanced reasoning model
  • Complex problem solving
  • Mathematical reasoning
  • Scientific analysis
o1: Original reasoning model
  • General-purpose reasoning
  • Step-by-step explanations
o1-mini: Efficient reasoning
  • Balanced cost and performance
o4-mini:轻量级推理模型
  • 推理速度更快
  • 成本更低
  • 适用于结构化推理任务
o3:高级推理模型
  • 复杂问题解决
  • 数学推理
  • 科学分析
o1:原始推理模型
  • 通用推理
  • 分步解释
o1-mini:高效推理
  • 成本与性能平衡

Image Generation

图像生成

GPT-image-1 (2025-04-15)
  • DALL-E 3 successor
  • Higher quality images
  • Better prompt understanding
  • Improved safety filters
GPT-image-1(2025-04-15)
  • DALL-E 3的继任者
  • 更高质量的图像
  • 更好的提示词理解能力
  • 改进的安全过滤器

Video Generation

视频生成

Sora (2025-05-02)
  • Text-to-video generation
  • Realistic and imaginative scenes
  • Up to 60 seconds of video
  • Multiple camera angles and styles
Sora(2025-05-02)
  • 文本转视频生成
  • 逼真且富有想象力的场景
  • 最长60秒视频
  • 多种拍摄角度和风格

Audio Models

音频模型

gpt-4o-transcribe: Speech-to-text powered by GPT-4o
  • High accuracy transcription
  • Multiple languages
  • Speaker diarization
gpt-4o-mini-transcribe: Faster, more affordable transcription
  • Good accuracy
  • Lower latency
  • Cost-effective
gpt-4o-transcribe:基于GPT-4o的语音转文本
  • 高精度转录
  • 支持多种语言
  • 说话人分离
gpt-4o-mini-transcribe:更快、更经济的转录
  • 良好的准确率
  • 更低延迟
  • 高性价比

Deploying Azure OpenAI

部署Azure OpenAI

Create Azure OpenAI Resource

创建Azure OpenAI资源

bash
undefined
bash
undefined

Create OpenAI account

Create OpenAI account

az cognitiveservices account create
--name myopenai
--resource-group MyRG
--kind OpenAI
--sku S0
--location eastus
--custom-domain myopenai
--public-network-access Disabled
--identity-type SystemAssigned
az cognitiveservices account create
--name myopenai
--resource-group MyRG
--kind OpenAI
--sku S0
--location eastus
--custom-domain myopenai
--public-network-access Disabled
--identity-type SystemAssigned

Get endpoint and key

Get endpoint and key

az cognitiveservices account show
--name myopenai
--resource-group MyRG
--query "properties.endpoint"
--output tsv
az cognitiveservices account keys list
--name myopenai
--resource-group MyRG
--query "key1"
--output tsv
undefined
az cognitiveservices account show
--name myopenai
--resource-group MyRG
--query "properties.endpoint"
--output tsv
az cognitiveservices account keys list
--name myopenai
--resource-group MyRG
--query "key1"
--output tsv
undefined

Deploy GPT-5 Model

部署GPT-5模型

bash
undefined
bash
undefined

Deploy gpt-5

Deploy gpt-5

az cognitiveservices account deployment create
--resource-group MyRG
--name myopenai
--deployment-name gpt-5
--model-name gpt-5
--model-version latest
--model-format OpenAI
--sku-name Standard
--sku-capacity 100
--scale-type Standard
az cognitiveservices account deployment create
--resource-group MyRG
--name myopenai
--deployment-name gpt-5
--model-name gpt-5
--model-version latest
--model-format OpenAI
--sku-name Standard
--sku-capacity 100
--scale-type Standard

Deploy gpt-5-pro (requires registration)

Deploy gpt-5-pro (requires registration)

az cognitiveservices account deployment create
--resource-group MyRG
--name myopenai
--deployment-name gpt-5-pro
--model-name gpt-5-pro
--model-version latest
--model-format OpenAI
--sku-name Standard
--sku-capacity 50
undefined
az cognitiveservices account deployment create
--resource-group MyRG
--name myopenai
--deployment-name gpt-5-pro
--model-name gpt-5-pro
--model-version latest
--model-format OpenAI
--sku-name Standard
--sku-capacity 50
undefined

Deploy Reasoning Models

部署推理模型

bash
undefined
bash
undefined

Deploy o3 reasoning model

Deploy o3 reasoning model

az cognitiveservices account deployment create
--resource-group MyRG
--name myopenai
--deployment-name o3-reasoning
--model-name o3
--model-version latest
--model-format OpenAI
--sku-name Standard
--sku-capacity 50
az cognitiveservices account deployment create
--resource-group MyRG
--name myopenai
--deployment-name o3-reasoning
--model-name o3
--model-version latest
--model-format OpenAI
--sku-name Standard
--sku-capacity 50

Deploy o4-mini

Deploy o4-mini

az cognitiveservices account deployment create
--resource-group MyRG
--name myopenai
--deployment-name o4-mini
--model-name o4-mini
--model-version latest
--model-format OpenAI
--sku-name Standard
--sku-capacity 100
undefined
az cognitiveservices account deployment create
--resource-group MyRG
--name myopenai
--deployment-name o4-mini
--model-name o4-mini
--model-version latest
--model-format OpenAI
--sku-name Standard
--sku-capacity 100
undefined

Deploy GPT-4.1 with 1M Context

部署支持100万上下文的GPT-4.1

bash
az cognitiveservices account deployment create \
  --resource-group MyRG \
  --name myopenai \
  --deployment-name gpt-4-1 \
  --model-name gpt-4.1 \
  --model-version latest \
  --model-format OpenAI \
  --sku-name Standard \
  --sku-capacity 100
bash
az cognitiveservices account deployment create \
  --resource-group MyRG \
  --name myopenai \
  --deployment-name gpt-4-1 \
  --model-name gpt-4.1 \
  --model-version latest \
  --model-format OpenAI \
  --sku-name Standard \
  --sku-capacity 100

Deploy Image Generation Model

部署图像生成模型

bash
az cognitiveservices account deployment create \
  --resource-group MyRG \
  --name myopenai \
  --deployment-name image-gen \
  --model-name gpt-image-1 \
  --model-version 2025-04-15 \
  --model-format OpenAI \
  --sku-name Standard \
  --sku-capacity 10
bash
az cognitiveservices account deployment create \
  --resource-group MyRG \
  --name myopenai \
  --deployment-name image-gen \
  --model-name gpt-image-1 \
  --model-version 2025-04-15 \
  --model-format OpenAI \
  --sku-name Standard \
  --sku-capacity 10

Deploy Sora Video Generation

部署Sora视频生成

bash
az cognitiveservices account deployment create \
  --resource-group MyRG \
  --name myopenai \
  --deployment-name sora \
  --model-name sora \
  --model-version 2025-05-02 \
  --model-format OpenAI \
  --sku-name Standard \
  --sku-capacity 5
bash
az cognitiveservices account deployment create \
  --resource-group MyRG \
  --name myopenai \
  --deployment-name sora \
  --model-name sora \
  --model-version 2025-05-02 \
  --model-format OpenAI \
  --sku-name Standard \
  --sku-capacity 5

Using Azure OpenAI Models

使用Azure OpenAI模型

Python SDK (GPT-5)

Python SDK(GPT-5)

python
from openai import AzureOpenAI
import os
python
from openai import AzureOpenAI
import os

Initialize client

Initialize client

client = AzureOpenAI( api_key=os.getenv("AZURE_OPENAI_API_KEY"), api_version="2025-02-01-preview", azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT") )
client = AzureOpenAI( api_key=os.getenv("AZURE_OPENAI_API_KEY"), api_version="2025-02-01-preview", azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT") )

GPT-5 completion

GPT-5 completion

response = client.chat.completions.create( model="gpt-5", # deployment name messages=[ {"role": "system", "content": "You are a helpful AI assistant."}, {"role": "user", "content": "Explain quantum computing in simple terms."} ], max_tokens=1000, temperature=0.7, top_p=0.95 )
print(response.choices[0].message.content)
undefined
response = client.chat.completions.create( model="gpt-5", # deployment name messages=[ {"role": "system", "content": "You are a helpful AI assistant."}, {"role": "user", "content": "Explain quantum computing in simple terms."} ], max_tokens=1000, temperature=0.7, top_p=0.95 )
print(response.choices[0].message.content)
undefined

Python SDK (o3 Reasoning Model)

Python SDK(o3推理模型)

python
undefined
python
undefined

o3 reasoning with chain-of-thought

o3 reasoning with chain-of-thought

response = client.chat.completions.create( model="o3-reasoning", messages=[ {"role": "system", "content": "You are an expert problem solver. Show your reasoning step-by-step."}, {"role": "user", "content": "If a train travels 120 km in 2 hours, then speeds up to travel 180 km in the next 2 hours, what is the average speed for the entire journey?"} ], max_tokens=2000, temperature=0.2 # Lower temperature for reasoning tasks )
print(response.choices[0].message.content)
undefined
response = client.chat.completions.create( model="o3-reasoning", messages=[ {"role": "system", "content": "You are an expert problem solver. Show your reasoning step-by-step."}, {"role": "user", "content": "If a train travels 120 km in 2 hours, then speeds up to travel 180 km in the next 2 hours, what is the average speed for the entire journey?"} ], max_tokens=2000, temperature=0.2 # Lower temperature for reasoning tasks )
print(response.choices[0].message.content)
undefined

Python SDK (GPT-4.1 with 1M Context)

Python SDK(支持100万上下文的GPT-4.1)

python
undefined
python
undefined

Read a large document

Read a large document

with open('large_document.txt', 'r') as f: document = f.read()
with open('large_document.txt', 'r') as f: document = f.read()

GPT-4.1 can handle up to 1M tokens

GPT-4.1 can handle up to 1M tokens

response = client.chat.completions.create( model="gpt-4-1", messages=[ {"role": "system", "content": "You are a document analysis expert."}, {"role": "user", "content": f"Analyze this document and provide key insights:\n\n{document}"} ], max_tokens=4000 )
print(response.choices[0].message.content)
undefined
response = client.chat.completions.create( model="gpt-4-1", messages=[ {"role": "system", "content": "You are a document analysis expert."}, {"role": "user", "content": f"Analyze this document and provide key insights:\n\n{document}"} ], max_tokens=4000 )
print(response.choices[0].message.content)
undefined

Image Generation (GPT-image-1)

图像生成(GPT-image-1)

python
undefined
python
undefined

Generate image with DALL-E 3 successor

Generate image with DALL-E 3 successor

response = client.images.generate( model="image-gen", prompt="A futuristic city with flying cars and vertical gardens, cyberpunk style, highly detailed, 4K", size="1024x1024", quality="hd", n=1 )
image_url = response.data[0].url print(f"Generated image: {image_url}")
undefined
response = client.images.generate( model="image-gen", prompt="A futuristic city with flying cars and vertical gardens, cyberpunk style, highly detailed, 4K", size="1024x1024", quality="hd", n=1 )
image_url = response.data[0].url print(f"Generated image: {image_url}")
undefined

Video Generation (Sora)

视频生成(Sora)

python
undefined
python
undefined

Generate video with Sora

Generate video with Sora

response = client.videos.generate( model="sora", prompt="A serene lakeside at sunset with birds flying overhead and gentle waves on the shore", duration=10, # seconds resolution="1080p", fps=30 )
video_url = response.data[0].url print(f"Generated video: {video_url}")
undefined
response = client.videos.generate( model="sora", prompt="A serene lakeside at sunset with birds flying overhead and gentle waves on the shore", duration=10, # seconds resolution="1080p", fps=30 )
video_url = response.data[0].url print(f"Generated video: {video_url}")
undefined

Audio Transcription

音频转录

python
undefined
python
undefined

Transcribe audio file

Transcribe audio file

audio_file = open("meeting_recording.mp3", "rb")
response = client.audio.transcriptions.create( model="gpt-4o-transcribe", file=audio_file, language="en", response_format="verbose_json" )
print(f"Transcription: {response.text}") print(f"Duration: {response.duration}s")
audio_file = open("meeting_recording.mp3", "rb")
response = client.audio.transcriptions.create( model="gpt-4o-transcribe", file=audio_file, language="en", response_format="verbose_json" )
print(f"Transcription: {response.text}") print(f"Duration: {response.duration}s")

Speaker diarization

Speaker diarization

for segment in response.segments: print(f"[{segment.start}s - {segment.end}s] {segment.text}")
undefined
for segment in response.segments: print(f"[{segment.start}s - {segment.end}s] {segment.text}")
undefined

Azure AI Foundry Integration

Azure AI Foundry集成

Model Router (Automatic Model Selection)

模型路由器(自动模型选择)

python
from azure.ai.foundry import ModelRouter
python
from azure.ai.foundry import ModelRouter

Initialize model router

Initialize model router

router = ModelRouter( endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"), credential=os.getenv("AZURE_OPENAI_API_KEY") )
router = ModelRouter( endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"), credential=os.getenv("AZURE_OPENAI_API_KEY") )

Automatically select optimal model

Automatically select optimal model

response = router.complete( prompt="Analyze this complex scientific paper...", optimization_goals=["quality", "cost"], available_models=["gpt-5", "gpt-5-mini", "gpt-4-1"] )
print(f"Selected model: {response.model_used}") print(f"Response: {response.content}") print(f"Cost: ${response.cost}")

**Benefits:**
- Automatic model selection based on prompt complexity
- Balance quality vs cost
- Reduce costs by up to 40% while maintaining quality
response = router.complete( prompt="Analyze this complex scientific paper...", optimization_goals=["quality", "cost"], available_models=["gpt-5", "gpt-5-mini", "gpt-4-1"] )
print(f"Selected model: {response.model_used}") print(f"Response: {response.content}") print(f"Cost: ${response.cost}")

**优势:**
- 根据提示复杂度自动选择最优模型
- 平衡质量与成本
- 在保持质量的同时降低高达40%的成本

Agentic Retrieval (Azure AI Search Integration)

智能检索(Azure AI Search集成)

python
from azure.search.documents import SearchClient
from azure.core.credentials import AzureKeyCredential
python
from azure.search.documents import SearchClient
from azure.core.credentials import AzureKeyCredential

Initialize search client

Initialize search client

search_client = SearchClient( endpoint=os.getenv("SEARCH_ENDPOINT"), index_name="documents", credential=AzureKeyCredential(os.getenv("SEARCH_KEY")) )
search_client = SearchClient( endpoint=os.getenv("SEARCH_ENDPOINT"), index_name="documents", credential=AzureKeyCredential(os.getenv("SEARCH_KEY")) )

Agentic retrieval with Azure OpenAI

Agentic retrieval with Azure OpenAI

response = client.chat.completions.create( model="gpt-5", messages=[ {"role": "system", "content": "You have access to a document search system."}, {"role": "user", "content": "What are the company's revenue projections for Q3?"} ], tools=[{ "type": "function", "function": { "name": "search_documents", "description": "Search company documents", "parameters": { "type": "object", "properties": { "query": {"type": "string", "description": "Search query"} }, "required": ["query"] } } }], tool_choice="auto" )
response = client.chat.completions.create( model="gpt-5", messages=[ {"role": "system", "content": "You have access to a document search system."}, {"role": "user", "content": "What are the company's revenue projections for Q3?"} ], tools=[{ "type": "function", "function": { "name": "search_documents", "description": "Search company documents", "parameters": { "type": "object", "properties": { "query": {"type": "string", "description": "Search query"} }, "required": ["query"] } } }], tool_choice="auto" )

Process tool calls

Process tool calls

if response.choices[0].message.tool_calls: for tool_call in response.choices[0].message.tool_calls: if tool_call.function.name == "search_documents": query = json.loads(tool_call.function.arguments)["query"] results = search_client.search(query) # Feed results back to model for final answer

**Improvements:**
- 40% better on complex, multi-part questions
- Automatic query decomposition
- Relevance ranking
- Citation generation
if response.choices[0].message.tool_calls: for tool_call in response.choices[0].message.tool_calls: if tool_call.function.name == "search_documents": query = json.loads(tool_call.function.arguments)["query"] results = search_client.search(query) # Feed results back to model for final answer

**改进:**
- 针对复杂多部分问题的性能提升40%
- 自动查询分解
- 相关性排序
- 引用生成

Foundry Observability (Preview)

Foundry可观测性(预览版)

python
from azure.ai.foundry import FoundryObservability
python
from azure.ai.foundry import FoundryObservability

Enable observability

Enable observability

observability = FoundryObservability( workspace_id=os.getenv("AI_FOUNDRY_WORKSPACE_ID"), enable_tracing=True, enable_metrics=True )
observability = FoundryObservability( workspace_id=os.getenv("AI_FOUNDRY_WORKSPACE_ID"), enable_tracing=True, enable_metrics=True )

Monitor agent execution

Monitor agent execution

with observability.trace_agent("customer_support_agent") as trace: response = client.chat.completions.create( model="gpt-5", messages=messages )
trace.log_tool_call("search_kb", {"query": "refund policy"})
trace.log_reasoning_step("Retrieved refund policy document")
trace.log_token_usage(response.usage.total_tokens)
with observability.trace_agent("customer_support_agent") as trace: response = client.chat.completions.create( model="gpt-5", messages=messages )
trace.log_tool_call("search_kb", {"query": "refund policy"})
trace.log_reasoning_step("Retrieved refund policy document")
trace.log_token_usage(response.usage.total_tokens)

View in Azure AI Foundry portal:

View in Azure AI Foundry portal:

- End-to-end trace logs

- End-to-end trace logs

- Reasoning steps and tool calls

- Reasoning steps and tool calls

- Performance metrics

- Performance metrics

- Cost analysis

- Cost analysis

undefined
undefined

Capacity and Quota Management

容量与配额管理

Check Quota

检查配额

bash
undefined
bash
undefined

List deployments with usage

List deployments with usage

az cognitiveservices account deployment list
--resource-group MyRG
--name myopenai
--output table
az cognitiveservices account deployment list
--resource-group MyRG
--name myopenai
--output table

Check usage metrics

Check usage metrics

az monitor metrics list
--resource $(az cognitiveservices account show -g MyRG -n myopenai --query id -o tsv)
--metric "TokenTransaction"
--start-time 2025-01-01T00:00:00Z
--end-time 2025-01-31T23:59:59Z
--interval PT1H
--aggregation Total
undefined
az monitor metrics list
--resource $(az cognitiveservices account show -g MyRG -n myopenai --query id -o tsv)
--metric "TokenTransaction"
--start-time 2025-01-01T00:00:00Z
--end-time 2025-01-31T23:59:59Z
--interval PT1H
--aggregation Total
undefined

Update Capacity

更新容量

bash
undefined
bash
undefined

Scale up deployment capacity

Scale up deployment capacity

az cognitiveservices account deployment update
--resource-group MyRG
--name myopenai
--deployment-name gpt-5
--sku-capacity 200
az cognitiveservices account deployment update
--resource-group MyRG
--name myopenai
--deployment-name gpt-5
--sku-capacity 200

Scale down during off-peak

Scale down during off-peak

az cognitiveservices account deployment update
--resource-group MyRG
--name myopenai
--deployment-name gpt-5
--sku-capacity 50
undefined
az cognitiveservices account deployment update
--resource-group MyRG
--name myopenai
--deployment-name gpt-5
--sku-capacity 50
undefined

Request Quota Increase

请求配额增加

  1. Navigate to Azure Portal → Azure OpenAI resource
  2. Go to "Quotas" blade
  3. Select model and region
  4. Click "Request quota increase"
  5. Provide justification and target capacity
  1. 导航至Azure门户 → Azure OpenAI资源
  2. 进入“配额”面板
  3. 选择模型和区域
  4. 点击“请求配额增加”
  5. 提供理由和目标容量

Security and Networking

安全与网络

Private Endpoint

专用端点

bash
undefined
bash
undefined

Create private endpoint

Create private endpoint

az network private-endpoint create
--name openai-private-endpoint
--resource-group MyRG
--vnet-name MyVNet
--subnet PrivateEndpointSubnet
--private-connection-resource-id $(az cognitiveservices account show -g MyRG -n myopenai --query id -o tsv)
--group-id account
--connection-name openai-connection
az network private-endpoint create
--name openai-private-endpoint
--resource-group MyRG
--vnet-name MyVNet
--subnet PrivateEndpointSubnet
--private-connection-resource-id $(az cognitiveservices account show -g MyRG -n myopenai --query id -o tsv)
--group-id account
--connection-name openai-connection

Create private DNS zone

Create private DNS zone

az network private-dns zone create
--resource-group MyRG
--name privatelink.openai.azure.com
az network private-dns zone create
--resource-group MyRG
--name privatelink.openai.azure.com

Link to VNet

Link to VNet

az network private-dns link vnet create
--resource-group MyRG
--zone-name privatelink.openai.azure.com
--name openai-dns-link
--virtual-network MyVNet
--registration-enabled false
az network private-dns link vnet create
--resource-group MyRG
--zone-name privatelink.openai.azure.com
--name openai-dns-link
--virtual-network MyVNet
--registration-enabled false

Create DNS zone group

Create DNS zone group

az network private-endpoint dns-zone-group create
--resource-group MyRG
--endpoint-name openai-private-endpoint
--name default
--private-dns-zone privatelink.openai.azure.com
--zone-name privatelink.openai.azure.com
undefined
az network private-endpoint dns-zone-group create
--resource-group MyRG
--endpoint-name openai-private-endpoint
--name default
--private-dns-zone privatelink.openai.azure.com
--zone-name privatelink.openai.azure.com
undefined

Managed Identity Access

托管身份访问

bash
undefined
bash
undefined

Enable system-assigned identity

Enable system-assigned identity

az cognitiveservices account identity assign
--name myopenai
--resource-group MyRG
az cognitiveservices account identity assign
--name myopenai
--resource-group MyRG

Grant role to managed identity

Grant role to managed identity

PRINCIPAL_ID=$(az cognitiveservices account show -g MyRG -n myopenai --query identity.principalId -o tsv)
az role assignment create
--assignee $PRINCIPAL_ID
--role "Cognitive Services OpenAI User"
--scope /subscriptions/<sub-id>/resourceGroups/MyRG
undefined
PRINCIPAL_ID=$(az cognitiveservices account show -g MyRG -n myopenai --query identity.principalId -o tsv)
az role assignment create
--assignee $PRINCIPAL_ID
--role "Cognitive Services OpenAI User"
--scope /subscriptions/<sub-id>/resourceGroups/MyRG
undefined

Content Filtering

内容过滤

bash
undefined
bash
undefined

Configure content filtering

Configure content filtering

az cognitiveservices account update
--name myopenai
--resource-group MyRG
--set properties.customContentFilter='{ "hate": {"severity": "medium", "enabled": true}, "violence": {"severity": "medium", "enabled": true}, "sexual": {"severity": "medium", "enabled": true}, "selfHarm": {"severity": "high", "enabled": true} }'
undefined
az cognitiveservices account update
--name myopenai
--resource-group MyRG
--set properties.customContentFilter='{ "hate": {"severity": "medium", "enabled": true}, "violence": {"severity": "medium", "enabled": true}, "sexual": {"severity": "medium", "enabled": true}, "selfHarm": {"severity": "high", "enabled": true} }'
undefined

Cost Optimization

成本优化

Model Selection Strategy

模型选择策略

Use GPT-5-mini or GPT-5-nano for:
  • Simple questions
  • Classification tasks
  • Content moderation
  • Summarization
Use GPT-5 or GPT-4.1 for:
  • Complex reasoning
  • Long-form content generation
  • Document analysis
  • Code generation
Use Reasoning Models (o3, o4-mini) for:
  • Mathematical problems
  • Scientific analysis
  • Step-by-step reasoning
  • Logic puzzles
使用GPT-5-mini或GPT-5-nano适用于:
  • 简单问题
  • 分类任务
  • 内容审核
  • 摘要生成
使用GPT-5或GPT-4.1适用于:
  • 复杂推理
  • 长篇内容生成
  • 文档分析
  • 代码生成
使用推理模型(o3、o4-mini)适用于:
  • 数学问题
  • 科学分析
  • 分步推理
  • 逻辑谜题

Implement Caching

实现缓存

python
undefined
python
undefined

Use semantic cache to reduce duplicate requests

Use semantic cache to reduce duplicate requests

from azure.ai.cache import SemanticCache
cache = SemanticCache( similarity_threshold=0.95, ttl_seconds=3600 )
from azure.ai.cache import SemanticCache
cache = SemanticCache( similarity_threshold=0.95, ttl_seconds=3600 )

Check cache before API call

Check cache before API call

cached_response = cache.get(user_query) if cached_response: return cached_response
response = client.chat.completions.create( model="gpt-5", messages=messages )
cache.set(user_query, response)
undefined
cached_response = cache.get(user_query) if cached_response: return cached_response
response = client.chat.completions.create( model="gpt-5", messages=messages )
cache.set(user_query, response)
undefined

Token Management

Token管理

python
import tiktoken
python
import tiktoken

Count tokens before API call

Count tokens before API call

encoding = tiktoken.get_encoding("cl100k_base") tokens = len(encoding.encode(prompt))
if tokens > 100000: print(f"Warning: Prompt has {tokens} tokens, this will be expensive!")
encoding = tiktoken.get_encoding("cl100k_base") tokens = len(encoding.encode(prompt))
if tokens > 100000: print(f"Warning: Prompt has {tokens} tokens, this will be expensive!")

Use shorter max_tokens when appropriate

Use shorter max_tokens when appropriate

response = client.chat.completions.create( model="gpt-5", messages=messages, max_tokens=500 # Limit output tokens )
undefined
response = client.chat.completions.create( model="gpt-5", messages=messages, max_tokens=500 # Limit output tokens )
undefined

Monitoring and Alerts

监控与告警

Set Up Cost Alerts

设置成本告警

bash
undefined
bash
undefined

Create budget alert

Create budget alert

az consumption budget create
--budget-name openai-monthly-budget
--resource-group MyRG
--amount 1000
--category Cost
--time-grain Monthly
--start-date 2025-01-01
--end-date 2025-12-31
--notifications '{ "actual_GreaterThan_80_Percent": { "enabled": true, "operator": "GreaterThan", "threshold": 80, "contactEmails": ["billing@example.com"] } }'
undefined
az consumption budget create
--budget-name openai-monthly-budget
--resource-group MyRG
--amount 1000
--category Cost
--time-grain Monthly
--start-date 2025-01-01
--end-date 2025-12-31
--notifications '{ "actual_GreaterThan_80_Percent": { "enabled": true, "operator": "GreaterThan", "threshold": 80, "contactEmails": ["billing@example.com"] } }'
undefined

Application Insights Integration

Application Insights集成

python
from opencensus.ext.azure.log_exporter import AzureLogHandler
import logging
python
from opencensus.ext.azure.log_exporter import AzureLogHandler
import logging

Configure logging

Configure logging

logger = logging.getLogger(name) logger.addHandler(AzureLogHandler( connection_string=os.getenv("APPLICATIONINSIGHTS_CONNECTION_STRING") ))
logger = logging.getLogger(name) logger.addHandler(AzureLogHandler( connection_string=os.getenv("APPLICATIONINSIGHTS_CONNECTION_STRING") ))

Log API calls

Log API calls

logger.info("OpenAI API call", extra={ "custom_dimensions": { "model": "gpt-5", "tokens": response.usage.total_tokens, "cost": calculate_cost(response.usage.total_tokens), "latency_ms": response.response_ms } })
undefined
logger.info("OpenAI API call", extra={ "custom_dimensions": { "model": "gpt-5", "tokens": response.usage.total_tokens, "cost": calculate_cost(response.usage.total_tokens), "latency_ms": response.response_ms } })
undefined

Best Practices

最佳实践

Use Model Router for automatic cost optimization ✓ Implement caching to reduce duplicate requests ✓ Monitor token usage and set budgets ✓ Use private endpoints for production workloads ✓ Enable managed identity instead of API keys ✓ Configure content filtering for safety ✓ Right-size capacity based on actual demand ✓ Use Foundry Observability for monitoring ✓ Implement retry logic with exponential backoff ✓ Choose appropriate models for task complexity
使用模型路由器实现自动成本优化 ✓ 实现缓存减少重复请求 ✓ 监控token使用并设置预算 ✓ 使用专用端点处理生产工作负载 ✓ 启用托管身份替代API密钥 ✓ 配置内容过滤保障安全 ✓ 根据实际需求调整容量使用Foundry可观测性进行监控 ✓ 实现重试逻辑并使用指数退避 ✓ 根据任务复杂度选择合适的模型

References

参考资料

Azure OpenAI Service with GPT-5 and reasoning models brings enterprise-grade AI to your applications!
Azure OpenAI Service 结合GPT-5和推理模型,为您的应用带来企业级AI能力!