azure-openai-2025
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAzure OpenAI Service - 2025 Models and Features
Azure OpenAI Service - 2025款模型及功能
Complete knowledge base for Azure OpenAI Service with latest 2025 models including GPT-5, GPT-4.1, reasoning models, and Azure AI Foundry integration.
本知识库涵盖Azure OpenAI Service的完整内容,包括2025年最新模型(GPT-5、GPT-4.1、推理模型)以及Azure AI Foundry集成功能。
Overview
概述
Azure OpenAI Service provides REST API access to OpenAI's most powerful models with enterprise-grade security, compliance, and regional availability.
Azure OpenAI Service 提供REST API访问OpenAI最强大的模型,同时具备企业级安全性、合规性和区域可用性。
Latest Models (2025)
2025年最新模型
GPT-5 Series (GA August 2025)
GPT-5系列(2025年8月正式发布)
Registration Required Models:
- : Highest capability, complex reasoning
gpt-5-pro - : Balanced performance and cost
gpt-5 - : Optimized for code generation
gpt-5-codex
No Registration Required:
- : Faster, more affordable
gpt-5-mini - : Ultra-fast for simple tasks
gpt-5-nano - : Optimized for conversational use
gpt-5-chat
需注册的模型:
- :最高性能,支持复杂推理
gpt-5-pro - :性能与成本平衡
gpt-5 - :针对代码生成优化
gpt-5-codex
无需注册的模型:
- :速度更快,成本更低
gpt-5-mini - :超高速,适用于简单任务
gpt-5-nano - :针对对话场景优化
gpt-5-chat
GPT-4.1 Series
GPT-4.1系列
- : 1 million token context window
gpt-4.1 - : Efficient version with 1M context
gpt-4.1-mini - : Fastest variant
gpt-4.1-nano
Key Improvements:
- 1,000,000 token context (vs 128K in GPT-4 Turbo)
- Better instruction following
- Reduced hallucinations
- Improved multilingual support
- :100万token上下文窗口
gpt-4.1 - :高效版本,支持100万上下文
gpt-4.1-mini - :最快变体
gpt-4.1-nano
主要改进:
- 100万token上下文(对比GPT-4 Turbo的128K)
- 更好的指令遵循能力
- 减少幻觉现象
- 改进的多语言支持
Reasoning Models
推理模型
o4-mini: Lightweight reasoning model
- Faster inference
- Lower cost
- Suitable for structured reasoning tasks
o3: Advanced reasoning model
- Complex problem solving
- Mathematical reasoning
- Scientific analysis
o1: Original reasoning model
- General-purpose reasoning
- Step-by-step explanations
o1-mini: Efficient reasoning
- Balanced cost and performance
o4-mini:轻量级推理模型
- 推理速度更快
- 成本更低
- 适用于结构化推理任务
o3:高级推理模型
- 复杂问题解决
- 数学推理
- 科学分析
o1:原始推理模型
- 通用推理
- 分步解释
o1-mini:高效推理
- 成本与性能平衡
Image Generation
图像生成
GPT-image-1 (2025-04-15)
- DALL-E 3 successor
- Higher quality images
- Better prompt understanding
- Improved safety filters
GPT-image-1(2025-04-15)
- DALL-E 3的继任者
- 更高质量的图像
- 更好的提示词理解能力
- 改进的安全过滤器
Video Generation
视频生成
Sora (2025-05-02)
- Text-to-video generation
- Realistic and imaginative scenes
- Up to 60 seconds of video
- Multiple camera angles and styles
Sora(2025-05-02)
- 文本转视频生成
- 逼真且富有想象力的场景
- 最长60秒视频
- 多种拍摄角度和风格
Audio Models
音频模型
gpt-4o-transcribe: Speech-to-text powered by GPT-4o
- High accuracy transcription
- Multiple languages
- Speaker diarization
gpt-4o-mini-transcribe: Faster, more affordable transcription
- Good accuracy
- Lower latency
- Cost-effective
gpt-4o-transcribe:基于GPT-4o的语音转文本
- 高精度转录
- 支持多种语言
- 说话人分离
gpt-4o-mini-transcribe:更快、更经济的转录
- 良好的准确率
- 更低延迟
- 高性价比
Deploying Azure OpenAI
部署Azure OpenAI
Create Azure OpenAI Resource
创建Azure OpenAI资源
bash
undefinedbash
undefinedCreate OpenAI account
Create OpenAI account
az cognitiveservices account create
--name myopenai
--resource-group MyRG
--kind OpenAI
--sku S0
--location eastus
--custom-domain myopenai
--public-network-access Disabled
--identity-type SystemAssigned
--name myopenai
--resource-group MyRG
--kind OpenAI
--sku S0
--location eastus
--custom-domain myopenai
--public-network-access Disabled
--identity-type SystemAssigned
az cognitiveservices account create
--name myopenai
--resource-group MyRG
--kind OpenAI
--sku S0
--location eastus
--custom-domain myopenai
--public-network-access Disabled
--identity-type SystemAssigned
--name myopenai
--resource-group MyRG
--kind OpenAI
--sku S0
--location eastus
--custom-domain myopenai
--public-network-access Disabled
--identity-type SystemAssigned
Get endpoint and key
Get endpoint and key
az cognitiveservices account show
--name myopenai
--resource-group MyRG
--query "properties.endpoint"
--output tsv
--name myopenai
--resource-group MyRG
--query "properties.endpoint"
--output tsv
az cognitiveservices account keys list
--name myopenai
--resource-group MyRG
--query "key1"
--output tsv
--name myopenai
--resource-group MyRG
--query "key1"
--output tsv
undefinedaz cognitiveservices account show
--name myopenai
--resource-group MyRG
--query "properties.endpoint"
--output tsv
--name myopenai
--resource-group MyRG
--query "properties.endpoint"
--output tsv
az cognitiveservices account keys list
--name myopenai
--resource-group MyRG
--query "key1"
--output tsv
--name myopenai
--resource-group MyRG
--query "key1"
--output tsv
undefinedDeploy GPT-5 Model
部署GPT-5模型
bash
undefinedbash
undefinedDeploy gpt-5
Deploy gpt-5
az cognitiveservices account deployment create
--resource-group MyRG
--name myopenai
--deployment-name gpt-5
--model-name gpt-5
--model-version latest
--model-format OpenAI
--sku-name Standard
--sku-capacity 100
--scale-type Standard
--resource-group MyRG
--name myopenai
--deployment-name gpt-5
--model-name gpt-5
--model-version latest
--model-format OpenAI
--sku-name Standard
--sku-capacity 100
--scale-type Standard
az cognitiveservices account deployment create
--resource-group MyRG
--name myopenai
--deployment-name gpt-5
--model-name gpt-5
--model-version latest
--model-format OpenAI
--sku-name Standard
--sku-capacity 100
--scale-type Standard
--resource-group MyRG
--name myopenai
--deployment-name gpt-5
--model-name gpt-5
--model-version latest
--model-format OpenAI
--sku-name Standard
--sku-capacity 100
--scale-type Standard
Deploy gpt-5-pro (requires registration)
Deploy gpt-5-pro (requires registration)
az cognitiveservices account deployment create
--resource-group MyRG
--name myopenai
--deployment-name gpt-5-pro
--model-name gpt-5-pro
--model-version latest
--model-format OpenAI
--sku-name Standard
--sku-capacity 50
--resource-group MyRG
--name myopenai
--deployment-name gpt-5-pro
--model-name gpt-5-pro
--model-version latest
--model-format OpenAI
--sku-name Standard
--sku-capacity 50
undefinedaz cognitiveservices account deployment create
--resource-group MyRG
--name myopenai
--deployment-name gpt-5-pro
--model-name gpt-5-pro
--model-version latest
--model-format OpenAI
--sku-name Standard
--sku-capacity 50
--resource-group MyRG
--name myopenai
--deployment-name gpt-5-pro
--model-name gpt-5-pro
--model-version latest
--model-format OpenAI
--sku-name Standard
--sku-capacity 50
undefinedDeploy Reasoning Models
部署推理模型
bash
undefinedbash
undefinedDeploy o3 reasoning model
Deploy o3 reasoning model
az cognitiveservices account deployment create
--resource-group MyRG
--name myopenai
--deployment-name o3-reasoning
--model-name o3
--model-version latest
--model-format OpenAI
--sku-name Standard
--sku-capacity 50
--resource-group MyRG
--name myopenai
--deployment-name o3-reasoning
--model-name o3
--model-version latest
--model-format OpenAI
--sku-name Standard
--sku-capacity 50
az cognitiveservices account deployment create
--resource-group MyRG
--name myopenai
--deployment-name o3-reasoning
--model-name o3
--model-version latest
--model-format OpenAI
--sku-name Standard
--sku-capacity 50
--resource-group MyRG
--name myopenai
--deployment-name o3-reasoning
--model-name o3
--model-version latest
--model-format OpenAI
--sku-name Standard
--sku-capacity 50
Deploy o4-mini
Deploy o4-mini
az cognitiveservices account deployment create
--resource-group MyRG
--name myopenai
--deployment-name o4-mini
--model-name o4-mini
--model-version latest
--model-format OpenAI
--sku-name Standard
--sku-capacity 100
--resource-group MyRG
--name myopenai
--deployment-name o4-mini
--model-name o4-mini
--model-version latest
--model-format OpenAI
--sku-name Standard
--sku-capacity 100
undefinedaz cognitiveservices account deployment create
--resource-group MyRG
--name myopenai
--deployment-name o4-mini
--model-name o4-mini
--model-version latest
--model-format OpenAI
--sku-name Standard
--sku-capacity 100
--resource-group MyRG
--name myopenai
--deployment-name o4-mini
--model-name o4-mini
--model-version latest
--model-format OpenAI
--sku-name Standard
--sku-capacity 100
undefinedDeploy GPT-4.1 with 1M Context
部署支持100万上下文的GPT-4.1
bash
az cognitiveservices account deployment create \
--resource-group MyRG \
--name myopenai \
--deployment-name gpt-4-1 \
--model-name gpt-4.1 \
--model-version latest \
--model-format OpenAI \
--sku-name Standard \
--sku-capacity 100bash
az cognitiveservices account deployment create \
--resource-group MyRG \
--name myopenai \
--deployment-name gpt-4-1 \
--model-name gpt-4.1 \
--model-version latest \
--model-format OpenAI \
--sku-name Standard \
--sku-capacity 100Deploy Image Generation Model
部署图像生成模型
bash
az cognitiveservices account deployment create \
--resource-group MyRG \
--name myopenai \
--deployment-name image-gen \
--model-name gpt-image-1 \
--model-version 2025-04-15 \
--model-format OpenAI \
--sku-name Standard \
--sku-capacity 10bash
az cognitiveservices account deployment create \
--resource-group MyRG \
--name myopenai \
--deployment-name image-gen \
--model-name gpt-image-1 \
--model-version 2025-04-15 \
--model-format OpenAI \
--sku-name Standard \
--sku-capacity 10Deploy Sora Video Generation
部署Sora视频生成
bash
az cognitiveservices account deployment create \
--resource-group MyRG \
--name myopenai \
--deployment-name sora \
--model-name sora \
--model-version 2025-05-02 \
--model-format OpenAI \
--sku-name Standard \
--sku-capacity 5bash
az cognitiveservices account deployment create \
--resource-group MyRG \
--name myopenai \
--deployment-name sora \
--model-name sora \
--model-version 2025-05-02 \
--model-format OpenAI \
--sku-name Standard \
--sku-capacity 5Using Azure OpenAI Models
使用Azure OpenAI模型
Python SDK (GPT-5)
Python SDK(GPT-5)
python
from openai import AzureOpenAI
import ospython
from openai import AzureOpenAI
import osInitialize client
Initialize client
client = AzureOpenAI(
api_key=os.getenv("AZURE_OPENAI_API_KEY"),
api_version="2025-02-01-preview",
azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT")
)
client = AzureOpenAI(
api_key=os.getenv("AZURE_OPENAI_API_KEY"),
api_version="2025-02-01-preview",
azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT")
)
GPT-5 completion
GPT-5 completion
response = client.chat.completions.create(
model="gpt-5", # deployment name
messages=[
{"role": "system", "content": "You are a helpful AI assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
max_tokens=1000,
temperature=0.7,
top_p=0.95
)
print(response.choices[0].message.content)
undefinedresponse = client.chat.completions.create(
model="gpt-5", # deployment name
messages=[
{"role": "system", "content": "You are a helpful AI assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
max_tokens=1000,
temperature=0.7,
top_p=0.95
)
print(response.choices[0].message.content)
undefinedPython SDK (o3 Reasoning Model)
Python SDK(o3推理模型)
python
undefinedpython
undefinedo3 reasoning with chain-of-thought
o3 reasoning with chain-of-thought
response = client.chat.completions.create(
model="o3-reasoning",
messages=[
{"role": "system", "content": "You are an expert problem solver. Show your reasoning step-by-step."},
{"role": "user", "content": "If a train travels 120 km in 2 hours, then speeds up to travel 180 km in the next 2 hours, what is the average speed for the entire journey?"}
],
max_tokens=2000,
temperature=0.2 # Lower temperature for reasoning tasks
)
print(response.choices[0].message.content)
undefinedresponse = client.chat.completions.create(
model="o3-reasoning",
messages=[
{"role": "system", "content": "You are an expert problem solver. Show your reasoning step-by-step."},
{"role": "user", "content": "If a train travels 120 km in 2 hours, then speeds up to travel 180 km in the next 2 hours, what is the average speed for the entire journey?"}
],
max_tokens=2000,
temperature=0.2 # Lower temperature for reasoning tasks
)
print(response.choices[0].message.content)
undefinedPython SDK (GPT-4.1 with 1M Context)
Python SDK(支持100万上下文的GPT-4.1)
python
undefinedpython
undefinedRead a large document
Read a large document
with open('large_document.txt', 'r') as f:
document = f.read()
with open('large_document.txt', 'r') as f:
document = f.read()
GPT-4.1 can handle up to 1M tokens
GPT-4.1 can handle up to 1M tokens
response = client.chat.completions.create(
model="gpt-4-1",
messages=[
{"role": "system", "content": "You are a document analysis expert."},
{"role": "user", "content": f"Analyze this document and provide key insights:\n\n{document}"}
],
max_tokens=4000
)
print(response.choices[0].message.content)
undefinedresponse = client.chat.completions.create(
model="gpt-4-1",
messages=[
{"role": "system", "content": "You are a document analysis expert."},
{"role": "user", "content": f"Analyze this document and provide key insights:\n\n{document}"}
],
max_tokens=4000
)
print(response.choices[0].message.content)
undefinedImage Generation (GPT-image-1)
图像生成(GPT-image-1)
python
undefinedpython
undefinedGenerate image with DALL-E 3 successor
Generate image with DALL-E 3 successor
response = client.images.generate(
model="image-gen",
prompt="A futuristic city with flying cars and vertical gardens, cyberpunk style, highly detailed, 4K",
size="1024x1024",
quality="hd",
n=1
)
image_url = response.data[0].url
print(f"Generated image: {image_url}")
undefinedresponse = client.images.generate(
model="image-gen",
prompt="A futuristic city with flying cars and vertical gardens, cyberpunk style, highly detailed, 4K",
size="1024x1024",
quality="hd",
n=1
)
image_url = response.data[0].url
print(f"Generated image: {image_url}")
undefinedVideo Generation (Sora)
视频生成(Sora)
python
undefinedpython
undefinedGenerate video with Sora
Generate video with Sora
response = client.videos.generate(
model="sora",
prompt="A serene lakeside at sunset with birds flying overhead and gentle waves on the shore",
duration=10, # seconds
resolution="1080p",
fps=30
)
video_url = response.data[0].url
print(f"Generated video: {video_url}")
undefinedresponse = client.videos.generate(
model="sora",
prompt="A serene lakeside at sunset with birds flying overhead and gentle waves on the shore",
duration=10, # seconds
resolution="1080p",
fps=30
)
video_url = response.data[0].url
print(f"Generated video: {video_url}")
undefinedAudio Transcription
音频转录
python
undefinedpython
undefinedTranscribe audio file
Transcribe audio file
audio_file = open("meeting_recording.mp3", "rb")
response = client.audio.transcriptions.create(
model="gpt-4o-transcribe",
file=audio_file,
language="en",
response_format="verbose_json"
)
print(f"Transcription: {response.text}")
print(f"Duration: {response.duration}s")
audio_file = open("meeting_recording.mp3", "rb")
response = client.audio.transcriptions.create(
model="gpt-4o-transcribe",
file=audio_file,
language="en",
response_format="verbose_json"
)
print(f"Transcription: {response.text}")
print(f"Duration: {response.duration}s")
Speaker diarization
Speaker diarization
for segment in response.segments:
print(f"[{segment.start}s - {segment.end}s] {segment.text}")
undefinedfor segment in response.segments:
print(f"[{segment.start}s - {segment.end}s] {segment.text}")
undefinedAzure AI Foundry Integration
Azure AI Foundry集成
Model Router (Automatic Model Selection)
模型路由器(自动模型选择)
python
from azure.ai.foundry import ModelRouterpython
from azure.ai.foundry import ModelRouterInitialize model router
Initialize model router
router = ModelRouter(
endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
credential=os.getenv("AZURE_OPENAI_API_KEY")
)
router = ModelRouter(
endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
credential=os.getenv("AZURE_OPENAI_API_KEY")
)
Automatically select optimal model
Automatically select optimal model
response = router.complete(
prompt="Analyze this complex scientific paper...",
optimization_goals=["quality", "cost"],
available_models=["gpt-5", "gpt-5-mini", "gpt-4-1"]
)
print(f"Selected model: {response.model_used}")
print(f"Response: {response.content}")
print(f"Cost: ${response.cost}")
**Benefits:**
- Automatic model selection based on prompt complexity
- Balance quality vs cost
- Reduce costs by up to 40% while maintaining qualityresponse = router.complete(
prompt="Analyze this complex scientific paper...",
optimization_goals=["quality", "cost"],
available_models=["gpt-5", "gpt-5-mini", "gpt-4-1"]
)
print(f"Selected model: {response.model_used}")
print(f"Response: {response.content}")
print(f"Cost: ${response.cost}")
**优势:**
- 根据提示复杂度自动选择最优模型
- 平衡质量与成本
- 在保持质量的同时降低高达40%的成本Agentic Retrieval (Azure AI Search Integration)
智能检索(Azure AI Search集成)
python
from azure.search.documents import SearchClient
from azure.core.credentials import AzureKeyCredentialpython
from azure.search.documents import SearchClient
from azure.core.credentials import AzureKeyCredentialInitialize search client
Initialize search client
search_client = SearchClient(
endpoint=os.getenv("SEARCH_ENDPOINT"),
index_name="documents",
credential=AzureKeyCredential(os.getenv("SEARCH_KEY"))
)
search_client = SearchClient(
endpoint=os.getenv("SEARCH_ENDPOINT"),
index_name="documents",
credential=AzureKeyCredential(os.getenv("SEARCH_KEY"))
)
Agentic retrieval with Azure OpenAI
Agentic retrieval with Azure OpenAI
response = client.chat.completions.create(
model="gpt-5",
messages=[
{"role": "system", "content": "You have access to a document search system."},
{"role": "user", "content": "What are the company's revenue projections for Q3?"}
],
tools=[{
"type": "function",
"function": {
"name": "search_documents",
"description": "Search company documents",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query"}
},
"required": ["query"]
}
}
}],
tool_choice="auto"
)
response = client.chat.completions.create(
model="gpt-5",
messages=[
{"role": "system", "content": "You have access to a document search system."},
{"role": "user", "content": "What are the company's revenue projections for Q3?"}
],
tools=[{
"type": "function",
"function": {
"name": "search_documents",
"description": "Search company documents",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query"}
},
"required": ["query"]
}
}
}],
tool_choice="auto"
)
Process tool calls
Process tool calls
if response.choices[0].message.tool_calls:
for tool_call in response.choices[0].message.tool_calls:
if tool_call.function.name == "search_documents":
query = json.loads(tool_call.function.arguments)["query"]
results = search_client.search(query)
# Feed results back to model for final answer
**Improvements:**
- 40% better on complex, multi-part questions
- Automatic query decomposition
- Relevance ranking
- Citation generationif response.choices[0].message.tool_calls:
for tool_call in response.choices[0].message.tool_calls:
if tool_call.function.name == "search_documents":
query = json.loads(tool_call.function.arguments)["query"]
results = search_client.search(query)
# Feed results back to model for final answer
**改进:**
- 针对复杂多部分问题的性能提升40%
- 自动查询分解
- 相关性排序
- 引用生成Foundry Observability (Preview)
Foundry可观测性(预览版)
python
from azure.ai.foundry import FoundryObservabilitypython
from azure.ai.foundry import FoundryObservabilityEnable observability
Enable observability
observability = FoundryObservability(
workspace_id=os.getenv("AI_FOUNDRY_WORKSPACE_ID"),
enable_tracing=True,
enable_metrics=True
)
observability = FoundryObservability(
workspace_id=os.getenv("AI_FOUNDRY_WORKSPACE_ID"),
enable_tracing=True,
enable_metrics=True
)
Monitor agent execution
Monitor agent execution
with observability.trace_agent("customer_support_agent") as trace:
response = client.chat.completions.create(
model="gpt-5",
messages=messages
)
trace.log_tool_call("search_kb", {"query": "refund policy"})
trace.log_reasoning_step("Retrieved refund policy document")
trace.log_token_usage(response.usage.total_tokens)with observability.trace_agent("customer_support_agent") as trace:
response = client.chat.completions.create(
model="gpt-5",
messages=messages
)
trace.log_tool_call("search_kb", {"query": "refund policy"})
trace.log_reasoning_step("Retrieved refund policy document")
trace.log_token_usage(response.usage.total_tokens)View in Azure AI Foundry portal:
View in Azure AI Foundry portal:
- End-to-end trace logs
- End-to-end trace logs
- Reasoning steps and tool calls
- Reasoning steps and tool calls
- Performance metrics
- Performance metrics
- Cost analysis
- Cost analysis
undefinedundefinedCapacity and Quota Management
容量与配额管理
Check Quota
检查配额
bash
undefinedbash
undefinedList deployments with usage
List deployments with usage
az cognitiveservices account deployment list
--resource-group MyRG
--name myopenai
--output table
--resource-group MyRG
--name myopenai
--output table
az cognitiveservices account deployment list
--resource-group MyRG
--name myopenai
--output table
--resource-group MyRG
--name myopenai
--output table
Check usage metrics
Check usage metrics
az monitor metrics list
--resource $(az cognitiveservices account show -g MyRG -n myopenai --query id -o tsv)
--metric "TokenTransaction"
--start-time 2025-01-01T00:00:00Z
--end-time 2025-01-31T23:59:59Z
--interval PT1H
--aggregation Total
--resource $(az cognitiveservices account show -g MyRG -n myopenai --query id -o tsv)
--metric "TokenTransaction"
--start-time 2025-01-01T00:00:00Z
--end-time 2025-01-31T23:59:59Z
--interval PT1H
--aggregation Total
undefinedaz monitor metrics list
--resource $(az cognitiveservices account show -g MyRG -n myopenai --query id -o tsv)
--metric "TokenTransaction"
--start-time 2025-01-01T00:00:00Z
--end-time 2025-01-31T23:59:59Z
--interval PT1H
--aggregation Total
--resource $(az cognitiveservices account show -g MyRG -n myopenai --query id -o tsv)
--metric "TokenTransaction"
--start-time 2025-01-01T00:00:00Z
--end-time 2025-01-31T23:59:59Z
--interval PT1H
--aggregation Total
undefinedUpdate Capacity
更新容量
bash
undefinedbash
undefinedScale up deployment capacity
Scale up deployment capacity
az cognitiveservices account deployment update
--resource-group MyRG
--name myopenai
--deployment-name gpt-5
--sku-capacity 200
--resource-group MyRG
--name myopenai
--deployment-name gpt-5
--sku-capacity 200
az cognitiveservices account deployment update
--resource-group MyRG
--name myopenai
--deployment-name gpt-5
--sku-capacity 200
--resource-group MyRG
--name myopenai
--deployment-name gpt-5
--sku-capacity 200
Scale down during off-peak
Scale down during off-peak
az cognitiveservices account deployment update
--resource-group MyRG
--name myopenai
--deployment-name gpt-5
--sku-capacity 50
--resource-group MyRG
--name myopenai
--deployment-name gpt-5
--sku-capacity 50
undefinedaz cognitiveservices account deployment update
--resource-group MyRG
--name myopenai
--deployment-name gpt-5
--sku-capacity 50
--resource-group MyRG
--name myopenai
--deployment-name gpt-5
--sku-capacity 50
undefinedRequest Quota Increase
请求配额增加
- Navigate to Azure Portal → Azure OpenAI resource
- Go to "Quotas" blade
- Select model and region
- Click "Request quota increase"
- Provide justification and target capacity
- 导航至Azure门户 → Azure OpenAI资源
- 进入“配额”面板
- 选择模型和区域
- 点击“请求配额增加”
- 提供理由和目标容量
Security and Networking
安全与网络
Private Endpoint
专用端点
bash
undefinedbash
undefinedCreate private endpoint
Create private endpoint
az network private-endpoint create
--name openai-private-endpoint
--resource-group MyRG
--vnet-name MyVNet
--subnet PrivateEndpointSubnet
--private-connection-resource-id $(az cognitiveservices account show -g MyRG -n myopenai --query id -o tsv)
--group-id account
--connection-name openai-connection
--name openai-private-endpoint
--resource-group MyRG
--vnet-name MyVNet
--subnet PrivateEndpointSubnet
--private-connection-resource-id $(az cognitiveservices account show -g MyRG -n myopenai --query id -o tsv)
--group-id account
--connection-name openai-connection
az network private-endpoint create
--name openai-private-endpoint
--resource-group MyRG
--vnet-name MyVNet
--subnet PrivateEndpointSubnet
--private-connection-resource-id $(az cognitiveservices account show -g MyRG -n myopenai --query id -o tsv)
--group-id account
--connection-name openai-connection
--name openai-private-endpoint
--resource-group MyRG
--vnet-name MyVNet
--subnet PrivateEndpointSubnet
--private-connection-resource-id $(az cognitiveservices account show -g MyRG -n myopenai --query id -o tsv)
--group-id account
--connection-name openai-connection
Create private DNS zone
Create private DNS zone
az network private-dns zone create
--resource-group MyRG
--name privatelink.openai.azure.com
--resource-group MyRG
--name privatelink.openai.azure.com
az network private-dns zone create
--resource-group MyRG
--name privatelink.openai.azure.com
--resource-group MyRG
--name privatelink.openai.azure.com
Link to VNet
Link to VNet
az network private-dns link vnet create
--resource-group MyRG
--zone-name privatelink.openai.azure.com
--name openai-dns-link
--virtual-network MyVNet
--registration-enabled false
--resource-group MyRG
--zone-name privatelink.openai.azure.com
--name openai-dns-link
--virtual-network MyVNet
--registration-enabled false
az network private-dns link vnet create
--resource-group MyRG
--zone-name privatelink.openai.azure.com
--name openai-dns-link
--virtual-network MyVNet
--registration-enabled false
--resource-group MyRG
--zone-name privatelink.openai.azure.com
--name openai-dns-link
--virtual-network MyVNet
--registration-enabled false
Create DNS zone group
Create DNS zone group
az network private-endpoint dns-zone-group create
--resource-group MyRG
--endpoint-name openai-private-endpoint
--name default
--private-dns-zone privatelink.openai.azure.com
--zone-name privatelink.openai.azure.com
--resource-group MyRG
--endpoint-name openai-private-endpoint
--name default
--private-dns-zone privatelink.openai.azure.com
--zone-name privatelink.openai.azure.com
undefinedaz network private-endpoint dns-zone-group create
--resource-group MyRG
--endpoint-name openai-private-endpoint
--name default
--private-dns-zone privatelink.openai.azure.com
--zone-name privatelink.openai.azure.com
--resource-group MyRG
--endpoint-name openai-private-endpoint
--name default
--private-dns-zone privatelink.openai.azure.com
--zone-name privatelink.openai.azure.com
undefinedManaged Identity Access
托管身份访问
bash
undefinedbash
undefinedEnable system-assigned identity
Enable system-assigned identity
az cognitiveservices account identity assign
--name myopenai
--resource-group MyRG
--name myopenai
--resource-group MyRG
az cognitiveservices account identity assign
--name myopenai
--resource-group MyRG
--name myopenai
--resource-group MyRG
Grant role to managed identity
Grant role to managed identity
PRINCIPAL_ID=$(az cognitiveservices account show -g MyRG -n myopenai --query identity.principalId -o tsv)
az role assignment create
--assignee $PRINCIPAL_ID
--role "Cognitive Services OpenAI User"
--scope /subscriptions/<sub-id>/resourceGroups/MyRG
--assignee $PRINCIPAL_ID
--role "Cognitive Services OpenAI User"
--scope /subscriptions/<sub-id>/resourceGroups/MyRG
undefinedPRINCIPAL_ID=$(az cognitiveservices account show -g MyRG -n myopenai --query identity.principalId -o tsv)
az role assignment create
--assignee $PRINCIPAL_ID
--role "Cognitive Services OpenAI User"
--scope /subscriptions/<sub-id>/resourceGroups/MyRG
--assignee $PRINCIPAL_ID
--role "Cognitive Services OpenAI User"
--scope /subscriptions/<sub-id>/resourceGroups/MyRG
undefinedContent Filtering
内容过滤
bash
undefinedbash
undefinedConfigure content filtering
Configure content filtering
az cognitiveservices account update
--name myopenai
--resource-group MyRG
--set properties.customContentFilter='{ "hate": {"severity": "medium", "enabled": true}, "violence": {"severity": "medium", "enabled": true}, "sexual": {"severity": "medium", "enabled": true}, "selfHarm": {"severity": "high", "enabled": true} }'
--name myopenai
--resource-group MyRG
--set properties.customContentFilter='{ "hate": {"severity": "medium", "enabled": true}, "violence": {"severity": "medium", "enabled": true}, "sexual": {"severity": "medium", "enabled": true}, "selfHarm": {"severity": "high", "enabled": true} }'
undefinedaz cognitiveservices account update
--name myopenai
--resource-group MyRG
--set properties.customContentFilter='{ "hate": {"severity": "medium", "enabled": true}, "violence": {"severity": "medium", "enabled": true}, "sexual": {"severity": "medium", "enabled": true}, "selfHarm": {"severity": "high", "enabled": true} }'
--name myopenai
--resource-group MyRG
--set properties.customContentFilter='{ "hate": {"severity": "medium", "enabled": true}, "violence": {"severity": "medium", "enabled": true}, "sexual": {"severity": "medium", "enabled": true}, "selfHarm": {"severity": "high", "enabled": true} }'
undefinedCost Optimization
成本优化
Model Selection Strategy
模型选择策略
Use GPT-5-mini or GPT-5-nano for:
- Simple questions
- Classification tasks
- Content moderation
- Summarization
Use GPT-5 or GPT-4.1 for:
- Complex reasoning
- Long-form content generation
- Document analysis
- Code generation
Use Reasoning Models (o3, o4-mini) for:
- Mathematical problems
- Scientific analysis
- Step-by-step reasoning
- Logic puzzles
使用GPT-5-mini或GPT-5-nano适用于:
- 简单问题
- 分类任务
- 内容审核
- 摘要生成
使用GPT-5或GPT-4.1适用于:
- 复杂推理
- 长篇内容生成
- 文档分析
- 代码生成
使用推理模型(o3、o4-mini)适用于:
- 数学问题
- 科学分析
- 分步推理
- 逻辑谜题
Implement Caching
实现缓存
python
undefinedpython
undefinedUse semantic cache to reduce duplicate requests
Use semantic cache to reduce duplicate requests
from azure.ai.cache import SemanticCache
cache = SemanticCache(
similarity_threshold=0.95,
ttl_seconds=3600
)
from azure.ai.cache import SemanticCache
cache = SemanticCache(
similarity_threshold=0.95,
ttl_seconds=3600
)
Check cache before API call
Check cache before API call
cached_response = cache.get(user_query)
if cached_response:
return cached_response
response = client.chat.completions.create(
model="gpt-5",
messages=messages
)
cache.set(user_query, response)
undefinedcached_response = cache.get(user_query)
if cached_response:
return cached_response
response = client.chat.completions.create(
model="gpt-5",
messages=messages
)
cache.set(user_query, response)
undefinedToken Management
Token管理
python
import tiktokenpython
import tiktokenCount tokens before API call
Count tokens before API call
encoding = tiktoken.get_encoding("cl100k_base")
tokens = len(encoding.encode(prompt))
if tokens > 100000:
print(f"Warning: Prompt has {tokens} tokens, this will be expensive!")
encoding = tiktoken.get_encoding("cl100k_base")
tokens = len(encoding.encode(prompt))
if tokens > 100000:
print(f"Warning: Prompt has {tokens} tokens, this will be expensive!")
Use shorter max_tokens when appropriate
Use shorter max_tokens when appropriate
response = client.chat.completions.create(
model="gpt-5",
messages=messages,
max_tokens=500 # Limit output tokens
)
undefinedresponse = client.chat.completions.create(
model="gpt-5",
messages=messages,
max_tokens=500 # Limit output tokens
)
undefinedMonitoring and Alerts
监控与告警
Set Up Cost Alerts
设置成本告警
bash
undefinedbash
undefinedCreate budget alert
Create budget alert
az consumption budget create
--budget-name openai-monthly-budget
--resource-group MyRG
--amount 1000
--category Cost
--time-grain Monthly
--start-date 2025-01-01
--end-date 2025-12-31
--notifications '{ "actual_GreaterThan_80_Percent": { "enabled": true, "operator": "GreaterThan", "threshold": 80, "contactEmails": ["billing@example.com"] } }'
--budget-name openai-monthly-budget
--resource-group MyRG
--amount 1000
--category Cost
--time-grain Monthly
--start-date 2025-01-01
--end-date 2025-12-31
--notifications '{ "actual_GreaterThan_80_Percent": { "enabled": true, "operator": "GreaterThan", "threshold": 80, "contactEmails": ["billing@example.com"] } }'
undefinedaz consumption budget create
--budget-name openai-monthly-budget
--resource-group MyRG
--amount 1000
--category Cost
--time-grain Monthly
--start-date 2025-01-01
--end-date 2025-12-31
--notifications '{ "actual_GreaterThan_80_Percent": { "enabled": true, "operator": "GreaterThan", "threshold": 80, "contactEmails": ["billing@example.com"] } }'
--budget-name openai-monthly-budget
--resource-group MyRG
--amount 1000
--category Cost
--time-grain Monthly
--start-date 2025-01-01
--end-date 2025-12-31
--notifications '{ "actual_GreaterThan_80_Percent": { "enabled": true, "operator": "GreaterThan", "threshold": 80, "contactEmails": ["billing@example.com"] } }'
undefinedApplication Insights Integration
Application Insights集成
python
from opencensus.ext.azure.log_exporter import AzureLogHandler
import loggingpython
from opencensus.ext.azure.log_exporter import AzureLogHandler
import loggingConfigure logging
Configure logging
logger = logging.getLogger(name)
logger.addHandler(AzureLogHandler(
connection_string=os.getenv("APPLICATIONINSIGHTS_CONNECTION_STRING")
))
logger = logging.getLogger(name)
logger.addHandler(AzureLogHandler(
connection_string=os.getenv("APPLICATIONINSIGHTS_CONNECTION_STRING")
))
Log API calls
Log API calls
logger.info("OpenAI API call", extra={
"custom_dimensions": {
"model": "gpt-5",
"tokens": response.usage.total_tokens,
"cost": calculate_cost(response.usage.total_tokens),
"latency_ms": response.response_ms
}
})
undefinedlogger.info("OpenAI API call", extra={
"custom_dimensions": {
"model": "gpt-5",
"tokens": response.usage.total_tokens,
"cost": calculate_cost(response.usage.total_tokens),
"latency_ms": response.response_ms
}
})
undefinedBest Practices
最佳实践
✓ Use Model Router for automatic cost optimization
✓ Implement caching to reduce duplicate requests
✓ Monitor token usage and set budgets
✓ Use private endpoints for production workloads
✓ Enable managed identity instead of API keys
✓ Configure content filtering for safety
✓ Right-size capacity based on actual demand
✓ Use Foundry Observability for monitoring
✓ Implement retry logic with exponential backoff
✓ Choose appropriate models for task complexity
✓ 使用模型路由器实现自动成本优化
✓ 实现缓存减少重复请求
✓ 监控token使用并设置预算
✓ 使用专用端点处理生产工作负载
✓ 启用托管身份替代API密钥
✓ 配置内容过滤保障安全
✓ 根据实际需求调整容量
✓ 使用Foundry可观测性进行监控
✓ 实现重试逻辑并使用指数退避
✓ 根据任务复杂度选择合适的模型
References
参考资料
- Azure OpenAI Documentation
- What's New in Azure OpenAI
- GPT-5 Announcement
- Azure AI Foundry
- Model Pricing
Azure OpenAI Service with GPT-5 and reasoning models brings enterprise-grade AI to your applications!
- Azure OpenAI Documentation
- What's New in Azure OpenAI
- GPT-5 Announcement
- Azure AI Foundry
- Model Pricing
Azure OpenAI Service 结合GPT-5和推理模型,为您的应用带来企业级AI能力!