Azure OpenAI Service - 2025 Models and Features

Azure OpenAI Service - 2025款模型及功能

Complete knowledge base for Azure OpenAI Service with latest 2025 models including GPT-5, GPT-4.1, reasoning models, and Azure AI Foundry integration.

本知识库涵盖Azure OpenAI Service的完整内容，包括2025年最新模型（GPT-5、GPT-4.1、推理模型）以及Azure AI Foundry集成功能。

Overview

概述

Azure OpenAI Service provides REST API access to OpenAI's most powerful models with enterprise-grade security, compliance, and regional availability.

Azure OpenAI Service 提供REST API访问OpenAI最强大的模型，同时具备企业级安全性、合规性和区域可用性。

Latest Models (2025)

2025年最新模型

GPT-5 Series (GA August 2025)

GPT-5系列（2025年8月正式发布）

Registration Required Models:

```
gpt-5-pro
```
: Highest capability, complex reasoning
```
gpt-5
```
: Balanced performance and cost
```
gpt-5-codex
```
: Optimized for code generation

No Registration Required:

```
gpt-5-mini
```
: Faster, more affordable
```
gpt-5-nano
```
: Ultra-fast for simple tasks
```
gpt-5-chat
```
: Optimized for conversational use

需注册的模型：

```
gpt-5-pro
```
：最高性能，支持复杂推理
```
gpt-5
```
：性能与成本平衡
```
gpt-5-codex
```
：针对代码生成优化

无需注册的模型：

```
gpt-5-mini
```
：速度更快，成本更低
```
gpt-5-nano
```
：超高速，适用于简单任务
```
gpt-5-chat
```
：针对对话场景优化

GPT-4.1 Series

GPT-4.1系列

```
gpt-4.1
```
: 1 million token context window
```
gpt-4.1-mini
```
: Efficient version with 1M context
```
gpt-4.1-nano
```
: Fastest variant

Key Improvements:

1,000,000 token context (vs 128K in GPT-4 Turbo)
Better instruction following
Reduced hallucinations
Improved multilingual support

```
gpt-4.1
```
：100万token上下文窗口
```
gpt-4.1-mini
```
：高效版本，支持100万上下文
```
gpt-4.1-nano
```
：最快变体

主要改进：

100万token上下文（对比GPT-4 Turbo的128K）
更好的指令遵循能力
减少幻觉现象
改进的多语言支持

Reasoning Models

推理模型

o4-mini: Lightweight reasoning model

Faster inference
Lower cost
Suitable for structured reasoning tasks

o3: Advanced reasoning model

Complex problem solving
Mathematical reasoning
Scientific analysis

o1: Original reasoning model

General-purpose reasoning
Step-by-step explanations

o1-mini: Efficient reasoning

Balanced cost and performance

o4-mini：轻量级推理模型

推理速度更快
成本更低
适用于结构化推理任务

o3：高级推理模型

复杂问题解决
数学推理
科学分析

o1：原始推理模型

通用推理
分步解释

o1-mini：高效推理

成本与性能平衡

Image Generation

图像生成

GPT-image-1 (2025-04-15)

DALL-E 3 successor
Higher quality images
Better prompt understanding
Improved safety filters

GPT-image-1（2025-04-15）

DALL-E 3的继任者
更高质量的图像
更好的提示词理解能力
改进的安全过滤器

Video Generation

视频生成

Sora (2025-05-02)

Text-to-video generation
Realistic and imaginative scenes
Up to 60 seconds of video
Multiple camera angles and styles

Sora（2025-05-02）

文本转视频生成
逼真且富有想象力的场景
最长60秒视频
多种拍摄角度和风格

Audio Models

音频模型

gpt-4o-transcribe: Speech-to-text powered by GPT-4o

High accuracy transcription
Multiple languages
Speaker diarization

gpt-4o-mini-transcribe: Faster, more affordable transcription

Good accuracy
Lower latency
Cost-effective

gpt-4o-transcribe：基于GPT-4o的语音转文本

高精度转录
支持多种语言
说话人分离

gpt-4o-mini-transcribe：更快、更经济的转录

良好的准确率
更低延迟
高性价比

Deploying Azure OpenAI

部署Azure OpenAI

Create Azure OpenAI Resource

创建Azure OpenAI资源

bash

undefined

bash

undefined

Create OpenAI account

az cognitiveservices account create
--name myopenai
--resource-group MyRG
--kind OpenAI
--sku S0
--location eastus
--custom-domain myopenai
--public-network-access Disabled
--identity-type SystemAssigned

Get endpoint and key

az cognitiveservices account show
--name myopenai
--resource-group MyRG
--query "properties.endpoint"
--output tsv

az cognitiveservices account keys list
--name myopenai
--resource-group MyRG
--query "key1"
--output tsv

undefined

az cognitiveservices account show
--name myopenai
--resource-group MyRG
--query "properties.endpoint"
--output tsv

az cognitiveservices account keys list
--name myopenai
--resource-group MyRG
--query "key1"
--output tsv

undefined

Deploy GPT-5 Model

部署GPT-5模型

bash

undefined

bash

undefined

Deploy gpt-5

az cognitiveservices account deployment create
--resource-group MyRG
--name myopenai
--deployment-name gpt-5
--model-name gpt-5
--model-version latest
--model-format OpenAI
--sku-name Standard
--sku-capacity 100
--scale-type Standard

Deploy gpt-5-pro (requires registration)

az cognitiveservices account deployment create
--resource-group MyRG
--name myopenai
--deployment-name gpt-5-pro
--model-name gpt-5-pro
--model-version latest
--model-format OpenAI
--sku-name Standard
--sku-capacity 50

undefined

az cognitiveservices account deployment create
--resource-group MyRG
--name myopenai
--deployment-name gpt-5-pro
--model-name gpt-5-pro
--model-version latest
--model-format OpenAI
--sku-name Standard
--sku-capacity 50

undefined

Deploy Reasoning Models

部署推理模型

bash

undefined

bash

undefined

Deploy o3 reasoning model

az cognitiveservices account deployment create
--resource-group MyRG
--name myopenai
--deployment-name o3-reasoning
--model-name o3
--model-version latest
--model-format OpenAI
--sku-name Standard
--sku-capacity 50

Deploy o4-mini

az cognitiveservices account deployment create
--resource-group MyRG
--name myopenai
--deployment-name o4-mini
--model-name o4-mini
--model-version latest
--model-format OpenAI
--sku-name Standard
--sku-capacity 100

undefined

az cognitiveservices account deployment create
--resource-group MyRG
--name myopenai
--deployment-name o4-mini
--model-name o4-mini
--model-version latest
--model-format OpenAI
--sku-name Standard
--sku-capacity 100

undefined

Deploy GPT-4.1 with 1M Context

部署支持100万上下文的GPT-4.1

bash

az cognitiveservices account deployment create \
  --resource-group MyRG \
  --name myopenai \
  --deployment-name gpt-4-1 \
  --model-name gpt-4.1 \
  --model-version latest \
  --model-format OpenAI \
  --sku-name Standard \
  --sku-capacity 100

bash

az cognitiveservices account deployment create \
  --resource-group MyRG \
  --name myopenai \
  --deployment-name gpt-4-1 \
  --model-name gpt-4.1 \
  --model-version latest \
  --model-format OpenAI \
  --sku-name Standard \
  --sku-capacity 100

Deploy Image Generation Model

部署图像生成模型

bash

az cognitiveservices account deployment create \
  --resource-group MyRG \
  --name myopenai \
  --deployment-name image-gen \
  --model-name gpt-image-1 \
  --model-version 2025-04-15 \
  --model-format OpenAI \
  --sku-name Standard \
  --sku-capacity 10

bash

az cognitiveservices account deployment create \
  --resource-group MyRG \
  --name myopenai \
  --deployment-name image-gen \
  --model-name gpt-image-1 \
  --model-version 2025-04-15 \
  --model-format OpenAI \
  --sku-name Standard \
  --sku-capacity 10

Deploy Sora Video Generation

部署Sora视频生成

bash

az cognitiveservices account deployment create \
  --resource-group MyRG \
  --name myopenai \
  --deployment-name sora \
  --model-name sora \
  --model-version 2025-05-02 \
  --model-format OpenAI \
  --sku-name Standard \
  --sku-capacity 5

bash

az cognitiveservices account deployment create \
  --resource-group MyRG \
  --name myopenai \
  --deployment-name sora \
  --model-name sora \
  --model-version 2025-05-02 \
  --model-format OpenAI \
  --sku-name Standard \
  --sku-capacity 5

Using Azure OpenAI Models

使用Azure OpenAI模型

Python SDK (GPT-5)

Python SDK（GPT-5）

python

from openai import AzureOpenAI
import os

python

from openai import AzureOpenAI
import os

Initialize client

client = AzureOpenAI( api_key=os.getenv("AZURE_OPENAI_API_KEY"), api_version="2025-02-01-preview", azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT") )

GPT-5 completion

response = client.chat.completions.create( model="gpt-5", # deployment name messages=[ {"role": "system", "content": "You are a helpful AI assistant."}, {"role": "user", "content": "Explain quantum computing in simple terms."} ], max_tokens=1000, temperature=0.7, top_p=0.95 )

print(response.choices[0].message.content)

undefined

response = client.chat.completions.create( model="gpt-5", # deployment name messages=[ {"role": "system", "content": "You are a helpful AI assistant."}, {"role": "user", "content": "Explain quantum computing in simple terms."} ], max_tokens=1000, temperature=0.7, top_p=0.95 )

print(response.choices[0].message.content)

undefined

Python SDK (o3 Reasoning Model)

Python SDK（o3推理模型）

python

undefined

python

undefined

o3 reasoning with chain-of-thought

response = client.chat.completions.create( model="o3-reasoning", messages=[ {"role": "system", "content": "You are an expert problem solver. Show your reasoning step-by-step."}, {"role": "user", "content": "If a train travels 120 km in 2 hours, then speeds up to travel 180 km in the next 2 hours, what is the average speed for the entire journey?"} ], max_tokens=2000, temperature=0.2 # Lower temperature for reasoning tasks )

print(response.choices[0].message.content)

undefined

response = client.chat.completions.create( model="o3-reasoning", messages=[ {"role": "system", "content": "You are an expert problem solver. Show your reasoning step-by-step."}, {"role": "user", "content": "If a train travels 120 km in 2 hours, then speeds up to travel 180 km in the next 2 hours, what is the average speed for the entire journey?"} ], max_tokens=2000, temperature=0.2 # Lower temperature for reasoning tasks )

print(response.choices[0].message.content)

undefined

Python SDK (GPT-4.1 with 1M Context)

Python SDK（支持100万上下文的GPT-4.1）

python

undefined

python

undefined

Read a large document

with open('large_document.txt', 'r') as f: document = f.read()

GPT-4.1 can handle up to 1M tokens

response = client.chat.completions.create( model="gpt-4-1", messages=[ {"role": "system", "content": "You are a document analysis expert."}, {"role": "user", "content": f"Analyze this document and provide key insights:\n\n{document}"} ], max_tokens=4000 )

print(response.choices[0].message.content)

undefined

response = client.chat.completions.create( model="gpt-4-1", messages=[ {"role": "system", "content": "You are a document analysis expert."}, {"role": "user", "content": f"Analyze this document and provide key insights:\n\n{document}"} ], max_tokens=4000 )

print(response.choices[0].message.content)

undefined

Image Generation (GPT-image-1)

图像生成（GPT-image-1）

python

undefined

python

undefined

Generate image with DALL-E 3 successor

response = client.images.generate( model="image-gen", prompt="A futuristic city with flying cars and vertical gardens, cyberpunk style, highly detailed, 4K", size="1024x1024", quality="hd", n=1 )

image_url = response.data[0].url print(f"Generated image: {image_url}")

undefined

response = client.images.generate( model="image-gen", prompt="A futuristic city with flying cars and vertical gardens, cyberpunk style, highly detailed, 4K", size="1024x1024", quality="hd", n=1 )

image_url = response.data[0].url print(f"Generated image: {image_url}")

undefined

Video Generation (Sora)

视频生成（Sora）

python

undefined

python

undefined

Generate video with Sora

response = client.videos.generate( model="sora", prompt="A serene lakeside at sunset with birds flying overhead and gentle waves on the shore", duration=10, # seconds resolution="1080p", fps=30 )

video_url = response.data[0].url print(f"Generated video: {video_url}")

undefined

response = client.videos.generate( model="sora", prompt="A serene lakeside at sunset with birds flying overhead and gentle waves on the shore", duration=10, # seconds resolution="1080p", fps=30 )

video_url = response.data[0].url print(f"Generated video: {video_url}")

undefined

Audio Transcription

音频转录

python

undefined

python

undefined

Transcribe audio file

audio_file = open("meeting_recording.mp3", "rb")

response = client.audio.transcriptions.create( model="gpt-4o-transcribe", file=audio_file, language="en", response_format="verbose_json" )

print(f"Transcription: {response.text}") print(f"Duration: {response.duration}s")

audio_file = open("meeting_recording.mp3", "rb")

response = client.audio.transcriptions.create( model="gpt-4o-transcribe", file=audio_file, language="en", response_format="verbose_json" )

print(f"Transcription: {response.text}") print(f"Duration: {response.duration}s")

Speaker diarization

for segment in response.segments: print(f"[{segment.start}s - {segment.end}s] {segment.text}")

undefined

for segment in response.segments: print(f"[{segment.start}s - {segment.end}s] {segment.text}")

undefined

Azure AI Foundry Integration

Azure AI Foundry集成

Model Router (Automatic Model Selection)

模型路由器（自动模型选择）

python

from azure.ai.foundry import ModelRouter

python

from azure.ai.foundry import ModelRouter

Initialize model router

router = ModelRouter( endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"), credential=os.getenv("AZURE_OPENAI_API_KEY") )

Automatically select optimal model

response = router.complete( prompt="Analyze this complex scientific paper...", optimization_goals=["quality", "cost"], available_models=["gpt-5", "gpt-5-mini", "gpt-4-1"] )

print(f"Selected model: {response.model_used}") print(f"Response: {response.content}") print(f"Cost: ${response.cost}")


**Benefits:**
- Automatic model selection based on prompt complexity
- Balance quality vs cost
- Reduce costs by up to 40% while maintaining quality

response = router.complete( prompt="Analyze this complex scientific paper...", optimization_goals=["quality", "cost"], available_models=["gpt-5", "gpt-5-mini", "gpt-4-1"] )

print(f"Selected model: {response.model_used}") print(f"Response: {response.content}") print(f"Cost: ${response.cost}")


**优势：**
- 根据提示复杂度自动选择最优模型
- 平衡质量与成本
- 在保持质量的同时降低高达40%的成本

Agentic Retrieval (Azure AI Search Integration)

智能检索（Azure AI Search集成）

python

from azure.search.documents import SearchClient
from azure.core.credentials import AzureKeyCredential

python

from azure.search.documents import SearchClient
from azure.core.credentials import AzureKeyCredential

Initialize search client

search_client = SearchClient( endpoint=os.getenv("SEARCH_ENDPOINT"), index_name="documents", credential=AzureKeyCredential(os.getenv("SEARCH_KEY")) )

Agentic retrieval with Azure OpenAI

response = client.chat.completions.create( model="gpt-5", messages=[ {"role": "system", "content": "You have access to a document search system."}, {"role": "user", "content": "What are the company's revenue projections for Q3?"} ], tools=[{ "type": "function", "function": { "name": "search_documents", "description": "Search company documents", "parameters": { "type": "object", "properties": { "query": {"type": "string", "description": "Search query"} }, "required": ["query"] } } }], tool_choice="auto" )

Process tool calls

if response.choices[0].message.tool_calls: for tool_call in response.choices[0].message.tool_calls: if tool_call.function.name == "search_documents": query = json.loads(tool_call.function.arguments)["query"] results = search_client.search(query) # Feed results back to model for final answer


**Improvements:**
- 40% better on complex, multi-part questions
- Automatic query decomposition
- Relevance ranking
- Citation generation

if response.choices[0].message.tool_calls: for tool_call in response.choices[0].message.tool_calls: if tool_call.function.name == "search_documents": query = json.loads(tool_call.function.arguments)["query"] results = search_client.search(query) # Feed results back to model for final answer


**改进：**
- 针对复杂多部分问题的性能提升40%
- 自动查询分解
- 相关性排序
- 引用生成

Foundry Observability (Preview)

Foundry可观测性（预览版）

python

from azure.ai.foundry import FoundryObservability

python

from azure.ai.foundry import FoundryObservability

Enable observability

observability = FoundryObservability( workspace_id=os.getenv("AI_FOUNDRY_WORKSPACE_ID"), enable_tracing=True, enable_metrics=True )

Monitor agent execution

with observability.trace_agent("customer_support_agent") as trace: response = client.chat.completions.create( model="gpt-5", messages=messages )

trace.log_tool_call("search_kb", {"query": "refund policy"})
trace.log_reasoning_step("Retrieved refund policy document")
trace.log_token_usage(response.usage.total_tokens)

with observability.trace_agent("customer_support_agent") as trace: response = client.chat.completions.create( model="gpt-5", messages=messages )

trace.log_tool_call("search_kb", {"query": "refund policy"})
trace.log_reasoning_step("Retrieved refund policy document")
trace.log_token_usage(response.usage.total_tokens)

View in Azure AI Foundry portal:

- End-to-end trace logs

- Reasoning steps and tool calls

- Performance metrics

- Cost analysis

undefined

undefined

Capacity and Quota Management

容量与配额管理

Check Quota

检查配额

bash

undefined

bash

undefined

List deployments with usage

az cognitiveservices account deployment list
--resource-group MyRG
--name myopenai
--output table

Check usage metrics

az monitor metrics list
--resource $(az cognitiveservices account show -g MyRG -n myopenai --query id -o tsv)
--metric "TokenTransaction"
--start-time 2025-01-01T00:00:00Z
--end-time 2025-01-31T23:59:59Z
--interval PT1H
--aggregation Total

undefined

az monitor metrics list
--resource $(az cognitiveservices account show -g MyRG -n myopenai --query id -o tsv)
--metric "TokenTransaction"
--start-time 2025-01-01T00:00:00Z
--end-time 2025-01-31T23:59:59Z
--interval PT1H
--aggregation Total

undefined

Update Capacity

更新容量

bash

undefined

bash

undefined

Scale up deployment capacity

az cognitiveservices account deployment update
--resource-group MyRG
--name myopenai
--deployment-name gpt-5
--sku-capacity 200

Scale down during off-peak

az cognitiveservices account deployment update
--resource-group MyRG
--name myopenai
--deployment-name gpt-5
--sku-capacity 50

undefined

az cognitiveservices account deployment update
--resource-group MyRG
--name myopenai
--deployment-name gpt-5
--sku-capacity 50

undefined

Request Quota Increase

请求配额增加

Navigate to Azure Portal → Azure OpenAI resource
Go to "Quotas" blade
Select model and region
Click "Request quota increase"
Provide justification and target capacity

导航至Azure门户 → Azure OpenAI资源
进入“配额”面板
选择模型和区域
点击“请求配额增加”
提供理由和目标容量

Security and Networking

安全与网络

Private Endpoint

专用端点

bash

undefined

bash

undefined

Create private endpoint

az network private-endpoint create
--name openai-private-endpoint
--resource-group MyRG
--vnet-name MyVNet
--subnet PrivateEndpointSubnet
--private-connection-resource-id $(az cognitiveservices account show -g MyRG -n myopenai --query id -o tsv)
--group-id account
--connection-name openai-connection

Create private DNS zone

az network private-dns zone create
--resource-group MyRG
--name privatelink.openai.azure.com

Link to VNet

az network private-dns link vnet create
--resource-group MyRG
--zone-name privatelink.openai.azure.com
--name openai-dns-link
--virtual-network MyVNet
--registration-enabled false

Create DNS zone group

az network private-endpoint dns-zone-group create
--resource-group MyRG
--endpoint-name openai-private-endpoint
--name default
--private-dns-zone privatelink.openai.azure.com
--zone-name privatelink.openai.azure.com

undefined

az network private-endpoint dns-zone-group create
--resource-group MyRG
--endpoint-name openai-private-endpoint
--name default
--private-dns-zone privatelink.openai.azure.com
--zone-name privatelink.openai.azure.com

undefined

Managed Identity Access

托管身份访问

bash

undefined

bash

undefined

Enable system-assigned identity

az cognitiveservices account identity assign
--name myopenai
--resource-group MyRG

Grant role to managed identity

PRINCIPAL_ID=$(az cognitiveservices account show -g MyRG -n myopenai --query identity.principalId -o tsv)

az role assignment create
--assignee $PRINCIPAL_ID
--role "Cognitive Services OpenAI User"
--scope /subscriptions/<sub-id>/resourceGroups/MyRG

undefined

PRINCIPAL_ID=$(az cognitiveservices account show -g MyRG -n myopenai --query identity.principalId -o tsv)

az role assignment create
--assignee $PRINCIPAL_ID
--role "Cognitive Services OpenAI User"
--scope /subscriptions/<sub-id>/resourceGroups/MyRG

undefined

Content Filtering

内容过滤

bash

undefined

bash

undefined

Configure content filtering

az cognitiveservices account update
--name myopenai
--resource-group MyRG
--set properties.customContentFilter='{ "hate": {"severity": "medium", "enabled": true}, "violence": {"severity": "medium", "enabled": true}, "sexual": {"severity": "medium", "enabled": true}, "selfHarm": {"severity": "high", "enabled": true} }'

undefined

az cognitiveservices account update
--name myopenai
--resource-group MyRG
--set properties.customContentFilter='{ "hate": {"severity": "medium", "enabled": true}, "violence": {"severity": "medium", "enabled": true}, "sexual": {"severity": "medium", "enabled": true}, "selfHarm": {"severity": "high", "enabled": true} }'

undefined

Cost Optimization

成本优化

Model Selection Strategy

模型选择策略

Use GPT-5-mini or GPT-5-nano for:

Simple questions
Classification tasks
Content moderation
Summarization

Use GPT-5 or GPT-4.1 for:

Complex reasoning
Long-form content generation
Document analysis
Code generation

Use Reasoning Models (o3, o4-mini) for:

Mathematical problems
Scientific analysis
Step-by-step reasoning
Logic puzzles

使用GPT-5-mini或GPT-5-nano适用于：

简单问题
分类任务
内容审核
摘要生成

使用GPT-5或GPT-4.1适用于：

复杂推理
长篇内容生成
文档分析
代码生成

使用推理模型（o3、o4-mini）适用于：

数学问题
科学分析
分步推理
逻辑谜题

Implement Caching

实现缓存

python

undefined

python

undefined

Use semantic cache to reduce duplicate requests

from azure.ai.cache import SemanticCache

cache = SemanticCache( similarity_threshold=0.95, ttl_seconds=3600 )

from azure.ai.cache import SemanticCache

cache = SemanticCache( similarity_threshold=0.95, ttl_seconds=3600 )

Check cache before API call

cached_response = cache.get(user_query) if cached_response: return cached_response

response = client.chat.completions.create( model="gpt-5", messages=messages )

cache.set(user_query, response)

undefined

cached_response = cache.get(user_query) if cached_response: return cached_response

response = client.chat.completions.create( model="gpt-5", messages=messages )

cache.set(user_query, response)

undefined

Token Management

Token管理

python

import tiktoken

python

import tiktoken

Count tokens before API call

encoding = tiktoken.get_encoding("cl100k_base") tokens = len(encoding.encode(prompt))

if tokens > 100000: print(f"Warning: Prompt has {tokens} tokens, this will be expensive!")

encoding = tiktoken.get_encoding("cl100k_base") tokens = len(encoding.encode(prompt))

if tokens > 100000: print(f"Warning: Prompt has {tokens} tokens, this will be expensive!")

Use shorter max_tokens when appropriate

response = client.chat.completions.create( model="gpt-5", messages=messages, max_tokens=500 # Limit output tokens )

undefined

response = client.chat.completions.create( model="gpt-5", messages=messages, max_tokens=500 # Limit output tokens )

undefined

Monitoring and Alerts

监控与告警

Set Up Cost Alerts

设置成本告警

bash

undefined

bash

undefined

Create budget alert

az consumption budget create
--budget-name openai-monthly-budget
--resource-group MyRG
--amount 1000
--category Cost
--time-grain Monthly
--start-date 2025-01-01
--end-date 2025-12-31
--notifications '{ "actual_GreaterThan_80_Percent": { "enabled": true, "operator": "GreaterThan", "threshold": 80, "contactEmails": ["billing@example.com"] } }'

undefined

az consumption budget create
--budget-name openai-monthly-budget
--resource-group MyRG
--amount 1000
--category Cost
--time-grain Monthly
--start-date 2025-01-01
--end-date 2025-12-31
--notifications '{ "actual_GreaterThan_80_Percent": { "enabled": true, "operator": "GreaterThan", "threshold": 80, "contactEmails": ["billing@example.com"] } }'

undefined

Application Insights Integration

Application Insights集成

python

from opencensus.ext.azure.log_exporter import AzureLogHandler
import logging

python

from opencensus.ext.azure.log_exporter import AzureLogHandler
import logging

Configure logging

logger = logging.getLogger(name) logger.addHandler(AzureLogHandler( connection_string=os.getenv("APPLICATIONINSIGHTS_CONNECTION_STRING") ))

Log API calls

logger.info("OpenAI API call", extra={ "custom_dimensions": { "model": "gpt-5", "tokens": response.usage.total_tokens, "cost": calculate_cost(response.usage.total_tokens), "latency_ms": response.response_ms } })

undefined

logger.info("OpenAI API call", extra={ "custom_dimensions": { "model": "gpt-5", "tokens": response.usage.total_tokens, "cost": calculate_cost(response.usage.total_tokens), "latency_ms": response.response_ms } })

undefined

Best Practices

最佳实践

✓ Use Model Router for automatic cost optimization ✓ Implement caching to reduce duplicate requests ✓ Monitor token usage and set budgets ✓ Use private endpoints for production workloads ✓ Enable managed identity instead of API keys ✓ Configure content filtering for safety ✓ Right-size capacity based on actual demand ✓ Use Foundry Observability for monitoring ✓ Implement retry logic with exponential backoff ✓ Choose appropriate models for task complexity

✓ 使用模型路由器实现自动成本优化 ✓ 实现缓存减少重复请求 ✓ 监控token使用并设置预算 ✓ 使用专用端点处理生产工作负载 ✓ 启用托管身份替代API密钥 ✓ 配置内容过滤保障安全 ✓ 根据实际需求调整容量 ✓ 使用Foundry可观测性进行监控 ✓ 实现重试逻辑并使用指数退避 ✓ 根据任务复杂度选择合适的模型

References

参考资料

Azure OpenAI Service with GPT-5 and reasoning models brings enterprise-grade AI to your applications!

Azure OpenAI Service 结合GPT-5和推理模型，为您的应用带来企业级AI能力！

azure-openai-2025

Original

Translation

Azure OpenAI Service - 2025 Models and Features

Azure OpenAI Service - 2025款模型及功能

Overview

概述

Latest Models (2025)

2025年最新模型

GPT-5 Series (GA August 2025)

GPT-5系列（2025年8月正式发布）

GPT-4.1 Series

GPT-4.1系列

Reasoning Models

推理模型

Image Generation

图像生成

Video Generation

视频生成

Audio Models

音频模型

Deploying Azure OpenAI

部署Azure OpenAI

Create Azure OpenAI Resource

创建Azure OpenAI资源

Create OpenAI account

Create OpenAI account

Get endpoint and key

Get endpoint and key

Deploy GPT-5 Model

部署GPT-5模型

Deploy gpt-5

Deploy gpt-5

Deploy gpt-5-pro (requires registration)

Deploy gpt-5-pro (requires registration)

Deploy Reasoning Models

部署推理模型

Deploy o3 reasoning model

Deploy o3 reasoning model

Deploy o4-mini

Deploy o4-mini

Deploy GPT-4.1 with 1M Context

部署支持100万上下文的GPT-4.1

Deploy Image Generation Model

部署图像生成模型

Deploy Sora Video Generation

部署Sora视频生成

Using Azure OpenAI Models

使用Azure OpenAI模型

Python SDK (GPT-5)

Python SDK（GPT-5）

Initialize client

Initialize client

GPT-5 completion

GPT-5 completion

Python SDK (o3 Reasoning Model)

Python SDK（o3推理模型）

o3 reasoning with chain-of-thought

o3 reasoning with chain-of-thought

Python SDK (GPT-4.1 with 1M Context)

Python SDK（支持100万上下文的GPT-4.1）

Read a large document

Read a large document

GPT-4.1 can handle up to 1M tokens

GPT-4.1 can handle up to 1M tokens

Image Generation (GPT-image-1)

图像生成（GPT-image-1）

Generate image with DALL-E 3 successor

Generate image with DALL-E 3 successor

Video Generation (Sora)

视频生成（Sora）

Generate video with Sora

Generate video with Sora

Audio Transcription

音频转录

Transcribe audio file

Transcribe audio file

Speaker diarization

Speaker diarization

Azure AI Foundry Integration