azure-ai-voicelive-dotnet
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAzure.AI.VoiceLive (.NET)
Azure.AI.VoiceLive(.NET)
Real-time voice AI SDK for building bidirectional voice assistants with Azure AI.
用于借助Azure AI构建双向语音助手的实时语音AI SDK。
Installation
安装
bash
dotnet add package Azure.AI.VoiceLive
dotnet add package Azure.Identity
dotnet add package NAudio # For audio capture/playbackCurrent Versions: Stable v1.0.0, Preview v1.1.0-beta.1
bash
dotnet add package Azure.AI.VoiceLive
dotnet add package Azure.Identity
dotnet add package NAudio # 用于音频捕获/播放当前版本:稳定版v1.0.0,预览版v1.1.0-beta.1
Environment Variables
环境变量
bash
AZURE_VOICELIVE_ENDPOINT=https://<resource>.services.ai.azure.com/
AZURE_VOICELIVE_MODEL=gpt-4o-realtime-preview
AZURE_VOICELIVE_VOICE=en-US-AvaNeuralbash
AZURE_VOICELIVE_ENDPOINT=https://<resource>.services.ai.azure.com/
AZURE_VOICELIVE_MODEL=gpt-4o-realtime-preview
AZURE_VOICELIVE_VOICE=en-US-AvaNeuralOptional: API key if not using Entra ID
可选:不使用Entra ID时的API密钥
AZURE_VOICELIVE_API_KEY=<your-api-key>
undefinedAZURE_VOICELIVE_API_KEY=<your-api-key>
undefinedAuthentication
认证
Microsoft Entra ID (Recommended)
Microsoft Entra ID(推荐)
csharp
using Azure.Identity;
using Azure.AI.VoiceLive;
Uri endpoint = new Uri("https://your-resource.cognitiveservices.azure.com");
DefaultAzureCredential credential = new DefaultAzureCredential();
VoiceLiveClient client = new VoiceLiveClient(endpoint, credential);Required Role: (assign in Azure Portal → Access control)
Cognitive Services Usercsharp
using Azure.Identity;
using Azure.AI.VoiceLive;
Uri endpoint = new Uri("https://your-resource.cognitiveservices.azure.com");
DefaultAzureCredential credential = new DefaultAzureCredential();
VoiceLiveClient client = new VoiceLiveClient(endpoint, credential);所需角色:(在Azure门户→访问控制中分配)
Cognitive Services UserAPI Key
API密钥
csharp
Uri endpoint = new Uri("https://your-resource.cognitiveservices.azure.com");
AzureKeyCredential credential = new AzureKeyCredential("your-api-key");
VoiceLiveClient client = new VoiceLiveClient(endpoint, credential);csharp
Uri endpoint = new Uri("https://your-resource.cognitiveservices.azure.com");
AzureKeyCredential credential = new AzureKeyCredential("your-api-key");
VoiceLiveClient client = new VoiceLiveClient(endpoint, credential);Client Hierarchy
客户端层级
VoiceLiveClient
└── VoiceLiveSession (WebSocket connection)
├── ConfigureSessionAsync()
├── GetUpdatesAsync() → SessionUpdate events
├── AddItemAsync() → UserMessageItem, FunctionCallOutputItem
├── SendAudioAsync()
└── StartResponseAsync()VoiceLiveClient
└── VoiceLiveSession (WebSocket连接)
├── ConfigureSessionAsync()
├── GetUpdatesAsync() → SessionUpdate事件
├── AddItemAsync() → UserMessageItem, FunctionCallOutputItem
├── SendAudioAsync()
└── StartResponseAsync()Core Workflow
核心工作流
1. Start Session and Configure
1. 启动会话并配置
csharp
using Azure.Identity;
using Azure.AI.VoiceLive;
var endpoint = new Uri(Environment.GetEnvironmentVariable("AZURE_VOICELIVE_ENDPOINT"));
var client = new VoiceLiveClient(endpoint, new DefaultAzureCredential());
var model = "gpt-4o-mini-realtime-preview";
// Start session
using VoiceLiveSession session = await client.StartSessionAsync(model);
// Configure session
VoiceLiveSessionOptions sessionOptions = new()
{
Model = model,
Instructions = "You are a helpful AI assistant. Respond naturally.",
Voice = new AzureStandardVoice("en-US-AvaNeural"),
TurnDetection = new AzureSemanticVadTurnDetection()
{
Threshold = 0.5f,
PrefixPadding = TimeSpan.FromMilliseconds(300),
SilenceDuration = TimeSpan.FromMilliseconds(500)
},
InputAudioFormat = InputAudioFormat.Pcm16,
OutputAudioFormat = OutputAudioFormat.Pcm16
};
// Set modalities (both text and audio for voice assistants)
sessionOptions.Modalities.Clear();
sessionOptions.Modalities.Add(InteractionModality.Text);
sessionOptions.Modalities.Add(InteractionModality.Audio);
await session.ConfigureSessionAsync(sessionOptions);csharp
using Azure.Identity;
using Azure.AI.VoiceLive;
var endpoint = new Uri(Environment.GetEnvironmentVariable("AZURE_VOICELIVE_ENDPOINT"));
var client = new VoiceLiveClient(endpoint, new DefaultAzureCredential());
var model = "gpt-4o-mini-realtime-preview";
// 启动会话
using VoiceLiveSession session = await client.StartSessionAsync(model);
// 配置会话
VoiceLiveSessionOptions sessionOptions = new()
{
Model = model,
Instructions = "You are a helpful AI assistant. Respond naturally.",
Voice = new AzureStandardVoice("en-US-AvaNeural"),
TurnDetection = new AzureSemanticVadTurnDetection()
{
Threshold = 0.5f,
PrefixPadding = TimeSpan.FromMilliseconds(300),
SilenceDuration = TimeSpan.FromMilliseconds(500)
},
InputAudioFormat = InputAudioFormat.Pcm16,
OutputAudioFormat = OutputAudioFormat.Pcm16
};
// 设置交互模式(语音助手需同时支持文本和音频)
sessionOptions.Modalities.Clear();
sessionOptions.Modalities.Add(InteractionModality.Text);
sessionOptions.Modalities.Add(InteractionModality.Audio);
await session.ConfigureSessionAsync(sessionOptions);2. Process Events
2. 处理事件
csharp
await foreach (SessionUpdate serverEvent in session.GetUpdatesAsync())
{
switch (serverEvent)
{
case SessionUpdateResponseAudioDelta audioDelta:
byte[] audioData = audioDelta.Delta.ToArray();
// Play audio via NAudio or other audio library
break;
case SessionUpdateResponseTextDelta textDelta:
Console.Write(textDelta.Delta);
break;
case SessionUpdateResponseFunctionCallArgumentsDone functionCall:
// Handle function call (see Function Calling section)
break;
case SessionUpdateError error:
Console.WriteLine($"Error: {error.Error.Message}");
break;
case SessionUpdateResponseDone:
Console.WriteLine("\n--- Response complete ---");
break;
}
}csharp
await foreach (SessionUpdate serverEvent in session.GetUpdatesAsync())
{
switch (serverEvent)
{
case SessionUpdateResponseAudioDelta audioDelta:
byte[] audioData = audioDelta.Delta.ToArray();
// 通过NAudio或其他音频库播放音频
break;
case SessionUpdateResponseTextDelta textDelta:
Console.Write(textDelta.Delta);
break;
case SessionUpdateResponseFunctionCallArgumentsDone functionCall:
// 处理函数调用(参见函数调用章节)
break;
case SessionUpdateError error:
Console.WriteLine($"Error: {error.Error.Message}");
break;
case SessionUpdateResponseDone:
Console.WriteLine("\n--- 响应完成 ---");
break;
}
}3. Send User Message
3. 发送用户消息
csharp
await session.AddItemAsync(new UserMessageItem("Hello, can you help me?"));
await session.StartResponseAsync();csharp
await session.AddItemAsync(new UserMessageItem("Hello, can you help me?"));
await session.StartResponseAsync();4. Function Calling
4. 函数调用
csharp
// Define function
var weatherFunction = new VoiceLiveFunctionDefinition("get_current_weather")
{
Description = "Get the current weather for a given location",
Parameters = BinaryData.FromString("""
{
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state or country"
}
},
"required": ["location"]
}
""")
};
// Add to session options
sessionOptions.Tools.Add(weatherFunction);
// Handle function call in event loop
if (serverEvent is SessionUpdateResponseFunctionCallArgumentsDone functionCall)
{
if (functionCall.Name == "get_current_weather")
{
var parameters = JsonSerializer.Deserialize<Dictionary<string, string>>(functionCall.Arguments);
string location = parameters?["location"] ?? "";
// Call external service
string weatherInfo = $"The weather in {location} is sunny, 75°F.";
// Send response
await session.AddItemAsync(new FunctionCallOutputItem(functionCall.CallId, weatherInfo));
await session.StartResponseAsync();
}
}csharp
// 定义函数
var weatherFunction = new VoiceLiveFunctionDefinition("get_current_weather")
{
Description = "Get the current weather for a given location",
Parameters = BinaryData.FromString("""
{
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state or country"
}
},
"required": ["location"]
}
""")
};
// 添加到会话选项
sessionOptions.Tools.Add(weatherFunction);
// 在事件循环中处理函数调用
if (serverEvent is SessionUpdateResponseFunctionCallArgumentsDone functionCall)
{
if (functionCall.Name == "get_current_weather")
{
var parameters = JsonSerializer.Deserialize<Dictionary<string, string>>(functionCall.Arguments);
string location = parameters?["location"] ?? "";
// 调用外部服务
string weatherInfo = $"The weather in {location} is sunny, 75°F.";
// 发送响应
await session.AddItemAsync(new FunctionCallOutputItem(functionCall.CallId, weatherInfo));
await session.StartResponseAsync();
}
}Voice Options
语音选项
| Voice Type | Class | Example |
|---|---|---|
| Azure Standard | | |
| Azure HD | | |
| Azure Custom | | Custom voice with endpoint ID |
| 语音类型 | 类 | 示例 |
|---|---|---|
| Azure标准语音 | | |
| Azure高清语音 | | |
| Azure自定义语音 | | 带端点ID的自定义语音 |
Supported Models
支持的模型
| Model | Description |
|---|---|
| GPT-4o with real-time audio |
| Lightweight, fast interactions |
| Cost-effective multimodal |
| 模型 | 描述 |
|---|---|
| 支持实时音频的GPT-4o |
| 轻量级、交互快速的模型 |
| 高性价比的多模态模型 |
Key Types Reference
核心类型参考
| Type | Purpose |
|---|---|
| Main client for creating sessions |
| Active WebSocket session |
| Session configuration |
| Standard Azure voice provider |
| Voice activity detection |
| Function tool definition |
| User text message |
| Function call response |
| Audio chunk event |
| Text chunk event |
| 类型 | 用途 |
|---|---|
| 创建会话的主客户端 |
| 活跃的WebSocket会话 |
| 会话配置项 |
| 标准Azure语音提供程序 |
| 语音活动检测 |
| 函数工具定义 |
| 用户文本消息 |
| 函数调用响应 |
| 音频块事件 |
| 文本块事件 |
Best Practices
最佳实践
- Always set both modalities — Include and
Textfor voice assistantsAudio - Use — Provides natural conversation flow
AzureSemanticVadTurnDetection - Configure appropriate silence duration — 500ms typical to avoid premature cutoffs
- Use statement — Ensures proper session disposal
using - Handle all event types — Check for errors, audio, text, and function calls
- Use DefaultAzureCredential — Never hardcode API keys
- 始终设置两种交互模式 — 为语音助手同时启用和
TextAudio - 使用— 提供自然的对话流程
AzureSemanticVadTurnDetection - 配置合适的静音时长 — 通常设为500ms以避免提前截断对话
- 使用语句 — 确保会话被正确释放
using - 处理所有事件类型 — 检查错误、音频、文本和函数调用事件
- 使用DefaultAzureCredential — 切勿硬编码API密钥
Error Handling
错误处理
csharp
if (serverEvent is SessionUpdateError error)
{
if (error.Error.Message.Contains("Cancellation failed: no active response"))
{
// Benign error, can ignore
}
else
{
Console.WriteLine($"Error: {error.Error.Message}");
}
}csharp
if (serverEvent is SessionUpdateError error)
{
if (error.Error.Message.Contains("Cancellation failed: no active response"))
{
// 良性错误,可忽略
}
else
{
Console.WriteLine($"Error: {error.Error.Message}");
}
}Audio Configuration
音频配置
- Input Format: (16-bit PCM)
InputAudioFormat.Pcm16 - Output Format:
OutputAudioFormat.Pcm16 - Sample Rate: 24kHz recommended
- Channels: Mono
- 输入格式:(16位PCM)
InputAudioFormat.Pcm16 - 输出格式:
OutputAudioFormat.Pcm16 - 采样率:推荐24kHz
- 声道:单声道
Related SDKs
相关SDK
| SDK | Purpose | Install |
|---|---|---|
| Real-time voice (this SDK) | |
| Speech-to-text, text-to-speech | |
| Audio capture/playback | |
| SDK | 用途 | 安装命令 |
|---|---|---|
| 实时语音(本SDK) | |
| 语音转文本、文本转语音 | |
| 音频捕获/播放 | |