azure-ai-voicelive-dotnet

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Azure.AI.VoiceLive (.NET)

Azure.AI.VoiceLive（.NET）

Real-time voice AI SDK for building bidirectional voice assistants with Azure AI.

用于借助Azure AI构建双向语音助手的实时语音AI SDK。

Installation

安装

bash

dotnet add package Azure.AI.VoiceLive
dotnet add package Azure.Identity
dotnet add package NAudio                    # For audio capture/playback

Current Versions: Stable v1.0.0, Preview v1.1.0-beta.1

bash

dotnet add package Azure.AI.VoiceLive
dotnet add package Azure.Identity
dotnet add package NAudio                    # 用于音频捕获/播放

当前版本：稳定版v1.0.0，预览版v1.1.0-beta.1

Environment Variables

环境变量

bash

AZURE_VOICELIVE_ENDPOINT=https://<resource>.services.ai.azure.com/
AZURE_VOICELIVE_MODEL=gpt-4o-realtime-preview
AZURE_VOICELIVE_VOICE=en-US-AvaNeural

bash

AZURE_VOICELIVE_ENDPOINT=https://<resource>.services.ai.azure.com/
AZURE_VOICELIVE_MODEL=gpt-4o-realtime-preview
AZURE_VOICELIVE_VOICE=en-US-AvaNeural

Optional: API key if not using Entra ID

可选：不使用Entra ID时的API密钥

AZURE_VOICELIVE_API_KEY=<your-api-key>

undefined

AZURE_VOICELIVE_API_KEY=<your-api-key>

undefined

Authentication

认证

Microsoft Entra ID (Recommended)

Microsoft Entra ID（推荐）

csharp

using Azure.Identity;
using Azure.AI.VoiceLive;

Uri endpoint = new Uri("https://your-resource.cognitiveservices.azure.com");
DefaultAzureCredential credential = new DefaultAzureCredential();
VoiceLiveClient client = new VoiceLiveClient(endpoint, credential);

Required Role:

Cognitive Services User

(assign in Azure Portal → Access control)

csharp

using Azure.Identity;
using Azure.AI.VoiceLive;

Uri endpoint = new Uri("https://your-resource.cognitiveservices.azure.com");
DefaultAzureCredential credential = new DefaultAzureCredential();
VoiceLiveClient client = new VoiceLiveClient(endpoint, credential);

所需角色：

Cognitive Services User

（在Azure门户→访问控制中分配）

API Key

API密钥

csharp

Uri endpoint = new Uri("https://your-resource.cognitiveservices.azure.com");
AzureKeyCredential credential = new AzureKeyCredential("your-api-key");
VoiceLiveClient client = new VoiceLiveClient(endpoint, credential);

csharp

Uri endpoint = new Uri("https://your-resource.cognitiveservices.azure.com");
AzureKeyCredential credential = new AzureKeyCredential("your-api-key");
VoiceLiveClient client = new VoiceLiveClient(endpoint, credential);

Client Hierarchy

客户端层级

VoiceLiveClient
└── VoiceLiveSession (WebSocket connection)
    ├── ConfigureSessionAsync()
    ├── GetUpdatesAsync() → SessionUpdate events
    ├── AddItemAsync() → UserMessageItem, FunctionCallOutputItem
    ├── SendAudioAsync()
    └── StartResponseAsync()

VoiceLiveClient
└── VoiceLiveSession (WebSocket连接)
    ├── ConfigureSessionAsync()
    ├── GetUpdatesAsync() → SessionUpdate事件
    ├── AddItemAsync() → UserMessageItem, FunctionCallOutputItem
    ├── SendAudioAsync()
    └── StartResponseAsync()

Core Workflow

核心工作流

1. Start Session and Configure

1. 启动会话并配置

csharp

using Azure.Identity;
using Azure.AI.VoiceLive;

var endpoint = new Uri(Environment.GetEnvironmentVariable("AZURE_VOICELIVE_ENDPOINT"));
var client = new VoiceLiveClient(endpoint, new DefaultAzureCredential());

var model = "gpt-4o-mini-realtime-preview";

// Start session
using VoiceLiveSession session = await client.StartSessionAsync(model);

// Configure session
VoiceLiveSessionOptions sessionOptions = new()
{
    Model = model,
    Instructions = "You are a helpful AI assistant. Respond naturally.",
    Voice = new AzureStandardVoice("en-US-AvaNeural"),
    TurnDetection = new AzureSemanticVadTurnDetection()
    {
        Threshold = 0.5f,
        PrefixPadding = TimeSpan.FromMilliseconds(300),
        SilenceDuration = TimeSpan.FromMilliseconds(500)
    },
    InputAudioFormat = InputAudioFormat.Pcm16,
    OutputAudioFormat = OutputAudioFormat.Pcm16
};

// Set modalities (both text and audio for voice assistants)
sessionOptions.Modalities.Clear();
sessionOptions.Modalities.Add(InteractionModality.Text);
sessionOptions.Modalities.Add(InteractionModality.Audio);

await session.ConfigureSessionAsync(sessionOptions);

csharp

using Azure.Identity;
using Azure.AI.VoiceLive;

var endpoint = new Uri(Environment.GetEnvironmentVariable("AZURE_VOICELIVE_ENDPOINT"));
var client = new VoiceLiveClient(endpoint, new DefaultAzureCredential());

var model = "gpt-4o-mini-realtime-preview";

// 启动会话
using VoiceLiveSession session = await client.StartSessionAsync(model);

// 配置会话
VoiceLiveSessionOptions sessionOptions = new()
{
    Model = model,
    Instructions = "You are a helpful AI assistant. Respond naturally.",
    Voice = new AzureStandardVoice("en-US-AvaNeural"),
    TurnDetection = new AzureSemanticVadTurnDetection()
    {
        Threshold = 0.5f,
        PrefixPadding = TimeSpan.FromMilliseconds(300),
        SilenceDuration = TimeSpan.FromMilliseconds(500)
    },
    InputAudioFormat = InputAudioFormat.Pcm16,
    OutputAudioFormat = OutputAudioFormat.Pcm16
};

// 设置交互模式（语音助手需同时支持文本和音频）
sessionOptions.Modalities.Clear();
sessionOptions.Modalities.Add(InteractionModality.Text);
sessionOptions.Modalities.Add(InteractionModality.Audio);

await session.ConfigureSessionAsync(sessionOptions);

2. Process Events

2. 处理事件

csharp

await foreach (SessionUpdate serverEvent in session.GetUpdatesAsync())
{
    switch (serverEvent)
    {
        case SessionUpdateResponseAudioDelta audioDelta:
            byte[] audioData = audioDelta.Delta.ToArray();
            // Play audio via NAudio or other audio library
            break;
            
        case SessionUpdateResponseTextDelta textDelta:
            Console.Write(textDelta.Delta);
            break;
            
        case SessionUpdateResponseFunctionCallArgumentsDone functionCall:
            // Handle function call (see Function Calling section)
            break;
            
        case SessionUpdateError error:
            Console.WriteLine($"Error: {error.Error.Message}");
            break;
            
        case SessionUpdateResponseDone:
            Console.WriteLine("\n--- Response complete ---");
            break;
    }
}

csharp

await foreach (SessionUpdate serverEvent in session.GetUpdatesAsync())
{
    switch (serverEvent)
    {
        case SessionUpdateResponseAudioDelta audioDelta:
            byte[] audioData = audioDelta.Delta.ToArray();
            // 通过NAudio或其他音频库播放音频
            break;
            
        case SessionUpdateResponseTextDelta textDelta:
            Console.Write(textDelta.Delta);
            break;
            
        case SessionUpdateResponseFunctionCallArgumentsDone functionCall:
            // 处理函数调用（参见函数调用章节）
            break;
            
        case SessionUpdateError error:
            Console.WriteLine($"Error: {error.Error.Message}");
            break;
            
        case SessionUpdateResponseDone:
            Console.WriteLine("\n--- 响应完成 ---");
            break;
    }
}

3. Send User Message

3. 发送用户消息

csharp

await session.AddItemAsync(new UserMessageItem("Hello, can you help me?"));
await session.StartResponseAsync();

csharp

await session.AddItemAsync(new UserMessageItem("Hello, can you help me?"));
await session.StartResponseAsync();

4. Function Calling

4. 函数调用

csharp

// Define function
var weatherFunction = new VoiceLiveFunctionDefinition("get_current_weather")
{
    Description = "Get the current weather for a given location",
    Parameters = BinaryData.FromString("""
        {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city and state or country"
                }
            },
            "required": ["location"]
        }
        """)
};

// Add to session options
sessionOptions.Tools.Add(weatherFunction);

// Handle function call in event loop
if (serverEvent is SessionUpdateResponseFunctionCallArgumentsDone functionCall)
{
    if (functionCall.Name == "get_current_weather")
    {
        var parameters = JsonSerializer.Deserialize<Dictionary<string, string>>(functionCall.Arguments);
        string location = parameters?["location"] ?? "";
        
        // Call external service
        string weatherInfo = $"The weather in {location} is sunny, 75°F.";
        
        // Send response
        await session.AddItemAsync(new FunctionCallOutputItem(functionCall.CallId, weatherInfo));
        await session.StartResponseAsync();
    }
}

csharp

// 定义函数
var weatherFunction = new VoiceLiveFunctionDefinition("get_current_weather")
{
    Description = "Get the current weather for a given location",
    Parameters = BinaryData.FromString("""
        {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city and state or country"
                }
            },
            "required": ["location"]
        }
        """)
};

// 添加到会话选项
sessionOptions.Tools.Add(weatherFunction);

// 在事件循环中处理函数调用
if (serverEvent is SessionUpdateResponseFunctionCallArgumentsDone functionCall)
{
    if (functionCall.Name == "get_current_weather")
    {
        var parameters = JsonSerializer.Deserialize<Dictionary<string, string>>(functionCall.Arguments);
        string location = parameters?["location"] ?? "";
        
        // 调用外部服务
        string weatherInfo = $"The weather in {location} is sunny, 75°F.";
        
        // 发送响应
        await session.AddItemAsync(new FunctionCallOutputItem(functionCall.CallId, weatherInfo));
        await session.StartResponseAsync();
    }
}

Voice Options

语音选项

Voice Type	Class	Example
Azure Standard	`AzureStandardVoice`	`"en-US-AvaNeural"`
Azure HD	`AzureStandardVoice`	`"en-US-Ava:DragonHDLatestNeural"`
Azure Custom	`AzureCustomVoice`	Custom voice with endpoint ID

语音类型	类	示例
Azure标准语音	`AzureStandardVoice`	`"en-US-AvaNeural"`
Azure高清语音	`AzureStandardVoice`	`"en-US-Ava:DragonHDLatestNeural"`
Azure自定义语音	`AzureCustomVoice`	带端点ID的自定义语音

Supported Models

支持的模型

Model	Description
`gpt-4o-realtime-preview`	GPT-4o with real-time audio
`gpt-4o-mini-realtime-preview`	Lightweight, fast interactions
`phi4-mm-realtime`	Cost-effective multimodal

模型	描述
`gpt-4o-realtime-preview`	支持实时音频的GPT-4o
`gpt-4o-mini-realtime-preview`	轻量级、交互快速的模型
`phi4-mm-realtime`	高性价比的多模态模型

Key Types Reference

核心类型参考

Type	Purpose
`VoiceLiveClient`	Main client for creating sessions
`VoiceLiveSession`	Active WebSocket session
`VoiceLiveSessionOptions`	Session configuration
`AzureStandardVoice`	Standard Azure voice provider
`AzureSemanticVadTurnDetection`	Voice activity detection
`VoiceLiveFunctionDefinition`	Function tool definition
`UserMessageItem`	User text message
`FunctionCallOutputItem`	Function call response
`SessionUpdateResponseAudioDelta`	Audio chunk event
`SessionUpdateResponseTextDelta`	Text chunk event

类型	用途
`VoiceLiveClient`	创建会话的主客户端
`VoiceLiveSession`	活跃的WebSocket会话
`VoiceLiveSessionOptions`	会话配置项
`AzureStandardVoice`	标准Azure语音提供程序
`AzureSemanticVadTurnDetection`	语音活动检测
`VoiceLiveFunctionDefinition`	函数工具定义
`UserMessageItem`	用户文本消息
`FunctionCallOutputItem`	函数调用响应
`SessionUpdateResponseAudioDelta`	音频块事件
`SessionUpdateResponseTextDelta`	文本块事件

Best Practices

最佳实践

Always set both modalities — Include
```
Text
```
and
```
Audio
```
for voice assistants
Use
AzureSemanticVadTurnDetection
— Provides natural conversation flow
Configure appropriate silence duration — 500ms typical to avoid premature cutoffs
Use
using
statement — Ensures proper session disposal
Handle all event types — Check for errors, audio, text, and function calls
Use DefaultAzureCredential — Never hardcode API keys

始终设置两种交互模式 — 为语音助手同时启用
```
Text
```
和
```
Audio
```
使用
AzureSemanticVadTurnDetection
— 提供自然的对话流程
配置合适的静音时长 — 通常设为500ms以避免提前截断对话
使用
using
语句 — 确保会话被正确释放
处理所有事件类型 — 检查错误、音频、文本和函数调用事件
使用DefaultAzureCredential — 切勿硬编码API密钥

Error Handling

错误处理

csharp

if (serverEvent is SessionUpdateError error)
{
    if (error.Error.Message.Contains("Cancellation failed: no active response"))
    {
        // Benign error, can ignore
    }
    else
    {
        Console.WriteLine($"Error: {error.Error.Message}");
    }
}

csharp

if (serverEvent is SessionUpdateError error)
{
    if (error.Error.Message.Contains("Cancellation failed: no active response"))
    {
        // 良性错误，可忽略
    }
    else
    {
        Console.WriteLine($"Error: {error.Error.Message}");
    }
}

Audio Configuration

音频配置

Input Format:
```
InputAudioFormat.Pcm16
```
(16-bit PCM)
Output Format:
```
OutputAudioFormat.Pcm16
```
Sample Rate: 24kHz recommended
Channels: Mono

输入格式：
```
InputAudioFormat.Pcm16
```
（16位PCM）
输出格式：
```
OutputAudioFormat.Pcm16
```
采样率：推荐24kHz
声道：单声道

Related SDKs

SDK	Purpose	Install
`Azure.AI.VoiceLive`	Real-time voice (this SDK)	`dotnet add package Azure.AI.VoiceLive`
`Microsoft.CognitiveServices.Speech`	Speech-to-text, text-to-speech	`dotnet add package Microsoft.CognitiveServices.Speech`
`NAudio`	Audio capture/playback	`dotnet add package NAudio`

SDK	用途	安装命令
`Azure.AI.VoiceLive`	实时语音（本SDK）	`dotnet add package Azure.AI.VoiceLive`
`Microsoft.CognitiveServices.Speech`	语音转文本、文本转语音	`dotnet add package Microsoft.CognitiveServices.Speech`
`NAudio`	音频捕获/播放	`dotnet add package NAudio`

Reference Links

参考链接

Resource	URL
NuGet Package	https://www.nuget.org/packages/Azure.AI.VoiceLive
API Reference	https://learn.microsoft.com/dotnet/api/azure.ai.voicelive
GitHub Source	https://github.com/Azure/azure-sdk-for-net/tree/main/sdk/ai/Azure.AI.VoiceLive
Quickstart	https://learn.microsoft.com/azure/ai-services/speech-service/voice-live-quickstart

资源	链接
NuGet包	https://www.nuget.org/packages/Azure.AI.VoiceLive
API参考	https://learn.microsoft.com/dotnet/api/azure.ai.voicelive
GitHub源码	https://github.com/Azure/azure-sdk-for-net/tree/main/sdk/ai/Azure.AI.VoiceLive
快速入门	https://learn.microsoft.com/azure/ai-services/speech-service/voice-live-quickstart

azure-ai-voicelive-dotnet

Original

Translation

Azure.AI.VoiceLive (.NET)

Azure.AI.VoiceLive（.NET）

Installation

安装

Environment Variables

环境变量

Optional: API key if not using Entra ID

可选：不使用Entra ID时的API密钥

Authentication

认证

Microsoft Entra ID (Recommended)

Microsoft Entra ID（推荐）

API Key

API密钥

Client Hierarchy

客户端层级

Core Workflow

核心工作流

1. Start Session and Configure

1. 启动会话并配置

2. Process Events

2. 处理事件

3. Send User Message

3. 发送用户消息

4. Function Calling

4. 函数调用

Voice Options

语音选项

Supported Models

支持的模型

Key Types Reference

核心类型参考

Best Practices

最佳实践

Error Handling

错误处理

Audio Configuration

音频配置

Related SDKs

相关SDK

Reference Links

参考链接