Loading...
Loading...
Azure AI Voice Live SDK for .NET. Build real-time voice AI applications with bidirectional WebSocket communication. Use for voice assistants, conversational AI, real-time speech-to-speech, and voice-enabled chatbots. Triggers: "voice live", "real-time voice", "VoiceLiveClient", "VoiceLiveSession", "voice assistant .NET", "bidirectional audio", "speech-to-speech".
npx skill4agent add sickn33/antigravity-awesome-skills azure-ai-voicelive-dotnetdotnet add package Azure.AI.VoiceLive
dotnet add package Azure.Identity
dotnet add package NAudio # For audio capture/playbackAZURE_VOICELIVE_ENDPOINT=https://<resource>.services.ai.azure.com/
AZURE_VOICELIVE_MODEL=gpt-4o-realtime-preview
AZURE_VOICELIVE_VOICE=en-US-AvaNeural
# Optional: API key if not using Entra ID
AZURE_VOICELIVE_API_KEY=<your-api-key>using Azure.Identity;
using Azure.AI.VoiceLive;
Uri endpoint = new Uri("https://your-resource.cognitiveservices.azure.com");
DefaultAzureCredential credential = new DefaultAzureCredential();
VoiceLiveClient client = new VoiceLiveClient(endpoint, credential);Cognitive Services UserUri endpoint = new Uri("https://your-resource.cognitiveservices.azure.com");
AzureKeyCredential credential = new AzureKeyCredential("your-api-key");
VoiceLiveClient client = new VoiceLiveClient(endpoint, credential);VoiceLiveClient
└── VoiceLiveSession (WebSocket connection)
├── ConfigureSessionAsync()
├── GetUpdatesAsync() → SessionUpdate events
├── AddItemAsync() → UserMessageItem, FunctionCallOutputItem
├── SendAudioAsync()
└── StartResponseAsync()using Azure.Identity;
using Azure.AI.VoiceLive;
var endpoint = new Uri(Environment.GetEnvironmentVariable("AZURE_VOICELIVE_ENDPOINT"));
var client = new VoiceLiveClient(endpoint, new DefaultAzureCredential());
var model = "gpt-4o-mini-realtime-preview";
// Start session
using VoiceLiveSession session = await client.StartSessionAsync(model);
// Configure session
VoiceLiveSessionOptions sessionOptions = new()
{
Model = model,
Instructions = "You are a helpful AI assistant. Respond naturally.",
Voice = new AzureStandardVoice("en-US-AvaNeural"),
TurnDetection = new AzureSemanticVadTurnDetection()
{
Threshold = 0.5f,
PrefixPadding = TimeSpan.FromMilliseconds(300),
SilenceDuration = TimeSpan.FromMilliseconds(500)
},
InputAudioFormat = InputAudioFormat.Pcm16,
OutputAudioFormat = OutputAudioFormat.Pcm16
};
// Set modalities (both text and audio for voice assistants)
sessionOptions.Modalities.Clear();
sessionOptions.Modalities.Add(InteractionModality.Text);
sessionOptions.Modalities.Add(InteractionModality.Audio);
await session.ConfigureSessionAsync(sessionOptions);await foreach (SessionUpdate serverEvent in session.GetUpdatesAsync())
{
switch (serverEvent)
{
case SessionUpdateResponseAudioDelta audioDelta:
byte[] audioData = audioDelta.Delta.ToArray();
// Play audio via NAudio or other audio library
break;
case SessionUpdateResponseTextDelta textDelta:
Console.Write(textDelta.Delta);
break;
case SessionUpdateResponseFunctionCallArgumentsDone functionCall:
// Handle function call (see Function Calling section)
break;
case SessionUpdateError error:
Console.WriteLine($"Error: {error.Error.Message}");
break;
case SessionUpdateResponseDone:
Console.WriteLine("\n--- Response complete ---");
break;
}
}await session.AddItemAsync(new UserMessageItem("Hello, can you help me?"));
await session.StartResponseAsync();// Define function
var weatherFunction = new VoiceLiveFunctionDefinition("get_current_weather")
{
Description = "Get the current weather for a given location",
Parameters = BinaryData.FromString("""
{
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state or country"
}
},
"required": ["location"]
}
""")
};
// Add to session options
sessionOptions.Tools.Add(weatherFunction);
// Handle function call in event loop
if (serverEvent is SessionUpdateResponseFunctionCallArgumentsDone functionCall)
{
if (functionCall.Name == "get_current_weather")
{
var parameters = JsonSerializer.Deserialize<Dictionary<string, string>>(functionCall.Arguments);
string location = parameters?["location"] ?? "";
// Call external service
string weatherInfo = $"The weather in {location} is sunny, 75°F.";
// Send response
await session.AddItemAsync(new FunctionCallOutputItem(functionCall.CallId, weatherInfo));
await session.StartResponseAsync();
}
}| Voice Type | Class | Example |
|---|---|---|
| Azure Standard | | |
| Azure HD | | |
| Azure Custom | | Custom voice with endpoint ID |
| Model | Description |
|---|---|
| GPT-4o with real-time audio |
| Lightweight, fast interactions |
| Cost-effective multimodal |
| Type | Purpose |
|---|---|
| Main client for creating sessions |
| Active WebSocket session |
| Session configuration |
| Standard Azure voice provider |
| Voice activity detection |
| Function tool definition |
| User text message |
| Function call response |
| Audio chunk event |
| Text chunk event |
TextAudioAzureSemanticVadTurnDetectionusingif (serverEvent is SessionUpdateError error)
{
if (error.Error.Message.Contains("Cancellation failed: no active response"))
{
// Benign error, can ignore
}
else
{
Console.WriteLine($"Error: {error.Error.Message}");
}
}InputAudioFormat.Pcm16OutputAudioFormat.Pcm16| SDK | Purpose | Install |
|---|---|---|
| Real-time voice (this SDK) | |
| Speech-to-text, text-to-speech | |
| Audio capture/playback | |