azure-ai-transcription-py

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Azure AI Transcription SDK for Python

Azure AI Transcription SDK for Python

Client library for Azure AI Transcription (speech-to-text) with real-time and batch transcription.
适用于Azure AI Transcription(语音转文字)的客户端库,支持实时和批量转录。

Installation

安装

bash
pip install azure-ai-transcription
bash
pip install azure-ai-transcription

Environment Variables

环境变量

bash
TRANSCRIPTION_ENDPOINT=https://<resource>.cognitiveservices.azure.com
TRANSCRIPTION_KEY=<your-key>
bash
TRANSCRIPTION_ENDPOINT=https://<resource>.cognitiveservices.azure.com
TRANSCRIPTION_KEY=<your-key>

Authentication

身份验证

Use subscription key authentication (DefaultAzureCredential is not supported for this client):
python
import os
from azure.ai.transcription import TranscriptionClient

client = TranscriptionClient(
    endpoint=os.environ["TRANSCRIPTION_ENDPOINT"],
    credential=os.environ["TRANSCRIPTION_KEY"]
)
使用订阅密钥进行身份验证(此客户端不支持DefaultAzureCredential):
python
import os
from azure.ai.transcription import TranscriptionClient

client = TranscriptionClient(
    endpoint=os.environ["TRANSCRIPTION_ENDPOINT"],
    credential=os.environ["TRANSCRIPTION_KEY"]
)

Transcription (Batch)

转录(批量)

python
job = client.begin_transcription(
    name="meeting-transcription",
    locale="en-US",
    content_urls=["https://<storage>/audio.wav"],
    diarization_enabled=True
)
result = job.result()
print(result.status)
python
job = client.begin_transcription(
    name="meeting-transcription",
    locale="en-US",
    content_urls=["https://<storage>/audio.wav"],
    diarization_enabled=True
)
result = job.result()
print(result.status)

Transcription (Real-time)

转录(实时)

python
stream = client.begin_stream_transcription(locale="en-US")
stream.send_audio_file("audio.wav")
for event in stream:
    print(event.text)
python
stream = client.begin_stream_transcription(locale="en-US")
stream.send_audio_file("audio.wav")
for event in stream:
    print(event.text)

Best Practices

最佳实践

  1. Enable diarization when multiple speakers are present
  2. Use batch transcription for long files stored in blob storage
  3. Capture timestamps for subtitle generation
  4. Specify language to improve recognition accuracy
  5. Handle streaming backpressure for real-time transcription
  6. Close transcription sessions when complete
  1. 启用说话人分离 当存在多个说话人时
  2. 使用批量转录 对于存储在Blob存储中的长文件
  3. 捕获时间戳 用于字幕生成
  4. 指定语言 以提高识别准确率
  5. 处理流背压 针对实时转录场景
  6. 关闭转录会话 当任务完成后