alicloud-ai-audio-asr-realtime

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
Category: provider
分类:服务商

Model Studio Qwen ASR Realtime

Model Studio Qwen ASR Realtime

Validation

验证

bash
mkdir -p output/alicloud-ai-audio-asr-realtime
python -m py_compile skills/ai/audio/alicloud-ai-audio-asr-realtime/scripts/prepare_realtime_asr_request.py && echo "py_compile_ok" > output/alicloud-ai-audio-asr-realtime/validate.txt
Pass criteria: command exits 0 and
output/alicloud-ai-audio-asr-realtime/validate.txt
is generated.
bash
mkdir -p output/alicloud-ai-audio-asr-realtime
python -m py_compile skills/ai/audio/alicloud-ai-audio-asr-realtime/scripts/prepare_realtime_asr_request.py && echo "py_compile_ok" > output/alicloud-ai-audio-asr-realtime/validate.txt
通过标准:命令执行返回0,且生成
output/alicloud-ai-audio-asr-realtime/validate.txt
文件。

Output And Evidence

输出与验证依据

  • Save session payloads and response samples under
    output/alicloud-ai-audio-asr-realtime/
    .
  • 将会话载荷和响应示例保存至
    output/alicloud-ai-audio-asr-realtime/
    目录下。

Critical model names

关键模型名称

Use one of these exact model strings:
  • qwen3-asr-flash-realtime
  • qwen3-asr-flash-realtime-2026-02-10
请使用以下精确模型字符串之一:
  • qwen3-asr-flash-realtime
  • qwen3-asr-flash-realtime-2026-02-10

Use cases

适用场景

  • Realtime subtitles and captions
  • Voice-agent duplex input
  • Streaming speech-to-text in browser or terminal clients
  • 实时字幕
  • 双工语音Agent输入
  • 浏览器或终端客户端中的流式语音转文字

Prerequisites

前置条件

  • Set
    DASHSCOPE_API_KEY
    in your environment, or add
    dashscope_api_key
    to
    ~/.alibabacloud/credentials
    .
  • Realtime sessions generally require WebSocket or streaming session handling in the client.
  • 在环境变量中设置
    DASHSCOPE_API_KEY
    ,或在
    ~/.alibabacloud/credentials
    中添加
    dashscope_api_key
  • 实时会话通常需要客户端支持WebSocket或流式会话处理。

Normalized interface (asr.realtime)

标准化接口(asr.realtime)

Request

请求参数

  • model
    (string, optional): default
    qwen3-asr-flash-realtime
  • language_hints
    (array<string>, optional)
  • format
    (string, optional): e.g.
    pcm
    ,
    wav
  • sample_rate
    (int, optional): e.g.
    16000
  • chunk_ms
    (int, optional): frame size in milliseconds
  • model
    (字符串,可选):默认值为
    qwen3-asr-flash-realtime
  • language_hints
    (字符串数组,可选)
  • format
    (字符串,可选):例如
    pcm
    wav
  • sample_rate
    (整数,可选):例如
    16000
  • chunk_ms
    (整数,可选):帧大小(毫秒)

Response

响应参数

  • text
    (string): recognized transcript fragment
  • is_final
    (bool): finalization marker
  • usage
    (object, optional)
  • text
    (字符串):识别到的文本片段
  • is_final
    (布尔值):最终结果标记
  • usage
    (对象,可选)

Quick start

快速开始

Generate a request template:
bash
python skills/ai/audio/alicloud-ai-audio-asr-realtime/scripts/prepare_realtime_asr_request.py \
  --output output/alicloud-ai-audio-asr-realtime/request.json
生成请求模板:
bash
python skills/ai/audio/alicloud-ai-audio-asr-realtime/scripts/prepare_realtime_asr_request.py \
  --output output/alicloud-ai-audio-asr-realtime/request.json

Operational guidance

操作指南

  • Prefer 16kHz mono PCM unless your client stack requires another format.
  • Keep chunks small enough for responsive partial results.
  • If you only have recorded files, use
    skills/ai/audio/alicloud-ai-audio-asr/
    instead.
  • 除非客户端栈要求其他格式,否则优先使用16kHz单声道PCM格式。
  • 保持数据块足够小,以获得响应迅速的部分结果。
  • 如果仅处理录制文件,请使用
    skills/ai/audio/alicloud-ai-audio-asr/
    替代。

References

参考资料

  • references/sources.md
  • references/sources.md