alicloud-ai-audio-asr

Original：🇺🇸 English

Translated

1 scriptsChecked / no sensitive code detected

Transcribe non-realtime speech with Alibaba Cloud Model Studio Qwen ASR models (`qwen3-asr-flash`, `qwen-audio-asr`, `qwen3-asr-flash-filetrans`). Use when converting recorded audio files to text, generating transcripts with timestamps, or documenting DashScope/OpenAI-compatible ASR request and response fields.

8installs

Sourcecinience/alicloud-skills

Added on2026-02-28

NPX Install

npx skill4agent add cinience/alicloud-skills alicloud-ai-audio-asr

SKILL.md Content

View Translation Comparison →

Category: provider

Model Studio Qwen ASR (Non-Realtime)

Use Qwen ASR for recorded audio transcription (non-realtime), including short audio sync calls and long audio async jobs.

Critical model names

Use one of these exact model strings:

```
qwen3-asr-flash
```
```
qwen-audio-asr
```
```
qwen3-asr-flash-filetrans
```

Selection guidance:

Use
```
qwen3-asr-flash
```
or
```
qwen-audio-asr
```
for short/normal recordings (sync).
Use
```
qwen3-asr-flash-filetrans
```
for long-file transcription (async task workflow).

Prerequisites

Install SDK dependencies (script uses Python stdlib only):

bash

python3 -m venv .venv
. .venv/bin/activate

Set

DASHSCOPE_API_KEY

in environment, or add

dashscope_api_key

to

~/.alibabacloud/credentials

.

Normalized interface (asr.transcribe)

Request

```
audio
```
(string, required): public URL or local file path.
```
model
```
(string, optional): default
```
qwen3-asr-flash
```
.
```
language_hints
```
(array<string>, optional): e.g.
```
zh
```
,
```
en
```
.
```
sample_rate
```
(number, optional)
```
vocabulary_id
```
(string, optional)
```
disfluency_removal_enabled
```
(bool, optional)
```
timestamp_granularities
```
(array<string>, optional): e.g.
```
sentence
```
.
```
async
```
(bool, optional): default false for sync models, true for
```
qwen3-asr-flash-filetrans
```
.

Response

```
text
```
(string): normalized transcript text.
```
task_id
```
(string, optional): present for async submission.
```
status
```
(string):
```
SUCCEEDED
```
or submission status.
```
raw
```
(object): original API response.

Quick start (official HTTP API)

Sync transcription (OpenAI-compatible protocol):

bash

curl -sS --location 'https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions' \
  --header "Authorization: Bearer $DASHSCOPE_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "qwen3-asr-flash",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "input_audio",
            "input_audio": {
              "data": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
            }
          }
        ]
      }
    ],
    "stream": false,
    "asr_options": {
      "enable_itn": false
    }
  }'

Async long-file transcription (DashScope protocol):

bash

curl -sS --location 'https://dashscope.aliyuncs.com/api/v1/services/audio/asr/transcription' \
  --header "Authorization: Bearer $DASHSCOPE_API_KEY" \
  --header 'X-DashScope-Async: enable' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "qwen3-asr-flash-filetrans",
    "input": {
      "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
    }
  }'

Poll task result:

bash

curl -sS --location "https://dashscope.aliyuncs.com/api/v1/tasks/<task_id>" \
  --header "Authorization: Bearer $DASHSCOPE_API_KEY"

Local helper script

Use the bundled script for URL/local-file input and optional async polling:

bash

python skills/ai/audio/alicloud-ai-audio-asr/scripts/transcribe_audio.py \
  --audio "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3" \
  --model qwen3-asr-flash \
  --language-hints zh,en \
  --print-response

Long-file mode:

bash

python skills/ai/audio/alicloud-ai-audio-asr/scripts/transcribe_audio.py \
  --audio "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3" \
  --model qwen3-asr-flash-filetrans \
  --async \
  --wait

Operational guidance

For local files, use
```
input_audio.data
```
(data URI) when direct URL is unavailable.
Keep
```
language_hints
```
minimal to reduce recognition ambiguity.
For async tasks, use 5-20s polling interval with max retry guard.
Save normalized outputs under
```
output/ai-audio-asr/transcripts/
```
.

Output location

Default output:
```
output/ai-audio-asr/transcripts/
```
Override base dir with
```
OUTPUT_DIR
```
.

References

```
references/api_reference.md
```
```
references/sources.md
```

Realtime synthesis is provided by

skills/ai/audio/alicloud-ai-audio-tts-realtime/

.

alicloud-ai-audio-asr

NPX Install

Tags

SKILL.md Content

Model Studio Qwen ASR (Non-Realtime)

Critical model names

Prerequisites

Normalized interface (asr.transcribe)

Request

Response

Quick start (official HTTP API)

Local helper script

Operational guidance

Output location

References