Loading...
Loading...
Automatic Speech Recognition (ASR). Uses Volcano Engine BigModel ASR for speech recognition, with two available modes: Express Edition (≤2h/100MB, synchronous fast response) and Standard Edition (≤5h, asynchronous recognition). It supports Feishu voice messages, local audio files and audio URLs. Use this skill when you receive voice messages or audio attachments (.ogg/.mp3/.wav).
npx skill4agent add bytedance/agentkit-samples byted-voice-to-textmessage_type: audioinspect_audio.pyasr_flash.pyasr_standard.pyensure_ffmpeg.py --executewhisperpython3 <SKILL_DIR>/scripts/inspect_audio.py "<AUDIO_INPUT>"python3 <SKILL_DIR>/scripts/ensure_ffmpeg.py --executecdskills/byted-voice-to-text| Condition | Script |
|---|---|
| Duration ≤ 2h and size ≤ 100MB | |
| 2h < duration ≤ 5h | |
| Duration > 5h | Not supported, slice first and process slices one by one with Express Edition |
| Duration unavailable and size ≤ 100MB | |
| Duration unavailable and size > 100MB | |
asr_standard.py --url "<URL>"| Environment Variable | Purpose | Required |
|---|---|---|
| API Key (new console solution) | Yes |
| App ID (used with old version authentication) | No |
| Express Edition endpoint (has default value) | No |
| Express Edition resource ID (default | No |
| Standard Edition submission endpoint (has default value) | No |
| Standard Edition query endpoint (has default value) | No |
| Standard Edition resource ID (default | No |
| Feishu tenant_access_token (only for | No |
| Script | Purpose | Corresponding Mode |
|---|---|---|
| Audio metadata detection (duration, sampling rate, channel, etc.) | Pre-check |
| Automatically detect and install ffmpeg/ffprobe | Pre-check |
| Express Edition recognition (≤2h/100MB, synchronous) | Express/Flash |
| Standard Edition recognition (≤5h, asynchronous submit+poll) | Standard |
# Pre-check: detect audio metadata
python3 <SKILL_DIR>/scripts/inspect_audio.py "<AUDIO_INPUT>"
# Automatically install when ffmpeg is missing
python3 <SKILL_DIR>/scripts/ensure_ffmpeg.py --execute
# Express Edition (short audio, ≤2h/100MB)
python3 <SKILL_DIR>/scripts/asr_flash.py --file "<AUDIO_FILE>"
# Standard Edition (long audio or URL)
python3 <SKILL_DIR>/scripts/asr_standard.py --url "<AUDIO_URL>"
python3 <SKILL_DIR>/scripts/asr_standard.py --file "<LONG_AUDIO_FILE>"
# Standard Edition: only submit without polling
python3 <SKILL_DIR>/scripts/asr_standard.py --url "<URL>" --no-poll
# Standard Edition: query existing task
python3 <SKILL_DIR>/scripts/asr_standard.py --query-task-id <ID> --query-logid <LOGID>| Parameter | Required | Description |
|---|---|---|
| Choose one of three | Local audio file path |
| Choose one of three | URL address of audio file |
| Choose one of three | file_key of Feishu voice message |
| No | Feishu tenant_access_token |
| No | App ID |
| No | API Key |
| No | Language code |
| Parameter | Required | Description |
|---|---|---|
| Choose one of two | URL address of audio file |
| Choose one of two | Local audio file path |
| No | App ID |
| No | API Key |
| No | Language code |
| No | Only submit the task, do not poll the result |
| No | Polling interval in seconds (default 3) |
| No | Maximum polling time in seconds (default 10800) |
| No | Query existing task ID |
| No | X-Tt-Logid passed in during query |
Receive audio message → Audio file has been downloaded to /root/.openclaw/media/inbound/ → Execute asr_flash.py --file → Return text → Treat as user message for processing# Feishu voice file (most commonly used, the file has been automatically downloaded by Feishu plugin)
python scripts/asr_flash.py --file "/root/.openclaw/media/inbound/xxxxx.ogg"PermissionError: MODEL_SPEECH_API_KEY ...ASR request failedAudio duration exceeds 5 hoursAudio file does not exist/is empty