Loading...
Loading...
Transcribes audio to text with timestamps and optional speaker identification. Use when you need to convert speech to text, create subtitles, transcribe meetings, or process voice recordings.
npx skill4agent add agntswrm/agent-media audio-transcribeagent-media audio transcribe --in <path> [options]| Option | Required | Description |
|---|---|---|
| Yes | Input audio file path or URL (supports mp3, wav, m4a, ogg) |
| No | Enable speaker identification |
| No | Language code (auto-detected if not provided) |
| No | Number of speakers hint for diarization |
| No | Output path, filename or directory (default: ./) |
| No | Provider to use (local, fal, replicate, runpod) |
{
"ok": true,
"media_type": "audio",
"action": "transcribe",
"provider": "fal",
"output_path": "transcription_123_abc.json",
"transcription": {
"text": "Full transcription text...",
"language": "en",
"segments": [
{ "start": 0.0, "end": 2.5, "text": "Hello.", "speaker": "SPEAKER_0" },
{ "start": 2.5, "end": 5.0, "text": "Hi there.", "speaker": "SPEAKER_1" }
]
}
}agent-media audio transcribe --in interview.mp3agent-media audio transcribe --in meeting.wav --diarizeagent-media audio transcribe --in podcast.mp3 --diarize --language en --speakers 3agent-media audio transcribe --in audio.wav --provider replicate# Step 1: Extract audio from video
agent-media audio extract --in video.mp4 --format mp3
# Step 2: Transcribe the extracted audio
agent-media audio transcribe --in extracted_xxx.mp3mutex lock failed"ok": trueagent-media audio transcribe --in audio.mp3 --provider localFAL_API_KEYwizperwhisperREPLICATE_API_TOKENwhisper-diarizationRUNPOD_API_KEYpruna/whisper-v3-largeagent-media audio transcribe --in audio.mp3 --provider runpod