Loading...
Loading...
Found 30 Skills
Use when low-latency realtime speech recognition is needed with Alibaba Cloud Model Studio Qwen ASR Realtime models, including streaming microphone input, live captions, or duplex voice agents.
Transcribe audio files to text using local speech recognition. Triggers on: "转录", "transcribe", "语音转文字", "ASR", "识别音频", "把这段音频转成文字".
Use local FunASR service to transcribe audio or video files into timestamped Markdown files, supporting common formats such as mp4, mov, mp3, wav, m4a, etc. This skill should be used when users need speech-to-text conversion, meeting minutes, video subtitles, or podcast transcription.
Minimal non-realtime ASR smoke test for Model Studio Qwen ASR.
Transcribe non-realtime speech with Alibaba Cloud Model Studio Qwen ASR models (`qwen3-asr-flash`, `qwen-audio-asr`, `qwen3-asr-flash-filetrans`). Use when converting recorded audio files to text, generating transcripts with timestamps, or documenting DashScope/OpenAI-compatible ASR request and response fields.
Transcribe audio files using Qwen ASR. Use when the user sends voice messages and wants them converted to text.
Minimal realtime ASR smoke test for Model Studio Qwen ASR Realtime.
OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M params). Use for speech-to-text, podcast transcription, or multilingual audio processing. Best for robust, multilingual ASR.
Vox single-entry voice orchestration skill. Used to complete environment guarding, CLI installation, on-demand model download, ASR transcription, voice cloning, pipeline execution and task troubleshooting through natural language. It is used when users only describe the target without providing specific commands.
Video understanding and transcription with intelligent multi-provider fallback. Use when: (1) Transcribing video or audio content, (2) Understanding video content including visual elements and scenes, (3) Analyzing YouTube videos by URL, (4) Extracting information from local video files, (5) Getting timestamps, summaries, or answering questions about video content. Automatically selects the best available provider based on configured API keys - prefers full video understanding (Gemini/OpenRouter) over ASR-only providers. Supports model selection per provider.
Fast ASR CLI tool for transcribing audio/video files. Use when user wants to transcribe audio/video, generate subtitles (VTT), convert speech to text with timestamps (JSON), or optimize transcription for low memory.
Extract, transcribe, and translate YouTube video transcripts using the YouTubeTranscript.dev V2 API. Supports captions, ASR audio transcription, batch processing (up to 100 videos), translation to 100+ languages, and multiple output formats. Use when working with YouTube videos, subtitles, captions, or video-to-text conversion.