Loading...
Loading...
Found 36 Skills
ElevenLabs Speech-to-Text transcription workflows with Scribe v1 supporting 99 languages, speaker diarization, and Vercel AI SDK integration. Use when implementing audio transcription, building STT features, integrating speech-to-text, setting up Vercel AI SDK with ElevenLabs, or when user mentions transcription, STT, Scribe v1, audio-to-text, speaker diarization, or multi-language transcription.
Text-to-speech (TTS) and speech-to-text (STT) via Together AI. TTS models include Orpheus, Kokoro, Cartesia Sonic, Rime, MiniMax with REST, streaming, and WebSocket support. STT models include Whisper and Voxtral. Use when users need voice synthesis, audio generation, speech recognition, transcription, TTS, STT, or real-time voice applications.
Watermark-free Douyin video download and transcript extraction tool. Retrieve watermark-free video download links from Douyin share links, download videos, extract voice transcripts from videos and automatically save them to files. Applicable scenarios include obtaining Douyin video information, downloading watermark-free videos, and batch extracting video transcripts. Triggered when users need to process Douyin video links or extract video content.
Transcribe audio files using Groq API (Whisper models). Use when user needs to transcribe audio to text.
Transcribe non-realtime speech with Alibaba Cloud Model Studio Qwen ASR models (`qwen3-asr-flash`, `qwen-audio-asr`, `qwen3-asr-flash-filetrans`). Use when converting recorded audio files to text, generating transcripts with timestamps, or documenting DashScope/OpenAI-compatible ASR request and response fields.
Transcribe audio files using Qwen ASR. Use when the user sends voice messages and wants them converted to text.
Transcribe audio files to text using OpenAI Whisper
Transcribe audio and video files using the Deepgram API. This skill should be used when the user requests transcription of audio files (mp3, wav, m4a, aac) or video files (mp4, mov, avi, etc.). Handles large video files by extracting audio first to reduce upload size and processing time.
Generate (TTS), Transcribe (STT), and Clone voices using Google's GenAI and Cloud Speech SDKs. Supports Gemini-TTS, Chirp 3, and Instant Custom Voice.
Architecting real-time Voice AI agents.
Install and configure Deepgram SDK/CLI authentication. Use when setting up a new Deepgram integration, configuring API keys, or initializing Deepgram in your project. Trigger with phrases like "install deepgram", "setup deepgram", "deepgram auth", "configure deepgram API key".
Fast ASR CLI tool for transcribing audio/video files. Use when user wants to transcribe audio/video, generate subtitles (VTT), convert speech to text with timestamps (JSON), or optimize transcription for low memory.