Loading...
Loading...
Found 125 Skills
Use this skill whenever the user wants to transcribe audio to text, convert speech to text, or get a transcript from an audio or video file. Triggers include: any mention of 'transcribe', 'transcription', 'speech to text', 'STT', 'convert audio to text', 'what does this audio say', 'get transcript', 'subtitle generation', or requests to extract spoken words from a file. Also use when the user wants speaker identification from audio, timestamps for captions, or multilingual transcription.
Convert files between 140+ formats using the ConversionTools MCP server. Use when the user needs to convert documents (Word, PDF, Excel, PowerPoint), data formats (JSON, CSV, XML, YAML, Parquet), images (PNG, JPG, WebP, AVIF, HEIC, JXL, SVG), audio (MP3, WAV, FLAC), video (MOV, MKV, AVI to MP4), e-books (EPUB, MOBI, AZW), OCR text extraction, AI-powered data extraction, AI text-to-speech (TTS), AI speech-to-text transcription (STT), subtitle conversion (SRT, VTT, ASS), or website screenshots.
Voice: outbound/inbound, TwiML, conferencing, recording, transcription, IVR Gather, SIP, BYOC
Complete subtitle and caption system for FFmpeg 7.1 LTS and 8.0.1 (latest stable, released 2025-11-20). PROACTIVELY activate for: (1) Burning subtitles (hardcoding SRT/ASS/VTT), (2) Adding soft subtitle tracks, (3) Extracting subtitles from video, (4) Subtitle format conversion, (5) Styled captions (font, color, outline, shadow), (6) Subtitle positioning and alignment, (7) CEA-608/708 closed captions, (8) Text overlays with drawtext, (9) Whisper AI automatic transcription (FFmpeg 8.0+ with VAD, multi-language, GPU), (10) Batch subtitle processing. Provides: Format reference tables, styling parameter guide, position alignment charts, Whisper model comparison, VAD configuration, dynamic text examples, accessibility best practices. Ensures: Professional captions with proper styling and accessibility compliance.
Use when user asks YouTube video extraction, get, fetch, transcripts, subtitles, or captions. Writes video details and transcription into structured markdown file.
Clean and reconstruct raw auto-generated captions (Zoom, YouTube, Teams, Google Meet, Otter.ai, etc.) into readable, coherent transcripts. Use when the user provides raw caption files (.txt, .vtt, .srt), meeting transcripts with timestamps and speaker tags, or asks to clean up/refine a transcript. Handles: timestamp removal, speaker tag normalization, filler word removal, broken sentence reconstruction, transcription error correction, paragraph formation. Preserves every piece of substantive content while removing noise. Trigger phrases: 'clean this transcript', 'refine captions', 'fix this transcript', 'process Zoom captions', 'clean up meeting notes'.
Speech-to-text transcription using Groq Whisper API. Supports m4a, mp3, wav, ogg, flac, webm.
Comprehensive automation for Letterly transcriptions. This skill exports the latest CSV from Letterly, processes "magic" notes into Obsidian markdown with custom metadata, semantically links them using a vector database, and moves them to the final Transcriptions directory. Use when the user asks to "process new letterly transcriptions", "sync letterly", or "import magic notes from letterly".
OpenAI Whisper API: audio transcription, translation, structured output, large file handling
Video understanding and transcription with intelligent multi-provider fallback. Use when: (1) Transcribing video or audio content, (2) Understanding video content including visual elements and scenes, (3) Analyzing YouTube videos by URL, (4) Extracting information from local video files, (5) Getting timestamps, summaries, or answering questions about video content. Automatically selects the best available provider based on configured API keys - prefers full video understanding (Gemini/OpenRouter) over ASR-only providers. Supports model selection per provider.
Download videos from social media URLs (X/Twitter, YouTube, Instagram, TikTok, etc.) using yt-dlp. Use when saving a video locally, extracting content for transcription, or archiving video references.
Video & Podcast Digest — send a video/podcast link, get full transcript + structured summary. Supports YouTube, Bilibili, X/Twitter video, Xiaoyuzhou, Apple Podcasts, and direct audio/video links. Uses yt-dlp for subtitles and Groq Whisper for transcription.