Loading...
Loading...
Found 12 Skills
Transcribe video files directly into timed transcripts and subtitle-ready artifacts using hosted Whisper video-to-text. Use this when the input is a video and the goal is speech extraction, caption generation, or edit-prep timing.
Transcription of oral broadcast videos and recognition of speech errors. Generate review drafts and deletion task lists. Trigger words: edit oral broadcast, process video, recognize speech errors
Transcribe audio and video files using the Deepgram API. This skill should be used when the user requests transcription of audio files (mp3, wav, m4a, aac) or video files (mp4, mov, avi, etc.). Handles large video files by extracting audio first to reduce upload size and processing time.
Extract full transcripts from video content for analysis, summarization, note-taking, or research. Use when the user wants a written version of video content, asks to "transcribe this", "get the text from this video", "convert video to text", or shares a video URL for content extraction.
Video processing toolkit. Use when user wants to: - Download videos from YouTube or other sites - Remove silence from videos - Trim, cut, or extract segments from videos - Extract audio from video files - Enhance or denoise audio - Replace audio track in a video - Change video playback speed - Concatenate multiple videos - Generate transcripts/captions (VTT) - Generate video descriptions, timestamps, or context cards - Upload videos to YouTube or Bunny.net CDN - Post social updates to X (Twitter) or LinkedIn - Get video metadata (duration, resolution, codec)
Adds visual descriptions to transcripts by extracting and analyzing video frames with ffmpeg. Creates visual transcript with periodic visual descriptions of the video clip. Use when all files have audio transcripts present (transcript) but don't yet have visual transcripts created (visual_transcript).
Spoken video transcription and slip-of-the-tongue recognition. Generate review drafts and deletion task checklists. Trigger phrases: edit spoken video, process video, recognize slip-of-the-tongue
Video & Podcast Digest — send a video/podcast link, get full transcript + structured summary. Supports YouTube, Bilibili, X/Twitter video, Xiaoyuzhou, Apple Podcasts, and direct audio/video links. Uses yt-dlp for subtitles and Groq Whisper for transcription.
Generate subtitles (SRT/VTT) and plain text transcripts from video or audio files using AWS Transcribe. Use when creating captions, extracting spoken content, generating transcripts for notes, or making video content searchable.
Transcribe audio/video with AssemblyAI (local upload or URL), plus subtitles + paragraph/sentence exports.
Extract, transcribe, and translate YouTube video transcripts using the YouTubeTranscript.dev V2 API. Supports captions, ASR audio transcription, batch processing (up to 100 videos), translation to 100+ languages, and multiple output formats. Use when working with YouTube videos, subtitles, captions, or video-to-text conversion.
Video Transcript Extraction Expert (based on Doubao Video Understanding Model). Supports links from Bilibili, Douyin, Xiaohongshu, YouTube, or local video files. Runs entirely in the background (headless) on the user's computer, no pop-ups, no requirement to log in to video platforms. Outputs strict verbatim transcripts with "semantic segmentation + paragraph-level timestamps" (retains colloquial words, internet memes, pauses). Long videos are automatically segmented to avoid being summarized by the model. Trigger scenarios: - User says "generate transcript", "extract transcript", "convert to text", "video to text" - User says "dictate video", "extract video copy", "video subtitles" - User uses the /video-transcript command - User pastes a video link (Bilibili/Douyin/Xiaohongshu/YouTube) with the intention of getting a text version