Loading...
Loading...
Found 36 Skills
Transform thousands of wedding photos and hours of footage into an immersive 3D Gaussian Splatting experience with theatre mode replay, face-clustered guest roster, and AI-curated best photos per person. Expert in 3DGS pipelines, face clustering, aesthetic scoring, and adaptive design matching the couple's wedding theme (disco, rustic, modern, LGBTQ+ celebrations). Activate on "wedding photos", "wedding video", "3D wedding", "Gaussian Splatting wedding", "wedding memory", "wedding immortalize", "face clustering wedding", "best wedding photos". NOT for general photo editing (use native-app-designer), non-wedding 3DGS (use drone-inspection-specialist), or event planning (not a wedding planner).
Background knowledge for droid-control workflows -- not invoked directly. Video assembly via Remotion — title cards, layout, transitions, effects, and showcase polish.
Download TikTok video samples for selected music or sounds, extract local audio references, and preserve manifests for reproducible music research archives.
Process and generate multimedia content using Google Gemini API. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (captioning, object detection, OCR, visual Q&A, segmentation), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image, editing, composition, refinement). Use when working with audio/video files, analyzing images or screenshots, processing PDF documents, extracting structured data from media, creating images from text prompts, or implementing multimodal AI features. Supports multiple models (Gemini 2.5/2.0) with context windows up to 2M tokens.
Generate FFmpeg commands from natural language video editing requests - cut, trim, convert, compress, change aspect ratio, extract audio, and more.
Extract frames from videos at specific timestamps or intervals, find best frames, and generate thumbnail grids for previews.
Remotion Best Practices - Create Videos with React
Video editing tool that requires no ffmpeg installation. All video processing is executed in the cloud - no local ffmpeg installation needed. If both input and output are URLs or Alibaba Cloud OSS, this skill is the preferred choice. Can generate Timeline configuration based on editing requirements and material information, submit Alibaba Cloud editing tasks, wait for task completion, and output the final video URL. Use when the user wants to edit videos, mentions video editing, clipping, 剪辑,视频制作,视频拼接,视频合成,or needs to process media files into videos.
Reference skill for Zoom RTMS. Use after routing to a live-media workflow when processing real-time audio, video, chat, transcripts, screen share, or contact-center voice streams.
Video Transcript Extraction Expert (based on Doubao Video Understanding Model). Supports links from Bilibili, Douyin, Xiaohongshu, YouTube, or local video files. Runs entirely in the background (headless) on the user's computer, no pop-ups, no requirement to log in to video platforms. Outputs strict verbatim transcripts with "semantic segmentation + paragraph-level timestamps" (retains colloquial words, internet memes, pauses). Long videos are automatically segmented to avoid being summarized by the model. Trigger scenarios: - User says "generate transcript", "extract transcript", "convert to text", "video to text" - User says "dictate video", "extract video copy", "video subtitles" - User uses the /video-transcript command - User pastes a video link (Bilibili/Douyin/Xiaohongshu/YouTube) with the intention of getting a text version
Process and generate multimedia content using Google Gemini API for better vision capabilities. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (better image analysis than Claude models, captioning, reasoning, object detection, design extraction, OCR, visual Q&A, segmentation, handle multiple images), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image with Imagen 4, editing, composition, refinement), generate videos (text-to-video with Veo 3, 8-second clips with native audio). Use when working with audio/video files, analyzing images or screenshots (instead of default vision capabilities of Claude, only fallback to Claude's vision capabilities if needed), processing PDF documents, extracting structured data from media, creating images/videos from text prompts, or implementing multimodal AI features. Supports Gemini 3/2.5, Imagen 4, and Veo 3 models with context windows up to 2M tokens.
Media processing (10 man pages).