Loading...
Loading...
Found 70 Skills
Analyze local or downloaded social video files with the official Gemini API, especially for TikTok/Reels benchmark breakdowns, script decomposition, and structured JSON outputs. Use this when you need video-level analysis beyond metadata, including uploading video files, prompting Gemini 3.1 Pro Preview, and linking results back to source metadata.
Turn a long video into N viral-ready short clips with a single managed API call. Wraps muapi.ai's `/ai-clipping` endpoint, which handles transcription, highlight ranking through a virality framework (hook / emotional peak / opinion bomb / revelation / conflict / quotable / story peak / practical value), overlap dedupe, and vertical face-tracking auto-crop server-side. No local Whisper, no local LLM, no GPU.
Extract transcript or subtitles from a local video file. Use this skill whenever the user asks to transcribe a video, extract speech-to-text, get subtitles, or wants a text version of what's said in a video. Also trigger on "提取字幕", "视频转文字", "语音转文字", "transcribe", "extract audio text", or when the user references getting a script/transcript from any video file (mp4, mkv, mov, avi, webm). This skill is for LOCAL video files — for YouTube or other online URLs, use the download-video skill first to get the file, then transcribe it.
Download and analyze social videos using frames + transcript for AI agent understanding at 50× lower cost than multimodal APIs
Extracts first and/or last frames of every shot from a video using adaptive scene detection. Use this skill when the user says "extract frames", "get shot frames", "pull frames", "shot breakdown", "scene detect", "first frame of each shot", "last frame of each shot", "extract shots from video", or wants to extract key frames at shot cut points from a video file.
Use when transforming video style with DashScope video-style-transform model. Use when converting videos to artistic styles such as Japanese manga, American comics, 3D cartoon, Chinese ink painting, paper art, or simple illustration via the video-synthesis async API.
Create and debug Cloudinary transformation URLs from natural language instructions. Use when building Cloudinary delivery URLs, applying image/video transformations, optimizing media, or debugging transformation syntax errors.
Python video composition with moviepy 2.x — overlaying deterministic text on AI-generated video (LTX-2, SadTalker), compositing clips, single-file build.py video projects. Use when adding labels/captions/lower-thirds to LTX-2 or SadTalker outputs, building short ad-style spots in pure Python without Remotion, or doing programmatic video composition. Triggers include text overlay on video, label LTX-2 clip, caption SadTalker output, lower third, build.py video, moviepy, Python video composition, sub-30s ad spot.
Turn video moments into AI-generated hand-drawn storyboards with Gemini-powered frame analysis and social media content generation
Process images, audio, and video files stored in Alibaba Cloud OSS. Supports 14+ image operations (resize, crop, rotate, watermark, blur, format conversion, etc.), image-intelligent features via IMM (blind watermark, face/body/car detection, QR recognition, labeling, scoring), and audio/video processing (transcoding, screenshot, animation, sprite sheet, concatenation, metadata extraction, HLS streaming). Results can be returned as signed URL, downloaded locally, or saved as new OSS object. Also supports plain file upload/download. Use when the user needs to process or transform media files in OSS, such as generating thumbnails, transcoding video, extracting audio, adding watermarks, detecting faces, compressing images, or converting formats. Triggers on media processing requests in English or Chinese (resize, crop, thumbnail, transcode, video convert, audio convert, watermark, face detection, 缩略图, 裁剪, 压缩, 转码, 视频转换, 音频处理, 水印, 盲水印, 人脸检测, 截帧, 拼接).