Loading...
Loading...
Found 25 Skills
[QwenCloud] Understand images and videos with Qwen vision models. TRIGGER when: user wants to analyze, describe, or extract information from images or videos, OCR text extraction, chart/table reading, visual reasoning, multi-image comparison, screenshot understanding, video comprehension, or explicitly invokes this skill by name (e.g. use qwencloud-vision). DO NOT TRIGGER when: user wants to generate/create images (use qwencloud-image-generation), generate videos (use qwencloud-video-generation), text-only tasks without visual input, or non-Qwen vision tasks.
Use this skill when analyzing existing video files using FFmpeg and AI vision, extracting frames for design system generation, detecting scene boundaries, analyzing animation timing, extracting color palettes, or understanding audio-visual sync. Triggers on video analysis, frame extraction, scene detection, ffprobe, motion analysis, and AI vision analysis of video content.
Alibaba Cloud Bailian Video Analysis Skill. Use for intelligent video comprehension and analysis via the Bailian (QuanMiaoLightApp) API. **Required API Product**: QuanMiaoLightApp (version 2024-08-01) **Required API Actions**: SubmitVideoAnalysisTask, GetVideoAnalysisTask **DO NOT use**: videorecog, Mts, or any other product for video analysis Triggers: "analyze video", "understand video", "analyze the local video /temp/xxx.mp4", "analyze the local video https://xxx.com/temp/xxx.mp4", "what is this video about", "summarize this video", "split video into shots", "video comprehension", "extract video insights", "transcribe video", "extract video captions", "generate video title", "generate video outline", "video mindmap".
Extract useful frames from local video files based on task intent, such as persona research, shot breakdown, product visibility, UI walkthroughs, visual-style review, or CTA/compliance checks. Use this when the goal is not generic video analysis, but selecting the right still frames and contact sheets for a specific downstream need.
Open Source Computer Vision Library (OpenCV) for real-time image processing, video analysis, object detection, face recognition, and camera calibration. Use when working with images, videos, cameras, edge detection, contours, feature detection, image transformations, object tracking, optical flow, or any computer vision task.
Watch and understand video files by converting them to viewable image storyboards for enjoyment, analysis, species identification, behavior tracking, and comprehension
Multimodal AI processing via Google Gemini API (2M tokens context). Capabilities: audio (transcription, 9.5hr max, summarization, music analysis), images (captioning, OCR, object detection, segmentation, visual Q&A), video (scene detection, 6hr max, YouTube URLs, temporal analysis), documents (PDF extraction, tables, forms, charts), image generation (text-to-image, editing). Actions: transcribe, analyze, extract, caption, detect, segment, generate from media. Keywords: Gemini API, audio transcription, image captioning, OCR, object detection, video analysis, PDF extraction, text-to-image, multimodal, speech recognition, visual Q&A, scene detection, YouTube transcription, table extraction, form processing, image generation, Imagen. Use when: transcribing audio/video, analyzing images/screenshots, extracting data from PDFs, processing YouTube videos, generating images from text, implementing multimodal AI features.
Watch and analyze YouTube videos using Gemini's video understanding API. Pass any YouTube URL to get summaries, timestamps, Q&A, or detailed analysis of video content — audio and visual.
Analyze videos, screen recordings, and screenshots to generate structured, actionable notes for coding agents. Supports Loom, YouTube, and local files. Extracts visual context, on-screen text, and audio narration. Use when someone shares a video and you need to understand what it shows.
Watch a bug video, screen recording, or screenshot, understand what's broken, find the relevant code, implement a fix, and raise a PR. Supports Loom, YouTube, and local files. Uses Gemini Flash for video understanding.
Watch a tutorial, demo, or walkthrough video and generate a Claude Code skill from it. Extracts the workflow, commands, tools, and patterns demonstrated and produces a SKILL.md with implementation. Supports Loom, YouTube, and local files.
Analyze videos using Google's Gemini API - describe content, answer questions, transcribe audio with visual descriptions, reference timestamps, clip videos, and process YouTube URLs. Supports 9 video formats, multiple models (Gemini 2.5/2.0), and context windows up to 2M tokens (6 hours of video).