Search Results: video-understanding

Found 11 Skills

AI & Machine Learningfreestylefly/canghe-skill...

volcengine-video-understanding

Volcengine Video Understanding - Analyze video content using Volcano Ark Video Understanding API. Upload videos via Files API (recommended), supports large files up to 512MB, applicable for video content analysis, object recognition, action understanding, etc. This skill is activated when users need to analyze videos, understand video content, or extract video information.

🇨🇳|ChineseTranslated

1 scripts/Checked

AI & Machine Learningnvidia/skills

video-understanding

Call the vss agent to run video understanding on video to answer a text question. Use when the user asks about video content, or about visual details that cannot be answered from conversation history, search hits, or metadata alone.

🇺🇸|EnglishTranslated

AI & Machine Learningaia-11-hn-mib/mib-mockint...

gemini-video-understanding

Analyze videos using Google's Gemini API - describe content, answer questions, transcribe audio with visual descriptions, reference timestamps, clip videos, and process YouTube URLs. Supports 9 video formats, multiple models (Gemini 2.5/2.0), and context windows up to 2M tokens (6 hours of video).

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

vss-ask-video

Use to ask the VSS agent's video_understanding tool a fresh visual question about a recorded clip. Not for prior tool output, search hits, or metadata-answerable questions.

🇺🇸|EnglishTranslated

AI & Machine Learningninehills/skills

video-reader

Read, watch, and listen to video/audio files. Use Gemini for native video understanding, or extract key frames + Whisper transcription as fallback. Use when a user sends a video/audio and asks about its content, what's in it, what someone said, etc.

🇺🇸|EnglishTranslated

AI & Machine Learningmikeygonz/skills

watch-youtube

Watch and analyze YouTube videos using Gemini's video understanding API. Pass any YouTube URL to get summaries, timestamps, Q&A, or detailed analysis of video content — audio and visual.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningqianwen-ai/qianwen-ai

qianwen-vision

[QianWen] Understand images and videos with Qwen vision models. TRIGGER when: user wants to analyze, describe, or extract information from images or videos, OCR text extraction, chart/table reading, visual reasoning, multi-image comparison, screenshot understanding, video comprehension, or explicitly invokes this skill by name (e.g. use qianwen-vision). DO NOT TRIGGER when: user wants to generate/create images (use qianwen-image-generation), generate videos (use qianwen-video-generation), text-only tasks without visual input, or non-Qwen vision tasks.

🇺🇸|EnglishTranslated

6 scripts/Checked

AI & Machine Learningnvidia/skills

report

Produce video analysis reports by discovering the deployed VSS agent, querying POST /generate for a timestamped captioned summary of the clip, then formatting the agent reply as the standard Video Analysis Report markdown.

🇺🇸|EnglishTranslated

AI & Machine Learningxsir0/xsir-skills

google-gemini-media

Use the Gemini API (Nano Banana image generation, Veo video, Gemini TTS speech and audio understanding) to deliver end-to-end multimodal media workflows and code templates for "generation + understanding".

🇺🇸|EnglishTranslated

19 scripts/Checked

AI & Machine Learningjrusso1020/video-understa...

video-understand

Video understanding and transcription with intelligent multi-provider fallback. Use when: (1) Transcribing video or audio content, (2) Understanding video content including visual elements and scenes, (3) Analyzing YouTube videos by URL, (4) Extracting information from local video files, (5) Getting timestamps, summaries, or answering questions about video content. Automatically selects the best available provider based on configured API keys - prefers full video understanding (Gemini/OpenRouter) over ASR-only providers. Supports model selection per provider.

🇺🇸|EnglishTranslated

3 scripts/Attention

AI & Machine Learningjordanrendric/claude-vide...

video-perception

Use when the user mentions a video file (.mp4, .mov, .avi, .mkv, .webm), a YouTube URL, asks to watch/analyze/review a video, or references video content in conversation

🇺🇸|EnglishTranslated