video-reader
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseVideo Reader Skill
Video Reader Skill
Primary Method: Gemini Native Video Understanding
主要方法:Gemini原生视频理解
🚨 MANDATORY: Use alma video analyze
for ALL video tasks. DO NOT use ffmpeg frame extraction unless alma video analyze
explicitly fails. Frame extraction is a LAST RESORT, not a default.
alma video analyzealma video analyze—
Always use this — Gemini can understand video natively (visual + audio).
bash
undefined🚨 强制要求:所有视频任务均使用命令。仅当明确执行失败时,才可使用ffmpeg提取帧。帧提取是最后的备选方案,而非默认选项。
alma video analyzealma video analyze请始终优先使用此方法 —— Gemini可原生理解视频(包含视觉+音频信息)。
bash
undefinedAnalyze a video with Gemini (uploads to Gemini Files API)
Analyze a video with Gemini (uploads to Gemini Files API)
alma video analyze "/path/to/video.mp4" "Describe what's happening in this video"
alma video analyze "/path/to/video.mp4" "Describe what's happening in this video"
Custom prompts
Custom prompts
alma video analyze "/path/to/video.mp4" "What language are they speaking? Summarize what they said"
alma video analyze "/path/to/video.mp4" "Is this video funny? Why?"
alma video analyze "/path/to/video.mp4" "Transcribe all spoken words in this video"
This uses Gemini's native multimodal video input — no frame extraction needed. Works with mp4, mov, webm, avi, mkv, m4v, 3gp. Max file size: 2GB.
**When to use Gemini:**
- Any video understanding task
- "What's in this video", "What did they say", "Summarize this"
- Best quality results — sees motion, hears audio, understands contextalma video analyze "/path/to/video.mp4" "What language are they speaking? Summarize what they said"
alma video analyze "/path/to/video.mp4" "Is this video funny? Why?"
alma video analyze "/path/to/video.mp4" "Transcribe all spoken words in this video"
该方法使用Gemini的原生多模态视频输入——无需提取帧。支持mp4、mov、webm、avi、mkv、m4v、3gp等格式,最大文件大小为2GB。
**Gemini适用场景:**
- 任何视频理解任务
- 例如“视频里有什么”、“他们说了什么”、“总结视频内容”等需求
- 可获得最佳质量结果——能识别动态画面、音频并理解上下文Fallback Method: Frame Extraction + Whisper
备选方法:提取关键帧 + Whisper转录
Use this if Gemini fails (no Google provider, API error, unsupported format):
bash
undefined当Gemini无法使用时(无Google服务、API错误、格式不支持),可使用此方法:
bash
undefinedGet video info
Get video info
ffprobe -v error -show_entries format=duration:stream=codec_name,width,height -of json "$VIDEO_PATH"
ffprobe -v error -show_entries format=duration:stream=codec_name,width,height -of json "$VIDEO_PATH"
Extract key frames (1 per second, max 12 frames)
Extract key frames (1 per second, max 12 frames)
OUTDIR=/tmp/alma-frames-$(date +%s)
mkdir -p "$OUTDIR"
DURATION=$(ffprobe -v error -show_entries format=duration -of csv=p=0 "$VIDEO_PATH" | cut -d. -f1)
FPS_RATE=$(echo "scale=2; 12 / $DURATION" | bc 2>/dev/null || echo "1")
if (( $(echo "$FPS_RATE > 1" | bc -l 2>/dev/null || echo 0) )); then FPS_RATE=1; fi
ffmpeg -hide_banner -loglevel error -i "$VIDEO_PATH" -vf "fps=$FPS_RATE,scale=720:-1" -frames:v 12 "$OUTDIR/frame_%02d.jpg"
ls "$OUTDIR"
undefinedOUTDIR=/tmp/alma-frames-$(date +%s)
mkdir -p "$OUTDIR"
DURATION=$(ffprobe -v error -show_entries format=duration -of csv=p=0 "$VIDEO_PATH" | cut -d. -f1)
FPS_RATE=$(echo "scale=2; 12 / $DURATION" | bc 2>/dev/null || echo "1")
if (( $(echo "$FPS_RATE > 1" | bc -l 2>/dev/null || echo 0) )); then FPS_RATE=1; fi
ffmpeg -hide_banner -loglevel error -i "$VIDEO_PATH" -vf "fps=$FPS_RATE,scale=720:-1" -frames:v 12 "$OUTDIR/frame_%02d.jpg"
ls "$OUTDIR"
undefinedAudio Transcription (Whisper)
音频转录(Whisper)
bash
undefinedbash
undefinedExtract audio
Extract audio
ffmpeg -hide_banner -loglevel error -i "$VIDEO_PATH" -vn -acodec pcm_s16le -ar 16000 -ac 1 "/tmp/alma-audio-$(date +%s).wav"
ffmpeg -hide_banner -loglevel error -i "$VIDEO_PATH" -vn -acodec pcm_s16le -ar 16000 -ac 1 "/tmp/alma-audio-$(date +%s).wav"
Transcribe
Transcribe
whisper "/tmp/alma-audio.wav" --model turbo --output_format txt --output_dir /tmp/alma-whisper
cat /tmp/alma-whisper/*.txt
undefinedwhisper "/tmp/alma-audio.wav" --model turbo --output_format txt --output_dir /tmp/alma-whisper
cat /tmp/alma-whisper/*.txt
undefinedThumbnail Grid (quick overview)
缩略图网格(快速概览)
bash
ffmpeg -hide_banner -loglevel error -i "$VIDEO_PATH" \
-vf "fps=1/$(ffprobe -v error -show_entries format=duration -of csv=p=0 "$VIDEO_PATH" | awk '{printf "%.1f", $1/9}'),scale=320:-1,tile=3x3" \
-frames:v 1 "$OUTDIR/grid.jpg"bash
ffmpeg -hide_banner -loglevel error -i "$VIDEO_PATH" \
-vf "fps=1/$(ffprobe -v error -show_entries format=duration -of csv=p=0 "$VIDEO_PATH" | awk '{printf "%.1f", $1/9}'),scale=320:-1,tile=3x3" \
-frames:v 1 "$OUTDIR/grid.jpg"Decision Flow
决策流程
- ALWAYS: Use (Gemini native) — this is the ONLY correct first choice
alma video analyze - ONLY if returns an error: Fall back to frame extraction + Whisper
alma video analyze - Audio only ("what did they say"): Can use Whisper directly
- Always clean up:
rm -rf "$OUTDIR" /tmp/alma-whisper /tmp/alma-audio*.wav
⚠️ NEVER skip step 1. Frame extraction loses motion, audio, and context. Gemini native video understanding is dramatically better. If you find yourself writing or for video analysis WITHOUT first trying , you are doing it wrong.
ffmpegffprobealma video analyze- 必须始终:使用(Gemini原生方法)——这是唯一正确的首选方案
alma video analyze - 仅当返回错误时:切换到帧提取+Whisper的备选方案
alma video analyze - 仅音频需求(如“他们说了什么”):可直接使用Whisper
- 始终清理文件:
rm -rf "$OUTDIR" /tmp/alma-whisper /tmp/alma-audio*.wav
⚠️ 绝对不要跳过步骤1。帧提取会丢失动态画面、音频和上下文信息。Gemini原生视频理解的效果显著更优。如果您在未先尝试的情况下,就直接使用或进行视频分析,那您的操作是错误的。
alma video analyzeffmpegffprobe