video-reader

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Video Reader Skill

Video Reader Skill

Primary Method: Gemini Native Video Understanding

主要方法:Gemini原生视频理解

🚨 MANDATORY: Use
alma video analyze
for ALL video tasks. DO NOT use ffmpeg frame extraction unless
alma video analyze
explicitly fails. Frame extraction is a LAST RESORT, not a default.

Always use this — Gemini can understand video natively (visual + audio).
bash
undefined
🚨 强制要求:所有视频任务均使用
alma video analyze
命令。仅当
alma video analyze
明确执行失败时,才可使用ffmpeg提取帧。帧提取是最后的备选方案,而非默认选项。
请始终优先使用此方法 —— Gemini可原生理解视频(包含视觉+音频信息)。
bash
undefined

Analyze a video with Gemini (uploads to Gemini Files API)

Analyze a video with Gemini (uploads to Gemini Files API)

alma video analyze "/path/to/video.mp4" "Describe what's happening in this video"
alma video analyze "/path/to/video.mp4" "Describe what's happening in this video"

Custom prompts

Custom prompts

alma video analyze "/path/to/video.mp4" "What language are they speaking? Summarize what they said" alma video analyze "/path/to/video.mp4" "Is this video funny? Why?" alma video analyze "/path/to/video.mp4" "Transcribe all spoken words in this video"

This uses Gemini's native multimodal video input — no frame extraction needed. Works with mp4, mov, webm, avi, mkv, m4v, 3gp. Max file size: 2GB.

**When to use Gemini:**
- Any video understanding task
- "What's in this video", "What did they say", "Summarize this"
- Best quality results — sees motion, hears audio, understands context
alma video analyze "/path/to/video.mp4" "What language are they speaking? Summarize what they said" alma video analyze "/path/to/video.mp4" "Is this video funny? Why?" alma video analyze "/path/to/video.mp4" "Transcribe all spoken words in this video"

该方法使用Gemini的原生多模态视频输入——无需提取帧。支持mp4、mov、webm、avi、mkv、m4v、3gp等格式,最大文件大小为2GB。

**Gemini适用场景:**
- 任何视频理解任务
- 例如“视频里有什么”、“他们说了什么”、“总结视频内容”等需求
- 可获得最佳质量结果——能识别动态画面、音频并理解上下文

Fallback Method: Frame Extraction + Whisper

备选方法:提取关键帧 + Whisper转录

Use this if Gemini fails (no Google provider, API error, unsupported format):
bash
undefined
当Gemini无法使用时(无Google服务、API错误、格式不支持),可使用此方法:
bash
undefined

Get video info

Get video info

ffprobe -v error -show_entries format=duration:stream=codec_name,width,height -of json "$VIDEO_PATH"
ffprobe -v error -show_entries format=duration:stream=codec_name,width,height -of json "$VIDEO_PATH"

Extract key frames (1 per second, max 12 frames)

Extract key frames (1 per second, max 12 frames)

OUTDIR=/tmp/alma-frames-$(date +%s) mkdir -p "$OUTDIR" DURATION=$(ffprobe -v error -show_entries format=duration -of csv=p=0 "$VIDEO_PATH" | cut -d. -f1) FPS_RATE=$(echo "scale=2; 12 / $DURATION" | bc 2>/dev/null || echo "1") if (( $(echo "$FPS_RATE > 1" | bc -l 2>/dev/null || echo 0) )); then FPS_RATE=1; fi ffmpeg -hide_banner -loglevel error -i "$VIDEO_PATH" -vf "fps=$FPS_RATE,scale=720:-1" -frames:v 12 "$OUTDIR/frame_%02d.jpg" ls "$OUTDIR"
undefined
OUTDIR=/tmp/alma-frames-$(date +%s) mkdir -p "$OUTDIR" DURATION=$(ffprobe -v error -show_entries format=duration -of csv=p=0 "$VIDEO_PATH" | cut -d. -f1) FPS_RATE=$(echo "scale=2; 12 / $DURATION" | bc 2>/dev/null || echo "1") if (( $(echo "$FPS_RATE > 1" | bc -l 2>/dev/null || echo 0) )); then FPS_RATE=1; fi ffmpeg -hide_banner -loglevel error -i "$VIDEO_PATH" -vf "fps=$FPS_RATE,scale=720:-1" -frames:v 12 "$OUTDIR/frame_%02d.jpg" ls "$OUTDIR"
undefined

Audio Transcription (Whisper)

音频转录(Whisper)

bash
undefined
bash
undefined

Extract audio

Extract audio

ffmpeg -hide_banner -loglevel error -i "$VIDEO_PATH" -vn -acodec pcm_s16le -ar 16000 -ac 1 "/tmp/alma-audio-$(date +%s).wav"
ffmpeg -hide_banner -loglevel error -i "$VIDEO_PATH" -vn -acodec pcm_s16le -ar 16000 -ac 1 "/tmp/alma-audio-$(date +%s).wav"

Transcribe

Transcribe

whisper "/tmp/alma-audio.wav" --model turbo --output_format txt --output_dir /tmp/alma-whisper cat /tmp/alma-whisper/*.txt
undefined
whisper "/tmp/alma-audio.wav" --model turbo --output_format txt --output_dir /tmp/alma-whisper cat /tmp/alma-whisper/*.txt
undefined

Thumbnail Grid (quick overview)

缩略图网格(快速概览)

bash
ffmpeg -hide_banner -loglevel error -i "$VIDEO_PATH" \
  -vf "fps=1/$(ffprobe -v error -show_entries format=duration -of csv=p=0 "$VIDEO_PATH" | awk '{printf "%.1f", $1/9}'),scale=320:-1,tile=3x3" \
  -frames:v 1 "$OUTDIR/grid.jpg"
bash
ffmpeg -hide_banner -loglevel error -i "$VIDEO_PATH" \
  -vf "fps=1/$(ffprobe -v error -show_entries format=duration -of csv=p=0 "$VIDEO_PATH" | awk '{printf "%.1f", $1/9}'),scale=320:-1,tile=3x3" \
  -frames:v 1 "$OUTDIR/grid.jpg"

Decision Flow

决策流程

  1. ALWAYS: Use
    alma video analyze
    (Gemini native) — this is the ONLY correct first choice
  2. ONLY if
    alma video analyze
    returns an error
    : Fall back to frame extraction + Whisper
  3. Audio only ("what did they say"): Can use Whisper directly
  4. Always clean up:
    rm -rf "$OUTDIR" /tmp/alma-whisper /tmp/alma-audio*.wav
⚠️ NEVER skip step 1. Frame extraction loses motion, audio, and context. Gemini native video understanding is dramatically better. If you find yourself writing
ffmpeg
or
ffprobe
for video analysis WITHOUT first trying
alma video analyze
, you are doing it wrong.
  1. 必须始终:使用
    alma video analyze
    (Gemini原生方法)——这是唯一正确的首选方案
  2. 仅当
    alma video analyze
    返回错误时
    :切换到帧提取+Whisper的备选方案
  3. 仅音频需求(如“他们说了什么”):可直接使用Whisper
  4. 始终清理文件
    rm -rf "$OUTDIR" /tmp/alma-whisper /tmp/alma-audio*.wav
⚠️ 绝对不要跳过步骤1。帧提取会丢失动态画面、音频和上下文信息。Gemini原生视频理解的效果显著更优。如果您在未先尝试
alma video analyze
的情况下,就直接使用
ffmpeg
ffprobe
进行视频分析,那您的操作是错误的。