Video Reader Skill

Primary Method: Gemini Native Video Understanding

主要方法：Gemini原生视频理解

🚨 MANDATORY: Use

alma video analyze

for ALL video tasks. DO NOT use ffmpeg frame extraction unless

alma video analyze

explicitly fails. Frame extraction is a LAST RESORT, not a default.

—

Always use this — Gemini can understand video natively (visual + audio).

bash

undefined

🚨 强制要求：所有视频任务均使用

alma video analyze

命令。仅当

alma video analyze

明确执行失败时，才可使用ffmpeg提取帧。帧提取是最后的备选方案，而非默认选项。

请始终优先使用此方法 —— Gemini可原生理解视频（包含视觉+音频信息）。

bash

undefined

Analyze a video with Gemini (uploads to Gemini Files API)

alma video analyze "/path/to/video.mp4" "Describe what's happening in this video"

Custom prompts

alma video analyze "/path/to/video.mp4" "What language are they speaking? Summarize what they said" alma video analyze "/path/to/video.mp4" "Is this video funny? Why?" alma video analyze "/path/to/video.mp4" "Transcribe all spoken words in this video"


This uses Gemini's native multimodal video input — no frame extraction needed. Works with mp4, mov, webm, avi, mkv, m4v, 3gp. Max file size: 2GB.

**When to use Gemini:**
- Any video understanding task
- "What's in this video", "What did they say", "Summarize this"
- Best quality results — sees motion, hears audio, understands context

alma video analyze "/path/to/video.mp4" "What language are they speaking? Summarize what they said" alma video analyze "/path/to/video.mp4" "Is this video funny? Why?" alma video analyze "/path/to/video.mp4" "Transcribe all spoken words in this video"


该方法使用Gemini的原生多模态视频输入——无需提取帧。支持mp4、mov、webm、avi、mkv、m4v、3gp等格式，最大文件大小为2GB。

**Gemini适用场景：**
- 任何视频理解任务
- 例如“视频里有什么”、“他们说了什么”、“总结视频内容”等需求
- 可获得最佳质量结果——能识别动态画面、音频并理解上下文

Fallback Method: Frame Extraction + Whisper

备选方法：提取关键帧 + Whisper转录

Use this if Gemini fails (no Google provider, API error, unsupported format):

bash

undefined

当Gemini无法使用时（无Google服务、API错误、格式不支持），可使用此方法：

bash

undefined

Get video info

ffprobe -v error -show_entries format=duration:stream=codec_name,width,height -of json "$VIDEO_PATH"

Extract key frames (1 per second, max 12 frames)

OUTDIR=/tmp/alma-frames-$(date +%s) mkdir -p "$OUTDIR" DURATION=$(ffprobe -v error -show_entries format=duration -of csv=p=0 "$VIDEO_PATH" | cut -d. -f1) FPS_RATE=$(echo "scale=2; 12 / $DURATION" | bc 2>/dev/null || echo "1") if (( $(echo "$FPS_RATE > 1" | bc -l 2>/dev/null || echo 0) )); then FPS_RATE=1; fi ffmpeg -hide_banner -loglevel error -i "$VIDEO_PATH" -vf "fps=$FPS_RATE,scale=720:-1" -frames:v 12 "$OUTDIR/frame_%02d.jpg" ls "$OUTDIR"

undefined

OUTDIR=/tmp/alma-frames-$(date +%s) mkdir -p "$OUTDIR" DURATION=$(ffprobe -v error -show_entries format=duration -of csv=p=0 "$VIDEO_PATH" | cut -d. -f1) FPS_RATE=$(echo "scale=2; 12 / $DURATION" | bc 2>/dev/null || echo "1") if (( $(echo "$FPS_RATE > 1" | bc -l 2>/dev/null || echo 0) )); then FPS_RATE=1; fi ffmpeg -hide_banner -loglevel error -i "$VIDEO_PATH" -vf "fps=$FPS_RATE,scale=720:-1" -frames:v 12 "$OUTDIR/frame_%02d.jpg" ls "$OUTDIR"

undefined

Audio Transcription (Whisper)

音频转录（Whisper）

bash

undefined

bash

undefined

Extract audio

ffmpeg -hide_banner -loglevel error -i "$VIDEO_PATH" -vn -acodec pcm_s16le -ar 16000 -ac 1 "/tmp/alma-audio-$(date +%s).wav"

Transcribe

whisper "/tmp/alma-audio.wav" --model turbo --output_format txt --output_dir /tmp/alma-whisper cat /tmp/alma-whisper/*.txt

undefined

whisper "/tmp/alma-audio.wav" --model turbo --output_format txt --output_dir /tmp/alma-whisper cat /tmp/alma-whisper/*.txt

undefined

Thumbnail Grid (quick overview)

缩略图网格（快速概览）

bash

ffmpeg -hide_banner -loglevel error -i "$VIDEO_PATH" \
  -vf "fps=1/$(ffprobe -v error -show_entries format=duration -of csv=p=0 "$VIDEO_PATH" | awk '{printf "%.1f", $1/9}'),scale=320:-1,tile=3x3" \
  -frames:v 1 "$OUTDIR/grid.jpg"

bash

ffmpeg -hide_banner -loglevel error -i "$VIDEO_PATH" \
  -vf "fps=1/$(ffprobe -v error -show_entries format=duration -of csv=p=0 "$VIDEO_PATH" | awk '{printf "%.1f", $1/9}'),scale=320:-1,tile=3x3" \
  -frames:v 1 "$OUTDIR/grid.jpg"

Decision Flow

决策流程

ALWAYS: Use
```
alma video analyze
```
(Gemini native) — this is the ONLY correct first choice
ONLY if
alma video analyze
returns an error: Fall back to frame extraction + Whisper
Audio only ("what did they say"): Can use Whisper directly

Always clean up:

rm -rf "$OUTDIR" /tmp/alma-whisper /tmp/alma-audio*.wav

⚠️ NEVER skip step 1. Frame extraction loses motion, audio, and context. Gemini native video understanding is dramatically better. If you find yourself writing

ffmpeg

or

ffprobe

for video analysis WITHOUT first trying

alma video analyze

, you are doing it wrong.

必须始终：使用
```
alma video analyze
```
（Gemini原生方法）——这是唯一正确的首选方案
仅当
alma video analyze
返回错误时：切换到帧提取+Whisper的备选方案
仅音频需求（如“他们说了什么”）：可直接使用Whisper

始终清理文件：

rm -rf "$OUTDIR" /tmp/alma-whisper /tmp/alma-audio*.wav

⚠️ 绝对不要跳过步骤1。帧提取会丢失动态画面、音频和上下文信息。Gemini原生视频理解的效果显著更优。如果您在未先尝试

alma video analyze

的情况下，就直接使用

ffmpeg

或

ffprobe

进行视频分析，那您的操作是错误的。

video-reader

Original

Translation

Video Reader Skill

Video Reader Skill

Primary Method: Gemini Native Video Understanding

主要方法：Gemini原生视频理解

🚨 MANDATORY: Use
`alma video analyze`
for ALL video tasks. DO NOT use ffmpeg frame extraction unless
`alma video analyze`
explicitly fails. Frame extraction is a LAST RESORT, not a default.

Analyze a video with Gemini (uploads to Gemini Files API)

Analyze a video with Gemini (uploads to Gemini Files API)

Custom prompts

Custom prompts

Fallback Method: Frame Extraction + Whisper

备选方法：提取关键帧 + Whisper转录

Get video info

Get video info

Extract key frames (1 per second, max 12 frames)

Extract key frames (1 per second, max 12 frames)

Audio Transcription (Whisper)

音频转录（Whisper）

Extract audio

Extract audio

Transcribe

Transcribe

Thumbnail Grid (quick overview)

缩略图网格（快速概览）

Decision Flow

决策流程