video-reader
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseVideo Reader Skill
Video Reader Skill
Extract key frames to "see" videos, and extract + transcribe audio to "hear" them.
提取关键帧来“查看”视频内容,同时提取并转录音频来“聆听”音频信息。
Quick Start
快速开始
bash
undefinedbash
undefinedGet video info (duration, resolution, codec)
获取视频信息(时长、分辨率、编解码器)
ffprobe -v error -show_entries format=duration:stream=codec_name,width,height -of json "$VIDEO_PATH"
ffprobe -v error -show_entries format=duration:stream=codec_name,width,height -of json "$VIDEO_PATH"
Extract key frames (1 per second, max 12 frames)
提取关键帧(每秒1帧,最多12帧)
OUTDIR=/tmp/alma-frames-$(date +%s)
mkdir -p "$OUTDIR"
DURATION=$(ffprobe -v error -show_entries format=duration -of csv=p=0 "$VIDEO_PATH" | cut -d. -f1)
FPS_RATE=$(echo "scale=2; 12 / $DURATION" | bc 2>/dev/null || echo "1")
OUTDIR=/tmp/alma-frames-$(date +%s)
mkdir -p "$OUTDIR"
DURATION=$(ffprobe -v error -show_entries format=duration -of csv=p=0 "$VIDEO_PATH" | cut -d. -f1)
FPS_RATE=$(echo "scale=2; 12 / $DURATION" | bc 2>/dev/null || echo "1")
Cap at 1fps for short videos
短视频限制为1fps
if (( $(echo "$FPS_RATE > 1" | bc -l 2>/dev/null || echo 0) )); then FPS_RATE=1; fi
ffmpeg -hide_banner -loglevel error -i "$VIDEO_PATH" -vf "fps=$FPS_RATE,scale=720:-1" -frames:v 12 "$OUTDIR/frame_%02d.jpg"
ls "$OUTDIR"
undefinedif (( $(echo "$FPS_RATE > 1" | bc -l 2>/dev/null || echo 0) )); then FPS_RATE=1; fi
ffmpeg -hide_banner -loglevel error -i "$VIDEO_PATH" -vf "fps=$FPS_RATE,scale=720:-1" -frames:v 12 "$OUTDIR/frame_%02d.jpg"
ls "$OUTDIR"
undefinedHow to Use
使用方法
- Get video info first to know the duration
- For short videos (<15s): extract 1 frame per second
- For medium videos (15-60s): extract ~8-12 frames evenly spread
- For long videos (>60s): extract 12 frames at key intervals
- Look at the extracted frames (they're image files) to describe the video content
- 先获取视频信息,了解其时长
- 短视频(<15秒):每秒提取1帧
- 中等时长视频(15-60秒):均匀提取约8-12帧
- 长视频(>60秒):在关键时间点提取12帧
- 查看提取的帧(图片文件)来描述视频内容
Frame Extraction Patterns
关键帧提取模式
bash
undefinedbash
undefinedEven spread: N frames across entire video
均匀分布:在整个视频中提取N帧
ffmpeg -hide_banner -loglevel error -i "$VIDEO_PATH"
-vf "select='not(mod(n,$(ffprobe -v error -count_frames -select_streams v:0 -show_entries stream=nb_read_frames -of csv=p=0 "$VIDEO_PATH" | awk -v n=8 '{printf "%d", $1/n}')))'"
-vsync vfr -frames:v 8 "$OUTDIR/frame_%02d.jpg"
-vf "select='not(mod(n,$(ffprobe -v error -count_frames -select_streams v:0 -show_entries stream=nb_read_frames -of csv=p=0 "$VIDEO_PATH" | awk -v n=8 '{printf "%d", $1/n}')))'"
-vsync vfr -frames:v 8 "$OUTDIR/frame_%02d.jpg"
ffmpeg -hide_banner -loglevel error -i "$VIDEO_PATH"
-vf "select='not(mod(n,$(ffprobe -v error -count_frames -select_streams v:0 -show_entries stream=nb_read_frames -of csv=p=0 "$VIDEO_PATH" | awk -v n=8 '{printf "%d", $1/n}')))'"
-vsync vfr -frames:v 8 "$OUTDIR/frame_%02d.jpg"
-vf "select='not(mod(n,$(ffprobe -v error -count_frames -select_streams v:0 -show_entries stream=nb_read_frames -of csv=p=0 "$VIDEO_PATH" | awk -v n=8 '{printf "%d", $1/n}')))'"
-vsync vfr -frames:v 8 "$OUTDIR/frame_%02d.jpg"
Specific timestamp
指定时间戳提取
ffmpeg -hide_banner -loglevel error -ss 5.0 -i "$VIDEO_PATH" -frames:v 1 "$OUTDIR/at_5s.jpg"
ffmpeg -hide_banner -loglevel error -ss 5.0 -i "$VIDEO_PATH" -frames:v 1 "$OUTDIR/at_5s.jpg"
Thumbnail grid (single image overview)
缩略图网格(单张图片概览)
ffmpeg -hide_banner -loglevel error -i "$VIDEO_PATH"
-vf "fps=1/$(ffprobe -v error -show_entries format=duration -of csv=p=0 "$VIDEO_PATH" | awk '{printf "%.1f", $1/9}'),scale=320:-1,tile=3x3"
-frames:v 1 "$OUTDIR/grid.jpg"
-vf "fps=1/$(ffprobe -v error -show_entries format=duration -of csv=p=0 "$VIDEO_PATH" | awk '{printf "%.1f", $1/9}'),scale=320:-1,tile=3x3"
-frames:v 1 "$OUTDIR/grid.jpg"
undefinedffmpeg -hide_banner -loglevel error -i "$VIDEO_PATH"
-vf "fps=1/$(ffprobe -v error -show_entries format=duration -of csv=p=0 "$VIDEO_PATH" | awk '{printf "%.1f", $1/9}'),scale=320:-1,tile=3x3"
-frames:v 1 "$OUTDIR/grid.jpg"
-vf "fps=1/$(ffprobe -v error -show_entries format=duration -of csv=p=0 "$VIDEO_PATH" | awk '{printf "%.1f", $1/9}'),scale=320:-1,tile=3x3"
-frames:v 1 "$OUTDIR/grid.jpg"
undefinedTips
小贴士
- Always use to keep output clean
-hide_banner -loglevel error - Scale down to 720px width () to save tokens when sending to AI
scale=720:-1 - Clean up frames after analysis:
rm -rf "$OUTDIR" - The extracted frames are regular image files — include their paths in your reply and they'll be auto-sent to Telegram
- 始终使用参数来保持输出简洁
-hide_banner -loglevel error - 将帧缩放到720px宽度(),发送给AI时可节省token
scale=720:-1 - 分析完成后清理帧文件:
rm -rf "$OUTDIR" - 提取的帧是常规图片文件——在回复中包含其路径,它们会自动发送到Telegram
Audio: "Hearing" Videos
音频:“聆听”视频
bash
undefinedbash
undefinedExtract audio from video
从视频中提取音频
ffmpeg -hide_banner -loglevel error -i "$VIDEO_PATH" -vn -acodec pcm_s16le -ar 16000 -ac 1 "/tmp/alma-audio-$(date +%s).wav"
ffmpeg -hide_banner -loglevel error -i "$VIDEO_PATH" -vn -acodec pcm_s16le -ar 16000 -ac 1 "/tmp/alma-audio-$(date +%s).wav"
Transcribe with Whisper (auto-detect language)
使用Whisper转录(自动检测语言)
whisper "/tmp/alma-audio.wav" --model turbo --output_format txt --output_dir /tmp/alma-whisper
whisper "/tmp/alma-audio.wav" --model turbo --output_format txt --output_dir /tmp/alma-whisper
Transcribe with specific language
指定语言转录
whisper "/tmp/alma-audio.wav" --model turbo --language zh --output_format txt --output_dir /tmp/alma-whisper
whisper "/tmp/alma-audio.wav" --model turbo --language zh --output_format txt --output_dir /tmp/alma-whisper
Read transcription
查看转录内容
cat /tmp/alma-whisper/*.txt
undefinedcat /tmp/alma-whisper/*.txt
undefinedWhen to See vs Hear
何时查看画面vs聆听音频
- "这个视频里有啥" → Extract frames (see) + transcribe audio (hear) for full picture
- "他说了什么" → Transcribe audio only
- "这个视频好看吗" → Extract frames to see the visuals
- "好听" → The user is commenting on audio content, transcribe to understand
- Music/street performance → Mention what you see in frames + note the audio content
- When in doubt, do BOTH — extract a few frames AND transcribe the audio
- "这个视频里有啥" → 提取帧(查看画面)+ 转录音频(聆听内容)以获取完整信息
- "他说了什么" → 仅转录音频
- "这个视频好看吗" → 提取帧查看画面内容
- "好听" → 用户在评价音频内容,转录音频以理解具体内容
- 音乐/街头表演 → 描述帧中的画面内容,并说明音频信息
- 不确定时,两者都做——提取少量帧并转录音频
Cleanup
清理操作
Always clean up temp files after analysis:
bash
rm -rf "$OUTDIR" /tmp/alma-whisper /tmp/alma-audio*.wav分析完成后务必清理临时文件:
bash
rm -rf "$OUTDIR" /tmp/alma-whisper /tmp/alma-audio*.wav