extract-frames

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Extract Frames

提取帧

Arguments

参数

Parse the user's request for:

video path (required): path to the video file
--last: extract last frame of each shot instead of first
--first --last or --all: extract both first and last frames
--shots N,N,N: only process specific shot numbers (1-indexed)
--threshold X: override auto-detected threshold (skip user confirmation)

Default: extract first frame of every shot.

解析用户的请求以获取：

video path（必填）：视频文件的路径
--last：提取每个镜头的最后一帧而非第一帧
--first --last 或 --all：同时提取第一帧和最后一帧
--shots N,N,N：仅处理指定编号的镜头（从1开始计数）
--threshold X：覆盖自动检测的阈值（跳过用户确认）

默认行为：提取每个镜头的第一帧。

Workflow

工作流程

Phase 0: Setup

阶段0：准备工作

Validate the video file exists.
Get video metadata via ffprobe:

bash

ffprobe -v error -show_entries format=duration -show_entries stream=r_frame_rate,width,height,codec_name -of default "$INPUT"

Parse fps (needed for last-frame calculation) and duration.
Create output directory next to the video:
```
{video_basename}_frames/
```
. If it already exists, check for
```
scores.txt
```
— if present, skip Phase 1.

验证视频文件是否存在。
通过ffprobe获取视频元数据：

bash

ffprobe -v error -show_entries format=duration -show_entries stream=r_frame_rate,width,height,codec_name -of default "$INPUT"

解析fps（用于计算最后一帧）和时长。
在视频所在位置创建输出目录：
```
{video_basename}_frames/
```
。如果目录已存在，检查是否有
```
scores.txt
```
文件——若存在，则跳过阶段1。

Phase 1: Score Dump

阶段1：分数导出

Dump per-frame scene scores for the entire video:

bash

ffmpeg -i "$INPUT" -vf "select='gte(scene,0)',metadata=print:file=$OUTPUT_DIR/scores.txt" -fps_mode vfr -f null - 2>&1

This is the expensive step. The output file

scores.txt

contains blocks like:

frame:0    pts:0       pts_time:0.000000
lavfi.scene_score=0.000000

导出整个视频的逐帧场景分数：

bash

ffmpeg -i "$INPUT" -vf "select='gte(scene,0)',metadata=print:file=$OUTPUT_DIR/scores.txt" -fps_mode vfr -f null - 2>&1

这是耗时较长的步骤。输出文件

scores.txt

包含如下格式的内容块：

frame:0    pts:0       pts_time:0.000000
lavfi.scene_score=0.000000

Phase 2: Cut Detection

阶段2：镜头切换检测

Parse scores.txt with this awk one-liner to get distribution + all candidate frames:

bash

awk '
/pts_time/ { split($0, a, "pts_time:"); ts=a[2]+0 }
/scene_score/ { split($0, a, "="); score=a[2]+0;
  if (score < 0.01) b1++;
  else if (score < 0.05) b2++;
  else if (score < 0.10) b3++;
  else if (score < 0.20) b4++;
  else b5++;
  total++;
  if (score > max) max=score;
  if (score > 0.05) printf "  ts=%.3fs  score=%.6f\n", ts, score;
}
END {
  print "\n--- Distribution ---";
  printf "< 0.01:      %d (%.1f%%)\n", b1, b1/total*100;
  printf "0.01 - 0.05: %d (%.1f%%)\n", b2, b2/total*100;
  printf "0.05 - 0.10: %d (%.1f%%)\n", b3, b3/total*100;
  printf "0.10 - 0.20: %d (%.1f%%)\n", b4, b4/total*100;
  printf "0.20+:       %d (%.1f%%)\n", b5, b5/total*100;
  printf "Max score:   %.6f\n", max;
}' "$OUTPUT_DIR/scores.txt"

Step 2a — Startup artifact filter: Discard any frames in the first 0.5s where

score > 0.05

(a fixed preliminary value — the final user-confirmed threshold is not known yet). Fade-ins from black or codec initialization commonly produce score=1.0 spikes at t=0.03-0.08s that are not real cuts. After discarding these, recompute max score from the remaining frames.

Step 2b — Branch on max score. If max score (after startup filter) < 0.05, the video has no cuts:

Report: "No shot boundaries detected — single continuous shot."
Extract only frame at t=0 (or t=1.0s if t=0 is a black frame — check file size, <50KB indicates black).
Skip threshold confirmation.

Step 2c — Gap analysis and threshold. If max score >= 0.05, use the raw candidates (all frames with score > 0.05, after startup filter) to find the noise ceiling and the lowest-scoring candidate. Place the proposed threshold at the midpoint of the gap. Present to user via AskUserQuestion:

Score distribution summary
Gap analysis (noise ceiling → lowest cut, gap width)
Proposed threshold and resulting shot count
Options: Accept proposed (Recommended), Lower threshold, Higher threshold, Custom value

--threshold

was provided, skip confirmation and set

$THRESHOLD_VALUE

directly. Otherwise set

$THRESHOLD_VALUE

to the user-confirmed value before proceeding.

Step 2d — Run-based dedup. Using

$THRESHOLD_VALUE

from Step 2c, identify cut points. A "run" is a sequence of consecutive frames where every frame scores above threshold. The principle: an aftershock immediately follows its parent cut (consecutive frames both above threshold), while a real cut always rises from the noise floor (preceded by at least one below-threshold frame).

bash

awk -v THRESHOLD="$THRESHOLD_VALUE" '
/pts_time/ { split($0, a, "pts_time:"); ts=a[2]+0 }
/scene_score/ { split($0, a, "="); score=a[2]+0;
  if (score > THRESHOLD) {
    if (!in_run) { run_best_ts = ts; run_best_score = score; in_run = 1; }
    else if (score > run_best_score) { run_best_ts = ts; run_best_score = score; }
  } else {
    if (in_run) { printf "%.3f %.6f\n", run_best_ts, run_best_score; in_run = 0; }
  }
}
END { if (in_run) printf "%.3f %.6f\n", run_best_ts, run_best_score }
' "$OUTPUT_DIR/scores.txt"

This keeps the peak frame of each run and discards aftershocks within the same run. It correctly handles:

Standard cuts: isolated spikes → each kept (run length 1)
Cuts with aftershocks: 2-3 consecutive high frames → peak kept, echoes discarded
Rapid montages: each cut separated by noise frames → all kept, even at 0.12s intervals

使用以下awk单行命令解析scores.txt，获取分数分布以及所有候选帧：

bash

awk '
/pts_time/ { split($0, a, "pts_time:"); ts=a[2]+0 }
/scene_score/ { split($0, a, "="); score=a[2]+0;
  if (score < 0.01) b1++;
  else if (score < 0.05) b2++;
  else if (score < 0.10) b3++;
  else if (score < 0.20) b4++;
  else b5++;
  total++;
  if (score > max) max=score;
  if (score > 0.05) printf "  ts=%.3fs  score=%.6f\n", ts, score;
}
END {
  print "\n--- Distribution ---";
  printf "< 0.01:      %d (%.1f%%)\n", b1, b1/total*100;
  printf "0.01 - 0.05: %d (%.1f%%)\n", b2, b2/total*100;
  printf "0.05 - 0.10: %d (%.1f%%)\n", b3, b3/total*100;
  printf "0.10 - 0.20: %d (%.1f%%)\n", b4, b4/total*100;
  printf "0.20+:       %d (%.1f%%)\n", b5, b5/total*100;
  printf "Max score:   %.6f\n", max;
}' "$OUTPUT_DIR/scores.txt"

步骤2a — 启动 artifact 过滤：丢弃前0.5秒内所有

score > 0.05

的帧（这是一个固定的初始值——此时最终的用户确认阈值尚未确定）。从黑屏淡入或编解码器初始化通常会在t=0.03-0.08s时产生score=1.0的峰值，但这并非真实的镜头切换。丢弃这些帧后，重新计算剩余帧的最高分数。

步骤2b — 根据最高分数分支处理：如果（经过启动过滤后的）最高分数 < 0.05，则视频无镜头切换：

报告：“未检测到镜头边界——视频为单一连续镜头。”
仅提取t=0处的帧（如果t=0处是黑屏帧——通过文件大小判断，小于50KB则为黑屏，则提取t=1.0s处的帧）。
跳过阈值确认步骤。

步骤2c — 间隙分析与阈值确定：如果最高分数 >= 0.05，使用原始候选帧（经过启动过滤后所有score > 0.05的帧）找出噪声上限和得分最低的候选帧。将建议阈值设置在间隙的中点。通过AskUserQuestion向用户展示：

分数分布摘要
间隙分析（噪声上限 → 最低切换得分，间隙宽度）
建议阈值及对应的镜头数量
选项：接受建议值（推荐）、降低阈值、提高阈值、自定义值

如果用户提供了

--threshold

参数，则跳过确认步骤，直接将

$THRESHOLD_VALUE

设置为该参数值。否则，将

$THRESHOLD_VALUE

设置为用户确认的值后继续。

步骤2d — 基于连续序列的去重：使用步骤2c中得到的

$THRESHOLD_VALUE

识别镜头切换点。“连续序列”指的是一系列连续的帧，其中每一帧的分数都高于阈值。原理：余震会紧随其主切换帧（连续两帧分数都高于阈值），而真实的镜头切换总是从噪声底上升（至少有一帧低于阈值的前置帧）。

bash

awk -v THRESHOLD="$THRESHOLD_VALUE" '
/pts_time/ { split($0, a, "pts_time:"); ts=a[2]+0 }
/scene_score/ { split($0, a, "="); score=a[2]+0;
  if (score > THRESHOLD) {
    if (!in_run) { run_best_ts = ts; run_best_score = score; in_run = 1; }
    else if (score > run_best_score) { run_best_ts = ts; run_best_score = score; }
  } else {
    if (in_run) { printf "%.3f %.6f\n", run_best_ts, run_best_score; in_run = 0; }
  }
}
END { if (in_run) printf "%.3f %.6f\n", run_best_ts, run_best_score }
' "$OUTPUT_DIR/scores.txt"

此命令保留每个连续序列中的峰值帧，并丢弃同一序列内的余震。它可以正确处理：

标准切换：孤立峰值 → 全部保留（序列长度为1）
带余震的切换：2-3个连续高分帧 → 保留峰值，丢弃回声
快速蒙太奇：每个切换之间有噪声帧分隔 → 全部保留，即使间隔为0.12秒

Phase 3: Frame Extraction

阶段3：帧提取

For each detected cut point (plus t=0 for shot 1):

First frame (default):

bash

ffmpeg -y -ss $TIMESTAMP -i "$INPUT" -frames:v 1 -update 1 "$OUTPUT_DIR/shot_${NUM}_${TIME}s.png"

Last frame (

--last

For shots 1 through N-1:

last_time = next_shot_timestamp - (1/fps)

For final shot:
```
last_time = duration - (2/fps)
```
(use 2 frames back, not 1 — seeking to
```
duration - 1/fps
```
can produce empty files near the end of some videos)

bash

ffmpeg -y -ss $LAST_TIME -i "$INPUT" -frames:v 1 -update 1 "$OUTPUT_DIR/shot_${NUM}_last_${TIME}s.png"

If a last-frame extraction produces an empty file (0 bytes), back off by another frame and retry.

Both (

--first --last

--all

): extract both per shot.

Filtered (

--shots 3,5,7

): only extract for the specified shot numbers.

对于每个检测到的切换点（加上镜头1的t=0）：

第一帧（默认）：

bash

ffmpeg -y -ss $TIMESTAMP -i "$INPUT" -frames:v 1 -update 1 "$OUTPUT_DIR/shot_${NUM}_${TIME}s.png"

最后一帧（

--last

参数）：

对于镜头1到N-1：

last_time = next_shot_timestamp - (1/fps)

对于最后一个镜头：
```
last_time = duration - (2/fps)
```
（后退2帧而非1帧——在某些视频的末尾，定位到
```
duration - 1/fps
```
可能会生成空文件）

bash

ffmpeg -y -ss $LAST_TIME -i "$INPUT" -frames:v 1 -update 1 "$OUTPUT_DIR/shot_${NUM}_last_${TIME}s.png"

如果提取最后一帧时生成了空文件（0字节），则再后退一帧并重试。

同时提取两者（

--first --last

或

--all

参数）：每个镜头同时提取第一帧和最后一帧。

过滤提取（

--shots 3,5,7

参数）：仅提取指定编号的镜头。

Output

输出结果

Directory:
```
{video_basename}_frames/
```
next to the input video
First frames:
```
shot_01_0.00s.png
```
,
```
shot_02_1.63s.png
```
, ...

Last frames:

shot_01_last_1.60s.png

shot_02_last_3.77s.png

, ...

```
scores.txt
```
always retained for re-runs at different thresholds

Report to user: total shots detected, shot list with timestamps, output directory path.

目录：输入视频所在位置的
```
{video_basename}_frames/
```
目录
第一帧：
```
shot_01_0.00s.png
```
、
```
shot_02_1.63s.png
```
……

最后一帧：

shot_01_last_1.60s.png

、

shot_02_last_3.77s.png

……

```
scores.txt
```
文件始终保留，以便在不同阈值下重新运行

向用户报告：检测到的总镜头数、带时间戳的镜头列表、输出目录路径。

Re-run Behavior

重新运行行为

scores.txt

already exists in the output directory, skip Phase 1 entirely and go straight to Phase 2 analysis. This makes threshold iteration instant — the user can re-run with

--threshold 0.15

without waiting for the score dump again.

如果输出目录中已存在

scores.txt

文件，则完全跳过阶段1，直接进入阶段2分析。这使得阈值迭代瞬间完成——用户无需等待分数导出，即可使用

--threshold 0.15

重新运行。

Shell Portability

Shell可移植性

Use pipe-based loops for frame extraction instead of array syntax (zsh handles

for

over arrays differently than bash):

bash

echo "0.000 2.480 4.280" | tr ' ' '\n' | awk '{printf "%02d %s\n", NR, $1}' | while read LABEL TS; do
  TS_FMT=$(printf "%.2f" "$TS")
  ffmpeg -y -ss "$TS" -i "$INPUT" -frames:v 1 -update 1 "$OUTPUT_DIR/shot_${LABEL}_${TS_FMT}s.png" 2>/dev/null
done

使用基于管道的循环进行帧提取，而非数组语法（zsh处理数组

for

循环的方式与bash不同）：

bash

echo "0.000 2.480 4.280" | tr ' ' '\n' | awk '{printf "%02d %s\n", NR, $1}' | while read LABEL TS; do
  TS_FMT=$(printf "%.2f" "$TS")
  ffmpeg -y -ss "$TS" -i "$INPUT" -frames:v 1 -update 1 "$OUTPUT_DIR/shot_${LABEL}_${TS_FMT}s.png" 2>/dev/null
done

Edge Cases

边缘情况

Single continuous shots: Handled by step 2b. Common in fashion videos with rack-focus reveals, slow wardrobe progression, or single-take lifestyle shots.
Startup artifacts: Fade-ins from black produce score=1.0 at t=0.03-0.08s. Step 2a discards frames with score > 0.05 in the first 0.5s. If t=0 itself is a black frame (PNG < 50KB), extract at t=1.0s instead.
Aftershock spikes: Consecutive above-threshold frames (a cut + its echo). Run-based dedup in step 2d keeps only the peak of each run — no temporal window needed.
Rapid montages: Videos where shots are 2-4 frames long (0.08-0.16s). Each cut rises from the noise floor with 1-2 noise frames between spikes. Run-based dedup correctly preserves every cut because no two spikes are frame-adjacent. Report montage segments to the user: "Detected rapid montage from Xs-Ys with N shots."
Dissolves/fades: Score gradually ramps over multiple frames — forms a single run. Run-based dedup takes the peak frame as the cut point.
Empty file on seek: If ffmpeg produces a 0-byte PNG (common near video end), back off by one frame interval and retry.

单一连续镜头：由步骤2b处理。常见于时尚视频（如焦点渐变展示、缓慢的服装变化）或一镜到底的生活方式视频。
启动 artifact：从黑屏淡入会在t=0.03-0.08s时产生score=1.0的峰值。步骤2a会丢弃前0.5秒内score > 0.05的帧。如果t=0处本身是黑屏帧（PNG文件小于50KB），则改为提取t=1.0s处的帧。
余震峰值：连续的高于阈值的帧（一次切换及其回声）。步骤2d中的基于连续序列的去重仅保留每个序列的峰值帧——无需时间窗口。
快速蒙太奇：镜头长度为2-4帧（0.08-0.16秒）的视频。每个切换都从噪声底上升，峰值之间有1-2个噪声帧。基于连续序列的去重可以正确保留每个切换，因为没有两个峰值是相邻帧。向用户报告蒙太奇片段：“检测到Xs-Ys时间段内的快速蒙太奇，包含N个镜头。”
溶解/淡入淡出：分数在多帧中逐渐上升——形成一个单一序列。基于连续序列的去重将峰值帧作为切换点。
定位时生成空文件：如果ffmpeg生成了0字节的PNG文件（在视频末尾常见），则后退一帧间隔并重试。