youtube-chapter-clipper

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

YouTube Chapter Clipper

YouTube 章节剪辑工具

Overview

概述

Generate chapter clips from a YouTube video by downloading MP4 + English subtitles, segmenting content, cutting clips, and producing per-chapter English SRTs. Chapter length is user-selectable (1-2, 2-3, or 3-4 minutes).

通过下载MP4视频+英文字幕、分割内容、剪辑片段并生成按章节划分的英文SRT字幕，从YouTube视频中生成章节片段。章节时长可由用户选择（1-2分钟、2-3分钟或3-4分钟）。

Workflow

工作流程

1) Use the automation script to reduce tokens

1) 使用自动化脚本减少Token消耗

Prefer
```
scripts/smart_edit.py
```
for end-to-end runs (download, chaptering, clip cut, subtitle slicing).
The script uses heuristic chaptering to avoid AI token usage.
Create and use a local venv (no external packages required):
- ```
python3 -m venv .venv
```
- ```
source .venv/bin/activate
```
- ```
python scripts/smart_edit.py --help
```
- Speed-focused default:
```
--mode fast
```
  (approximate cuts, faster encode, optional downscale).
- Use
```
--mode accurate
```
  when you need precise boundaries.

优先使用
```
scripts/smart_edit.py
```
进行端到端运行（下载、章节划分、片段剪辑、字幕分割）。
该脚本采用启发式章节划分方式，避免消耗AI Token。
创建并使用本地venv（无需外部包）：
- ```
python3 -m venv .venv
```
- ```
source .venv/bin/activate
```
- ```
python scripts/smart_edit.py --help
```
- 以速度为导向的默认模式：
```
--mode fast
```
  （近似剪辑、更快编码、可选降分辨率）。
- 当需要精准边界时使用
```
--mode accurate
```
  。

2) Confirm inputs and environment

2) 确认输入和环境

Ask for the YouTube URL and whether English subtitles are available (manual preferred; auto as fallback).
Check tools:
```
yt-dlp
```
and
```
ffmpeg
```
. If missing, install before proceeding.
Use command templates in
```
references/commands.md
```
.

向用户索要YouTube URL，并确认是否有英文字幕（优先手动字幕；自动字幕作为备选）。
检查工具：
```
yt-dlp
```
和
```
ffmpeg
```
。若缺失，需先安装再继续。
使用
```
references/commands.md
```
中的命令模板。

3) Download source video and subtitles

3) 下载源视频和字幕

Check current directory for existing source files before downloading:
- If
```
<id>.mp4
```
  and
```
<id>.en.vtt
```
  already exist, skip yt-dlp download.
Download highest 1080p MP4 and English VTT. Save in current directory with ID-based names:
- ```
<id>.mp4
```
- ```
<id>.en.vtt
```
  (or
```
<id>.en.auto.vtt
```
  if manual subs absent)
Also capture video metadata (id, title, duration, uploader) for reporting.
- The script handles this when
```
--url
```
  is provided.

下载前检查当前目录是否已有源文件：
- 若
```
<id>.mp4
```
  和
```
<id>.en.vtt
```
  已存在，跳过yt-dlp下载步骤。
下载最高清1080p MP4视频和英文VTT字幕。以视频ID为命名保存在当前目录：
- ```
<id>.mp4
```
- ```
<id>.en.vtt
```
  （若没有手动字幕则使用
```
<id>.en.auto.vtt
```
  ）
同时捕获视频元数据（ID、标题、时长、上传者）用于生成报告。
- 当提供
```
--url
```
  参数时，脚本会自动处理此步骤。

4) Prepare output directory

4) 准备输出目录

Create output directory using the original video title:
- Replace spaces with underscores.
- Remove/replace filesystem-unsafe characters.
Place all chapter clips and subtitle files into this directory.

以原始视频标题创建输出目录：
- 将空格替换为下划线。
- 移除或替换文件系统不支持的字符。
将所有章节片段和字幕文件放入该目录。

5) Generate fine-grained chapters (user-selected length)

5) 生成细粒度章节（用户可选时长）

Ask the user to choose a chapter length preset: 1-2, 2-3, or 3-4 minutes.
Perform AI analysis (critical step):
- Read the full subtitle content.
- Understand the semantic flow and topic transitions.
- Identify natural topic switch points.
Draft chapter boundaries based on semantic topic changes and sentence boundaries.
Target the selected range; avoid cutting mid-sentence.
Prefer semantic breaks (new concept, example, recap) over strict timing.

Produce a chapter list with:

```
title
```
,
```
start
```
,
```
end
```
,
```
reason
```

The script uses

--chapter-preset

(or

--min-seconds/--target-seconds/--max-seconds

for custom).

请用户选择章节时长预设：1-2分钟、2-3分钟或3-4分钟。
执行AI分析（关键步骤）：
- 读取完整字幕内容。
- 理解语义流程和主题转换。
- 识别自然的主题切换点。
根据语义主题变化和句子边界草拟章节边界。
目标时长符合所选范围；避免在句子中间截断。
优先选择语义断点（新概念、示例、回顾）而非严格按时间划分。
生成章节列表，包含：
- ```
title
```
  （标题）、
```
start
```
  （开始时间）、
```
end
```
  （结束时间）、
```
reason
```
  （划分原因）
- 脚本使用
```
--chapter-preset
```
  （或自定义参数
```
--min-seconds/--target-seconds/--max-seconds
```
  ）。

6) Cut precise clips (speed vs accuracy)

6) 剪辑精准片段（速度vs精度）

Use ffmpeg with accurate trimming and stable outputs. Always re-encode:
- Place
```
-ss
```
  after
```
-i
```
  for accurate seeking.
- Use
```
libx264
```
  +
```
aac
```
  ,
```
-movflags +faststart
```
  , and
```
-pix_fmt yuv420p
```
  to maximize player compatibility.
- Use a fast preset (e.g.,
```
-preset veryfast
```
  ) to avoid long encodes and timeouts.
Run clips serially and avoid external timeouts that kill ffmpeg mid-write.
After each clip, validate with
```
ffprobe
```
; retry once if validation fails.
If speed is the priority (listening practice), prefer approximate cuts:
- Put
```
-ss
```
  before
```
-i
```
  to avoid decoding from the start every time.
- Use
```
-preset ultrafast
```
  and a higher CRF (e.g., 28).
- Optionally downscale (e.g., width 1280) to reduce encode time.
Name each clip with an ordered prefix:
```
<nn>_<chapter_title>.mp4
```
using safe filenames:
- Use a 2-digit index starting at 01.
- Replace spaces with underscores.
- Remove filesystem-unsafe characters.

使用ffmpeg进行精准修剪并生成稳定输出。始终重新编码：
- 将
```
-ss
```
  放在
```
-i
```
  之后以实现精准定位。
- 使用
```
libx264
```
  +
```
aac
```
  编码、
```
-movflags +faststart
```
  和
```
-pix_fmt yuv420p
```
  以最大化播放器兼容性。
- 使用快速预设（如
```
-preset veryfast
```
  ）避免长时间编码和超时。
串行运行剪辑任务，避免外部超时导致ffmpeg中途终止写入。
每个片段剪辑完成后，用
```
ffprobe
```
验证；若验证失败则重试一次。
若优先考虑速度（如用于听力练习），可选择近似剪辑：
- 将
```
-ss
```
  放在
```
-i
```
  之前，避免每次都从头解码。
- 使用
```
-preset ultrafast
```
  和更高的CRF值（如28）。
- 可选降分辨率（如宽度1280）以减少编码时间。
为每个片段添加有序前缀命名：
```
<nn>_<chapter_title>.mp4
```
，使用安全文件名规则：
- 从01开始的两位数字索引。
- 将空格替换为下划线。
- 移除文件系统不支持的字符。

7) Extract and convert subtitles per chapter

7) 按章节提取并转换字幕

Extract VTT segment for each chapter by time range.
Convert each segment to SRT:
- ```
<nn>_<chapter_title>.en.srt
```
- The script deletes per-chapter VTT unless
```
--keep-vtt
```
  is set.

按时间范围提取每个章节对应的VTT片段。
将每个片段转换为SRT格式：
- ```
<nn>_<chapter_title>.en.srt
```
- 除非设置
```
--keep-vtt
```
  参数，否则脚本会删除按章节划分的VTT文件。

8) Report outputs

8) 输出报告

Print output directory path, chapter list, and generated files.

Output Rules

输出规则

Source files stay in current directory (
```
<id>.mp4
```
,
```
<id>.en.vtt
```
).
All chapter clips and subtitle files are placed in the per-video directory named after the sanitized title.
Use consistent time formats (
```
HH:MM:SS.mmm
```
).

源文件保留在当前目录（
```
<id>.mp4
```
、
```
<id>.en.vtt
```
）。
所有章节片段和字幕文件放入以清理后的标题命名的视频专属目录中。
使用统一的时间格式（
```
HH:MM:SS.mmm
```
）。

References

参考资料

Command templates and copy/paste examples:
```
references/commands.md
```
Automation:
```
scripts/smart_edit.py
```

命令模板和复制粘贴示例：
```
references/commands.md
```
自动化脚本：
```
scripts/smart_edit.py
```