podcastcut-edit-raw
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinese粗剪
Rough Cut
遍历 transcript 每个句子 → 检查是否在删除标记中 → 整句删除
Iterate through each sentence in the transcript → Check if it is in the deletion markers → Delete the entire sentence
快速使用
Quick Start
用户: 按这个审查稿粗剪 /path/to/podcast_审查稿.md
用户: 粗剪,输出到 /path/to/outputUser: Rough cut according to this review draft /path/to/podcast_review_draft.md
User: Rough cut, output to /path/to/output适用场景
Applicable Scenarios
| 场景 | 说明 |
|---|---|
| ✅ 删大段内容 | 寒暄、跑题、隐私、啰嗦 |
| ✅ 删整句 | 句子级时间戳足够精确 |
| ❌ 删半句/口误 | 用 |
| Scenario | Description |
|---|---|
| ✅ Delete large sections | Greetings, off-topic content, privacy-related parts, verbose segments |
| ✅ Delete entire sentences | Sentence-level timestamps are sufficiently accurate |
| ❌ Delete half sentences/verbal slips | Use |
核心逻辑
Core Logic
python
undefinedpython
undefined从 transcript 出发,检查每个句子是否需要删除
从 transcript 出发,检查每个句子是否需要删除
for sentence in transcript['sentences']:
if normalize(sentence['text']) in 审查稿删除标记:
删除这个句子
**为什么从 transcript 出发?**
- transcript.json 的每个句子都有精确时间戳
- 只需检查句子文本是否在 `~~删除标记~~` 中
- 不需要复杂的多句组合匹配
- 100% 利用已有的句子边界
---for sentence in transcript['sentences']:
if normalize(sentence['text']) in 审查稿删除标记:
删除这个句子
**Why start from the transcript?**
- Each sentence in transcript.json has precise timestamps
- Only need to check if the sentence text is in `~~deletion markers~~`
- No need for complex multi-sentence combination matching
- 100% utilization of existing sentence boundaries
---流程
Workflow
1. 加载审查稿,提取所有 ~~删除标记~~
↓
2. 加载 transcript.json(句子级时间戳)
↓
3. 遍历每个句子,检查是否在删除标记中
↓
4. 合并连续删除 → 计算保留片段
↓
5. 生成 FFmpeg filter → 执行剪辑
↓
6. 输出 podcast_v2.mp3/mp41. Load the review draft and extract all ~~deletion markers~~
↓
2. Load transcript.json (sentence-level timestamps)
↓
3. Iterate through each sentence and check if it is in the deletion markers
↓
4. Merge consecutive deletions → Calculate retained segments
↓
5. Generate FFmpeg filter → Execute clipping
↓
6. Output podcast_v2.mp3/mp4匹配逻辑
Matching Logic
python
def is_sentence_deleted(sentence_text, deletions):
"""检查句子是否应该删除"""
text_norm = normalize(sentence_text) # 移除空格标点
# 句子出现在任一删除标记中 → 删除
for deletion in deletions:
if text_norm in normalize(deletion):
return True
return False标准化:移除空格、标点后比较,避免格式差异导致不匹配。
python
def is_sentence_deleted(sentence_text, deletions):
"""检查句子是否应该删除"""
text_norm = normalize(sentence_text) # 移除空格标点
# 句子出现在任一删除标记中 → 删除
for deletion in deletions:
if text_norm in normalize(deletion):
return True
return FalseNormalization: Compare after removing spaces and punctuation to avoid mismatches caused by format differences.
可复用脚本
Reusable Script
rough_cut.py
rough_cut.py
一键完成粗剪全流程。
bash
python scripts/rough_cut.py <工作目录> <输入音频> [输出音频]示例:
bash
python scripts/rough_cut.py \
"/Volumes/T9/podcast/v5" \
"/Volumes/T9/podcast/原始音频.mp3"输入:
- - 带删除标记的审查稿
podcast_审查稿.md - - 句子级时间戳
podcast_transcript.json
输出:
- - 删除的句子列表
podcast_删除清单.json - - 保留片段列表
keep_segments.json - - FFmpeg filter 脚本
filter.txt - - FFmpeg 完整命令
ffmpeg_cmd.sh
Complete the entire rough cut process with one click.
bash
python scripts/rough_cut.py <working_directory> <input_audio> [output_audio]Example:
bash
python scripts/rough_cut.py \
"/Volumes/T9/podcast/v5" \
"/Volumes/T9/podcast/original_audio.mp3"Input:
- - Review draft with deletion markers
podcast_review_draft.md - - Sentence-level timestamps
podcast_transcript.json
Output:
- - List of deleted sentences
podcast_deletion_list.json - - List of retained segments
keep_segments.json - - FFmpeg filter script
filter.txt - - Complete FFmpeg command
ffmpeg_cmd.sh
输出文件
Output Files
<工作目录>/
├── podcast_v2.mp3 # 剪辑后音频
├── podcast_删除清单.json # 删除时间段列表
├── keep_segments.json # 保留片段列表
├── filter.txt # FFmpeg filter 脚本
└── ffmpeg_cmd.sh # FFmpeg 完整命令<working_directory>/
├── podcast_v2.mp3 # Clipped audio
├── podcast_deletion_list.json # List of deleted time segments
├── keep_segments.json # List of retained segments
├── filter.txt # FFmpeg filter script
└── ffmpeg_cmd.sh # Complete FFmpeg commandFFmpeg 命令
FFmpeg Commands
纯音频(mp3)
Audio Only (mp3)
bash
ffmpeg -y -i input.mp3 \
-filter_complex_script filter.txt \
-map "[outa]" \
-c:a libmp3lame -q:a 2 \
output_v2.mp3bash
ffmpeg -y -i input.mp3 \
-filter_complex_script filter.txt \
-map "[outa]" \
-c:a libmp3lame -q:a 2 \
output_v2.mp3音视频(mp4)
Audio & Video (mp4)
bash
ffmpeg -y -i input.mp4 \
-filter_complex_script filter.txt \
-map "[outv]" -map "[outa]" \
-c:v libx264 -crf 18 -c:a aac \
output_v2.mp4bash
ffmpeg -y -i input.mp4 \
-filter_complex_script filter.txt \
-map "[outv]" -map "[outa]" \
-c:v libx264 -crf 18 -c:a aac \
output_v2.mp4使用示例
Usage Example
用户: 按审查稿粗剪 /path/to/podcast_审查稿.md
AI: 好的,开始粗剪...
1. 加载审查稿: 144 处删除标记
2. 加载 transcript: 3376 句
3. 匹配删除: 296 个句子
4. 合并后: 137 个删除块
5. 生成 FFmpeg 命令并执行
结果:
- 原始时长: 2:08:07
- 剪辑后: 1:57:54
- 删除: 10:13User: Rough cut according to the review draft /path/to/podcast_review_draft.md
AI: Sure, starting rough cut...
1. Loaded review draft: 144 deletion markers
2. Loaded transcript: 3376 sentences
3. Matched deletions: 296 sentences
4. After merging: 137 deletion blocks
5. Generated FFmpeg command and executed it
Results:
- Original duration: 2:08:07
- After clipping: 1:57:54
- Deleted: 10:13与精剪的区别
Differences from Fine Cut
| 对比 | 粗剪 (本 Skill) | 精剪 (/podcastcut-edit-fine) |
|---|---|---|
| 时间戳 | 句子级 | 字符级 |
| 最小单位 | 整句 | 单字 |
| 适用 | 删大段内容 | 删口误、语气词 |
| 输入 | podcast_transcript.json | podcast_transcript_chars.json |
| Comparison | Rough Cut (this Skill) | Fine Cut (/podcastcut-edit-fine) |
|---|---|---|
| Timestamp | Sentence-level | Character-level |
| Minimum Unit | Entire sentence | Single character |
| Use Case | Delete large sections | Delete verbal slips, filler words |
| Input | podcast_transcript.json | podcast_transcript_chars.json |
反馈记录
Feedback Records
2026-02-01
2026-02-01
- 单字符句子匹配问题
- 问题:"好," 标准化后变成 "好"(1字符),被 跳过,导致没删掉
len < 2 - 修复:单字符用精确匹配,多字符用包含匹配
pythonif len(text_norm) == 1: text_norm == del_norm # 精确匹配,避免误删 else: text_norm in del_norm # 包含即可- 原因:单字符如果用包含匹配,"好" 会匹配任何含 "好" 的删除标记,造成误删
- 问题:"好," 标准化后变成 "好"(1字符),被
- Single-character sentence matching issue
- Problem: "好," becomes "好" (1 character) after normalization, skipped by , resulting in failure to delete
len < 2 - Fix: Exact match for single characters, contain match for multi-character texts
pythonif len(text_norm) == 1: text_norm == del_norm # 精确匹配,避免误删 else: text_norm in del_norm # 包含即可- Reason: If single characters use contain match, "好" would match any deletion marker containing "好", causing incorrect deletions
- Problem: "好," becomes "好" (1 character) after normalization, skipped by
2026-01-31
2026-01-31
- 改用"从 transcript 出发"的匹配逻辑
- 原来:解析审查稿删除标记 → 尝试匹配 transcript 句子(容易失败)
- 现在:遍历 transcript 句子 → 检查是否在删除标记中(简单可靠)
- 原因:审查稿删除标记可能跨多句,反向匹配更简单
- Switched to "start from transcript" matching logic
- Original: Parse deletion markers from review draft → Attempt to match transcript sentences (prone to failure)
- Current: Iterate through transcript sentences → Check if they are in deletion markers (simple and reliable)
- Reason: Deletion markers in the review draft may span multiple sentences, reverse matching is simpler
2026-01-31 (早)
2026-01-31 (Morning)
- 创建独立 Skill:从 拆分出来
/podcastcut-edit - 粗剪聚焦句子级删除,精剪由 处理
/podcastcut-edit-fine
- Created independent Skill: Split from
/podcastcut-edit - Rough cut focuses on sentence-level deletion, fine cut is handled by
/podcastcut-edit-fine