podcastcut-edit-raw

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

粗剪

Rough Cut

遍历 transcript 每个句子 → 检查是否在删除标记中 → 整句删除

Iterate through each sentence in the transcript → Check if it is in the deletion markers → Delete the entire sentence

快速使用

Quick Start

用户: 按这个审查稿粗剪 /path/to/podcast_审查稿.md
用户: 粗剪，输出到 /path/to/output

User: Rough cut according to this review draft /path/to/podcast_review_draft.md
User: Rough cut, output to /path/to/output

适用场景

Applicable Scenarios

场景	说明
✅ 删大段内容	寒暄、跑题、隐私、啰嗦
✅ 删整句	句子级时间戳足够精确
❌ 删半句/口误	用 `/podcastcut-edit-fine` 或 `/podcastcut-transcribe`

Scenario	Description
✅ Delete large sections	Greetings, off-topic content, privacy-related parts, verbose segments
✅ Delete entire sentences	Sentence-level timestamps are sufficiently accurate
❌ Delete half sentences/verbal slips	Use `/podcastcut-edit-fine` or `/podcastcut-transcribe`

核心逻辑

Core Logic

python

undefined

python

undefined

从 transcript 出发，检查每个句子是否需要删除

for sentence in transcript['sentences']: if normalize(sentence['text']) in 审查稿删除标记: 删除这个句子


**为什么从 transcript 出发？**
- transcript.json 的每个句子都有精确时间戳
- 只需检查句子文本是否在 `~~删除标记~~` 中
- 不需要复杂的多句组合匹配
- 100% 利用已有的句子边界

---

for sentence in transcript['sentences']: if normalize(sentence['text']) in 审查稿删除标记: 删除这个句子


**Why start from the transcript?**
- Each sentence in transcript.json has precise timestamps
- Only need to check if the sentence text is in `~~deletion markers~~`
- No need for complex multi-sentence combination matching
- 100% utilization of existing sentence boundaries

---

流程

Workflow

1. 加载审查稿，提取所有 ~~删除标记~~
    ↓
2. 加载 transcript.json（句子级时间戳）
    ↓
3. 遍历每个句子，检查是否在删除标记中
    ↓
4. 合并连续删除 → 计算保留片段
    ↓
5. 生成 FFmpeg filter → 执行剪辑
    ↓
6. 输出 podcast_v2.mp3/mp4

1. Load the review draft and extract all ~~deletion markers~~
    ↓
2. Load transcript.json (sentence-level timestamps)
    ↓
3. Iterate through each sentence and check if it is in the deletion markers
    ↓
4. Merge consecutive deletions → Calculate retained segments
    ↓
5. Generate FFmpeg filter → Execute clipping
    ↓
6. Output podcast_v2.mp3/mp4

匹配逻辑

Matching Logic

python

def is_sentence_deleted(sentence_text, deletions):
    """检查句子是否应该删除"""
    text_norm = normalize(sentence_text)  # 移除空格标点

    # 句子出现在任一删除标记中 → 删除
    for deletion in deletions:
        if text_norm in normalize(deletion):
            return True

    return False

标准化：移除空格、标点后比较，避免格式差异导致不匹配。

python

def is_sentence_deleted(sentence_text, deletions):
    """检查句子是否应该删除"""
    text_norm = normalize(sentence_text)  # 移除空格标点

    # 句子出现在任一删除标记中 → 删除
    for deletion in deletions:
        if text_norm in normalize(deletion):
            return True

    return False

Normalization: Compare after removing spaces and punctuation to avoid mismatches caused by format differences.

可复用脚本

Reusable Script

rough_cut.py

一键完成粗剪全流程。

bash

python scripts/rough_cut.py <工作目录> <输入音频> [输出音频]

示例：

bash

python scripts/rough_cut.py \
  "/Volumes/T9/podcast/v5" \
  "/Volumes/T9/podcast/原始音频.mp3"

输入：

```
podcast_审查稿.md
```
- 带删除标记的审查稿
```
podcast_transcript.json
```
- 句子级时间戳

输出：

```
podcast_删除清单.json
```
- 删除的句子列表
```
keep_segments.json
```
- 保留片段列表
```
filter.txt
```
- FFmpeg filter 脚本
```
ffmpeg_cmd.sh
```
- FFmpeg 完整命令

Complete the entire rough cut process with one click.

bash

python scripts/rough_cut.py <working_directory> <input_audio> [output_audio]

Example:

bash

python scripts/rough_cut.py \
  "/Volumes/T9/podcast/v5" \
  "/Volumes/T9/podcast/original_audio.mp3"

Input:

```
podcast_review_draft.md
```
- Review draft with deletion markers
```
podcast_transcript.json
```
- Sentence-level timestamps

Output:

```
podcast_deletion_list.json
```
- List of deleted sentences
```
keep_segments.json
```
- List of retained segments
```
filter.txt
```
- FFmpeg filter script
```
ffmpeg_cmd.sh
```
- Complete FFmpeg command

输出文件

Output Files

<工作目录>/
├── podcast_v2.mp3           # 剪辑后音频
├── podcast_删除清单.json     # 删除时间段列表
├── keep_segments.json       # 保留片段列表
├── filter.txt               # FFmpeg filter 脚本
└── ffmpeg_cmd.sh            # FFmpeg 完整命令

<working_directory>/
├── podcast_v2.mp3           # Clipped audio
├── podcast_deletion_list.json     # List of deleted time segments
├── keep_segments.json       # List of retained segments
├── filter.txt               # FFmpeg filter script
└── ffmpeg_cmd.sh            # Complete FFmpeg command

FFmpeg 命令

FFmpeg Commands

纯音频（mp3）

Audio Only (mp3)

bash

ffmpeg -y -i input.mp3 \
  -filter_complex_script filter.txt \
  -map "[outa]" \
  -c:a libmp3lame -q:a 2 \
  output_v2.mp3

bash

ffmpeg -y -i input.mp3 \
  -filter_complex_script filter.txt \
  -map "[outa]" \
  -c:a libmp3lame -q:a 2 \
  output_v2.mp3

音视频（mp4）

Audio & Video (mp4)

bash

ffmpeg -y -i input.mp4 \
  -filter_complex_script filter.txt \
  -map "[outv]" -map "[outa]" \
  -c:v libx264 -crf 18 -c:a aac \
  output_v2.mp4

bash

ffmpeg -y -i input.mp4 \
  -filter_complex_script filter.txt \
  -map "[outv]" -map "[outa]" \
  -c:v libx264 -crf 18 -c:a aac \
  output_v2.mp4

使用示例

Usage Example

用户: 按审查稿粗剪 /path/to/podcast_审查稿.md

AI: 好的，开始粗剪...
    1. 加载审查稿: 144 处删除标记
    2. 加载 transcript: 3376 句
    3. 匹配删除: 296 个句子
    4. 合并后: 137 个删除块
    5. 生成 FFmpeg 命令并执行

    结果:
    - 原始时长: 2:08:07
    - 剪辑后: 1:57:54
    - 删除: 10:13

User: Rough cut according to the review draft /path/to/podcast_review_draft.md

AI: Sure, starting rough cut...
    1. Loaded review draft: 144 deletion markers
    2. Loaded transcript: 3376 sentences
    3. Matched deletions: 296 sentences
    4. After merging: 137 deletion blocks
    5. Generated FFmpeg command and executed it

    Results:
    - Original duration: 2:08:07
    - After clipping: 1:57:54
    - Deleted: 10:13

与精剪的区别

Differences from Fine Cut

对比	粗剪 (本 Skill)	精剪 (/podcastcut-edit-fine)
时间戳	句子级	字符级
最小单位	整句	单字
适用	删大段内容	删口误、语气词
输入	podcast_transcript.json	podcast_transcript_chars.json

Comparison	Rough Cut (this Skill)	Fine Cut (/podcastcut-edit-fine)
Timestamp	Sentence-level	Character-level
Minimum Unit	Entire sentence	Single character
Use Case	Delete large sections	Delete verbal slips, filler words
Input	podcast_transcript.json	podcast_transcript_chars.json

反馈记录

Feedback Records

2026-02-01

单字符句子匹配问题
- 问题："好，" 标准化后变成 "好"（1字符），被
```
len < 2
```
  跳过，导致没删掉
- 修复：单字符用精确匹配，多字符用包含匹配
python
```
if len(text_norm) == 1:
    text_norm == del_norm  # 精确匹配，避免误删
else:
    text_norm in del_norm  # 包含即可
```
- 原因：单字符如果用包含匹配，"好" 会匹配任何含 "好" 的删除标记，造成误删

Single-character sentence matching issue
- Problem: "好，" becomes "好" (1 character) after normalization, skipped by
```
len < 2
```
  , resulting in failure to delete
- Fix: Exact match for single characters, contain match for multi-character texts
python
```
if len(text_norm) == 1:
    text_norm == del_norm  # 精确匹配，避免误删
else:
    text_norm in del_norm  # 包含即可
```
- Reason: If single characters use contain match, "好" would match any deletion marker containing "好", causing incorrect deletions

2026-01-31

改用"从 transcript 出发"的匹配逻辑
- 原来：解析审查稿删除标记 → 尝试匹配 transcript 句子（容易失败）
- 现在：遍历 transcript 句子 → 检查是否在删除标记中（简单可靠）
- 原因：审查稿删除标记可能跨多句，反向匹配更简单

Switched to "start from transcript" matching logic
- Original: Parse deletion markers from review draft → Attempt to match transcript sentences (prone to failure)
- Current: Iterate through transcript sentences → Check if they are in deletion markers (simple and reliable)
- Reason: Deletion markers in the review draft may span multiple sentences, reverse matching is simpler

2026-01-31 (早)

2026-01-31 (Morning)

创建独立 Skill：从
```
/podcastcut-edit
```
拆分出来
粗剪聚焦句子级删除，精剪由
```
/podcastcut-edit-fine
```
处理

Created independent Skill: Split from
```
/podcastcut-edit
```
Rough cut focuses on sentence-level deletion, fine cut is handled by
```
/podcastcut-edit-fine
```