videocut-clip-oral

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

剪口播

Edit Spoken Video

转录 + 口误/静音识别 → 生成审查稿

Transcription + Slip-of-the-tongue/Silence Recognition → Generate Review Draft

快速使用

Quick Usage

用户: 帮我剪这个口播视频
用户: 处理一下这个视频

User: Help me edit this spoken video
User: Process this video

流程

Workflow

1. FunASR 30s 分段转录（字符级时间戳）
    ↓
2. 识别口误（逐句检查）
    ↓
3. 识别微口误（VAD 检测短片段）
    ↓
4. 识别语气词（嗯/哎/诶 等）
    ↓
5. 识别静音（≥1s）
    ↓
6. 生成审查稿（时间戳驱动）
    ↓
7. 输出删除任务 TodoList
    ↓
【等待用户确认】→ 用户确认后，执行 /videocut:剪辑

1. FunASR 30s segmented transcription (character-level timestamps)
    ↓
2. Recognize slip-of-the-tongue (sentence-by-sentence check)
    ↓
3. Recognize micro slip-of-the-tongue (VAD detects short segments)
    ↓
4. Recognize filler words (um/ah/eh, etc.)
    ↓
5. Recognize silence (≥1s)
    ↓
6. Generate review draft (timestamp-driven)
    ↓
7. Output deletion task TodoList
    ↓
[Await User Confirmation] → After user confirmation, execute /videocut:edit

⚠️ 为什么用 30s 分段

⚠️ Why use 30s segmentation

FunASR 长视频有时间戳漂移，30s 分段可避免。

FunASR has timestamp drift for long videos, 30s segmentation avoids this issue.

进度 TodoList

Progress TodoList

启动时创建：

- [ ] 读取「转录最佳实践」→ 转录视频
- [ ] 读取「口误识别方法论」→ 识别口误
- [ ] VAD 检测微口误（短片段 < 0.5s）
- [ ] 扫描语气词（嗯/哎/诶 等）
- [ ] 识别静音（≥1s）
- [ ] 生成审查稿
- [ ] 输出删除任务清单

Created on startup:

- [ ] Read "Transcription Best Practices" → Transcribe video
- [ ] Read "Slip-of-the-Tongue Recognition Methodology" → Recognize slip-of-the-tongue
- [ ] VAD detection for micro slip-of-the-tongue (short segments < 0.5s)
- [ ] Scan filler words (um/ah/eh, etc.)
- [ ] Recognize silence (≥1s)
- [ ] Generate review draft
- [ ] Output deletion task checklist

⚠️ 必须先读方法论再执行

⚠️ Must read methodology before execution

阶段	先读	再执行
转录	`tips/转录最佳实践.md`	调用ASR
识别口误	`tips/口误识别方法论.md`	逐句分析

Stage	Read First	Then Execute
Transcription	`tips/Transcription Best Practices.md`	Call ASR
Slip-of-the-tongue Recognition	`tips/Slip-of-the-Tongue Recognition Methodology.md`	Sentence-by-sentence analysis

核心：时间戳驱动

Core: Timestamp-driven

删除任务格式

Deletion Task Format

每项必须标注精确时间戳

(start-end)

：

口误（N处）：
- [ ] 1. `(start-end)` 删"错误文本" → 保留"正确文本"

语气词（N处）：
- [ ] 1. `(前字end-后字start)` 删"嗯" 上下文: XX【嗯】YY

静音（N处）：
- [ ] 1. `(start-end)` 静音Xs

Each item must be marked with precise timestamps

(start-end)

Slip-of-the-tongue (N instances):
- [ ] 1. `(start-end)` Delete "incorrect text" → Keep "correct text"

Filler words (N instances):
- [ ] 1. `(end of previous word - start of next word)` Delete "um" Context: XX【um】YY

Silence (N instances):
- [ ] 1. `(start-end)` Silence Xs

口误类型

Slip-of-the-tongue Types

类型	示例	删除策略
重复型	`拉满新拉满`	只删差异（"新"）
替换型	`AI就是AI就会`	删第一个完整版本（"AI就是"）
卡顿型	`听会会`	删第一个重复字

Type	Example	Deletion Strategy
Repetition	`Pull full new pull full`	Delete only the difference ("new")
Replacement	`AI is AI will`	Delete the first complete version ("AI is")
Stutter	`Listen will will`	Delete the first repeated character

⚠️ 关键规则

⚠️ Key Rules

时间戳驱动：审查稿直接标注时间戳，剪辑不再搜索文本
逐token分析：对于"删前面保后面"的口误，必须逐token查时间戳
检查时间跨度：如果口误时间跨度 > 2秒，必有静音，需拆分

Timestamp-driven: Review draft directly marks timestamps, no need to search text during editing
Token-by-token analysis: For slip-of-the-tongue where "delete front, keep back", must check timestamps token by token
Check time span: If the time span of slip-of-the-tongue > 2 seconds, there must be silence, need to split

输出文件

Output Files

01-xxx-v1_transcript.json  # 转录结果（含时间戳）
01-xxx-v1_审查稿.md        # 口误审查稿

01-xxx-v1_transcript.json  # Transcription result (with timestamps)
01-xxx-v1_Review Draft.md        # Slip-of-the-tongue review draft

展示要求

Display Requirements

生成审查稿后，必须展示给用户：

写入文件
```
01-xxx-v1_审查稿.md
```
读取并展示内容
等待用户确认要删除哪些项目

After generating the review draft, must show it to the user:

Write to file
```
01-xxx-v1_Review Draft.md
```
Read and display the content
Wait for user confirmation on which items to delete

方法论

Methodology

详见

tips/口误识别方法论.md

：

口误识别方法（逐句检查）
"删前面保后面"的精确处理
FunASR 时间戳对齐规则

See

tips/Slip-of-the-Tongue Recognition Methodology.md

Slip-of-the-tongue recognition methods (sentence-by-sentence check)
Precise processing of "delete front, keep back"
FunASR timestamp alignment rules