podcastcut-content
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinese内容剪辑
Podcast Content Editing
生成逐字稿 → 生成审查稿(话题级大纲 + 句子级删除建议)→ 用户审核 → 执行剪辑
Generate transcript → Generate review draft (topic-level outline + sentence-level deletion suggestions) → User review → Execute editing
快速使用
Quick Start
用户: 帮我剪掉播客里的废话
用户: 内容剪辑
用户: 生成逐字稿,标记要删的内容User: Help me cut the nonsense in the podcast
User: content edit
User: Generate a transcript and mark the content to be deleted输入
Input
- 音频/视频文件
- (可选)说话人名字列表,如
["Maia", "响歌歌", "安安"]
- Audio/video files
- (Optional) List of speaker names, e.g.,
["Maia", "响歌歌", "安安"]
输出
Output
- 审查稿 - 一个文件包含:内容大纲 + 完整逐字稿 + 删除标记
- 确认后 - 执行剪辑
注意:不单独输出逐字稿文件,审查稿已包含完整逐字稿
- Review Draft - A file containing: content outline + full transcript + deletion marks
- After Confirmation - Execute editing
Note: No separate transcript file will be output, as the review draft already includes the full transcript
两个层级配合使用
Two Levels for Collaborative Use
| 层级 | 位置 | 粒度 | 适用场景 |
|---|---|---|---|
| 话题级 | 审查稿 第一部分(内容大纲) | 5-30分钟/块 | 快速粗剪,删除整段跑题/闲聊 |
| 句子级 | 审查稿 第三部分(正文) | 逐句内联标记 | 精细调整,查看上下文 |
| Level | Position | Granularity | Applicable Scenario |
|---|---|---|---|
| Topic-level | First part of the review draft (content outline) | 5-30 minutes per block | Quick rough cut, delete entire off-topic/chit-chat segments |
| Sentence-level | Third part of the review draft (main text) | Inline sentence marking | Fine adjustment, view context |
流程
Workflow
1. 转录音频(FunASR + 句子级时间戳 + 说话人分离)
↓
2. 静音检测(FFmpeg silencedetect,识别大段空白)
↓
3. 生成逐字稿(带说话人标签)
↓
4. AI分析:识别话题结构 + 标记建议删除
↓
5. 输出审查稿(内容大纲 + 静音片段 + 删除建议)
↓
【用户在审查稿上直接修改删除标记】
↓
6. 执行剪辑 → /podcastcut-edit(从审查稿解析删除标记)1. Transcribe audio (FunASR + sentence-level timestamps + speaker diarization)
↓
2. Silence detection (FFmpeg silencedetect, identify long blank segments)
↓
3. Generate transcript (with speaker labels)
↓
4. AI analysis: Identify topic structure + mark suggested deletions
↓
5. Output review draft (content outline + silence segments + deletion suggestions)
↓
【User directly modifies deletion marks on the review draft】
↓
6. Execute editing → /podcastcut-edit (parse deletion marks from review draft)技术说明
Technical Notes
| 功能 | 实现 |
|---|---|
| 转录 | FunASR(必须用完整模型路径,见下方代码) |
| 时间戳 | 句子级(自动返回 |
| 说话人分离 | FunASR 内置 CAM++ 模型 |
| 静音检测 | FFmpeg |
| Feature | Implementation |
|---|---|
| Transcription | FunASR (must use full model path, see code below) |
| Timestamps | Sentence-level (automatically returns |
| Speaker Diarization | Built-in CAM++ model in FunASR |
| Silence Detection | FFmpeg |
⚠️ 转录必须使用脚本
⚠️ Must Use Script for Transcription
不要自己写代码,直接调用现成脚本:
bash
undefinedDo not write your own code, directly call the existing script:
bash
undefined转录(输出 podcast_transcript.json)
Transcribe (output podcast_transcript.json)
python ~/.claude/skills/podcastcut-content/scripts/transcribe.py <音频文件> <输出目录>
python ~/.claude/skills/podcastcut-content/scripts/transcribe.py <audio file> <output directory>
生成逐字稿(输出 podcast_逐字稿.md)
Generate transcript (output podcast_逐字稿.md)
python ~/.claude/skills/podcastcut-content/scripts/generate_transcript.py
<transcript.json> <输出.md> '{"0":"响歌歌","1":"麦雅","2":"安安"}'
<transcript.json> <输出.md> '{"0":"响歌歌","1":"麦雅","2":"安安"}'
**为什么不能自己写?** FunASR 必须使用完整模型路径 + VAD + Punc + Speaker 四个模型才能获取 `sentence_info`。简化写法(如 `model="paraformer-zh"`)会导致转录失败。
> 性能参考、常见问题见 `tips/转录最佳实践.md`
**为什么用句子级而非字符级?** 字符级 + 说话人分离在长音频上不稳定(OOM、对齐错误)
**本 Skill 只删除整句**,更精细的删除(半句、语气词)留给 `/podcastcut-transcribe`
---python ~/.claude/skills/podcastcut-content/scripts/generate_transcript.py
<transcript.json> <output.md> '{"0":"响歌歌","1":"麦雅","2":"安安"}'
<transcript.json> <output.md> '{"0":"响歌歌","1":"麦雅","2":"安安"}'
**Why can't you write your own code?** FunASR requires the full model path + VAD + Punc + Speaker four models to obtain `sentence_info`. Simplified writing (e.g., `model="paraformer-zh"`) will cause transcription failure.
> For performance references and common issues, see `tips/转录最佳实践.md`
**Why use sentence-level instead of character-level?** Character-level + speaker diarization is unstable for long audio (OOM, alignment errors after segmentation)
**This Skill only deletes entire sentences**, more fine-grained deletions (half sentences, filler words) are left to `/podcastcut-transcribe`
---逐字稿格式
Transcript Format
markdown
**Maia** 00:05
开始了。
**响歌歌** 00:06
是吗?
**Maia** 00:08
开心的OK。Hello,大家好,欢迎来到今天的5点1刻,我是主播Maia。今天我们聊点开心的话题。
**响歌歌** 00:20
我是主播响歌歌。好的,那我们开始吧。markdown
**Maia** 00:05
Let's start.
**响歌歌** 00:06
Really?
**Maia** 00:08
Great, OK. Hello everyone, welcome to today's 5:15 Podcast, I'm host Maia. Today we're going to talk about something fun.
**响歌歌** 00:20
I'm host 响歌歌. Alright, let's get started.格式规则
Format Rules
| 元素 | 格式 |
|---|---|
| 说话人 | |
| 时间戳 | |
| 内容 | 同一说话人的内容连在一起,不逐句换行 |
| 换人 | 空一行 |
| Element | Format |
|---|---|
| Speaker | |
| Timestamp | |
| Content | Content from the same speaker is concatenated, no line breaks per sentence |
| Speaker Change | Add a blank line |
审查稿格式
Review Draft Format
审查稿整合了话题级大纲和句子级删除建议,一个文件完成审核。
markdown
undefinedThe review draft integrates topic-level outlines and sentence-level deletion suggestions, allowing users to complete the review in one file.
markdown
undefined播客审查稿
Podcast Review Draft
文件: podcast.mp3
总时长: 2:08:07
File: podcast.mp3
Total Duration: 2:08:07
一、内容大纲(话题级)
I. Content Outline (Topic-level)
| # | 话题 | 时间 | 时长 | AI建议 | 原因 |
|---|---|---|---|---|---|
| 1 | 片头寒暄 + 技术调试 | 00:00 - 04:45 | 04:45 | 🗑️ 删除 | 录制准备、技术问题 |
| 2 | 正式开场 + 嘉宾介绍 | 04:45 - 07:00 | 02:15 | ✅ 保留 | 播客正式开始 |
| 3 | 闲聊:嘉宾背景 | 05:48 - 07:01 | 01:13 | 🗑️ 删除 | 与主题无关 |
| 4 | 主题讨论 | 07:01 - 40:00 | 32:59 | ✅ 保留 | 核心内容 |
| 5 | 录制讨论(中间) | 49:59 - 51:13 | 01:14 | 🗑️ 删除 | 讨论剪辑事宜 |
统计: 建议保留 2:03:21 | 建议删除 08:12
操作: 或
删除话题 1, 3, 5只保留话题 2, 4| # | Topic | Time | Duration | AI Suggestion | Reason |
|---|---|---|---|---|---|
| 1 | Opening small talk + technical debugging | 00:00 - 04:45 | 04:45 | 🗑️ Delete | Recording preparation, technical issues |
| 2 | Official opening + guest introduction | 04:45 - 07:00 | 02:15 | ✅ Keep | Official start of the podcast |
| 3 | Chit-chat: Guest background | 05:48 - 07:01 | 01:13 | 🗑️ Delete | Irrelevant to the topic |
| 4 | Topic discussion | 07:01 - 40:00 | 32:59 | ✅ Keep | Core content |
| 5 | Recording discussion (middle) | 49:59 - 51:13 | 01:14 | 🗑️ Delete | Discussion on editing matters |
Statistics: Suggested to keep 2:03:21 | Suggested to delete 08:12
Operation: or
Delete topics 1, 3, 5Keep only topics 2, 4二、静音片段
II. Silence Segments
| # | 时间 | 时长 | 位置说明 |
|---|---|---|---|
| 1 | 12:34 - 12:48 | 00:14 | 话题2和话题3之间 |
| 2 | 35:20 - 35:58 | 00:38 | 嘉宾思考停顿 |
| 3 | 1:02:15 - 1:02:45 | 00:30 | 中途断线/静音 |
统计: 共 3 处静音,总时长 01:22
操作: 或 (保留2,可能是有意停顿)
删除所有静音删除静音 1, 3| # | Time | Duration | Location Description |
|---|---|---|---|
| 1 | 12:34 - 12:48 | 00:14 | Between Topic 2 and Topic 3 |
| 2 | 35:20 - 35:58 | 00:38 | Guest's thinking pause |
| 3 | 1:02:15 - 1:02:45 | 00:30 | Mid-recording disconnection/silence |
Statistics: Total 3 silence segments, total duration 01:22
Operation: or (keep 2, may be intentional pause)
Delete all silenceDelete silence 1, 3三、统计
III. Statistics
- 总句子数: 3390
- 建议删除: 377 处
- 静音片段: 3 处(01:22)
- Total sentences: 3390
- Suggested deletions: 377 instances
- Silence segments: 3 instances (01:22)
按类型
By Type
- 片头寒暄: 31处
- 闲聊-个人背景: 23处
- 技术调试: 15处
- 录制讨论: 6处
- 隐私-公司名: 5处
- 隐私-地点: 4处
- 隐私-学校名: 3处
- Opening small talk: 31 instances
- Chit-chat - personal background: 23 instances
- Technical debugging: 15 instances
- Recording discussion: 6 instances
- Privacy - company name: 5 instances
- Privacy - location: 4 instances
- Privacy - school name: 3 instances
四、正文(逐字稿 + 删除标记)
IV. Main Text (Transcript + Deletion Marks)
⚠️ 必须包含完整逐字稿,从第一句到最后一句,不能省略任何内容!
错误做法: ← 不允许!
正确做法:输出所有句子,无论是否标记删除
(后续内容为主题讨论,保留...)完整逐字稿,AI建议删除的内容用 删除线 标记并注明原因。同一说话人的内容连在一起,不逐句换行。
响歌歌 00:00
行呗,对,你们应该没有听到噪音吧。因为我记得好像上一上,上次我们是用这个是有噪音的,就跟那个心理心理咨询那那那次嗯,对,那这次应该好了。
[删除: 片头寒暄]麦雅 00:23
我把这个这个 dog 打开好哦,
[删除: 片头寒暄]安安 00:27
我也打开一下自我介绍。
[删除: 片头寒暄]...
麦雅 04:50
Hello,大家好,欢迎来到今天的五点一刻,我是主播麦雅。
响歌歌 04:58
我是主播响歌歌。
麦雅 05:02
今天我们请到了一位特别的嘉宾安安。
安安 05:08
大家好,我是安安。
...
安安 15:32
我之前在Google工作的时候 我之前工作的时候,遇到过类似的情况。
[删除: 隐私-公司名]...
麦雅 49:59
这段要不要剪掉?
[删除: 录制讨论]响歌歌 50:02
嗯,回头看看吧。
[删除: 录制讨论]安安 50:05
我觉得可以保留。
[删除: 录制讨论]...
(完整逐字稿继续...)
undefined⚠️ Must include the full transcript, from the first sentence to the last, no content can be omitted!
Incorrect practice: ← Not allowed!
Correct practice: Output all sentences, regardless of whether they are marked for deletion
(The following content is topic discussion, keep...)Full transcript, content suggested for deletion by AI is marked with strikethrough and the reason is noted. Content from the same speaker is concatenated, no line breaks per sentence.
响歌歌 00:00
Alright, right, you shouldn't hear any noise. Because I remember last time we used this there was noise, just like that psychological counseling session, um, right, this time it should be better.
[Delete: Opening small talk]麦雅 00:23
I'll turn on this dog, okay
[Delete: Opening small talk]安安 00:27
I'll also turn on my self-introduction.
[Delete: Opening small talk]...
麦雅 04:50
Hello everyone, welcome to today's 5:15 Podcast, I'm host 麦雅.
响歌歌 04:58
I'm host 响歌歌.
麦雅 05:02
Today we have a special guest, 安安.
安安 05:08
Hello everyone, I'm 安安.
...
安安 15:32
When I worked at Google before When I worked before, I encountered a similar situation.
[Delete: Privacy - company name]...
麦雅 49:59
Should we cut this segment?
[Delete: Recording discussion]响歌歌 50:02
Hmm, let's check later.
[Delete: Recording discussion]安安 50:05
I think we can keep it.
[Delete: Recording discussion]...
(Full transcript continues...)
undefined结构说明
Structure Description
| 部分 | 内容 | 用途 |
|---|---|---|
| 一、内容大纲 | 话题级表格 | 快速了解结构,整块删除 |
| 二、静音片段 | 大段空白列表 | 删除无声段落 |
| 三、统计 | 删除数量按类型汇总 | 一眼看出删除规模 |
| 四、正文 | 完整逐字稿 + 内联删除标记 | 查看上下文,逐句审核 |
| Section | Content | Purpose |
|---|---|---|
| I. Content Outline | Topic-level table | Quickly understand structure, delete entire blocks |
| II. Silence Segments | List of long blank segments | Delete silent paragraphs |
| III. Statistics | Summary of deletion counts by type | Quickly see the scale of deletions |
| IV. Main Text | Full transcript + inline deletion marks | View context, review sentence by sentence |
话题识别规则
Topic Identification Rules
| 话题类型 | 识别方式 |
|---|---|
| 片头寒暄 | 正式开场("大家好")之前的内容 |
| 正式开场 | "Hello/大家好" 开始的段落 |
| 闲聊 | 与主题无关的个人背景讨论 |
| 主题讨论 | 围绕播客主题的核心内容 |
| 录制讨论 | 讨论剪辑、内容取舍的段落 |
| 片尾 | "好,那今天就到这" 等收尾语 |
| Topic Type | Identification Method |
|---|---|
| Opening small talk | Content before the official opening ("Hello everyone") |
| Official opening | Paragraph starting with "Hello/Hello everyone" |
| Chit-chat | Discussion of personal background irrelevant to the topic |
| Topic discussion | Core content around the podcast topic |
| Recording discussion | Paragraphs discussing editing, content selection |
| Closing | Concluding remarks like "Alright, that's it for today" |
删除类型
Deletion Types
⚠️ 分工原则
⚠️ Division of Labor Principle
| Skill | 关注点 | 处理内容 | 时间戳粒度 |
|---|---|---|---|
| 内容语义 | 片头、跑题、隐私、啰嗦、大段静音 | 句子级 |
| 口误技术 | 语气词、口误、短停顿、半句删除 | 字符级 |
本 Skill 聚焦内容层面:什么该删、什么该留,是语义判断,删除整句。
口误识别是技术层面:需要更精细的规则(重复字、停顿模式),使用字符级时间戳。
为什么这样分工?
- 句子级转录 + 说话人分离 = 说话人准确
- 字符级转录 + 说话人分离 = 说话人容易错位(长音频 OOM,分段后合并对齐困难)
- 先删大段内容(句子级),再精细处理剩余部分(字符级)
| Skill | Focus | Processed Content | Timestamp Granularity |
|---|---|---|---|
| Content semantics | Opening, off-topic, privacy, redundancy, long silence | Sentence-level |
| Verbal error technology | Filler words, verbal errors, short pauses, half-sentence deletion | Character-level |
This Skill focuses on content level: What to delete and what to keep is a semantic judgment, deleting entire sentences.
Verbal error identification is technical level: Requires more fine-grained rules (repeated characters, pause patterns), using character-level timestamps.
Why this division of labor?
- Sentence-level transcription + speaker diarization = accurate speaker identification
- Character-level transcription + speaker diarization = easy speaker misalignment (OOM for long audio, alignment difficulties after segmentation)
- First delete large segments (sentence-level), then process the remaining content (character-level)
内容删除类型(本 Skill 处理)
Content Deletion Types (Processed by This Skill)
| 类型 | 标记 | 示例 |
|---|---|---|
| 片头寒暄 | | "开始了吗?" "能听到吗?" |
| 片尾闲聊 | | "好,那就这样" "拜拜" |
| 录制相关 | | "这段重录" "等下剪掉" |
| 跑题内容 | | 与主题无关的讨论 |
| 啰嗦重复 | | 大段重复表达同一观点 |
| 隐私-公司 | | "我在Google工作" |
| 隐私-人名 | | "我同事张三说" |
| 隐私-地点 | | "我住在xxx" |
| 长静音 | 审查稿第二部分单独列出 | 3秒以上的无声片段 |
| Type | Mark | Example |
|---|---|---|
| Opening small talk | | "Shall we start?" "Can you hear me?" |
| Closing chit-chat | | "Alright, that's it" "Bye" |
| Recording-related | | "Re-record this segment" "Cut this later" |
| Off-topic content | | Discussion irrelevant to the topic |
| Redundant repetition | | Large segments repeating the same point |
| Privacy - company | | "I work at Google" |
| Privacy - personal name | | "My colleague Zhang San said" |
| Privacy - location | | "I live in xxx" |
| Long silence | Listed separately in the second part of the review draft | Silence over 3 seconds |
口误删除类型(由 /podcastcut-transcribe 处理)
Verbal Error Deletion Types (Processed by /podcastcut-transcribe)
| 类型 | 说明 |
|---|---|
| 口头禅/语气词 | "嗯"、"就是说"、"然后"、"对对对" |
| 口误 | 说错了重说 |
| 短停顿 | 句中小停顿(< 3秒) |
注意:大段静音(≥3秒)由本 Skill 处理,短停顿由 处理。
/podcastcut-transcribe为什么口头禅不在这里处理? 口头禅的识别需要更精细的规则(连续重复、停顿模式),属于技术层面而非内容语义。
| Type | Description |
|---|---|
| Fillers/modal particles | "Um", "I mean", "Then", "Right right right" |
| Verbal errors | Misspoken words and corrections |
| Short pauses | Small pauses within sentences (< 3 seconds) |
Note: Long silence (≥3 seconds) is processed by this Skill, short pauses are processed by .
/podcastcut-transcribeWhy not process fillers here? Identification of fillers requires more fine-grained rules (continuous repetition, pause patterns), which is technical rather than content semantic.
AI分析方法
AI Analysis Method
⚠️ 必须使用 Claude 做语义分析
⚠️ Must Use Claude for Semantic Analysis
关键词匹配不够用! 基于规则的方法无法识别:
- 语义层面的跑题/闲聊(没有明显关键词)
- 嘉宾背景介绍后的闲聊(住在哪里、哪年毕业、学校怎么样)
- 隐藏的录制讨论(没有"剪掉"等关键词)
必须用 Claude 分段分析逐字稿,质量为先。
Keyword matching is not sufficient! Rule-based methods cannot identify:
- Semantic off-topic/chit-chat (no obvious keywords)
- Chit-chat after guest introduction (where they live, graduation year, school conditions)
- Hidden recording discussions (no keywords like "cut")
Must use Claude to analyze the transcript in segments, prioritize quality.
分析流程
Analysis Workflow
1. 将逐字稿按15分钟分段
2. 每段发送给 Claude 分析,识别建议删除的内容
3. Claude 返回:句子索引 + 删除类型 + 原因
4. 合并所有段的结果,生成审查稿1. Split the transcript into 15-minute segments
2. Send each segment to Claude for analysis, identify content suggested for deletion
3. Claude returns: sentence index + deletion type + reason
4. Merge results from all segments, generate review draftClaude 分析 Prompt
Claude Analysis Prompt
对每段逐字稿,使用以下 prompt:
你是播客内容审核助手。分析以下逐字稿,识别建议删除的句子。For each transcript segment, use the following prompt:
You are a podcast content review assistant. Analyze the following transcript and identify sentences suggested for deletion.删除类型
Deletion Types
- 片头寒暄:正式开场("大家好")之前的闲聊、技术调试
- 录制讨论:讨论剪辑、录制状态、技术问题、"这段要不要剪"
- 隐私-公司名:提到具体公司名(Google、Meta、字节等)
- 隐私-学校名:提到具体学校名(Stanford、清华等)
- 隐私-地点:提到具体地点(Palo Alto、硅谷等)
- 隐私-人名:提到具体人名(非公众人物)
- 跑题/闲聊:与播客主题无关的讨论(个人背景闲聊、地理讨论等)
- 啰嗦重复:同一观点反复说、大段重复
- Opening small talk: Chit-chat, technical debugging before the official opening ("Hello everyone")
- Recording discussion: Discussion of editing, recording status, technical issues, "Should we cut this segment"
- Privacy - company name: Mention of specific company names (Google, Meta, ByteDance, etc.)
- Privacy - school name: Mention of specific school names (Stanford, Tsinghua, etc.)
- Privacy - location: Mention of specific locations (Palo Alto, Silicon Valley, etc.)
- Privacy - personal name: Mention of specific personal names (non-public figures)
- Off-topic/chit-chat: Discussion irrelevant to the podcast topic (personal background chit-chat, geographic discussion, etc.)
- Redundant repetition: Repeating the same point multiple times, large segments of repetition
输出格式
Output Format
对每个建议删除的句子,输出:
- 句子时间戳
- 删除类型
- 原因(简短说明)
只标记需要删除的句子,不需要标记的跳过。
For each sentence suggested for deletion, output:
- Sentence timestamp
- Deletion type
- Reason (brief description)
Only mark sentences that need deletion, skip those that don't.
逐字稿
Transcript
{transcript_segment}
undefined{transcript_segment}
undefined删除类型详解
Detailed Deletion Type Explanations
| 类型 | 识别要点 |
|---|---|
| 片头寒暄 | 正式开场前的所有内容,包括技术调试、聊天 |
| 录制讨论 | "剪掉"、"录上了吗"、"这段太敏感"、"回头剪" |
| 隐私信息 | 公司名、学校名、地点、人名 |
| 跑题/闲聊 | 与主题无关:住哪里、哪年来的、学校怎么样 |
| 啰嗦重复 | 同一意思说3遍以上 |
| Type | Identification Key Points |
|---|---|
| Opening small talk | All content before the official opening, including technical debugging and chit-chat |
| Recording discussion | "Cut", "Did we record that?", "This is too sensitive", "Cut later" |
| Privacy information | Company names, school names, locations, personal names |
| Off-topic/chit-chat | Irrelevant to the topic: where they live, when they arrived, how the school is |
| Redundant repetition | The same meaning repeated more than 3 times |
闲聊检测重点
Chit-chat Detection Focus
嘉宾介绍后的闲聊特别容易漏检,注意这些信号:
- 突然出现地名、学校名、年份
- "你在哪个area" "你是哪年来的" "那边怎么样"
- 连续多句讨论非主题内容(地理、学校、城市比较)
Chit-chat after guest introduction is particularly easy to miss, watch for these signals:
- Sudden appearance of place names, school names, years
- "Which area do you live in" "When did you arrive" "How is it over there"
- Multiple consecutive sentences discussing non-topic content (geography, school, city comparison)
录制讨论检测重点
Recording Discussion Detection Focus
不只在片头! 全程可能出现:
- 技术问题:"能听见吗"、"断了"、"耳机没电"
- 内容顾虑:"太低调了"、"不想share"、"细节不说"
- 剪辑讨论:"回头剪掉"、"这段要不要"
Not just at the opening! May appear throughout the podcast:
- Technical issues: "Can you hear me", "Disconnected", "Headphone battery dead"
- Content concerns: "Too low-key", "Don't want to share", "Don't mention details"
- Editing discussion: "Cut later", "Should we keep this"
静音检测方法
Silence Detection Method
使用 FFmpeg 的 滤镜检测大段空白。
silencedetectUse FFmpeg's filter to detect large blank segments.
silencedetect检测命令
Detection Command
bash
ffmpeg -i video.mp4 -af "silencedetect=noise=-40dB:d=3" -f null - 2>&1 | grep silencedetect| 参数 | 说明 | 推荐值 |
|---|---|---|
| 静音阈值(低于此音量视为静音) | -40dB |
| 最小静音时长(秒) | 3(内容剪辑关注大段空白) |
bash
ffmpeg -i video.mp4 -af "silencedetect=noise=-40dB:d=3" -f null - 2>&1 | grep silencedetect| Parameter | Description | Recommended Value |
|---|---|---|
| Silence threshold (volume below this is considered silence) | -40dB |
| Minimum silence duration (seconds) | 3 (content editing focuses on large blank segments) |
输出解析
Output Parsing
[silencedetect @ 0x...] silence_start: 752.341
[silencedetect @ 0x...] silence_end: 766.512 | silence_duration: 14.171解析 和 生成静音片段列表。
silence_startsilence_end[silencedetect @ 0x...] silence_start: 752.341
[silencedetect @ 0x...] silence_end: 766.512 | silence_duration: 14.171Parse and to generate the list of silence segments.
silence_startsilence_end阈值选择
Threshold Selection
| 场景 | noise | d(最小时长) |
|---|---|---|
| 内容剪辑(本Skill) | -40dB | 3秒 |
| 口误识别(精细) | -50dB | 0.5秒 |
为什么用 3 秒? 短于 3 秒的停顿可能是自然的思考间隙,不建议删除。
| Scenario | noise | d (minimum duration) |
|---|---|---|
| Content editing (this Skill) | -40dB | 3 seconds |
| Verbal error identification (fine-grained) | -50dB | 0.5 seconds |
Why use 3 seconds? Pauses shorter than 3 seconds may be natural thinking gaps, not recommended for deletion.
输出文件
Output Files
podcast_transcript.json # 句子级时间戳 + 说话人(供剪辑使用)
podcast_审查稿.md # 审查稿(包含完整逐字稿 + 删除标记)⚠️ 只输出审查稿,不单独输出逐字稿
审查稿第四部分"正文"就是完整逐字稿,无需重复输出。
podcast_transcript.json # Sentence-level timestamps + speakers (for editing use)
podcast_审查稿.md # Review draft (includes full transcript + deletion marks)⚠️ Only output the review draft, no separate transcript file
The fourth section "Main Text" of the review draft is the full transcript, no need to output separately.
句子级 JSON 格式
Sentence-level JSON Format
json
{
"file": "podcast.mp3",
"duration": 3600.5,
"sentences": [
{"text": "大家好,", "start": 0.50, "end": 1.20, "spk": 0},
{"text": "欢迎来到今天的播客。", "start": 1.20, "end": 2.80, "spk": 0},
{"text": "我是主播小明。", "start": 2.80, "end": 3.90, "spk": 1},
...
]
}json
{
"file": "podcast.mp3",
"duration": 3600.5,
"sentences": [
{"text": "Hello everyone,", "start": 0.50, "end": 1.20, "spk": 0},
{"text": "Welcome to today's podcast.", "start": 1.20, "end": 2.80, "spk": 0},
{"text": "I'm host Xiao Ming.", "start": 2.80, "end": 3.90, "spk": 1},
...
]
}⚠️ 审查稿和删除清单必须同步
⚠️ Review Draft and Deletion List Must Be Synchronized
用户可能直接在审查稿中修改删除标记(添加/移除删除线),此时删除清单会过时。
规则:
- 审查稿是用户审核的最终来源
- 执行剪辑前,从审查稿重新解析删除标记
- 不要依赖可能过时的
podcast_删除清单.json
解析方法:扫描审查稿中 标记的文本,匹配 transcript.json 中的时间戳。
~~删除线~~Users may directly modify deletion marks in the review draft (add/remove strikethrough), making the deletion list outdated.
Rules:
- The review draft is the final source for user review
- Before executing editing, re-parse deletion marks from the review draft
- Do not rely on the potentially outdated
podcast_删除清单.json
Parsing Method: Scan the text marked with in the review draft and match the timestamps in transcript.json.
~~strikethrough~~与其他 Skill 的关系
Relationship with Other Skills
/podcastcut-content → 内容剪辑(语义层面)← 本 Skill
/podcastcut-edit → 执行剪辑
/podcastcut-transcribe → 口误识别(技术层面,可选)
/podcastcut-subtitle → 生成字幕推荐流程:
原始视频
↓
/podcastcut-content ← 标记大段内容(寒暄、跑题、啰嗦、隐私)
↓
/podcastcut-edit ← 执行删除,输出 v2
↓
【可选】还需要处理口误?
↓ 是
/podcastcut-transcribe ← 识别口误、语气词、静音
↓
/podcastcut-edit ← 执行删除,输出 v3
↓
完成为什么先删内容再处理口误?
- 大段内容删除后,视频变短
- 口误识别的转录更快,审查范围更小
- 被删掉的大段里的口误不用处理了
/podcastcut-content → Content editing (semantic level) ← This Skill
/podcastcut-edit → Execute editing
/podcastcut-transcribe → Verbal error identification (technical level, optional)
/podcastcut-subtitle → Generate subtitlesRecommended Workflow:
Original video
↓
/podcastcut-content ← Mark large segments (small talk, off-topic, redundancy, privacy)
↓
/podcastcut-edit ← Execute deletion, output v2
↓
【Optional】Need to process verbal errors?
↓ Yes
/podcastcut-transcribe ← Identify verbal errors, filler words, silence
↓
/podcastcut-edit ← Execute deletion, output v3
↓
CompletedWhy delete content first then process verbal errors?
- After deleting large segments, the video becomes shorter
- Verbal error identification transcription is faster, review scope is smaller
- No need to process verbal errors in deleted large segments
说话人分离
Speaker Diarization
FunASR 内置说话人分离功能(),自动输出说话人ID。
spk_model="cam++"FunASR has built-in speaker diarization functionality (), automatically outputting speaker IDs.
spk_model="cam++"流程
Workflow
FunASR 转录(启用 spk_model)
↓
输出带说话人ID的句子(说话人0、说话人1...)
↓
搜索自我介绍确认 ID 对应的真实人名
↓
生成审查稿时替换为真实名字FunASR transcription (enable spk_model)
↓
Output sentences with speaker IDs (speaker 0, speaker 1...)
↓
Search self-introduction phrases to confirm the real name corresponding to the ID
↓
Replace with real names when generating the review draft⚠️ 说话人映射确认方法
⚠️ Speaker Mapping Confirmation Method
不要直接用用户提供的顺序! 必须在转录结果中搜索自我介绍短语确认:
python
undefinedDo not directly use the order provided by the user! Must search self-introduction phrases in the transcription results to confirm:
python
undefined搜索关键短语确定说话人映射
Search key phrases to determine speaker mapping
key_phrases = ["我是主播", "我是xxx", "大家好我是"]
for s in sentences:
for phrase in key_phrases:
if phrase in s['text']:
print(f"spk{s['spk']}: {s['text']}") # 确认 spk ID 对应谁
undefinedkey_phrases = ["I'm the host", "I'm xxx", "Hello everyone, I'm"]
for s in sentences:
for phrase in key_phrases:
if phrase in s['text']:
print(f"spk{s['spk']}: {s['text']}") # Confirm who the spk ID corresponds to
undefined常见问题
Common Issues
| 问题 | 原因 | 解决 |
|---|---|---|
| 同一人被分成多个 ID | FunASR 识别不稳定 | 将多个 ID 映射到同一人名 |
| ID 数量多于实际人数 | 如上 | 根据自我介绍合并多余 ID |
| 分段转录后 ID 错位 | 每段 ID 独立重置 | 优先用整体转录,避免分段 |
| Issue | Cause | Solution |
|---|---|---|
| Same person divided into multiple IDs | FunASR identification instability | Map multiple IDs to the same name |
| More IDs than actual speakers | As above | Merge redundant IDs based on self-introduction |
| Speaker ID misalignment after segmented transcription | Each segment's ID resets independently | Prioritize full transcription, avoid segmentation |
限制
Limitations
| 条件 | 效果 |
|---|---|
| 2-10人对话 | 效果好 |
| 音频 < 30s | 效果下降 |
| 超过10人 | 效果下降 |
| 分段转录 | 说话人 ID 可能不一致 |
| Condition | Effect |
|---|---|
| 2-10 person conversation | Good effect |
| Audio < 30s | Decreased effect |
| More than 10 people | Decreased effect |
| Segmented transcription | Speaker IDs may be inconsistent |
进度 TodoList
Progress TodoList
启动时创建:
- [ ] 转录音频(FunASR,句子级 + 说话人分离)
- [ ] 静音检测(FFmpeg silencedetect)
- [ ] 生成逐字稿
- [ ] AI分析:识别话题结构 + 标记建议删除
- [ ] 输出审查稿(含静音片段)
- [ ] 等待用户确认Create at startup:
- [ ] Transcribe audio (FunASR, sentence-level + speaker diarization)
- [ ] Silence detection (FFmpeg silencedetect)
- [ ] Generate transcript
- [ ] AI analysis: Identify topic structure + mark suggested deletions
- [ ] Output review draft (including silence segments)
- [ ] Wait for user confirmation示例对话
Example Dialogue
用户: 帮我剪掉播客里的废话,说话人是 Maia 和响歌歌
AI: 好的,我来处理这个播客。
1. 转录音频...
2. 静音检测...
3. 生成逐字稿...
4. 分析内容...
已生成审查稿:podcast_审查稿.md
=== 内容大纲(话题级)===
| # | 话题 | 时长 | AI建议 |
|---|------|------|--------|
| 1 | 片头寒暄 | 04:45 | 🗑️ 删除 |
| 2 | 正式开场 | 02:15 | ✅ 保留 |
| 3 | 闲聊:嘉宾背景 | 01:13 | 🗑️ 删除 |
| 4 | 主题讨论 | 32:59 | ✅ 保留 |
=== 句子级统计(按类型)===
- 片头寒暄:12处
- 录制讨论:8处
- 隐私信息:5处
- 跑题闲聊:3处
请在审查稿中检查删除标记,调整后告诉我执行剪辑。
用户: [在审查稿中添加/移除了一些删除标记] 好了,按审查稿剪
AI: 好的,从审查稿解析删除标记...
- 找到 25 处删除标记
- 删除总时长:06:32
执行剪辑...User: Help me cut the nonsense in the podcast, speakers are Maia and 响歌歌
AI: Alright, I'll process this podcast.
1. Transcribing audio...
2. Detecting silence...
3. Generating transcript...
4. Analyzing content...
Review draft generated: podcast_审查稿.md
=== Content Outline (Topic-level) ===
| # | Topic | Duration | AI Suggestion |
|---|------|------|--------|
| 1 | Opening small talk | 04:45 | 🗑️ Delete |
| 2 | Official opening | 02:15 | ✅ Keep |
| 3 | Chit-chat: Guest background | 01:13 | 🗑️ Delete |
| 4 | Topic discussion | 32:59 | ✅ Keep |
=== Sentence-level Statistics (By Type) ===
- Opening small talk: 12 instances
- Recording discussion: 8 instances
- Privacy information: 5 instances
- Off-topic chit-chat: 3 instances
Please check the deletion marks in the review draft, and tell me to execute editing after adjustment.
User: [Added/removed some deletion marks in the review draft] Alright, cut according to the review draft
AI: Alright, parsing deletion marks from the review draft...
- Found 25 deletion marks
- Total deletion duration: 06:32
Executing editing...反馈记录
Feedback Records
2026-01-31 (深夜)
2026-01-31 (Late Night)
- 分段转录导致说话人 ID 错位:分段转录时每段的说话人 ID 独立重置,合并后同一人可能有不同 ID
- 原因:为避免 OOM 将 2 小时音频分成 13 段,每段说话人 ID 从 0 开始
- 解决:优先使用整体转录(2 小时音频约 16 分钟,不会 OOM)
- 已更新:tips/转录最佳实践.md 新增「分段 vs 整体转录」章节
- FunASR 可能把同一人识别为多个 ID:3 人对话识别出 4 个说话人 ID
- 表现:响歌歌被分成 spk1(60 句)和 spk3(560 句)
- 解决:搜索自我介绍短语("我是主播xxx")确认映射,将多个 ID 合并
- 已更新:SKILL.md「说话人分离」章节新增确认方法和常见问题
- 性能数据更新:2 小时播客实测 16 分钟,3390 句(之前估算 12 分钟、800 句偏低)
- 已更新:tips/转录最佳实践.md 性能参考表
- Speaker ID misalignment caused by segmented transcription: Speaker IDs reset independently for each segment during segmented transcription, resulting in different IDs for the same person after merging
- Cause: 2-hour audio split into 13 segments to avoid OOM, each segment's speaker ID starts from 0
- Solution: Prioritize full transcription (2-hour audio takes about 16 minutes, no OOM)
- Updated: Added "Segmented vs Full Transcription" section to tips/转录最佳实践.md
- FunASR may identify the same person as multiple IDs: 3-person conversation identified as 4 speaker IDs
- Performance: 响歌歌 was split into spk1 (60 sentences) and spk3 (560 sentences)
- Solution: Search self-introduction phrases ("I'm host xxx") to confirm mapping, merge multiple IDs
- Updated: Added confirmation method and common issues to the "Speaker Diarization" section of SKILL.md
- Updated performance data: 2-hour podcast tested to take 16 minutes, 3390 sentences (previous estimate of 12 minutes, 800 sentences was low)
- Updated: Performance reference table in tips/转录最佳实践.md
2026-01-31 (晚上)
2026-01-31 (Evening)
- 审查稿内容不完整:AI 偷懒写了 "(后续内容为主题讨论,保留...)" 而不是完整逐字稿
- 已更新:在审查稿格式"第四部分"明确标注必须输出完整内容,不允许省略
- 输出了多余的逐字稿文件:用户只需要一个审查稿(已包含完整逐字稿)
- 已更新:输出文件章节移除 ,明确只输出审查稿
podcast_逐字稿.md
- 已更新:输出文件章节移除
- Incomplete review draft content: AI took a shortcut and wrote "(The following content is topic discussion, keep...)" instead of the full transcript
- Updated: Clearly marked in the "Section IV" of the review draft format that full content must be output, no omissions allowed
- Redundant transcript file output: Users only need one review draft (which already includes the full transcript)
- Updated: Removed from the output files section, clearly stated to only output the review draft
podcast_逐字稿.md
- Updated: Removed
2026-01-31
2026-01-31
- 删除"用户确认方式"章节:用户实际操作是直接在审查稿上修改删除标记,不需要命令式操作
- 旧流程:用户输入「删除话题 1, 3」「删除所有静音」等命令
- 实际流程:用户在审查稿中添加/移除 ,然后说「按审查稿剪」
~~删除线~~ - 已更新:流程图、示例对话,移除命令式操作说明
- FunASR 调用参数错误导致转录失败:使用简化模型名无法获取
sentence_info- 错误写法:+
model="paraformer-zh"+spk_model="cam++"sentence_timestamp=True - 正确写法:必须使用完整模型路径 + VAD + Punc 四个模型
- 原因:SKILL.md 只写了简化参数,实际执行时"自由发挥"用了错误的 API
- 已更新:在 SKILL.md 中直接包含完整的调用代码,标注错误写法和正确写法对比
- 教训:可执行的代码必须完整写在 SKILL.md 中,不能只写参数名让 AI 自己拼
- 错误写法:
- Deleted "User Confirmation Method" section: Users actually operate by directly modifying deletion marks in the review draft, no need for command-based operations
- Old workflow: Users input commands like "Delete topics 1, 3" "Delete all silence"
- Actual workflow: Users add/remove in the review draft, then say "Cut according to the review draft"
~~strikethrough~~ - Updated: Flowchart, example dialogue, removed command-based operation instructions
- FunASR call parameter error caused transcription failure: Using simplified model names cannot obtain
sentence_info- Incorrect writing: +
model="paraformer-zh"+spk_model="cam++"sentence_timestamp=True - Correct writing: Must use full model path + VAD + Punc + Speaker four models
- Cause: SKILL.md only wrote simplified parameters, actual execution used incorrect API due to "free play"
- Updated: Included complete call code in SKILL.md, marked comparison between incorrect and correct writing
- Lesson: Executable code must be fully written in SKILL.md, cannot only write parameter names and let AI assemble it
- Incorrect writing:
2026-01-25
2026-01-25
- 回退到句子级时间戳:字符级 + 说话人分离在长音频上不稳定
- 问题:字符级转录后合并说话人信息,说话人对齐出错("是主播麦雅" 被归到响歌歌名下)
- 原因:
- 长音频说话人分离 OOM(2小时音频 → 234MB WAV)
- 分段说话人分离返回 0 句子(API 格式问题)
- 字符级转录没有标点,句子边界不自然
- 已更新:回退到句子级时间戳,本 Skill 只删整句
- 半句删除留给 (字符级)
/podcastcut-transcribe
- Reverted to sentence-level timestamps: Character-level + speaker diarization is unstable for long audio
- Issue: After character-level transcription and merging speaker information, speaker alignment errors occurred ("I'm host 麦雅" was attributed to 响歌歌)
- Cause:
- Speaker diarization OOM for long audio (2-hour audio → 234MB WAV)
- Segmented speaker diarization returned 0 sentences (API format issue)
- Character-level transcription has no punctuation, sentence boundaries are unnatural
- Updated: Reverted to sentence-level timestamps, this Skill only deletes entire sentences
- Half-sentence deletion is left to (character-level)
/podcastcut-transcribe
2026-01-24 (晚上)
2026-01-24 (Evening)
- 尝试升级到字符级时间戳:解决句子级无法精确删除部分句子的问题
- 问题:删除 "嗯,我可以讲一下对" 会连带删除后半句 "这期嘉宾其实想哥哥邀请的"
- 尝试:使用 30s 分段 + 获取字符级时间戳
timestamp_granularity="character" - 结果:字符级转录成功,但说话人分离失败,导致说话人对齐错误
- 最终决定:回退到句子级(见 2026-01-25 反馈)
- Attempted to upgrade to character-level timestamps: Solved the problem that sentence-level cannot accurately delete parts of sentences
- Issue: Deleting "Um, I can talk about why" would also delete the second half "This episode's guest was actually invited by 响歌歌"
- Attempt: Use 30s segmentation + to obtain character-level timestamps
timestamp_granularity="character" - Result: Character-level transcription succeeded, but speaker diarization failed, leading to speaker alignment errors
- Final decision: Revert to sentence-level (see 2026-01-25 feedback)
2026-01-24
2026-01-24
- 新增静音检测功能:使用 FFmpeg silencedetect 识别大段空白(≥3秒),在审查稿中单独列出供用户确认删除
- 审查稿标记删除但实际没剪掉:审查稿中整句标记删除,但删除清单只有部分内容
- 案例:审查稿 ,删除清单只有
~~嗯,这个要就是具体为什么...~~嗯, - 已更新:强调审查稿和删除清单必须同步,执行剪辑前从审查稿重新解析
- 案例:审查稿
- Added silence detection function: Use FFmpeg silencedetect to identify long blank segments (≥3 seconds), listed separately in the review draft for user confirmation to delete
- Review draft marked for deletion but not actually cut: Entire sentences marked for deletion in the review draft, but only part of the content was in the deletion list
- Case: Review draft , deletion list only had
~~Um, this is why specifically...~~Um, - Updated: Emphasized that the review draft and deletion list must be synchronized, re-parse from the review draft before executing editing
- Case: Review draft
2026-01-18 (下午)
2026-01-18 (Afternoon)
- 逐字稿/审查稿格式调整:同一说话人的内容连在一起,不逐句换行
- 原格式:每句一行
- 新格式:同一说话人的所有句子连在一段里
- 优点:更紧凑,阅读体验更好
- Adjusted transcript/review draft format: Content from the same speaker is concatenated, no line breaks per sentence
- Original format: One line per sentence
- New format: All sentences from the same speaker are in one paragraph
- Advantage: More compact, better reading experience
2026-01-18
2026-01-18
- 必须使用 Claude 做语义分析:基于规则的关键词匹配质量不够
- 问题:无法识别语义层面的跑题/闲聊、隐藏的录制讨论
- 已更新:新增「AI分析方法」章节,明确必须用 Claude 分段分析逐字稿
- 包含:分析流程、Claude prompt 模板、删除类型详解
- Must use Claude for semantic analysis: Rule-based keyword matching is not sufficient in quality
- Issue: Cannot identify semantic off-topic/chit-chat, hidden recording discussions
- Updated: Added "AI Analysis Method" section, clearly stated that Claude must be used to analyze the transcript in segments
- Included: Analysis workflow, Claude prompt template, detailed deletion type explanations
2026-01-17 (晚上)
2026-01-17 (Evening)
- 审查稿第二部分格式调整:改为完整逐字稿 + 内联删除标记
- 原格式:按话题分组 → 每个话题下列出删除建议(表格形式)
- 新格式:完整逐字稿(正文),删除内容用 +
~~删除线~~内联标记[删除: 原因] - 优点:保留完整上下文,用户可以看到前后文再决定是否删除
- Adjusted format of Section II of the review draft: Changed to full transcript + inline deletion marks
- Original format: Grouped by topic → deletion suggestions listed under each topic (table form)
- New format: Full transcript (main text), deleted content marked with +
~~strikethrough~~inline[Delete: Reason] - Advantage: Retains full context, users can see the context before deciding whether to delete
2026-01-17 (下午)
2026-01-17 (Afternoon)
- 大块删除剪不干净:连续句子都标记删除,但剪辑时逐句删除,保留了句间空白
- 原因:每句独立处理,没有合并连续同理由的删除
- 已更新:明确分工,本 Skill 聚焦句子级,语气词/口头禅由 处理
/podcastcut-transcribe - 剪辑规则已同步更新到
/podcastcut-edit
- 不要在句子级标记语气词:句子级时间戳不够精确,删语气词容易误删
- 已更新:删除类型分为「句子级」和「字符级」,明确分工
- Large segment deletion not clean: Consecutive sentences marked for deletion, but sentence-by-sentence deletion retained blank spaces between sentences
- Cause: Each sentence processed independently, no merging of consecutive deletions with the same reason
- Updated: Clarified division of labor, this Skill focuses on sentence-level, filler words/verbal fillers processed by
/podcastcut-transcribe - Editing rules have been synchronized to
/podcastcut-edit
- Do not mark filler words at sentence-level: Sentence-level timestamps are not precise enough, deleting filler words is easy to cause accidental deletion
- Updated: Divided deletion types into "sentence-level" and "character-level", clarified division of labor
2026-01-17 (上午)
2026-01-17 (Morning)
- 录制讨论和技术调试可能出现在播客任何位置,不只是片头
- 已更新:录制相关检测改为全程检测,增加技术问题关键词和连续段落检测
- 嘉宾介绍后的闲聊(住在哪、哪年来的、学校怎么样)容易漏检
- 已更新:跑题/闲聊检测增加信号模式(地名、学校名、年份相关对话)
- 逐句审核效率低,用户希望能看到全局结构、整块删除
- 已新增:审查稿整合话题级大纲和句子级删除建议,一个文件完成审核
- Recording discussions and technical debugging may appear anywhere in the podcast, not just at the opening
- Updated: Recording-related detection changed to full-process detection, added technical issue keywords and continuous paragraph detection
- Chit-chat after guest introduction (where they live, when they arrived, how the school is) is easy to miss
- Updated: Added signal patterns (place names, school names, year-related conversations) to off-topic/chit-chat detection
- Sentence-by-sentence review is inefficient, users want to see the global structure and delete entire blocks
- Added: Review draft integrates topic-level outline and sentence-level deletion suggestions, complete review in one file