podcastcut-content

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

内容剪辑

Podcast Content Editing

生成逐字稿 → 生成审查稿（话题级大纲 + 句子级删除建议）→ 用户审核 → 执行剪辑

Generate transcript → Generate review draft (topic-level outline + sentence-level deletion suggestions) → User review → Execute editing

快速使用

Quick Start

用户: 帮我剪掉播客里的废话
用户: 内容剪辑
用户: 生成逐字稿，标记要删的内容

User: Help me cut the nonsense in the podcast
User: content edit
User: Generate a transcript and mark the content to be deleted

输入

Input

音频/视频文件
（可选）说话人名字列表，如
```
["Maia", "响歌歌", "安安"]
```

Audio/video files
(Optional) List of speaker names, e.g.,
```
["Maia", "响歌歌", "安安"]
```

输出

Output

审查稿 - 一个文件包含：内容大纲 + 完整逐字稿 + 删除标记
确认后 - 执行剪辑

注意：不单独输出逐字稿文件，审查稿已包含完整逐字稿

Review Draft - A file containing: content outline + full transcript + deletion marks
After Confirmation - Execute editing

Note: No separate transcript file will be output, as the review draft already includes the full transcript

两个层级配合使用

Two Levels for Collaborative Use

层级	位置	粒度	适用场景
话题级	审查稿第一部分（内容大纲）	5-30分钟/块	快速粗剪，删除整段跑题/闲聊
句子级	审查稿第三部分（正文）	逐句内联标记	精细调整，查看上下文

Level	Position	Granularity	Applicable Scenario
Topic-level	First part of the review draft (content outline)	5-30 minutes per block	Quick rough cut, delete entire off-topic/chit-chat segments
Sentence-level	Third part of the review draft (main text)	Inline sentence marking	Fine adjustment, view context

流程

Workflow

1. 转录音频（FunASR + 句子级时间戳 + 说话人分离）
    ↓
2. 静音检测（FFmpeg silencedetect，识别大段空白）
    ↓
3. 生成逐字稿（带说话人标签）
    ↓
4. AI分析：识别话题结构 + 标记建议删除
    ↓
5. 输出审查稿（内容大纲 + 静音片段 + 删除建议）
    ↓
【用户在审查稿上直接修改删除标记】
    ↓
6. 执行剪辑 → /podcastcut-edit（从审查稿解析删除标记）

1. Transcribe audio (FunASR + sentence-level timestamps + speaker diarization)
    ↓
2. Silence detection (FFmpeg silencedetect, identify long blank segments)
    ↓
3. Generate transcript (with speaker labels)
    ↓
4. AI analysis: Identify topic structure + mark suggested deletions
    ↓
5. Output review draft (content outline + silence segments + deletion suggestions)
    ↓
【User directly modifies deletion marks on the review draft】
    ↓
6. Execute editing → /podcastcut-edit (parse deletion marks from review draft)

技术说明

Technical Notes

功能	实现
转录	FunASR（必须用完整模型路径，见下方代码）
时间戳	句子级（自动返回 `sentence_info` ）
说话人分离	FunASR 内置 CAM++ 模型
静音检测	FFmpeg `silencedetect` （阈值 -40dB，最小时长 3s）

Feature	Implementation
Transcription	FunASR (must use full model path, see code below)
Timestamps	Sentence-level (automatically returns `sentence_info` )
Speaker Diarization	Built-in CAM++ model in FunASR
Silence Detection	FFmpeg `silencedetect` (threshold -40dB, minimum duration 3s)

⚠️ 转录必须使用脚本

⚠️ Must Use Script for Transcription

不要自己写代码，直接调用现成脚本：

bash

undefined

Do not write your own code, directly call the existing script:

bash

undefined

转录（输出 podcast_transcript.json）

Transcribe (output podcast_transcript.json)

python ~/.claude/skills/podcastcut-content/scripts/transcribe.py <音频文件> <输出目录>

python ~/.claude/skills/podcastcut-content/scripts/transcribe.py <audio file> <output directory>

生成逐字稿（输出 podcast_逐字稿.md）

Generate transcript (output podcast_逐字稿.md)

python ~/.claude/skills/podcastcut-content/scripts/generate_transcript.py
<transcript.json> <输出.md> '{"0":"响歌歌","1":"麦雅","2":"安安"}'


**为什么不能自己写？** FunASR 必须使用完整模型路径 + VAD + Punc + Speaker 四个模型才能获取 `sentence_info`。简化写法（如 `model="paraformer-zh"`）会导致转录失败。

> 性能参考、常见问题见 `tips/转录最佳实践.md`

**为什么用句子级而非字符级？** 字符级 + 说话人分离在长音频上不稳定（OOM、对齐错误）

**本 Skill 只删除整句**，更精细的删除（半句、语气词）留给 `/podcastcut-transcribe`

---

python ~/.claude/skills/podcastcut-content/scripts/generate_transcript.py
<transcript.json> <output.md> '{"0":"响歌歌","1":"麦雅","2":"安安"}'


**Why can't you write your own code?** FunASR requires the full model path + VAD + Punc + Speaker four models to obtain `sentence_info`. Simplified writing (e.g., `model="paraformer-zh"`) will cause transcription failure.

> For performance references and common issues, see `tips/转录最佳实践.md`

**Why use sentence-level instead of character-level?** Character-level + speaker diarization is unstable for long audio (OOM, alignment errors after segmentation)

**This Skill only deletes entire sentences**, more fine-grained deletions (half sentences, filler words) are left to `/podcastcut-transcribe`

---

逐字稿格式

Transcript Format

markdown

**Maia** 00:05
开始了。

**响歌歌** 00:06
是吗？

**Maia** 00:08
开心的OK。Hello，大家好，欢迎来到今天的5点1刻，我是主播Maia。今天我们聊点开心的话题。

**响歌歌** 00:20
我是主播响歌歌。好的，那我们开始吧。

markdown

**Maia** 00:05
Let's start.

**响歌歌** 00:06
Really?

**Maia** 00:08
Great, OK. Hello everyone, welcome to today's 5:15 Podcast, I'm host Maia. Today we're going to talk about something fun.

**响歌歌** 00:20
I'm host 响歌歌. Alright, let's get started.

格式规则

Format Rules

元素	格式
说话人	`名字` 加粗
时间戳	`MM:SS` 或 `HH:MM:SS` （说话人开始时间）
内容	同一说话人的内容连在一起，不逐句换行
换人	空一行

Element	Format
Speaker	`Name` in bold
Timestamp	`MM:SS` or `HH:MM:SS` (start time of the speaker)
Content	Content from the same speaker is concatenated, no line breaks per sentence
Speaker Change	Add a blank line

审查稿格式

Review Draft Format

审查稿整合了话题级大纲和句子级删除建议，一个文件完成审核。

markdown

undefined

The review draft integrates topic-level outlines and sentence-level deletion suggestions, allowing users to complete the review in one file.

markdown

undefined

播客审查稿

Podcast Review Draft

文件: podcast.mp3 总时长: 2:08:07

File: podcast.mp3 Total Duration: 2:08:07

一、内容大纲（话题级）

I. Content Outline (Topic-level)

#	话题	时间	时长	AI建议	原因
1	片头寒暄 + 技术调试	00:00 - 04:45	04:45	🗑️ 删除	录制准备、技术问题
2	正式开场 + 嘉宾介绍	04:45 - 07:00	02:15	✅ 保留	播客正式开始
3	闲聊：嘉宾背景	05:48 - 07:01	01:13	🗑️ 删除	与主题无关
4	主题讨论	07:01 - 40:00	32:59	✅ 保留	核心内容
5	录制讨论（中间）	49:59 - 51:13	01:14	🗑️ 删除	讨论剪辑事宜

统计: 建议保留 2:03:21 | 建议删除 08:12

操作:

删除话题 1, 3, 5

或

只保留话题 2, 4

#	Topic	Time	Duration	AI Suggestion	Reason
1	Opening small talk + technical debugging	00:00 - 04:45	04:45	🗑️ Delete	Recording preparation, technical issues
2	Official opening + guest introduction	04:45 - 07:00	02:15	✅ Keep	Official start of the podcast
3	Chit-chat: Guest background	05:48 - 07:01	01:13	🗑️ Delete	Irrelevant to the topic
4	Topic discussion	07:01 - 40:00	32:59	✅ Keep	Core content
5	Recording discussion (middle)	49:59 - 51:13	01:14	🗑️ Delete	Discussion on editing matters

Statistics: Suggested to keep 2:03:21 | Suggested to delete 08:12

Operation:

Delete topics 1, 3, 5

Keep only topics 2, 4

二、静音片段

II. Silence Segments

#	时间	时长	位置说明
1	12:34 - 12:48	00:14	话题2和话题3之间
2	35:20 - 35:58	00:38	嘉宾思考停顿
3	1:02:15 - 1:02:45	00:30	中途断线/静音

统计: 共 3 处静音，总时长 01:22

操作:

删除所有静音

或

删除静音 1, 3

（保留2，可能是有意停顿）

#	Time	Duration	Location Description
1	12:34 - 12:48	00:14	Between Topic 2 and Topic 3
2	35:20 - 35:58	00:38	Guest's thinking pause
3	1:02:15 - 1:02:45	00:30	Mid-recording disconnection/silence

Statistics: Total 3 silence segments, total duration 01:22

Operation:

Delete all silence

Delete silence 1, 3

(keep 2, may be intentional pause)

三、统计

III. Statistics

总句子数: 3390
建议删除: 377 处
静音片段: 3 处（01:22）

Total sentences: 3390
Suggested deletions: 377 instances
Silence segments: 3 instances (01:22)

按类型

By Type

片头寒暄: 31处
闲聊-个人背景: 23处
技术调试: 15处
录制讨论: 6处
隐私-公司名: 5处
隐私-地点: 4处
隐私-学校名: 3处

Opening small talk: 31 instances
Chit-chat - personal background: 23 instances
Technical debugging: 15 instances
Recording discussion: 6 instances
Privacy - company name: 5 instances
Privacy - location: 4 instances
Privacy - school name: 3 instances

四、正文（逐字稿 + 删除标记）

IV. Main Text (Transcript + Deletion Marks)

⚠️ 必须包含完整逐字稿，从第一句到最后一句，不能省略任何内容！

错误做法：

（后续内容为主题讨论，保留...）

← 不允许！正确做法：输出所有句子，无论是否标记删除

完整逐字稿，AI建议删除的内容用 ~~删除线~~ 标记并注明原因。同一说话人的内容连在一起，不逐句换行。

响歌歌 00:00 行呗，对，你们应该没有听到噪音吧。因为我记得好像上一上，上次我们是用这个是有噪音的，就跟那个心理心理咨询那那那次嗯，对，那这次应该好了。

[删除: 片头寒暄]

麦雅 00:23 ~~我把这个这个 dog 打开好哦，~~

[删除: 片头寒暄]

安安 00:27 ~~我也打开一下自我介绍。~~

[删除: 片头寒暄]

...

麦雅 04:50 Hello，大家好，欢迎来到今天的五点一刻，我是主播麦雅。

响歌歌 04:58 我是主播响歌歌。

麦雅 05:02 今天我们请到了一位特别的嘉宾安安。

安安 05:08 大家好，我是安安。

...

安安 15:32 ~~我之前在Google工作的时候~~

[删除: 隐私-公司名]

我之前工作的时候，遇到过类似的情况。

...

麦雅 49:59 ~~这段要不要剪掉？~~

[删除: 录制讨论]

响歌歌 50:02 ~~嗯，回头看看吧。~~

[删除: 录制讨论]

安安 50:05 ~~我觉得可以保留。~~

[删除: 录制讨论]

...

（完整逐字稿继续...）

undefined

⚠️ Must include the full transcript, from the first sentence to the last, no content can be omitted!

Incorrect practice:

(The following content is topic discussion, keep...)

← Not allowed! Correct practice: Output all sentences, regardless of whether they are marked for deletion

Full transcript, content suggested for deletion by AI is marked with ~~strikethrough~~ and the reason is noted. Content from the same speaker is concatenated, no line breaks per sentence.

响歌歌 00:00 ~~Alright, right, you shouldn't hear any noise. Because I remember last time we used this there was noise, just like that psychological counseling session, um, right, this time it should be better.~~

[Delete: Opening small talk]

麦雅 00:23 ~~I'll turn on this dog, okay~~

[Delete: Opening small talk]

安安 00:27 ~~I'll also turn on my self-introduction.~~

[Delete: Opening small talk]

...

麦雅 04:50 Hello everyone, welcome to today's 5:15 Podcast, I'm host 麦雅.

响歌歌 04:58 I'm host 响歌歌.

麦雅 05:02 Today we have a special guest, 安安.

安安 05:08 Hello everyone, I'm 安安.

...

安安 15:32 ~~When I worked at Google before~~

[Delete: Privacy - company name]

When I worked before, I encountered a similar situation.

...

麦雅 49:59 ~~Should we cut this segment?~~

[Delete: Recording discussion]

响歌歌 50:02 ~~Hmm, let's check later.~~

[Delete: Recording discussion]

安安 50:05 ~~I think we can keep it.~~

[Delete: Recording discussion]

...

(Full transcript continues...)

undefined

结构说明

Structure Description

部分	内容	用途
一、内容大纲	话题级表格	快速了解结构，整块删除
二、静音片段	大段空白列表	删除无声段落
三、统计	删除数量按类型汇总	一眼看出删除规模
四、正文	完整逐字稿 + 内联删除标记	查看上下文，逐句审核

Section	Content	Purpose
I. Content Outline	Topic-level table	Quickly understand structure, delete entire blocks
II. Silence Segments	List of long blank segments	Delete silent paragraphs
III. Statistics	Summary of deletion counts by type	Quickly see the scale of deletions
IV. Main Text	Full transcript + inline deletion marks	View context, review sentence by sentence

话题识别规则

Topic Identification Rules

话题类型	识别方式
片头寒暄	正式开场（"大家好"）之前的内容
正式开场	"Hello/大家好" 开始的段落
闲聊	与主题无关的个人背景讨论
主题讨论	围绕播客主题的核心内容
录制讨论	讨论剪辑、内容取舍的段落
片尾	"好，那今天就到这" 等收尾语

Topic Type	Identification Method
Opening small talk	Content before the official opening ("Hello everyone")
Official opening	Paragraph starting with "Hello/Hello everyone"
Chit-chat	Discussion of personal background irrelevant to the topic
Topic discussion	Core content around the podcast topic
Recording discussion	Paragraphs discussing editing, content selection
Closing	Concluding remarks like "Alright, that's it for today"

删除类型

Deletion Types

⚠️ 分工原则

⚠️ Division of Labor Principle

Skill	关注点	处理内容	时间戳粒度
`/podcastcut-content`	内容语义	片头、跑题、隐私、啰嗦、大段静音	句子级
`/podcastcut-transcribe`	口误技术	语气词、口误、短停顿、半句删除	字符级

本 Skill 聚焦内容层面：什么该删、什么该留，是语义判断，删除整句。 口误识别是技术层面：需要更精细的规则（重复字、停顿模式），使用字符级时间戳。

为什么这样分工？

句子级转录 + 说话人分离 = 说话人准确
字符级转录 + 说话人分离 = 说话人容易错位（长音频 OOM，分段后合并对齐困难）
先删大段内容（句子级），再精细处理剩余部分（字符级）

Skill	Focus	Processed Content	Timestamp Granularity
`/podcastcut-content`	Content semantics	Opening, off-topic, privacy, redundancy, long silence	Sentence-level
`/podcastcut-transcribe`	Verbal error technology	Filler words, verbal errors, short pauses, half-sentence deletion	Character-level

This Skill focuses on content level: What to delete and what to keep is a semantic judgment, deleting entire sentences. Verbal error identification is technical level: Requires more fine-grained rules (repeated characters, pause patterns), using character-level timestamps.

Why this division of labor?

Sentence-level transcription + speaker diarization = accurate speaker identification
Character-level transcription + speaker diarization = easy speaker misalignment (OOM for long audio, alignment difficulties after segmentation)
First delete large segments (sentence-level), then process the remaining content (character-level)

内容删除类型（本 Skill 处理）

Content Deletion Types (Processed by This Skill)

类型	标记	示例
片头寒暄	`[删除: 片头寒暄]`	"开始了吗？" "能听到吗？"
片尾闲聊	`[删除: 片尾闲聊]`	"好，那就这样" "拜拜"
录制相关	`[删除: 录制相关]`	"这段重录" "等下剪掉"
跑题内容	`[删除: 跑题]`	与主题无关的讨论
啰嗦重复	`[删除: 啰嗦]`	大段重复表达同一观点
隐私-公司	`[删除: 隐私-公司名]`	"我在Google工作"
隐私-人名	`[删除: 隐私-人名]`	"我同事张三说"
隐私-地点	`[删除: 隐私-地点]`	"我住在xxx"
长静音	审查稿第二部分单独列出	3秒以上的无声片段

Type	Mark	Example
Opening small talk	`[Delete: Opening small talk]`	"Shall we start?" "Can you hear me?"
Closing chit-chat	`[Delete: Closing chit-chat]`	"Alright, that's it" "Bye"
Recording-related	`[Delete: Recording-related]`	"Re-record this segment" "Cut this later"
Off-topic content	`[Delete: Off-topic]`	Discussion irrelevant to the topic
Redundant repetition	`[Delete: Redundant]`	Large segments repeating the same point
Privacy - company	`[Delete: Privacy - company name]`	"I work at Google"
Privacy - personal name	`[Delete: Privacy - personal name]`	"My colleague Zhang San said"
Privacy - location	`[Delete: Privacy - location]`	"I live in xxx"
Long silence	Listed separately in the second part of the review draft	Silence over 3 seconds

口误删除类型（由 /podcastcut-transcribe 处理）

Verbal Error Deletion Types (Processed by /podcastcut-transcribe)

类型	说明
口头禅/语气词	"嗯"、"就是说"、"然后"、"对对对"
口误	说错了重说
短停顿	句中小停顿（< 3秒）

注意：大段静音（≥3秒）由本 Skill 处理，短停顿由

/podcastcut-transcribe

处理。

为什么口头禅不在这里处理？ 口头禅的识别需要更精细的规则（连续重复、停顿模式），属于技术层面而非内容语义。

Type	Description
Fillers/modal particles	"Um", "I mean", "Then", "Right right right"
Verbal errors	Misspoken words and corrections
Short pauses	Small pauses within sentences (< 3 seconds)

Note: Long silence (≥3 seconds) is processed by this Skill, short pauses are processed by

/podcastcut-transcribe

Why not process fillers here? Identification of fillers requires more fine-grained rules (continuous repetition, pause patterns), which is technical rather than content semantic.

AI分析方法

AI Analysis Method

⚠️ 必须使用 Claude 做语义分析

⚠️ Must Use Claude for Semantic Analysis

关键词匹配不够用！ 基于规则的方法无法识别：

语义层面的跑题/闲聊（没有明显关键词）
嘉宾背景介绍后的闲聊（住在哪里、哪年毕业、学校怎么样）
隐藏的录制讨论（没有"剪掉"等关键词）

必须用 Claude 分段分析逐字稿，质量为先。

Keyword matching is not sufficient! Rule-based methods cannot identify:

Semantic off-topic/chit-chat (no obvious keywords)
Chit-chat after guest introduction (where they live, graduation year, school conditions)
Hidden recording discussions (no keywords like "cut")

Must use Claude to analyze the transcript in segments, prioritize quality.

分析流程

Analysis Workflow

1. 将逐字稿按15分钟分段
2. 每段发送给 Claude 分析，识别建议删除的内容
3. Claude 返回：句子索引 + 删除类型 + 原因
4. 合并所有段的结果，生成审查稿

1. Split the transcript into 15-minute segments
2. Send each segment to Claude for analysis, identify content suggested for deletion
3. Claude returns: sentence index + deletion type + reason
4. Merge results from all segments, generate review draft

Claude 分析 Prompt

Claude Analysis Prompt

对每段逐字稿，使用以下 prompt：

你是播客内容审核助手。分析以下逐字稿，识别建议删除的句子。

For each transcript segment, use the following prompt:

You are a podcast content review assistant. Analyze the following transcript and identify sentences suggested for deletion.

删除类型

Deletion Types

片头寒暄：正式开场（"大家好"）之前的闲聊、技术调试
录制讨论：讨论剪辑、录制状态、技术问题、"这段要不要剪"
隐私-公司名：提到具体公司名（Google、Meta、字节等）
隐私-学校名：提到具体学校名（Stanford、清华等）
隐私-地点：提到具体地点（Palo Alto、硅谷等）
隐私-人名：提到具体人名（非公众人物）
跑题/闲聊：与播客主题无关的讨论（个人背景闲聊、地理讨论等）
啰嗦重复：同一观点反复说、大段重复

Opening small talk: Chit-chat, technical debugging before the official opening ("Hello everyone")
Recording discussion: Discussion of editing, recording status, technical issues, "Should we cut this segment"
Privacy - company name: Mention of specific company names (Google, Meta, ByteDance, etc.)
Privacy - school name: Mention of specific school names (Stanford, Tsinghua, etc.)
Privacy - location: Mention of specific locations (Palo Alto, Silicon Valley, etc.)
Privacy - personal name: Mention of specific personal names (non-public figures)
Off-topic/chit-chat: Discussion irrelevant to the podcast topic (personal background chit-chat, geographic discussion, etc.)
Redundant repetition: Repeating the same point multiple times, large segments of repetition

输出格式

Output Format

对每个建议删除的句子，输出：

句子时间戳
删除类型
原因（简短说明）

只标记需要删除的句子，不需要标记的跳过。

For each sentence suggested for deletion, output:

Sentence timestamp
Deletion type
Reason (brief description)

Only mark sentences that need deletion, skip those that don't.

逐字稿

Transcript

{transcript_segment}

undefined

{transcript_segment}

undefined

删除类型详解

Detailed Deletion Type Explanations

类型	识别要点
片头寒暄	正式开场前的所有内容，包括技术调试、聊天
录制讨论	"剪掉"、"录上了吗"、"这段太敏感"、"回头剪"
隐私信息	公司名、学校名、地点、人名
跑题/闲聊	与主题无关：住哪里、哪年来的、学校怎么样
啰嗦重复	同一意思说3遍以上

Type	Identification Key Points
Opening small talk	All content before the official opening, including technical debugging and chit-chat
Recording discussion	"Cut", "Did we record that?", "This is too sensitive", "Cut later"
Privacy information	Company names, school names, locations, personal names
Off-topic/chit-chat	Irrelevant to the topic: where they live, when they arrived, how the school is
Redundant repetition	The same meaning repeated more than 3 times

闲聊检测重点

Chit-chat Detection Focus

嘉宾介绍后的闲聊特别容易漏检，注意这些信号：

突然出现地名、学校名、年份
"你在哪个area" "你是哪年来的" "那边怎么样"
连续多句讨论非主题内容（地理、学校、城市比较）

Chit-chat after guest introduction is particularly easy to miss, watch for these signals:

Sudden appearance of place names, school names, years
"Which area do you live in" "When did you arrive" "How is it over there"
Multiple consecutive sentences discussing non-topic content (geography, school, city comparison)

录制讨论检测重点

Recording Discussion Detection Focus

不只在片头！ 全程可能出现：

技术问题："能听见吗"、"断了"、"耳机没电"
内容顾虑："太低调了"、"不想share"、"细节不说"
剪辑讨论："回头剪掉"、"这段要不要"

Not just at the opening! May appear throughout the podcast:

Technical issues: "Can you hear me", "Disconnected", "Headphone battery dead"
Content concerns: "Too low-key", "Don't want to share", "Don't mention details"
Editing discussion: "Cut later", "Should we keep this"

静音检测方法

Silence Detection Method

使用 FFmpeg 的

silencedetect

滤镜检测大段空白。

Use FFmpeg's

silencedetect

filter to detect large blank segments.

检测命令

Detection Command

bash

ffmpeg -i video.mp4 -af "silencedetect=noise=-40dB:d=3" -f null - 2>&1 | grep silencedetect

参数	说明	推荐值
`noise`	静音阈值（低于此音量视为静音）	-40dB
`d`	最小静音时长（秒）	3（内容剪辑关注大段空白）

bash

ffmpeg -i video.mp4 -af "silencedetect=noise=-40dB:d=3" -f null - 2>&1 | grep silencedetect

Parameter	Description	Recommended Value
`noise`	Silence threshold (volume below this is considered silence)	-40dB
`d`	Minimum silence duration (seconds)	3 (content editing focuses on large blank segments)

输出解析

Output Parsing

[silencedetect @ 0x...] silence_start: 752.341
[silencedetect @ 0x...] silence_end: 766.512 | silence_duration: 14.171

解析

silence_start

和

silence_end

生成静音片段列表。

[silencedetect @ 0x...] silence_start: 752.341
[silencedetect @ 0x...] silence_end: 766.512 | silence_duration: 14.171

Parse

silence_start

and

silence_end

to generate the list of silence segments.

阈值选择

Threshold Selection

场景	noise	d（最小时长）
内容剪辑（本Skill）	-40dB	3秒
口误识别（精细）	-50dB	0.5秒

为什么用 3 秒？ 短于 3 秒的停顿可能是自然的思考间隙，不建议删除。

Scenario	noise	d (minimum duration)
Content editing (this Skill)	-40dB	3 seconds
Verbal error identification (fine-grained)	-50dB	0.5 seconds

Why use 3 seconds? Pauses shorter than 3 seconds may be natural thinking gaps, not recommended for deletion.

输出文件

Output Files

podcast_transcript.json        # 句子级时间戳 + 说话人（供剪辑使用）
podcast_审查稿.md              # 审查稿（包含完整逐字稿 + 删除标记）

⚠️ 只输出审查稿，不单独输出逐字稿

审查稿第四部分"正文"就是完整逐字稿，无需重复输出。

podcast_transcript.json        # Sentence-level timestamps + speakers (for editing use)
podcast_审查稿.md              # Review draft (includes full transcript + deletion marks)

⚠️ Only output the review draft, no separate transcript file

The fourth section "Main Text" of the review draft is the full transcript, no need to output separately.

句子级 JSON 格式

Sentence-level JSON Format

json

{
  "file": "podcast.mp3",
  "duration": 3600.5,
  "sentences": [
    {"text": "大家好，", "start": 0.50, "end": 1.20, "spk": 0},
    {"text": "欢迎来到今天的播客。", "start": 1.20, "end": 2.80, "spk": 0},
    {"text": "我是主播小明。", "start": 2.80, "end": 3.90, "spk": 1},
    ...
  ]
}

json

{
  "file": "podcast.mp3",
  "duration": 3600.5,
  "sentences": [
    {"text": "Hello everyone,", "start": 0.50, "end": 1.20, "spk": 0},
    {"text": "Welcome to today's podcast.", "start": 1.20, "end": 2.80, "spk": 0},
    {"text": "I'm host Xiao Ming.", "start": 2.80, "end": 3.90, "spk": 1},
    ...
  ]
}

⚠️ 审查稿和删除清单必须同步

⚠️ Review Draft and Deletion List Must Be Synchronized

用户可能直接在审查稿中修改删除标记（添加/移除删除线），此时删除清单会过时。

规则：

审查稿是用户审核的最终来源
执行剪辑前，从审查稿重新解析删除标记
不要依赖可能过时的
```
podcast_删除清单.json
```

解析方法：扫描审查稿中

~~删除线~~

标记的文本，匹配 transcript.json 中的时间戳。

Users may directly modify deletion marks in the review draft (add/remove strikethrough), making the deletion list outdated.

Rules:

The review draft is the final source for user review
Before executing editing, re-parse deletion marks from the review draft
Do not rely on the potentially outdated
```
podcast_删除清单.json
```

Parsing Method: Scan the text marked with

~~strikethrough~~

in the review draft and match the timestamps in transcript.json.

与其他 Skill 的关系

Relationship with Other Skills

/podcastcut-content     → 内容剪辑（语义层面）← 本 Skill
/podcastcut-edit        → 执行剪辑
/podcastcut-transcribe  → 口误识别（技术层面，可选）
/podcastcut-subtitle    → 生成字幕

推荐流程：

原始视频
    ↓
/podcastcut-content  ← 标记大段内容（寒暄、跑题、啰嗦、隐私）
    ↓
/podcastcut-edit     ← 执行删除，输出 v2
    ↓
【可选】还需要处理口误？
    ↓ 是
/podcastcut-transcribe  ← 识别口误、语气词、静音
    ↓
/podcastcut-edit        ← 执行删除，输出 v3
    ↓
完成

为什么先删内容再处理口误？

大段内容删除后，视频变短
口误识别的转录更快，审查范围更小
被删掉的大段里的口误不用处理了

/podcastcut-content     → Content editing (semantic level) ← This Skill
/podcastcut-edit        → Execute editing
/podcastcut-transcribe  → Verbal error identification (technical level, optional)
/podcastcut-subtitle    → Generate subtitles

Recommended Workflow:

Original video
    ↓
/podcastcut-content  ← Mark large segments (small talk, off-topic, redundancy, privacy)
    ↓
/podcastcut-edit     ← Execute deletion, output v2
    ↓
【Optional】Need to process verbal errors?
    ↓ Yes
/podcastcut-transcribe  ← Identify verbal errors, filler words, silence
    ↓
/podcastcut-edit        ← Execute deletion, output v3
    ↓
Completed

Why delete content first then process verbal errors?

After deleting large segments, the video becomes shorter
Verbal error identification transcription is faster, review scope is smaller
No need to process verbal errors in deleted large segments

说话人分离

Speaker Diarization

FunASR 内置说话人分离功能（

spk_model="cam++"

），自动输出说话人ID。

FunASR has built-in speaker diarization functionality (

spk_model="cam++"

), automatically outputting speaker IDs.

流程

Workflow

FunASR 转录（启用 spk_model）
    ↓
输出带说话人ID的句子（说话人0、说话人1...）
    ↓
搜索自我介绍确认 ID 对应的真实人名
    ↓
生成审查稿时替换为真实名字

FunASR transcription (enable spk_model)
    ↓
Output sentences with speaker IDs (speaker 0, speaker 1...)
    ↓
Search self-introduction phrases to confirm the real name corresponding to the ID
    ↓
Replace with real names when generating the review draft

⚠️ 说话人映射确认方法

⚠️ Speaker Mapping Confirmation Method

不要直接用用户提供的顺序！ 必须在转录结果中搜索自我介绍短语确认：

python

undefined

Do not directly use the order provided by the user! Must search self-introduction phrases in the transcription results to confirm:

python

undefined

搜索关键短语确定说话人映射

Search key phrases to determine speaker mapping

key_phrases = ["我是主播", "我是xxx", "大家好我是"] for s in sentences: for phrase in key_phrases: if phrase in s['text']: print(f"spk{s['spk']}: {s['text']}") # 确认 spk ID 对应谁

undefined

key_phrases = ["I'm the host", "I'm xxx", "Hello everyone, I'm"] for s in sentences: for phrase in key_phrases: if phrase in s['text']: print(f"spk{s['spk']}: {s['text']}") # Confirm who the spk ID corresponds to

undefined

常见问题

Common Issues

问题	原因	解决
同一人被分成多个 ID	FunASR 识别不稳定	将多个 ID 映射到同一人名
ID 数量多于实际人数	如上	根据自我介绍合并多余 ID
分段转录后 ID 错位	每段 ID 独立重置	优先用整体转录，避免分段

Issue	Cause	Solution
Same person divided into multiple IDs	FunASR identification instability	Map multiple IDs to the same name
More IDs than actual speakers	As above	Merge redundant IDs based on self-introduction
Speaker ID misalignment after segmented transcription	Each segment's ID resets independently	Prioritize full transcription, avoid segmentation

限制

Limitations

条件	效果
2-10人对话	效果好
音频 < 30s	效果下降
超过10人	效果下降
分段转录	说话人 ID 可能不一致

Condition	Effect
2-10 person conversation	Good effect
Audio < 30s	Decreased effect
More than 10 people	Decreased effect
Segmented transcription	Speaker IDs may be inconsistent

进度 TodoList

Progress TodoList

启动时创建：

- [ ] 转录音频（FunASR，句子级 + 说话人分离）
- [ ] 静音检测（FFmpeg silencedetect）
- [ ] 生成逐字稿
- [ ] AI分析：识别话题结构 + 标记建议删除
- [ ] 输出审查稿（含静音片段）
- [ ] 等待用户确认

Create at startup:

- [ ] Transcribe audio (FunASR, sentence-level + speaker diarization)
- [ ] Silence detection (FFmpeg silencedetect)
- [ ] Generate transcript
- [ ] AI analysis: Identify topic structure + mark suggested deletions
- [ ] Output review draft (including silence segments)
- [ ] Wait for user confirmation

示例对话

Example Dialogue

用户: 帮我剪掉播客里的废话，说话人是 Maia 和响歌歌

AI: 好的，我来处理这个播客。
    1. 转录音频...
    2. 静音检测...
    3. 生成逐字稿...
    4. 分析内容...

    已生成审查稿：podcast_审查稿.md

    === 内容大纲（话题级）===
    | # | 话题 | 时长 | AI建议 |
    |---|------|------|--------|
    | 1 | 片头寒暄 | 04:45 | 🗑️ 删除 |
    | 2 | 正式开场 | 02:15 | ✅ 保留 |
    | 3 | 闲聊：嘉宾背景 | 01:13 | 🗑️ 删除 |
    | 4 | 主题讨论 | 32:59 | ✅ 保留 |

    === 句子级统计（按类型）===
    - 片头寒暄：12处
    - 录制讨论：8处
    - 隐私信息：5处
    - 跑题闲聊：3处

    请在审查稿中检查删除标记，调整后告诉我执行剪辑。

用户: [在审查稿中添加/移除了一些删除标记] 好了，按审查稿剪

AI: 好的，从审查稿解析删除标记...
    - 找到 25 处删除标记
    - 删除总时长：06:32
    执行剪辑...

User: Help me cut the nonsense in the podcast, speakers are Maia and 响歌歌

AI: Alright, I'll process this podcast.
    1. Transcribing audio...
    2. Detecting silence...
    3. Generating transcript...
    4. Analyzing content...

    Review draft generated: podcast_审查稿.md

    === Content Outline (Topic-level) ===
    | # | Topic | Duration | AI Suggestion |
    |---|------|------|--------|
    | 1 | Opening small talk | 04:45 | 🗑️ Delete |
    | 2 | Official opening | 02:15 | ✅ Keep |
    | 3 | Chit-chat: Guest background | 01:13 | 🗑️ Delete |
    | 4 | Topic discussion | 32:59 | ✅ Keep |

    === Sentence-level Statistics (By Type) ===
    - Opening small talk: 12 instances
    - Recording discussion: 8 instances
    - Privacy information: 5 instances
    - Off-topic chit-chat: 3 instances

    Please check the deletion marks in the review draft, and tell me to execute editing after adjustment.

User: [Added/removed some deletion marks in the review draft] Alright, cut according to the review draft

AI: Alright, parsing deletion marks from the review draft...
    - Found 25 deletion marks
    - Total deletion duration: 06:32
    Executing editing...

反馈记录

Feedback Records

2026-01-31 (深夜)

2026-01-31 (Late Night)

分段转录导致说话人 ID 错位：分段转录时每段的说话人 ID 独立重置，合并后同一人可能有不同 ID
- 原因：为避免 OOM 将 2 小时音频分成 13 段，每段说话人 ID 从 0 开始
- 解决：优先使用整体转录（2 小时音频约 16 分钟，不会 OOM）
- 已更新：tips/转录最佳实践.md 新增「分段 vs 整体转录」章节
FunASR 可能把同一人识别为多个 ID：3 人对话识别出 4 个说话人 ID
- 表现：响歌歌被分成 spk1（60 句）和 spk3（560 句）
- 解决：搜索自我介绍短语（"我是主播xxx"）确认映射，将多个 ID 合并
- 已更新：SKILL.md「说话人分离」章节新增确认方法和常见问题
性能数据更新：2 小时播客实测 16 分钟，3390 句（之前估算 12 分钟、800 句偏低）
- 已更新：tips/转录最佳实践.md 性能参考表

Speaker ID misalignment caused by segmented transcription: Speaker IDs reset independently for each segment during segmented transcription, resulting in different IDs for the same person after merging
- Cause: 2-hour audio split into 13 segments to avoid OOM, each segment's speaker ID starts from 0
- Solution: Prioritize full transcription (2-hour audio takes about 16 minutes, no OOM)
- Updated: Added "Segmented vs Full Transcription" section to tips/转录最佳实践.md
FunASR may identify the same person as multiple IDs: 3-person conversation identified as 4 speaker IDs
- Performance: 响歌歌 was split into spk1 (60 sentences) and spk3 (560 sentences)
- Solution: Search self-introduction phrases ("I'm host xxx") to confirm mapping, merge multiple IDs
- Updated: Added confirmation method and common issues to the "Speaker Diarization" section of SKILL.md
Updated performance data: 2-hour podcast tested to take 16 minutes, 3390 sentences (previous estimate of 12 minutes, 800 sentences was low)
- Updated: Performance reference table in tips/转录最佳实践.md

2026-01-31 (晚上)

2026-01-31 (Evening)

审查稿内容不完整：AI 偷懒写了 "(后续内容为主题讨论，保留...)" 而不是完整逐字稿
- 已更新：在审查稿格式"第四部分"明确标注必须输出完整内容，不允许省略
输出了多余的逐字稿文件：用户只需要一个审查稿（已包含完整逐字稿）
- 已更新：输出文件章节移除
```
podcast_逐字稿.md
```
  ，明确只输出审查稿

Incomplete review draft content: AI took a shortcut and wrote "(The following content is topic discussion, keep...)" instead of the full transcript
- Updated: Clearly marked in the "Section IV" of the review draft format that full content must be output, no omissions allowed
Redundant transcript file output: Users only need one review draft (which already includes the full transcript)
- Updated: Removed
```
podcast_逐字稿.md
```
  from the output files section, clearly stated to only output the review draft

2026-01-31

删除"用户确认方式"章节：用户实际操作是直接在审查稿上修改删除标记，不需要命令式操作
- 旧流程：用户输入「删除话题 1, 3」「删除所有静音」等命令
- 实际流程：用户在审查稿中添加/移除
```
~~删除线~~
```
  ，然后说「按审查稿剪」
- 已更新：流程图、示例对话，移除命令式操作说明
FunASR 调用参数错误导致转录失败：使用简化模型名无法获取
```
sentence_info
```
- 错误写法：
```
model="paraformer-zh"
```
  +
```
spk_model="cam++"
```
  +
```
sentence_timestamp=True
```
- 正确写法：必须使用完整模型路径 + VAD + Punc 四个模型
- 原因：SKILL.md 只写了简化参数，实际执行时"自由发挥"用了错误的 API
- 已更新：在 SKILL.md 中直接包含完整的调用代码，标注错误写法和正确写法对比
- 教训：可执行的代码必须完整写在 SKILL.md 中，不能只写参数名让 AI 自己拼

Deleted "User Confirmation Method" section: Users actually operate by directly modifying deletion marks in the review draft, no need for command-based operations
- Old workflow: Users input commands like "Delete topics 1, 3" "Delete all silence"
- Actual workflow: Users add/remove
```
~~strikethrough~~
```
  in the review draft, then say "Cut according to the review draft"
- Updated: Flowchart, example dialogue, removed command-based operation instructions
FunASR call parameter error caused transcription failure: Using simplified model names cannot obtain
```
sentence_info
```
- Incorrect writing:
```
model="paraformer-zh"
```
  +
```
spk_model="cam++"
```
  +
```
sentence_timestamp=True
```
- Correct writing: Must use full model path + VAD + Punc + Speaker four models
- Cause: SKILL.md only wrote simplified parameters, actual execution used incorrect API due to "free play"
- Updated: Included complete call code in SKILL.md, marked comparison between incorrect and correct writing
- Lesson: Executable code must be fully written in SKILL.md, cannot only write parameter names and let AI assemble it

2026-01-25

回退到句子级时间戳：字符级 + 说话人分离在长音频上不稳定
- 问题：字符级转录后合并说话人信息，说话人对齐出错（"是主播麦雅" 被归到响歌歌名下）
- 原因：
  1. 长音频说话人分离 OOM（2小时音频 → 234MB WAV）
  2. 分段说话人分离返回 0 句子（API 格式问题）
  3. 字符级转录没有标点，句子边界不自然
- 已更新：回退到句子级时间戳，本 Skill 只删整句
- 半句删除留给
```
/podcastcut-transcribe
```
  （字符级）

Reverted to sentence-level timestamps: Character-level + speaker diarization is unstable for long audio
- Issue: After character-level transcription and merging speaker information, speaker alignment errors occurred ("I'm host 麦雅" was attributed to 响歌歌)
- Cause:
  1. Speaker diarization OOM for long audio (2-hour audio → 234MB WAV)
  2. Segmented speaker diarization returned 0 sentences (API format issue)
  3. Character-level transcription has no punctuation, sentence boundaries are unnatural
- Updated: Reverted to sentence-level timestamps, this Skill only deletes entire sentences
- Half-sentence deletion is left to
```
/podcastcut-transcribe
```
  (character-level)

2026-01-24 (晚上)

2026-01-24 (Evening)

尝试升级到字符级时间戳：解决句子级无法精确删除部分句子的问题
- 问题：删除 "嗯，我可以讲一下对" 会连带删除后半句 "这期嘉宾其实想哥哥邀请的"
- 尝试：使用 30s 分段 +
```
timestamp_granularity="character"
```
  获取字符级时间戳
- 结果：字符级转录成功，但说话人分离失败，导致说话人对齐错误
- 最终决定：回退到句子级（见 2026-01-25 反馈）

Attempted to upgrade to character-level timestamps: Solved the problem that sentence-level cannot accurately delete parts of sentences
- Issue: Deleting "Um, I can talk about why" would also delete the second half "This episode's guest was actually invited by 响歌歌"
- Attempt: Use 30s segmentation +
```
timestamp_granularity="character"
```
  to obtain character-level timestamps
- Result: Character-level transcription succeeded, but speaker diarization failed, leading to speaker alignment errors
- Final decision: Revert to sentence-level (see 2026-01-25 feedback)

2026-01-24

新增静音检测功能：使用 FFmpeg silencedetect 识别大段空白（≥3秒），在审查稿中单独列出供用户确认删除
审查稿标记删除但实际没剪掉：审查稿中整句标记删除，但删除清单只有部分内容
- 案例：审查稿
```
~~嗯，这个要就是具体为什么...~~
```
  ，删除清单只有
```
嗯，
```
- 已更新：强调审查稿和删除清单必须同步，执行剪辑前从审查稿重新解析

Added silence detection function: Use FFmpeg silencedetect to identify long blank segments (≥3 seconds), listed separately in the review draft for user confirmation to delete
Review draft marked for deletion but not actually cut: Entire sentences marked for deletion in the review draft, but only part of the content was in the deletion list
- Case: Review draft
```
~~Um, this is why specifically...~~
```
  , deletion list only had
```
Um,
```
- Updated: Emphasized that the review draft and deletion list must be synchronized, re-parse from the review draft before executing editing

2026-01-18 (下午)

2026-01-18 (Afternoon)

逐字稿/审查稿格式调整：同一说话人的内容连在一起，不逐句换行
- 原格式：每句一行
- 新格式：同一说话人的所有句子连在一段里
- 优点：更紧凑，阅读体验更好

Adjusted transcript/review draft format: Content from the same speaker is concatenated, no line breaks per sentence
- Original format: One line per sentence
- New format: All sentences from the same speaker are in one paragraph
- Advantage: More compact, better reading experience

2026-01-18

必须使用 Claude 做语义分析：基于规则的关键词匹配质量不够
- 问题：无法识别语义层面的跑题/闲聊、隐藏的录制讨论
- 已更新：新增「AI分析方法」章节，明确必须用 Claude 分段分析逐字稿
- 包含：分析流程、Claude prompt 模板、删除类型详解

Must use Claude for semantic analysis: Rule-based keyword matching is not sufficient in quality
- Issue: Cannot identify semantic off-topic/chit-chat, hidden recording discussions
- Updated: Added "AI Analysis Method" section, clearly stated that Claude must be used to analyze the transcript in segments
- Included: Analysis workflow, Claude prompt template, detailed deletion type explanations

2026-01-17 (晚上)

2026-01-17 (Evening)

审查稿第二部分格式调整：改为完整逐字稿 + 内联删除标记
- 原格式：按话题分组 → 每个话题下列出删除建议（表格形式）
- 新格式：完整逐字稿（正文），删除内容用
```
~~删除线~~
```
  +
```
[删除: 原因]
```
  内联标记
- 优点：保留完整上下文，用户可以看到前后文再决定是否删除

Adjusted format of Section II of the review draft: Changed to full transcript + inline deletion marks
- Original format: Grouped by topic → deletion suggestions listed under each topic (table form)
- New format: Full transcript (main text), deleted content marked with
```
~~strikethrough~~
```
  +
```
[Delete: Reason]
```
  inline
- Advantage: Retains full context, users can see the context before deciding whether to delete

2026-01-17 (下午)

2026-01-17 (Afternoon)

大块删除剪不干净：连续句子都标记删除，但剪辑时逐句删除，保留了句间空白
- 原因：每句独立处理，没有合并连续同理由的删除
- 已更新：明确分工，本 Skill 聚焦句子级，语气词/口头禅由
```
/podcastcut-transcribe
```
  处理
- 剪辑规则已同步更新到
```
/podcastcut-edit
```
不要在句子级标记语气词：句子级时间戳不够精确，删语气词容易误删
- 已更新：删除类型分为「句子级」和「字符级」，明确分工

Large segment deletion not clean: Consecutive sentences marked for deletion, but sentence-by-sentence deletion retained blank spaces between sentences
- Cause: Each sentence processed independently, no merging of consecutive deletions with the same reason
- Updated: Clarified division of labor, this Skill focuses on sentence-level, filler words/verbal fillers processed by
```
/podcastcut-transcribe
```
- Editing rules have been synchronized to
```
/podcastcut-edit
```
Do not mark filler words at sentence-level: Sentence-level timestamps are not precise enough, deleting filler words is easy to cause accidental deletion
- Updated: Divided deletion types into "sentence-level" and "character-level", clarified division of labor

2026-01-17 (上午)

2026-01-17 (Morning)

录制讨论和技术调试可能出现在播客任何位置，不只是片头
- 已更新：录制相关检测改为全程检测，增加技术问题关键词和连续段落检测
嘉宾介绍后的闲聊（住在哪、哪年来的、学校怎么样）容易漏检
- 已更新：跑题/闲聊检测增加信号模式（地名、学校名、年份相关对话）
逐句审核效率低，用户希望能看到全局结构、整块删除
- 已新增：审查稿整合话题级大纲和句子级删除建议，一个文件完成审核

Recording discussions and technical debugging may appear anywhere in the podcast, not just at the opening
- Updated: Recording-related detection changed to full-process detection, added technical issue keywords and continuous paragraph detection
Chit-chat after guest introduction (where they live, when they arrived, how the school is) is easy to miss
- Updated: Added signal patterns (place names, school names, year-related conversations) to off-topic/chit-chat detection
Sentence-by-sentence review is inefficient, users want to see the global structure and delete entire blocks
- Added: Review draft integrates topic-level outline and sentence-level deletion suggestions, complete review in one file