clipify-video-clip-generator
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseClipify Video Clip Generator
Clipify 短视频片段生成工具
Skill by ara.so — Devtools Skills collection.
Clipify is a Claude Code skill that automatically turns long-form videos into social-ready clips. It transcribes video, identifies clip-worthy moments, reframes 16:9 to 9:16 with face-tracking pans, and burns opus-style word-by-word captions.
Key capabilities:
- Auto-detect punchlines, reversals, and awkward pauses via Whisper transcription
- Face-tracking pan for 9:16 vertical clips (no ML models — uses motion energy)
- Opus/karaoke/minimal subtitle styles with word-level highlighting
- Hardware-accelerated rendering (VideoToolbox on macOS)
- ~20s render time for 20s clips on Apple Silicon
由 ara.so 开发的技能 —— Devtools Skills 合集。
Clipify 是一款 Claude Code 技能,可自动将长视频转换为适合社交媒体发布的短视频片段。它能转录视频内容、识别值得剪辑的片段、通过人脸追踪平移将16:9画面重构图为9:16竖屏,并添加Opus风格的逐词高亮字幕。
核心功能:
- 通过Whisper转录自动识别笑点、反转情节和尴尬停顿
- 针对9:16竖屏片段的人脸追踪平移(无需机器学习模型——使用运动能量算法)
- 支持Opus/卡拉OK/极简等字幕样式,带逐词高亮效果
- 硬件加速渲染(macOS平台使用VideoToolbox)
- Apple Silicon设备上,20秒片段的渲染时间约为20秒
Installation
安装步骤
bash
undefinedbash
undefinedClone to Claude Code skills directory
克隆到Claude Code技能目录
git clone https://github.com/louisedesadeleer/clipify.git ~/.claude/skills/clipify
git clone https://github.com/louisedesadeleer/clipify.git ~/.claude/skills/clipify
Install dependencies
安装依赖
brew install ffmpeg
pip install openai-whisper numpy
**Requirements:**
- macOS (or Linux/Windows with `-hwaccel videotoolbox` removed from SKILL.md)
- ffmpeg with libx264
- Python 3 with numpy
- Whisper (openai-whisper)
Restart Claude Code after installation. The `/clipify` slash command will be available.brew install ffmpeg
pip install openai-whisper numpy
**系统要求:**
- macOS(Linux/Windows用户需移除SKILL.md中的`-hwaccel videotoolbox`参数)
- 带libx264的ffmpeg
- 安装了numpy的Python 3
- Whisper(openai-whisper)
安装完成后重启Claude Code,即可使用`/clipify`命令。Usage Workflow
使用流程
1. Invoke the skill
1. 调用技能
In Claude Code:
/clipifyProvide the path to your source video when prompted:
/path/to/long-interview.mp4在Claude Code中输入:
/clipify根据提示提供源视频路径:
/path/to/long-interview.mp42. Review proposed clips
2. 查看推荐片段
Clipify transcribes the video and proposes 3-5 candidates with:
- Timestamp range
- Title/description
- Reason (punchline, reversal, audio peak, awkward pause)
Example output:
Clip 1: "The worst product advice" (02:34 - 02:51)
Reason: Reversal after awkward pause
Clip 2: "We burned $2M on this" (08:12 - 08:29)
Reason: Audio peak + punchline
Clip 3: "My co-founder quit on Zoom" (15:03 - 15:24)
Reason: PunchlineClipify会转录视频并推荐3-5个候选片段,包含:
- 时间戳范围
- 标题/描述
- 推荐理由(笑点、反转、音频峰值、尴尬停顿)
示例输出:
片段1:"最糟糕的产品建议"(02:34 - 02:51)
理由:尴尬停顿后的反转情节
片段2:"我们在这上面烧了200万美元"(08:12 - 08:29)
理由:音频峰值+笑点
片段3:"我的联合创始人在Zoom上辞职了"(15:03 - 15:24)
理由:笑点3. Select clip and format
3. 选择片段和格式
Choose which clip to cut, then specify:
- Aspect ratio: 9:16 (vertical), 16:9 (horizontal), 1:1 (square)
- Reframe style (if 9:16 from 16:9 with two speakers): pan (follow speaker) or split-screen
- Subtitle style: opus (bold white + yellow highlight), karaoke (word-by-word), minimal, or paste reference image
选择要剪辑的片段,然后指定:
- 宽高比:9:16(竖屏)、16:9(横屏)、1:1(方形)
- 重构图样式(若从16:9转换为9:16且有两位发言者):平移(跟随发言者)或分屏
- 字幕样式:Opus(白色粗体+黄色高亮)、卡拉OK(逐词显示)、极简,或粘贴参考图片
4. Output
4. 输出结果
Final clips are saved to:
<source-video-dir>/clipify_out/clip_<timestamp>.mp4最终片段将保存至:
<源视频目录>/clipify_out/clip_<时间戳>.mp4Scripts Reference
脚本参考
Clipify uses standalone Python scripts for each processing step. You can call these directly for custom workflows.
Clipify使用独立的Python脚本处理每个步骤,你可以直接调用这些脚本实现自定义工作流。
analyze.py — Speaker timeline from motion energy
analyze.py —— 基于运动能量生成发言者时间线
python
undefinedpython
undefinedGenerate motion energy files for two face regions
为两个面部区域生成运动能量文件
ffmpeg -i video.mp4 -vf "crop=300:200:100:50,format=gray,tblend=all_mode=difference"
-f rawvideo -pix_fmt gray - | python scripts/analyze.py motion_left.bin
-f rawvideo -pix_fmt gray - | python scripts/analyze.py motion_left.bin
ffmpeg -i video.mp4 -vf "crop=300:200:1000:50,format=gray,tblend=all_mode=difference"
-f rawvideo -pix_fmt gray - | python scripts/analyze.py motion_right.bin
-f rawvideo -pix_fmt gray - | python scripts/analyze.py motion_right.bin
ffmpeg -i video.mp4 -vf "crop=300:200:100:50,format=gray,tblend=all_mode=difference"
-f rawvideo -pix_fmt gray - | python scripts/analyze.py motion_left.bin
-f rawvideo -pix_fmt gray - | python scripts/analyze.py motion_left.bin
ffmpeg -i video.mp4 -vf "crop=300:200:1000:50,format=gray,tblend=all_mode=difference"
-f rawvideo -pix_fmt gray - | python scripts/analyze.py motion_right.bin
-f rawvideo -pix_fmt gray - | python scripts/analyze.py motion_right.bin
Analyze both to generate speaker timeline
分析两个文件生成发言者时间线
python scripts/analyze.py motion_left.bin motion_right.bin --fps 30 > timeline.txt
**Output format (timeline.txt):**0.00-2.34:left
2.34-5.67:right
5.67-8.12:left
undefinedpython scripts/analyze.py motion_left.bin motion_right.bin --fps 30 > timeline.txt
**输出格式(timeline.txt):**0.00-2.34:left
2.34-5.67:right
5.67-8.12:left
undefinedbuild_pan.py — Generate ffmpeg crop expression
build_pan.py —— 生成ffmpeg裁剪表达式
python
undefinedpython
undefinedFrom speaker timeline, build hard-cut pan expression
根据发言者时间线生成硬切平移表达式
python scripts/build_pan.py timeline.txt --left-x 100 --right-x 1000 --width 608 > pan_expr.txt
python scripts/build_pan.py timeline.txt --left-x 100 --right-x 1000 --width 608 > pan_expr.txt
Use in ffmpeg crop filter
在ffmpeg裁剪滤镜中使用
ffmpeg -i source.mp4 -vf "crop=608:1080:'$(cat pan_expr.txt)':0" output.mp4
**Arguments:**
- `--left-x`: X coordinate of left speaker's face center
- `--right-x`: X coordinate of right speaker's face center
- `--width`: Width of the 9:16 crop window (e.g., 608 for 1080p)
**Output:** ffmpeg expression string like:if(between(t,0,2.34),100,if(between(t,2.34,5.67),1000,if(between(t,5.67,8.12),100,1000)))
undefinedffmpeg -i source.mp4 -vf "crop=608:1080:'$(cat pan_expr.txt)':0" output.mp4
**参数说明:**
- `--left-x`:左侧发言者面部中心的X坐标
- `--right-x`:右侧发言者面部中心的X坐标
- `--width`:9:16裁剪窗口的宽度(例如1080p为608)
**输出:** ffmpeg表达式字符串,示例如下:if(between(t,0,2.34),100,if(between(t,2.34,5.67),1000,if(between(t,5.67,8.12),100,1000)))
undefinedbuild_ass.py — Generate ASS subtitle file
build_ass.py —— 生成ASS字幕文件
python
undefinedpython
undefinedFrom Whisper JSON output, create opus-style captions
根据Whisper JSON输出创建Opus风格字幕
python scripts/build_ass.py whisper_output.json --style opus > captions.ass
python scripts/build_ass.py whisper_output.json --style opus > captions.ass
Burn into video
嵌入到视频中
ffmpeg -i video.mp4 -vf "ass=captions.ass" output.mp4
**Whisper JSON format (input):**
```json
{
"segments": [
{
"start": 0.5,
"end": 2.3,
"text": "This is the worst advice",
"words": [
{"word": "This", "start": 0.5, "end": 0.7},
{"word": "is", "start": 0.7, "end": 0.85},
{"word": "the", "start": 0.85, "end": 1.0},
{"word": "worst", "start": 1.0, "end": 1.4},
{"word": "advice", "start": 1.4, "end": 2.3}
]
}
]
}Styles:
- : Bold white text, yellow active-word highlight, centered top
opus - : Word-by-word color change, bottom positioned
karaoke - : Clean white text, no highlights
minimal
ffmpeg -i video.mp4 -vf "ass=captions.ass" output.mp4
**Whisper JSON格式(输入):**
```json
{
"segments": [
{
"start": 0.5,
"end": 2.3,
"text": "This is the worst advice",
"words": [
{"word": "This", "start": 0.5, "end": 0.7},
{"word": "is", "start": 0.7, "end": 0.85},
{"word": "the", "start": 0.85, "end": 1.0},
{"word": "worst", "start": 1.0, "end": 1.4},
{"word": "advice", "start": 1.4, "end": 2.3}
]
}
]
}字幕样式:
- :白色粗体文本,当前单词黄色高亮,顶部居中
opus - :逐词变色,位于底部
karaoke - :简洁白色文本,无高亮
minimal
audio_align.py — Find clip offset in source
audio_align.py —— 查找片段在源视频中的偏移位置
python
undefinedpython
undefinedFind where a 20s clip appears in a 2-hour source video
查找20秒片段在2小时源视频中的位置
python scripts/audio_align.py source.mp4 clip.mp4
python scripts/audio_align.py source.mp4 clip.mp4
Output: 00:15:34.2 (offset timestamp)
输出:00:15:34.2(偏移时间戳)
Uses audio cross-correlation. Useful for re-linking edited clips to source timestamps.
使用音频互相关算法,适用于将编辑后的片段重新关联到源视频时间戳。Common Patterns
常见使用场景
Extract clip manually (without auto-detection)
手动提取片段(无需自动检测)
bash
undefinedbash
undefined1. Transcribe with Whisper
1. 使用Whisper转录
whisper source.mp4 --model base --output_format json --output_dir ./
whisper source.mp4 --model base --output_format json --output_dir ./
2. Cut segment (03:15 to 03:42)
2. 剪辑片段(03:15至03:42)
ffmpeg -i source.mp4 -ss 00:03:15 -to 00:03:42 -c copy raw_clip.mp4
ffmpeg -i source.mp4 -ss 00:03:15 -to 00:03:42 -c copy raw_clip.mp4
3. Generate captions for this segment
3. 为该片段生成字幕
python scripts/build_ass.py source.json --start 195 --end 222 --style opus > clip.ass
python scripts/build_ass.py source.json --start 195 --end 222 --style opus > clip.ass
4. Reframe to 9:16 with center crop (no pan)
4. 重构图为9:16(居中裁剪,无平移)
ffmpeg -i raw_clip.mp4 -vf "crop=608:1080:656:0,ass=clip.ass" final_clip.mp4
undefinedffmpeg -i raw_clip.mp4 -vf "crop=608:1080:656:0,ass=clip.ass" final_clip.mp4
undefinedTwo-speaker pan with manual face coordinates
双发言者平移(手动指定面部坐标)
bash
undefinedbash
undefined1. Identify face regions on a sample frame
1. 在样本帧上识别面部区域
ffplay -ss 00:01:00 source.mp4 # visual inspection
ffplay -ss 00:01:00 source.mp4 # 可视化检查
Left face: x=200, y=100, width=300, height=200
左侧面部:x=200, y=100, width=300, height=200
Right face: x=1100, y=100, width=300, height=200
右侧面部:x=1100, y=100, width=300, height=200
2. Generate motion energy files
2. 生成运动能量文件
ffmpeg -i source.mp4 -ss 00:03:15 -to 00:03:42
-vf "crop=300:200:200:100,format=gray,tblend=all_mode=difference"
-f rawvideo -pix_fmt gray motion_left.bin
-vf "crop=300:200:200:100,format=gray,tblend=all_mode=difference"
-f rawvideo -pix_fmt gray motion_left.bin
ffmpeg -i source.mp4 -ss 00:03:15 -to 00:03:42
-vf "crop=300:200:1100:100,format=gray,tblend=all_mode=difference"
-f rawvideo -pix_fmt gray motion_right.bin
-vf "crop=300:200:1100:100,format=gray,tblend=all_mode=difference"
-f rawvideo -pix_fmt gray motion_right.bin
ffmpeg -i source.mp4 -ss 00:03:15 -to 00:03:42
-vf "crop=300:200:200:100,format=gray,tblend=all_mode=difference"
-f rawvideo -pix_fmt gray motion_left.bin
-vf "crop=300:200:200:100,format=gray,tblend=all_mode=difference"
-f rawvideo -pix_fmt gray motion_left.bin
ffmpeg -i source.mp4 -ss 00:03:15 -to 00:03:42
-vf "crop=300:200:1100:100,format=gray,tblend=all_mode=difference"
-f rawvideo -pix_fmt gray motion_right.bin
-vf "crop=300:200:1100:100,format=gray,tblend=all_mode=difference"
-f rawvideo -pix_fmt gray motion_right.bin
3. Analyze speaker timeline
3. 分析发言者时间线
python scripts/analyze.py motion_left.bin motion_right.bin --fps 30 > timeline.txt
python scripts/analyze.py motion_left.bin motion_right.bin --fps 30 > timeline.txt
4. Build pan expression (faces centered at x=350 and x=1250)
4. 构建平移表达式(面部中心位于x=350和x=1250)
python scripts/build_pan.py timeline.txt --left-x 350 --right-x 1250 --width 608 > pan.txt
python scripts/build_pan.py timeline.txt --left-x 350 --right-x 1250 --width 608 > pan.txt
5. Apply crop with pan
5. 应用带平移的裁剪
ffmpeg -i source.mp4 -ss 00:03:15 -to 00:03:42
-vf "crop=608:1080:'$(cat pan.txt)':0" panned_clip.mp4
-vf "crop=608:1080:'$(cat pan.txt)':0" panned_clip.mp4
undefinedffmpeg -i source.mp4 -ss 00:03:15 -to 00:03:42
-vf "crop=608:1080:'$(cat pan.txt)':0" panned_clip.mp4
-vf "crop=608:1080:'$(cat pan.txt)':0" panned_clip.mp4
undefinedCustom subtitle styling
自定义字幕样式
Edit ASS file generated by :
build_ass.pyass
[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: Default,Impact,68,&H00FFFFFF,&H0000FFFF,&H00000000,&H00000000,-1,0,0,0,100,100,0,0,1,3,0,2,10,10,120,1
[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:00.50,0:00:02.30,Default,,0,0,0,,{\k70}This {\k15}is {\k15}the {\k40}worst {\k90}adviceCustomize:
- : Impact, Arial, Montserrat
Fontname - : 68 for 1080p vertical
Fontsize - :
PrimaryColour(white in BGR hex)&H00FFFFFF - :
SecondaryColour(yellow highlight)&H0000FFFF - : Border thickness (3 = thick black outline)
Outline - : 2=bottom center, 8=top center
Alignment - : Vertical margin from edge
MarginV
编辑生成的ASS文件:
build_ass.pyass
[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: Default,Impact,68,&H00FFFFFF,&H0000FFFF,&H00000000,&H00000000,-1,0,0,0,100,100,0,0,1,3,0,2,10,10,120,1
[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:00.50,0:00:02.30,Default,,0,0,0,,{\k70}This {\k15}is {\k15}the {\k40}worst {\k90}advice可自定义项:
- :Impact、Arial、Montserrat等
Fontname - :1080p竖屏建议68
Fontsize - :
PrimaryColour(BGR十六进制格式的白色)&H00FFFFFF - :
SecondaryColour(黄色高亮)&H0000FFFF - :边框厚度(3=粗黑边框)
Outline - :2=底部居中,8=顶部居中
Alignment - :距边缘的垂直边距
MarginV
Batch processing multiple clips
批量处理多个片段
python
import subprocess
import jsonpython
import subprocess
import jsonLoad Whisper transcript
加载Whisper转录结果
with open("source.json") as f:
data = json.load(f)
with open("source.json") as f:
data = json.load(f)
Define clip ranges
定义片段范围
clips = [
{"start": 154, "end": 171, "title": "clip1"},
{"start": 492, "end": 509, "title": "clip2"},
{"start": 903, "end": 924, "title": "clip3"}
]
for clip in clips:
# Cut raw clip
subprocess.run([
"ffmpeg", "-i", "source.mp4",
"-ss", str(clip["start"]),
"-to", str(clip["end"]),
"-c", "copy",
f"raw_{clip['title']}.mp4"
])
# Generate captions
subprocess.run([
"python", "scripts/build_ass.py", "source.json",
"--start", str(clip["start"]),
"--end", str(clip["end"]),
"--style", "opus"
], stdout=open(f"{clip['title']}.ass", "w"))
# Reframe and burn captions
subprocess.run([
"ffmpeg", "-i", f"raw_{clip['title']}.mp4",
"-vf", f"crop=608:1080:656:0,ass={clip['title']}.ass",
f"{clip['title']}_final.mp4"
])undefinedclips = [
{"start": 154, "end": 171, "title": "clip1"},
{"start": 492, "end": 509, "title": "clip2"},
{"start": 903, "end": 924, "title": "clip3"}
]
for clip in clips:
# 剪辑原始片段
subprocess.run([
"ffmpeg", "-i", "source.mp4",
"-ss", str(clip["start"]),
"-to", str(clip["end"]),
"-c", "copy",
f"raw_{clip['title']}.mp4"
])
# 生成字幕
subprocess.run([
"python", "scripts/build_ass.py", "source.json",
"--start", str(clip["start"]),
"--end", str(clip["end"]),
"--style", "opus"
], stdout=open(f"{clip['title']}.ass", "w"))
# 重构图并嵌入字幕
subprocess.run([
"ffmpeg", "-i", f"raw_{clip['title']}.mp4",
"-vf", f"crop=608:1080:656:0,ass={clip['title']}.ass",
f"{clip['title']}_final.mp4"
])undefinedConfiguration
配置说明
Hardware acceleration
硬件加速
macOS (default):
bash
ffmpeg -hwaccel videotoolbox -i input.mp4 ...Linux with NVIDIA:
bash
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 ...Windows:
bash
ffmpeg -hwaccel dxva2 -i input.mp4 ...Disable (CPU only):
Remove flags from SKILL.md ffmpeg commands.
-hwaccelmacOS(默认):
bash
ffmpeg -hwaccel videotoolbox -i input.mp4 ...NVIDIA显卡的Linux:
bash
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 ...Windows:
bash
ffmpeg -hwaccel dxva2 -i input.mp4 ...禁用(仅CPU):
从SKILL.md的ffmpeg命令中移除参数。
-hwaccelWhisper model size
Whisper模型尺寸
Faster but less accurate:
bash
whisper video.mp4 --model tiny # ~1GB, 10x fasterMore accurate but slower:
bash
whisper video.mp4 --model medium # ~1.5GB, 2x slower
whisper video.mp4 --model large # ~3GB, 4x slowerDefault in SKILL.md: (good balance for dialogue).
base速度快但精度较低:
bash
whisper video.mp4 --model tiny # ~1GB,速度快10倍精度高但速度较慢:
bash
whisper video.mp4 --model medium # ~1.5GB,速度慢2倍
whisper video.mp4 --model large # ~3GB,速度慢4倍SKILL.md中的默认模型:(对话场景下的平衡选择)。
baseOutput quality settings
输出质量设置
High quality (larger file):
bash
ffmpeg -i input.mp4 -c:v libx264 -preset slow -crf 18 -c:a aac -b:a 192k output.mp4Fast encode (lower quality):
bash
ffmpeg -i input.mp4 -c:v libx264 -preset veryfast -crf 23 -c:a aac -b:a 128k output.mp4Social media optimized (SKILL.md default):
bash
ffmpeg -i input.mp4 -c:v libx264 -preset medium -crf 20 -c:a aac -b:a 160k output.mp4高质量(文件较大):
bash
ffmpeg -i input.mp4 -c:v libx264 -preset slow -crf 18 -c:a aac -b:a 192k output.mp4快速编码(质量较低):
bash
ffmpeg -i input.mp4 -c:v libx264 -preset veryfast -crf 23 -c:a aac -b:a 128k output.mp4社交媒体优化(SKILL.md默认):
bash
ffmpeg -i input.mp4 -c:v libx264 -preset medium -crf 20 -c:a aac -b:a 160k output.mp4Troubleshooting
故障排除
"No motion detected in face regions"
"面部区域未检测到运动"
Face crop coordinates are wrong. Verify on a sample frame:
bash
undefined面部裁剪坐标错误。在样本帧上验证:
bash
undefinedExtract frame at 1 minute mark
提取1分钟处的帧
ffmpeg -ss 00:01:00 -i source.mp4 -frames:v 1 sample.png
ffmpeg -ss 00:01:00 -i source.mp4 -frames:v 1 sample.png
Overlay crop rectangles (adjust x,y,w,h)
叠加裁剪矩形(调整x,y,w,h)
ffmpeg -i source.mp4 -ss 00:01:00 -frames:v 1
-vf "drawbox=x=200:y=100:w=300:h=200:color=red:t=5,drawbox=x=1100:y=100:w=300:h=200:color=blue:t=5"
sample_boxes.png
-vf "drawbox=x=200:y=100:w=300:h=200:color=red:t=5,drawbox=x=1100:y=100:w=300:h=200:color=blue:t=5"
sample_boxes.png
Red = left face, blue = right face. Adjust coordinates until boxes frame each person's mouth/chin.ffmpeg -i source.mp4 -ss 00:01:00 -frames:v 1
-vf "drawbox=x=200:y=100:w=300:h=200:color=red:t=5,drawbox=x=1100:y=100:w=300:h=200:color=blue:t=5"
sample_boxes.png
-vf "drawbox=x=200:y=100:w=300:h=200:color=red:t=5,drawbox=x=1100:y=100:w=300:h=200:color=blue:t=5"
sample_boxes.png
红色=左侧面部,蓝色=右侧面部。调整坐标直到框选每个人的嘴部/下巴区域。"Captions out of sync"
"字幕不同步"
Whisper timestamps drift on long videos. Use smaller segments:
bash
undefined长视频中Whisper时间戳会漂移。使用更小的片段:
bash
undefinedTranscribe only the relevant 5-minute section
仅转录相关的5分钟片段
ffmpeg -i source.mp4 -ss 00:15:00 -to 00:20:00 -c copy segment.mp4
whisper segment.mp4 --model base --output_format json
Or enable Whisper's word-level timestamps:
```bash
whisper video.mp4 --model base --word_timestamps Trueffmpeg -i source.mp4 -ss 00:15:00 -to 00:20:00 -c copy segment.mp4
whisper segment.mp4 --model base --output_format json
或启用Whisper的逐词时间戳:
```bash
whisper video.mp4 --model base --word_timestamps True"Pan cuts too frequently"
"平移切换过于频繁"
Increase motion threshold in :
analyze.pypython
undefined在中提高运动阈值:
analyze.pypython
undefinedDefault threshold
默认阈值
MOTION_THRESHOLD = 0.1
MOTION_THRESHOLD = 0.1
Higher = less sensitive (fewer speaker changes)
值越高越不敏感(发言者切换更少)
MOTION_THRESHOLD = 0.3
Or add minimum duration between cuts in `build_pan.py`:
```pythonMOTION_THRESHOLD = 0.3
或在`build_pan.py`中添加切换的最小间隔时长:
```pythonIgnore speaker changes shorter than 2 seconds
忽略短于2秒的发言者切换
MIN_SEGMENT_DURATION = 2.0
undefinedMIN_SEGMENT_DURATION = 2.0
undefined"ffmpeg not found"
"未找到ffmpeg"
Ensure ffmpeg is in PATH:
bash
which ffmpeg确保ffmpeg在PATH中:
bash
which ffmpegShould print: /usr/local/bin/ffmpeg or similar
应输出:/usr/local/bin/ffmpeg或类似路径
If not installed:
若未安装:
brew install ffmpeg # macOS
sudo apt install ffmpeg # Ubuntu
undefinedbrew install ffmpeg # macOS
sudo apt install ffmpeg # Ubuntu
undefined"Whisper import error"
"Whisper导入错误"
bash
undefinedbash
undefinedUninstall conflicting whisper packages
卸载冲突的whisper包
pip uninstall whisper openai-whisper
pip uninstall whisper openai-whisper
Reinstall correct package
重新安装正确的包
pip install openai-whisper
pip install openai-whisper
Verify
验证
python -c "import whisper; print(whisper.version)"
undefinedpython -c "import whisper; print(whisper.version)"
undefined"VideoToolbox acceleration failed"
"VideoToolbox加速失败"
macOS-specific. Fallback to CPU:
bash
undefinedmacOS专属问题。 fallback到CPU:
bash
undefinedRemove -hwaccel videotoolbox from all ffmpeg commands
从所有ffmpeg命令中移除-hwaccel videotoolbox
ffmpeg -i input.mp4 ... # (no -hwaccel flag)
Or use software decode explicitly:
```bash
ffmpeg -hwaccel none -i input.mp4 ...ffmpeg -i input.mp4 ... # 无-hwaccel参数
或明确使用软件解码:
```bash
ffmpeg -hwaccel none -i input.mp4 ...Advanced: Integration with Custom Workflows
进阶:与自定义工作流集成
Use Clipify detection with external editor
将Clipify检测与外部编辑器结合使用
python
undefinedpython
undefined1. Run Clipify's detection logic (without cutting)
1. 运行Clipify的检测逻辑(不进行剪辑)
This would be in your custom script:
以下代码可放入自定义脚本:
import subprocess
import json
import subprocess
import json
Transcribe
转录
subprocess.run(["whisper", "source.mp4", "--model", "base", "--output_format", "json"])
subprocess.run(["whisper", "source.mp4", "--model", "base", "--output_format", "json"])
Load transcript
加载转录结果
with open("source.json") as f:
transcript = json.load(f)
with open("source.json") as f:
transcript = json.load(f)
Simple punchline detector (look for laughter indicators)
简单的笑点检测器(查找笑声相关词汇)
candidates = []
for segment in transcript["segments"]:
text = segment["text"].lower()
if any(word in text for word in ["haha", "lol", "crazy", "worst", "insane"]):
candidates.append({
"start": segment["start"],
"end": segment["end"],
"text": segment["text"]
})
print(json.dumps(candidates, indent=2))
candidates = []
for segment in transcript["segments"]:
text = segment["text"].lower()
if any(word in text for word in ["haha", "lol", "crazy", "worst", "insane"]):
candidates.append({
"start": segment["start"],
"end": segment["end"],
"text": segment["text"]
})
print(json.dumps(candidates, indent=2))
Output to DaVinci Resolve, Premiere, etc.
输出到DaVinci Resolve、Premiere等软件
undefinedundefinedExport timeline for manual editing
导出时间线用于手动编辑
python
undefinedpython
undefinedGenerate EDL (Edit Decision List) from Clipify candidates
从Clipify候选片段生成EDL(编辑决策列表)
def to_edl(clips, fps=30):
edl = ["TITLE: Clipify Export", "FCM: NON-DROP FRAME", ""]
for i, clip in enumerate(clips, 1):
start_tc = frames_to_tc(int(clip["start"] * fps), fps)
end_tc = frames_to_tc(int(clip["end"] * fps), fps)
edl.append(f"{i:03d} AX V C {start_tc} {end_tc} 00:00:00:00 {end_tc}")
return "\n".join(edl)def frames_to_tc(frames, fps):
h = frames // (fps * 3600)
m = (frames % (fps * 3600)) // (fps * 60)
s = (frames % (fps * 60)) // fps
f = frames % fps
return f"{h:02d}:{m:02d}:{s:02d}:{f:02d}"
def to_edl(clips, fps=30):
edl = ["TITLE: Clipify Export", "FCM: NON-DROP FRAME", ""]
for i, clip in enumerate(clips, 1):
start_tc = frames_to_tc(int(clip["start"] * fps), fps)
end_tc = frames_to_tc(int(clip["end"] * fps), fps)
edl.append(f"{i:03d} AX V C {start_tc} {end_tc} 00:00:00:00 {end_tc}")
return "\n".join(edl)def frames_to_tc(frames, fps):
h = frames // (fps * 3600)
m = (frames % (fps * 3600)) // (fps * 60)
s = (frames % (fps * 60)) // fps
f = frames % fps
return f"{h:02d}:{m:02d}:{s:02d}:{f:02d}"
Usage
使用示例
clips = [{"start": 154.2, "end": 171.8}, {"start": 492.5, "end": 509.1}]
print(to_edl(clips))
undefinedclips = [{"start": 154.2, "end": 171.8}, {"start": 492.5, "end": 509.1}]
print(to_edl(clips))
undefinedCustom caption animations
自定义字幕动画
Modify ASS file for animated entrances:
ass
Dialogue: 0,0:00:00.50,0:00:02.30,Default,,0,0,0,,{\fad(200,200)\move(640,1000,640,900)}This is animated text- : 200ms fade in/out
\fad(200,200) - : Slide from bottom to center
\move(x1,y1,x2,y2) - : Scale animation over 500ms
\t(0,500,\fscx120\fscy120)
修改ASS文件实现入场动画:
ass
Dialogue: 0,0:00:00.50,0:00:02.30,Default,,0,0,0,,{\fad(200,200)\move(640,1000,640,900)}This is animated text- :200ms淡入淡出
\fad(200,200) - :从底部滑动到中心
\move(x1,y1,x2,y2) - :500ms内缩放动画
\t(0,500,\fscx120\fscy120)
Performance Tips
性能优化技巧
-
Use proxy files for preview: Transcode to lower resolution before clipify analysisbash
ffmpeg -i source.mp4 -vf scale=960:540 -c:v libx264 -crf 28 proxy.mp4 -
Skip transcription on re-runs: Cache Whisper JSON outputbash
if [ ! -f source.json ]; then whisper source.mp4 --model base --output_format json fi -
Parallel clip rendering: Process multiple clips simultaneouslybash
for clip in clip1 clip2 clip3; do ffmpeg -i $clip.mp4 -vf "..." ${clip}_out.mp4 & done wait -
GPU acceleration: Use NVIDIA NVENC for faster encodingbash
ffmpeg -hwaccel cuda -i input.mp4 -c:v h264_nvenc -preset p4 -crf 20 output.mp4
-
使用代理文件预览:在Clipify分析前转码为低分辨率bash
ffmpeg -i source.mp4 -vf scale=960:540 -c:v libx264 -crf 28 proxy.mp4 -
重复运行时跳过转录:缓存Whisper JSON输出bash
if [ ! -f source.json ]; then whisper source.mp4 --model base --output_format json fi -
并行渲染片段:同时处理多个片段bash
for clip in clip1 clip2 clip3; do ffmpeg -i $clip.mp4 -vf "..." ${clip}_out.mp4 & done wait -
GPU加速:使用NVIDIA NVENC加快编码速度bash
ffmpeg -hwaccel cuda -i input.mp4 -c:v h264_nvenc -preset p4 -crf 20 output.mp4