clipify-video-clip-generator

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Clipify Video Clip Generator

Clipify 短视频片段生成工具

Skill by ara.so — Devtools Skills collection.
Clipify is a Claude Code skill that automatically turns long-form videos into social-ready clips. It transcribes video, identifies clip-worthy moments, reframes 16:9 to 9:16 with face-tracking pans, and burns opus-style word-by-word captions.
Key capabilities:
  • Auto-detect punchlines, reversals, and awkward pauses via Whisper transcription
  • Face-tracking pan for 9:16 vertical clips (no ML models — uses motion energy)
  • Opus/karaoke/minimal subtitle styles with word-level highlighting
  • Hardware-accelerated rendering (VideoToolbox on macOS)
  • ~20s render time for 20s clips on Apple Silicon
ara.so 开发的技能 —— Devtools Skills 合集。
Clipify 是一款 Claude Code 技能,可自动将长视频转换为适合社交媒体发布的短视频片段。它能转录视频内容、识别值得剪辑的片段、通过人脸追踪平移将16:9画面重构图为9:16竖屏,并添加Opus风格的逐词高亮字幕。
核心功能:
  • 通过Whisper转录自动识别笑点、反转情节和尴尬停顿
  • 针对9:16竖屏片段的人脸追踪平移(无需机器学习模型——使用运动能量算法)
  • 支持Opus/卡拉OK/极简等字幕样式,带逐词高亮效果
  • 硬件加速渲染(macOS平台使用VideoToolbox)
  • Apple Silicon设备上,20秒片段的渲染时间约为20秒

Installation

安装步骤

bash
undefined
bash
undefined

Clone to Claude Code skills directory

克隆到Claude Code技能目录

git clone https://github.com/louisedesadeleer/clipify.git ~/.claude/skills/clipify
git clone https://github.com/louisedesadeleer/clipify.git ~/.claude/skills/clipify

Install dependencies

安装依赖

brew install ffmpeg pip install openai-whisper numpy

**Requirements:**
- macOS (or Linux/Windows with `-hwaccel videotoolbox` removed from SKILL.md)
- ffmpeg with libx264
- Python 3 with numpy
- Whisper (openai-whisper)

Restart Claude Code after installation. The `/clipify` slash command will be available.
brew install ffmpeg pip install openai-whisper numpy

**系统要求:**
- macOS(Linux/Windows用户需移除SKILL.md中的`-hwaccel videotoolbox`参数)
- 带libx264的ffmpeg
- 安装了numpy的Python 3
- Whisper(openai-whisper)

安装完成后重启Claude Code,即可使用`/clipify`命令。

Usage Workflow

使用流程

1. Invoke the skill

1. 调用技能

In Claude Code:
/clipify
Provide the path to your source video when prompted:
/path/to/long-interview.mp4
在Claude Code中输入:
/clipify
根据提示提供源视频路径:
/path/to/long-interview.mp4

2. Review proposed clips

2. 查看推荐片段

Clipify transcribes the video and proposes 3-5 candidates with:
  • Timestamp range
  • Title/description
  • Reason (punchline, reversal, audio peak, awkward pause)
Example output:
Clip 1: "The worst product advice" (02:34 - 02:51)
  Reason: Reversal after awkward pause

Clip 2: "We burned $2M on this" (08:12 - 08:29)
  Reason: Audio peak + punchline

Clip 3: "My co-founder quit on Zoom" (15:03 - 15:24)
  Reason: Punchline
Clipify会转录视频并推荐3-5个候选片段,包含:
  • 时间戳范围
  • 标题/描述
  • 推荐理由(笑点、反转、音频峰值、尴尬停顿)
示例输出:
片段1:"最糟糕的产品建议"(02:34 - 02:51)
  理由:尴尬停顿后的反转情节

片段2:"我们在这上面烧了200万美元"(08:12 - 08:29)
  理由:音频峰值+笑点

片段3:"我的联合创始人在Zoom上辞职了"(15:03 - 15:24)
  理由:笑点

3. Select clip and format

3. 选择片段和格式

Choose which clip to cut, then specify:
  • Aspect ratio: 9:16 (vertical), 16:9 (horizontal), 1:1 (square)
  • Reframe style (if 9:16 from 16:9 with two speakers): pan (follow speaker) or split-screen
  • Subtitle style: opus (bold white + yellow highlight), karaoke (word-by-word), minimal, or paste reference image
选择要剪辑的片段,然后指定:
  • 宽高比:9:16(竖屏)、16:9(横屏)、1:1(方形)
  • 重构图样式(若从16:9转换为9:16且有两位发言者):平移(跟随发言者)或分屏
  • 字幕样式:Opus(白色粗体+黄色高亮)、卡拉OK(逐词显示)、极简,或粘贴参考图片

4. Output

4. 输出结果

Final clips are saved to:
<source-video-dir>/clipify_out/clip_<timestamp>.mp4
最终片段将保存至:
<源视频目录>/clipify_out/clip_<时间戳>.mp4

Scripts Reference

脚本参考

Clipify uses standalone Python scripts for each processing step. You can call these directly for custom workflows.
Clipify使用独立的Python脚本处理每个步骤,你可以直接调用这些脚本实现自定义工作流。

analyze.py — Speaker timeline from motion energy

analyze.py —— 基于运动能量生成发言者时间线

python
undefined
python
undefined

Generate motion energy files for two face regions

为两个面部区域生成运动能量文件

ffmpeg -i video.mp4 -vf "crop=300:200:100:50,format=gray,tblend=all_mode=difference"
-f rawvideo -pix_fmt gray - | python scripts/analyze.py motion_left.bin
ffmpeg -i video.mp4 -vf "crop=300:200:1000:50,format=gray,tblend=all_mode=difference"
-f rawvideo -pix_fmt gray - | python scripts/analyze.py motion_right.bin
ffmpeg -i video.mp4 -vf "crop=300:200:100:50,format=gray,tblend=all_mode=difference"
-f rawvideo -pix_fmt gray - | python scripts/analyze.py motion_left.bin
ffmpeg -i video.mp4 -vf "crop=300:200:1000:50,format=gray,tblend=all_mode=difference"
-f rawvideo -pix_fmt gray - | python scripts/analyze.py motion_right.bin

Analyze both to generate speaker timeline

分析两个文件生成发言者时间线

python scripts/analyze.py motion_left.bin motion_right.bin --fps 30 > timeline.txt

**Output format (timeline.txt):**
0.00-2.34:left 2.34-5.67:right 5.67-8.12:left
undefined
python scripts/analyze.py motion_left.bin motion_right.bin --fps 30 > timeline.txt

**输出格式(timeline.txt):**
0.00-2.34:left 2.34-5.67:right 5.67-8.12:left
undefined

build_pan.py — Generate ffmpeg crop expression

build_pan.py —— 生成ffmpeg裁剪表达式

python
undefined
python
undefined

From speaker timeline, build hard-cut pan expression

根据发言者时间线生成硬切平移表达式

python scripts/build_pan.py timeline.txt --left-x 100 --right-x 1000 --width 608 > pan_expr.txt
python scripts/build_pan.py timeline.txt --left-x 100 --right-x 1000 --width 608 > pan_expr.txt

Use in ffmpeg crop filter

在ffmpeg裁剪滤镜中使用

ffmpeg -i source.mp4 -vf "crop=608:1080:'$(cat pan_expr.txt)':0" output.mp4

**Arguments:**
- `--left-x`: X coordinate of left speaker's face center
- `--right-x`: X coordinate of right speaker's face center
- `--width`: Width of the 9:16 crop window (e.g., 608 for 1080p)

**Output:** ffmpeg expression string like:
if(between(t,0,2.34),100,if(between(t,2.34,5.67),1000,if(between(t,5.67,8.12),100,1000)))
undefined
ffmpeg -i source.mp4 -vf "crop=608:1080:'$(cat pan_expr.txt)':0" output.mp4

**参数说明:**
- `--left-x`:左侧发言者面部中心的X坐标
- `--right-x`:右侧发言者面部中心的X坐标
- `--width`:9:16裁剪窗口的宽度(例如1080p为608)

**输出:** ffmpeg表达式字符串,示例如下:
if(between(t,0,2.34),100,if(between(t,2.34,5.67),1000,if(between(t,5.67,8.12),100,1000)))
undefined

build_ass.py — Generate ASS subtitle file

build_ass.py —— 生成ASS字幕文件

python
undefined
python
undefined

From Whisper JSON output, create opus-style captions

根据Whisper JSON输出创建Opus风格字幕

python scripts/build_ass.py whisper_output.json --style opus > captions.ass
python scripts/build_ass.py whisper_output.json --style opus > captions.ass

Burn into video

嵌入到视频中

ffmpeg -i video.mp4 -vf "ass=captions.ass" output.mp4

**Whisper JSON format (input):**
```json
{
  "segments": [
    {
      "start": 0.5,
      "end": 2.3,
      "text": "This is the worst advice",
      "words": [
        {"word": "This", "start": 0.5, "end": 0.7},
        {"word": "is", "start": 0.7, "end": 0.85},
        {"word": "the", "start": 0.85, "end": 1.0},
        {"word": "worst", "start": 1.0, "end": 1.4},
        {"word": "advice", "start": 1.4, "end": 2.3}
      ]
    }
  ]
}
Styles:
  • opus
    : Bold white text, yellow active-word highlight, centered top
  • karaoke
    : Word-by-word color change, bottom positioned
  • minimal
    : Clean white text, no highlights
ffmpeg -i video.mp4 -vf "ass=captions.ass" output.mp4

**Whisper JSON格式(输入):**
```json
{
  "segments": [
    {
      "start": 0.5,
      "end": 2.3,
      "text": "This is the worst advice",
      "words": [
        {"word": "This", "start": 0.5, "end": 0.7},
        {"word": "is", "start": 0.7, "end": 0.85},
        {"word": "the", "start": 0.85, "end": 1.0},
        {"word": "worst", "start": 1.0, "end": 1.4},
        {"word": "advice", "start": 1.4, "end": 2.3}
      ]
    }
  ]
}
字幕样式:
  • opus
    :白色粗体文本,当前单词黄色高亮,顶部居中
  • karaoke
    :逐词变色,位于底部
  • minimal
    :简洁白色文本,无高亮

audio_align.py — Find clip offset in source

audio_align.py —— 查找片段在源视频中的偏移位置

python
undefined
python
undefined

Find where a 20s clip appears in a 2-hour source video

查找20秒片段在2小时源视频中的位置

python scripts/audio_align.py source.mp4 clip.mp4
python scripts/audio_align.py source.mp4 clip.mp4

Output: 00:15:34.2 (offset timestamp)

输出:00:15:34.2(偏移时间戳)


Uses audio cross-correlation. Useful for re-linking edited clips to source timestamps.

使用音频互相关算法,适用于将编辑后的片段重新关联到源视频时间戳。

Common Patterns

常见使用场景

Extract clip manually (without auto-detection)

手动提取片段(无需自动检测)

bash
undefined
bash
undefined

1. Transcribe with Whisper

1. 使用Whisper转录

whisper source.mp4 --model base --output_format json --output_dir ./
whisper source.mp4 --model base --output_format json --output_dir ./

2. Cut segment (03:15 to 03:42)

2. 剪辑片段(03:15至03:42)

ffmpeg -i source.mp4 -ss 00:03:15 -to 00:03:42 -c copy raw_clip.mp4
ffmpeg -i source.mp4 -ss 00:03:15 -to 00:03:42 -c copy raw_clip.mp4

3. Generate captions for this segment

3. 为该片段生成字幕

python scripts/build_ass.py source.json --start 195 --end 222 --style opus > clip.ass
python scripts/build_ass.py source.json --start 195 --end 222 --style opus > clip.ass

4. Reframe to 9:16 with center crop (no pan)

4. 重构图为9:16(居中裁剪,无平移)

ffmpeg -i raw_clip.mp4 -vf "crop=608:1080:656:0,ass=clip.ass" final_clip.mp4
undefined
ffmpeg -i raw_clip.mp4 -vf "crop=608:1080:656:0,ass=clip.ass" final_clip.mp4
undefined

Two-speaker pan with manual face coordinates

双发言者平移(手动指定面部坐标)

bash
undefined
bash
undefined

1. Identify face regions on a sample frame

1. 在样本帧上识别面部区域

ffplay -ss 00:01:00 source.mp4 # visual inspection
ffplay -ss 00:01:00 source.mp4 # 可视化检查

Left face: x=200, y=100, width=300, height=200

左侧面部:x=200, y=100, width=300, height=200

Right face: x=1100, y=100, width=300, height=200

右侧面部:x=1100, y=100, width=300, height=200

2. Generate motion energy files

2. 生成运动能量文件

ffmpeg -i source.mp4 -ss 00:03:15 -to 00:03:42
-vf "crop=300:200:200:100,format=gray,tblend=all_mode=difference"
-f rawvideo -pix_fmt gray motion_left.bin
ffmpeg -i source.mp4 -ss 00:03:15 -to 00:03:42
-vf "crop=300:200:1100:100,format=gray,tblend=all_mode=difference"
-f rawvideo -pix_fmt gray motion_right.bin
ffmpeg -i source.mp4 -ss 00:03:15 -to 00:03:42
-vf "crop=300:200:200:100,format=gray,tblend=all_mode=difference"
-f rawvideo -pix_fmt gray motion_left.bin
ffmpeg -i source.mp4 -ss 00:03:15 -to 00:03:42
-vf "crop=300:200:1100:100,format=gray,tblend=all_mode=difference"
-f rawvideo -pix_fmt gray motion_right.bin

3. Analyze speaker timeline

3. 分析发言者时间线

python scripts/analyze.py motion_left.bin motion_right.bin --fps 30 > timeline.txt
python scripts/analyze.py motion_left.bin motion_right.bin --fps 30 > timeline.txt

4. Build pan expression (faces centered at x=350 and x=1250)

4. 构建平移表达式(面部中心位于x=350和x=1250)

python scripts/build_pan.py timeline.txt --left-x 350 --right-x 1250 --width 608 > pan.txt
python scripts/build_pan.py timeline.txt --left-x 350 --right-x 1250 --width 608 > pan.txt

5. Apply crop with pan

5. 应用带平移的裁剪

ffmpeg -i source.mp4 -ss 00:03:15 -to 00:03:42
-vf "crop=608:1080:'$(cat pan.txt)':0" panned_clip.mp4
undefined
ffmpeg -i source.mp4 -ss 00:03:15 -to 00:03:42
-vf "crop=608:1080:'$(cat pan.txt)':0" panned_clip.mp4
undefined

Custom subtitle styling

自定义字幕样式

Edit ASS file generated by
build_ass.py
:
ass
[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: Default,Impact,68,&H00FFFFFF,&H0000FFFF,&H00000000,&H00000000,-1,0,0,0,100,100,0,0,1,3,0,2,10,10,120,1

[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:00.50,0:00:02.30,Default,,0,0,0,,{\k70}This {\k15}is {\k15}the {\k40}worst {\k90}advice
Customize:
  • Fontname
    : Impact, Arial, Montserrat
  • Fontsize
    : 68 for 1080p vertical
  • PrimaryColour
    :
    &H00FFFFFF
    (white in BGR hex)
  • SecondaryColour
    :
    &H0000FFFF
    (yellow highlight)
  • Outline
    : Border thickness (3 = thick black outline)
  • Alignment
    : 2=bottom center, 8=top center
  • MarginV
    : Vertical margin from edge
编辑
build_ass.py
生成的ASS文件:
ass
[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: Default,Impact,68,&H00FFFFFF,&H0000FFFF,&H00000000,&H00000000,-1,0,0,0,100,100,0,0,1,3,0,2,10,10,120,1

[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:00.50,0:00:02.30,Default,,0,0,0,,{\k70}This {\k15}is {\k15}the {\k40}worst {\k90}advice
可自定义项:
  • Fontname
    :Impact、Arial、Montserrat等
  • Fontsize
    :1080p竖屏建议68
  • PrimaryColour
    &H00FFFFFF
    (BGR十六进制格式的白色)
  • SecondaryColour
    &H0000FFFF
    (黄色高亮)
  • Outline
    :边框厚度(3=粗黑边框)
  • Alignment
    :2=底部居中,8=顶部居中
  • MarginV
    :距边缘的垂直边距

Batch processing multiple clips

批量处理多个片段

python
import subprocess
import json
python
import subprocess
import json

Load Whisper transcript

加载Whisper转录结果

with open("source.json") as f: data = json.load(f)
with open("source.json") as f: data = json.load(f)

Define clip ranges

定义片段范围

clips = [ {"start": 154, "end": 171, "title": "clip1"}, {"start": 492, "end": 509, "title": "clip2"}, {"start": 903, "end": 924, "title": "clip3"} ]
for clip in clips: # Cut raw clip subprocess.run([ "ffmpeg", "-i", "source.mp4", "-ss", str(clip["start"]), "-to", str(clip["end"]), "-c", "copy", f"raw_{clip['title']}.mp4" ])
# Generate captions
subprocess.run([
    "python", "scripts/build_ass.py", "source.json",
    "--start", str(clip["start"]),
    "--end", str(clip["end"]),
    "--style", "opus"
], stdout=open(f"{clip['title']}.ass", "w"))

# Reframe and burn captions
subprocess.run([
    "ffmpeg", "-i", f"raw_{clip['title']}.mp4",
    "-vf", f"crop=608:1080:656:0,ass={clip['title']}.ass",
    f"{clip['title']}_final.mp4"
])
undefined
clips = [ {"start": 154, "end": 171, "title": "clip1"}, {"start": 492, "end": 509, "title": "clip2"}, {"start": 903, "end": 924, "title": "clip3"} ]
for clip in clips: # 剪辑原始片段 subprocess.run([ "ffmpeg", "-i", "source.mp4", "-ss", str(clip["start"]), "-to", str(clip["end"]), "-c", "copy", f"raw_{clip['title']}.mp4" ])
# 生成字幕
subprocess.run([
    "python", "scripts/build_ass.py", "source.json",
    "--start", str(clip["start"]),
    "--end", str(clip["end"]),
    "--style", "opus"
], stdout=open(f"{clip['title']}.ass", "w"))

# 重构图并嵌入字幕
subprocess.run([
    "ffmpeg", "-i", f"raw_{clip['title']}.mp4",
    "-vf", f"crop=608:1080:656:0,ass={clip['title']}.ass",
    f"{clip['title']}_final.mp4"
])
undefined

Configuration

配置说明

Hardware acceleration

硬件加速

macOS (default):
bash
ffmpeg -hwaccel videotoolbox -i input.mp4 ...
Linux with NVIDIA:
bash
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 ...
Windows:
bash
ffmpeg -hwaccel dxva2 -i input.mp4 ...
Disable (CPU only): Remove
-hwaccel
flags from SKILL.md ffmpeg commands.
macOS(默认):
bash
ffmpeg -hwaccel videotoolbox -i input.mp4 ...
NVIDIA显卡的Linux:
bash
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 ...
Windows:
bash
ffmpeg -hwaccel dxva2 -i input.mp4 ...
禁用(仅CPU): 从SKILL.md的ffmpeg命令中移除
-hwaccel
参数。

Whisper model size

Whisper模型尺寸

Faster but less accurate:
bash
whisper video.mp4 --model tiny  # ~1GB, 10x faster
More accurate but slower:
bash
whisper video.mp4 --model medium  # ~1.5GB, 2x slower
whisper video.mp4 --model large   # ~3GB, 4x slower
Default in SKILL.md:
base
(good balance for dialogue).
速度快但精度较低:
bash
whisper video.mp4 --model tiny  # ~1GB,速度快10倍
精度高但速度较慢:
bash
whisper video.mp4 --model medium  # ~1.5GB,速度慢2倍
whisper video.mp4 --model large   # ~3GB,速度慢4倍
SKILL.md中的默认模型:
base
(对话场景下的平衡选择)。

Output quality settings

输出质量设置

High quality (larger file):
bash
ffmpeg -i input.mp4 -c:v libx264 -preset slow -crf 18 -c:a aac -b:a 192k output.mp4
Fast encode (lower quality):
bash
ffmpeg -i input.mp4 -c:v libx264 -preset veryfast -crf 23 -c:a aac -b:a 128k output.mp4
Social media optimized (SKILL.md default):
bash
ffmpeg -i input.mp4 -c:v libx264 -preset medium -crf 20 -c:a aac -b:a 160k output.mp4
高质量(文件较大):
bash
ffmpeg -i input.mp4 -c:v libx264 -preset slow -crf 18 -c:a aac -b:a 192k output.mp4
快速编码(质量较低):
bash
ffmpeg -i input.mp4 -c:v libx264 -preset veryfast -crf 23 -c:a aac -b:a 128k output.mp4
社交媒体优化(SKILL.md默认):
bash
ffmpeg -i input.mp4 -c:v libx264 -preset medium -crf 20 -c:a aac -b:a 160k output.mp4

Troubleshooting

故障排除

"No motion detected in face regions"

"面部区域未检测到运动"

Face crop coordinates are wrong. Verify on a sample frame:
bash
undefined
面部裁剪坐标错误。在样本帧上验证:
bash
undefined

Extract frame at 1 minute mark

提取1分钟处的帧

ffmpeg -ss 00:01:00 -i source.mp4 -frames:v 1 sample.png
ffmpeg -ss 00:01:00 -i source.mp4 -frames:v 1 sample.png

Overlay crop rectangles (adjust x,y,w,h)

叠加裁剪矩形(调整x,y,w,h)

ffmpeg -i source.mp4 -ss 00:01:00 -frames:v 1
-vf "drawbox=x=200:y=100:w=300:h=200:color=red:t=5,drawbox=x=1100:y=100:w=300:h=200:color=blue:t=5"
sample_boxes.png

Red = left face, blue = right face. Adjust coordinates until boxes frame each person's mouth/chin.
ffmpeg -i source.mp4 -ss 00:01:00 -frames:v 1
-vf "drawbox=x=200:y=100:w=300:h=200:color=red:t=5,drawbox=x=1100:y=100:w=300:h=200:color=blue:t=5"
sample_boxes.png

红色=左侧面部,蓝色=右侧面部。调整坐标直到框选每个人的嘴部/下巴区域。

"Captions out of sync"

"字幕不同步"

Whisper timestamps drift on long videos. Use smaller segments:
bash
undefined
长视频中Whisper时间戳会漂移。使用更小的片段:
bash
undefined

Transcribe only the relevant 5-minute section

仅转录相关的5分钟片段

ffmpeg -i source.mp4 -ss 00:15:00 -to 00:20:00 -c copy segment.mp4 whisper segment.mp4 --model base --output_format json

Or enable Whisper's word-level timestamps:
```bash
whisper video.mp4 --model base --word_timestamps True
ffmpeg -i source.mp4 -ss 00:15:00 -to 00:20:00 -c copy segment.mp4 whisper segment.mp4 --model base --output_format json

或启用Whisper的逐词时间戳:
```bash
whisper video.mp4 --model base --word_timestamps True

"Pan cuts too frequently"

"平移切换过于频繁"

Increase motion threshold in
analyze.py
:
python
undefined
analyze.py
中提高运动阈值:
python
undefined

Default threshold

默认阈值

MOTION_THRESHOLD = 0.1
MOTION_THRESHOLD = 0.1

Higher = less sensitive (fewer speaker changes)

值越高越不敏感(发言者切换更少)

MOTION_THRESHOLD = 0.3

Or add minimum duration between cuts in `build_pan.py`:

```python
MOTION_THRESHOLD = 0.3

或在`build_pan.py`中添加切换的最小间隔时长:

```python

Ignore speaker changes shorter than 2 seconds

忽略短于2秒的发言者切换

MIN_SEGMENT_DURATION = 2.0
undefined
MIN_SEGMENT_DURATION = 2.0
undefined

"ffmpeg not found"

"未找到ffmpeg"

Ensure ffmpeg is in PATH:
bash
which ffmpeg
确保ffmpeg在PATH中:
bash
which ffmpeg

Should print: /usr/local/bin/ffmpeg or similar

应输出:/usr/local/bin/ffmpeg或类似路径

If not installed:

若未安装:

brew install ffmpeg # macOS sudo apt install ffmpeg # Ubuntu
undefined
brew install ffmpeg # macOS sudo apt install ffmpeg # Ubuntu
undefined

"Whisper import error"

"Whisper导入错误"

bash
undefined
bash
undefined

Uninstall conflicting whisper packages

卸载冲突的whisper包

pip uninstall whisper openai-whisper
pip uninstall whisper openai-whisper

Reinstall correct package

重新安装正确的包

pip install openai-whisper
pip install openai-whisper

Verify

验证

python -c "import whisper; print(whisper.version)"
undefined
python -c "import whisper; print(whisper.version)"
undefined

"VideoToolbox acceleration failed"

"VideoToolbox加速失败"

macOS-specific. Fallback to CPU:
bash
undefined
macOS专属问题。 fallback到CPU:
bash
undefined

Remove -hwaccel videotoolbox from all ffmpeg commands

从所有ffmpeg命令中移除-hwaccel videotoolbox

ffmpeg -i input.mp4 ... # (no -hwaccel flag)

Or use software decode explicitly:
```bash
ffmpeg -hwaccel none -i input.mp4 ...
ffmpeg -i input.mp4 ... # 无-hwaccel参数

或明确使用软件解码:
```bash
ffmpeg -hwaccel none -i input.mp4 ...

Advanced: Integration with Custom Workflows

进阶:与自定义工作流集成

Use Clipify detection with external editor

将Clipify检测与外部编辑器结合使用

python
undefined
python
undefined

1. Run Clipify's detection logic (without cutting)

1. 运行Clipify的检测逻辑(不进行剪辑)

This would be in your custom script:

以下代码可放入自定义脚本:

import subprocess import json
import subprocess import json

Transcribe

转录

subprocess.run(["whisper", "source.mp4", "--model", "base", "--output_format", "json"])
subprocess.run(["whisper", "source.mp4", "--model", "base", "--output_format", "json"])

Load transcript

加载转录结果

with open("source.json") as f: transcript = json.load(f)
with open("source.json") as f: transcript = json.load(f)

Simple punchline detector (look for laughter indicators)

简单的笑点检测器(查找笑声相关词汇)

candidates = [] for segment in transcript["segments"]: text = segment["text"].lower() if any(word in text for word in ["haha", "lol", "crazy", "worst", "insane"]): candidates.append({ "start": segment["start"], "end": segment["end"], "text": segment["text"] })
print(json.dumps(candidates, indent=2))
candidates = [] for segment in transcript["segments"]: text = segment["text"].lower() if any(word in text for word in ["haha", "lol", "crazy", "worst", "insane"]): candidates.append({ "start": segment["start"], "end": segment["end"], "text": segment["text"] })
print(json.dumps(candidates, indent=2))

Output to DaVinci Resolve, Premiere, etc.

输出到DaVinci Resolve、Premiere等软件

undefined
undefined

Export timeline for manual editing

导出时间线用于手动编辑

python
undefined
python
undefined

Generate EDL (Edit Decision List) from Clipify candidates

从Clipify候选片段生成EDL(编辑决策列表)

def to_edl(clips, fps=30): edl = ["TITLE: Clipify Export", "FCM: NON-DROP FRAME", ""]
for i, clip in enumerate(clips, 1):
    start_tc = frames_to_tc(int(clip["start"] * fps), fps)
    end_tc = frames_to_tc(int(clip["end"] * fps), fps)
    
    edl.append(f"{i:03d}  AX       V     C        {start_tc} {end_tc} 00:00:00:00 {end_tc}")

return "\n".join(edl)
def frames_to_tc(frames, fps): h = frames // (fps * 3600) m = (frames % (fps * 3600)) // (fps * 60) s = (frames % (fps * 60)) // fps f = frames % fps return f"{h:02d}:{m:02d}:{s:02d}:{f:02d}"
def to_edl(clips, fps=30): edl = ["TITLE: Clipify Export", "FCM: NON-DROP FRAME", ""]
for i, clip in enumerate(clips, 1):
    start_tc = frames_to_tc(int(clip["start"] * fps), fps)
    end_tc = frames_to_tc(int(clip["end"] * fps), fps)
    
    edl.append(f"{i:03d}  AX       V     C        {start_tc} {end_tc} 00:00:00:00 {end_tc}")

return "\n".join(edl)
def frames_to_tc(frames, fps): h = frames // (fps * 3600) m = (frames % (fps * 3600)) // (fps * 60) s = (frames % (fps * 60)) // fps f = frames % fps return f"{h:02d}:{m:02d}:{s:02d}:{f:02d}"

Usage

使用示例

clips = [{"start": 154.2, "end": 171.8}, {"start": 492.5, "end": 509.1}] print(to_edl(clips))
undefined
clips = [{"start": 154.2, "end": 171.8}, {"start": 492.5, "end": 509.1}] print(to_edl(clips))
undefined

Custom caption animations

自定义字幕动画

Modify ASS file for animated entrances:
ass
Dialogue: 0,0:00:00.50,0:00:02.30,Default,,0,0,0,,{\fad(200,200)\move(640,1000,640,900)}This is animated text
  • \fad(200,200)
    : 200ms fade in/out
  • \move(x1,y1,x2,y2)
    : Slide from bottom to center
  • \t(0,500,\fscx120\fscy120)
    : Scale animation over 500ms
修改ASS文件实现入场动画:
ass
Dialogue: 0,0:00:00.50,0:00:02.30,Default,,0,0,0,,{\fad(200,200)\move(640,1000,640,900)}This is animated text
  • \fad(200,200)
    :200ms淡入淡出
  • \move(x1,y1,x2,y2)
    :从底部滑动到中心
  • \t(0,500,\fscx120\fscy120)
    :500ms内缩放动画

Performance Tips

性能优化技巧

  1. Use proxy files for preview: Transcode to lower resolution before clipify analysis
    bash
    ffmpeg -i source.mp4 -vf scale=960:540 -c:v libx264 -crf 28 proxy.mp4
  2. Skip transcription on re-runs: Cache Whisper JSON output
    bash
    if [ ! -f source.json ]; then
      whisper source.mp4 --model base --output_format json
    fi
  3. Parallel clip rendering: Process multiple clips simultaneously
    bash
    for clip in clip1 clip2 clip3; do
      ffmpeg -i $clip.mp4 -vf "..." ${clip}_out.mp4 &
    done
    wait
  4. GPU acceleration: Use NVIDIA NVENC for faster encoding
    bash
    ffmpeg -hwaccel cuda -i input.mp4 -c:v h264_nvenc -preset p4 -crf 20 output.mp4
  1. 使用代理文件预览:在Clipify分析前转码为低分辨率
    bash
    ffmpeg -i source.mp4 -vf scale=960:540 -c:v libx264 -crf 28 proxy.mp4
  2. 重复运行时跳过转录:缓存Whisper JSON输出
    bash
    if [ ! -f source.json ]; then
      whisper source.mp4 --model base --output_format json
    fi
  3. 并行渲染片段:同时处理多个片段
    bash
    for clip in clip1 clip2 clip3; do
      ffmpeg -i $clip.mp4 -vf "..." ${clip}_out.mp4 &
    done
    wait
  4. GPU加速:使用NVIDIA NVENC加快编码速度
    bash
    ffmpeg -hwaccel cuda -i input.mp4 -c:v h264_nvenc -preset p4 -crf 20 output.mp4