moviepy

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

moviepy for Video Production

基于moviepy的视频制作

moviepy is the toolkit's go-to library for putting deterministic text on top of AI-generated video and for building short, single-file Python video projects without a Remotion toolchain.
The deeper principle is trustworthy text: any genre where text has to be readable, accurate, and consistent (legally, editorially, or commercially) is a genre where AI-rendered in-frame text is unacceptable and a moviepy overlay step is the natural fix. Names must be spelled right. Prices must be exact. Source attributions must be pixel-perfect. AI generation models cannot guarantee any of that.
moviepy是本工具集中用于在AI生成的视频上叠加确定性文本,以及无需Remotion工具链即可构建简短单文件Python视频项目的首选库。
其核心原则是可信文本:任何对文本的可读性、准确性和一致性有严格要求(法律、编辑或商业层面)的场景,AI渲染的帧内文本都是不可接受的,而moviepy的叠加步骤则是天然的解决方案。姓名拼写必须准确,价格必须精确,来源标注必须像素级完美。AI生成模型无法保证这些要求。

When to use moviepy vs. Remotion

moviepy与Remotion的适用场景对比

Use moviepy when…Use Remotion when…
Overlaying text/labels on an LTX-2 or SadTalker outputBuilding long-form sprint reviews or product demos
Building sub-30s ad-style spots in a single
build.py
Multi-template, multi-brand, design-heavy work
Compositing data-driven visuals (matplotlib
FuncAnimation
→ mp4)
Anything needing React components or design system reuse
One-off transformations on existing video filesAnything where the project lifecycle (planning → render) matters
You want zero Node.js / no React mental overheadYou want hot-reload preview in Remotion Studio
Two runnable references for everything in this skill live in
examples/
:
  • examples/quick-spot/build.py
    — 15-second ad-style spot. Audio-anchored timeline, text overlay, optional VO + ducked music. Renders silent out of the box with zero external assets.
  • examples/data-viz-chart/build.py
    — animated time-series chart with deterministic title and source attribution. Demonstrates the matplotlib (data) + moviepy (trustworthy text) split.
Both run with
python3 build.py
and produce a real
out.mp4
immediately. Read them alongside this skill — every pattern below is shown working there.
Dependencies.
moviepy
,
Pillow
, and
matplotlib
are declared in
tools/requirements.txt
and installed with the toolkit's one-line Python setup:
python3 -m pip install -r tools/requirements.txt
. If you hit
Missing dependency
when running an example, run that command from the repo root — the examples'
build.py
files will tell you the same thing in their error message and exit cleanly rather than printing a bare traceback.
选择moviepy的场景…选择Remotion的场景…
为LTX-2或SadTalker的输出叠加文本/标签制作长格式冲刺回顾或产品演示视频
用单文件
build.py
制作30秒以内的广告风格视频
多模板、多品牌、重设计的项目
合成数据驱动的可视化内容(matplotlib
FuncAnimation
→ mp4)
需要使用React组件或复用设计系统的场景
对现有视频文件进行一次性转换重视项目生命周期(规划→渲染)的场景
希望完全无需Node.js / 无React思维负担希望在Remotion Studio中实现热重载预览
本技能的两个可运行参考示例位于
examples/
目录下:
  • examples/quick-spot/build.py
    — 15秒广告风格视频。基于音频锚定的时间线、文本叠加、可选旁白+压低音量的背景音乐。默认无需外部资源即可渲染出无音视频。
  • examples/data-viz-chart/build.py
    — 带有确定性标题和来源标注的动画时间序列图表。展示了matplotlib(数据处理)+ moviepy(可信文本)的分工模式。
两个示例均可通过
python3 build.py
运行,并立即生成真实的
out.mp4
文件。阅读本技能时可同时参考这两个示例——下文提到的所有模式均已在其中实现。
依赖项
moviepy
Pillow
matplotlib
已在
tools/requirements.txt
中声明,可通过工具集的一键Python安装命令安装:
python3 -m pip install -r tools/requirements.txt
。如果运行示例时遇到
Missing dependency
错误,请从仓库根目录运行该命令——示例中的
build.py
文件会在错误信息中提示相同内容,并干净退出而非打印原始回溯信息。

The main use case: text on AI-generated video

主要使用场景:AI生成视频上的文本叠加

Both LTX-2 and SadTalker output bare visuals:
  • LTX-2 cannot reliably render readable text (the model hallucinates letterforms — see the ltx2 skill's "Bad Prompts").
  • SadTalker outputs a talking head with no captions, labels, lower thirds, or context.
The fix is to generate the visual cleanly, then composite text over it deterministically with moviepy. This is the canonical pattern in this toolkit:
python
from moviepy import VideoFileClip, ImageClip, CompositeVideoClip
LTX-2和SadTalker均仅输出纯视觉内容:
  • LTX-2无法可靠渲染可读文本(模型会幻写字形——详见ltx2技能的“不良提示”部分)。
  • SadTalker输出的是说话人头像,无字幕、标签、下三分之一字幕或上下文信息。
解决方案是先干净生成视觉内容,再通过moviepy确定性地合成文本叠加层。这是本工具集中的标准模式:
python
from moviepy import VideoFileClip, ImageClip, CompositeVideoClip

1. AI-generated visual (LTX-2 or SadTalker output)

1. AI生成的视觉内容(LTX-2或SadTalker输出)

bg = VideoFileClip("lugh_ltx.mp4").without_audio()
bg = VideoFileClip("lugh_ltx.mp4").without_audio()

2. Text rendered via PIL → ImageClip (see "Text rendering" below)

2. 通过PIL渲染文本→ImageClip(详见下文“文本渲染”)

title = ( ImageClip("text_cache/intro_title.png") .with_duration(2.0) .with_start(0.5) .with_position(("center", 880)) )
title = ( ImageClip("text_cache/intro_title.png") .with_duration(2.0) .with_start(0.5) .with_position(("center", 880)) )

3. Composite

3. 合成

final = CompositeVideoClip([bg, title], size=(1920, 1080)) final.write_videofile("lugh_with_caption.mp4", fps=30, codec="libx264")

Common shapes this takes:

| Shape | LTX-2 use | SadTalker use |
|-------|-----------|---------------|
| Title card over hero footage | "INTRODUCING LONGARM" over a cinematic LTX-2 b-roll | n/a |
| Lower third / name plate | n/a | "Lugh — Ancient Warrior God" under a talking head |
| Quote caption | "I am going home." over an LTX-2 character cameo | Same, over a SadTalker talking head |
| Brand attribution | Logo + URL fade-in over the last second | Same |
| Tinted overlay for contrast | Dark navy semi-transparent layer behind text | Same |
final = CompositeVideoClip([bg, title], size=(1920, 1080)) final.write_videofile("lugh_with_caption.mp4", fps=30, codec="libx264")

常见的应用形式:

| 形式 | LTX-2应用场景 | SadTalker应用场景 |
|-------|-----------|---------------|
| 英雄镜头上的标题卡片 | 在LTX-2生成的电影风格B-roll上叠加“INTRODUCING LONGARM” | 不适用 |
| 下三分之一字幕/姓名牌 | 不适用 | 在说话人头像下方叠加“Lugh — Ancient Warrior God” |
| 引语字幕 | 在LTX-2生成的角色特写镜头上叠加“I am going home.” | 相同,在SadTalker生成的说话人头像上叠加 |
| 品牌标识 | 在最后一秒淡入Logo+网址 | 相同 |
| 用于增强对比度的着色叠加层 | 在文本后方添加深蓝色半透明层 | 相同 |

Genres where this shines

适用的视频类型

The "AI-visual + deterministic text overlay" pattern is the natural production pipeline for several styles of video. If the request matches one of these, reach for moviepy by default:
GenreWhat you overlayWhy moviepy is the right call
News / talking-head journalismSpeaker name plates, location bars, breaking-news banners, source attribution, pull quotesNames must be spelled right (editorial / legal). The biggest category by volume.
Documentary segmentsInterviewee lower thirds, chapter titles, archival source credits, location stampsSame trust requirement as news.
Trailers / promo spotsTitle cards, credit overlays ("FROM THE DIRECTOR OF…"), date stings, quote cards, CTAsTightly timed, text-heavy, every frame matters. The
q2-townhall-longarm-ad
example is exactly this.
Social short-form (Reels, TikTok, Shorts)Word-accurate captions for sound-off viewing, hashtag overlaysMost social viewing is muted; captions are non-negotiable.
Product demos with annotationsPricing callouts, feature labels, "click here" pointers over screen recordings, before/after labelsPrices and product names must be exact.
Tutorials / explainersStep number overlays, terminal-command captions, keyboard-shortcut calloutsStep numbers must be sequential, commands must be copy-pasteable.
Lesser-but-real fits: music videos (lyric overlays), reaction videos (source attribution), sports recaps (score overlays), real-estate tours (price / sqft), conference talks (speaker + session plate).
For full SRT-driven subtitling (long-form, time-coded, multilingual) moviepy is workable but not ideal — reach for
ffmpeg
with
subtitles
filter or a dedicated subtitle tool. moviepy is best for hand-placed overlays, not bulk caption tracks.
“AI视觉内容+确定性文本叠加”模式是多种视频风格的天然制作流程。如果需求符合以下类型,默认选择moviepy即可:
视频类型叠加内容为什么moviepy是正确选择
新闻/访谈类视频说话人姓名牌、位置栏、突发新闻横幅、来源标注、引语卡片姓名拼写必须准确(编辑/法律要求)。这是需求量最大的类别。
纪录片片段受访者下三分之一字幕、章节标题、档案来源 credits、位置标记与新闻类有相同的可信性要求。
预告片/宣传视频标题卡片、信用叠加层(“FROM THE DIRECTOR OF…”)、日期标记、引语卡片、行动号召(CTA)时间紧凑、文本密集,每一帧都至关重要。
q2-townhall-longarm-ad
示例正是此类场景。
社交短视频(Reels、TikTok、Shorts)精准对应语音的字幕(静音观看场景)、话题标签叠加层大多数社交视频观看为静音状态,字幕必不可少。
带注释的产品演示价格标注、功能标签、屏幕录制上的“点击此处”指示、前后对比标签价格和产品名称必须精确。
教程/讲解视频步骤编号叠加层、终端命令字幕、快捷键标注步骤编号必须连续,命令必须可复制粘贴。
其他适用场景:音乐视频(歌词叠加)、反应视频(来源标注)、体育回顾(比分叠加)、房地产巡展(价格/面积)、会议演讲(演讲者+场次信息牌)。
对于完整的SRT驱动字幕(长格式、时间编码、多语言),moviepy虽可行但并非理想选择——建议使用带
subtitles
滤镜的
ffmpeg
或专用字幕工具。moviepy最适合手动放置的叠加层,而非批量字幕轨道。

Text rendering — use PIL, not
TextClip

文本渲染——使用PIL,而非
TextClip

Critical gotcha: moviepy 2.x's
TextClip(method='label')
has a tight-bbox bug that clips letter ascenders and descenders (the tops of capitals, the tails of g/p/y). On Apple Silicon you'll see characters with sliced edges and not realise what's wrong for hours.
The workaround: render text to a transparent PNG via PIL, then load it as an
ImageClip
. Cache the result by content hash so re-builds are free.
python
import hashlib
from pathlib import Path
from PIL import Image, ImageDraw, ImageFont

ARIAL_BOLD = "/System/Library/Fonts/Supplemental/Arial Bold.ttf"

def render_text_png(txt, size, hex_color, cache_dir="./text_cache"):
    cache = Path(cache_dir); cache.mkdir(parents=True, exist_ok=True)
    key = hashlib.sha1(f"{txt}|{size}|{hex_color}".encode()).hexdigest()[:16]
    path = cache / f"{key}.png"
    if path.exists():
        return str(path)

    font = ImageFont.truetype(ARIAL_BOLD, size)
    bbox = ImageDraw.Draw(Image.new("RGBA", (1, 1))).textbbox((0, 0), txt, font=font)
    tw, th = bbox[2] - bbox[0], bbox[3] - bbox[1]
    pad = max(20, size // 4)

    img = Image.new("RGBA", (tw + pad * 2, th + pad * 2), (0, 0, 0, 0))
    rgb = tuple(int(hex_color.lstrip("#")[i:i+2], 16) for i in (0, 2, 4))
    ImageDraw.Draw(img).text((pad - bbox[0], pad - bbox[1]), txt, font=font, fill=(*rgb, 255))
    img.save(path)
    return str(path)
The full helper (with kwargs for bold, position, fades, and cleaner ergonomics) is in
examples/quick-spot/build.py
— copy it rather than re-implementing.
关键陷阱: moviepy 2.x的
TextClip(method='label')
存在边界框过紧的bug,会裁剪字母的上升部和下降部(大写字母的顶部、g/p/y的尾部)。在Apple Silicon设备上,你会看到字符边缘被切割,可能数小时都意识不到问题所在。
解决方案: 通过PIL将文本渲染为透明PNG,再作为
ImageClip
加载。通过内容哈希缓存结果,以便重新构建时无需重复渲染。
python
import hashlib
from pathlib import Path
from PIL import Image, ImageDraw, ImageFont

ARIAL_BOLD = "/System/Library/Fonts/Supplemental/Arial Bold.ttf"

def render_text_png(txt, size, hex_color, cache_dir="./text_cache"):
    cache = Path(cache_dir); cache.mkdir(parents=True, exist_ok=True)
    key = hashlib.sha1(f"{txt}|{size}|{hex_color}".encode()).hexdigest()[:16]
    path = cache / f"{key}.png"
    if path.exists():
        return str(path)

    font = ImageFont.truetype(ARIAL_BOLD, size)
    bbox = ImageDraw.Draw(Image.new("RGBA", (1, 1))).textbbox((0, 0), txt, font=font)
    tw, th = bbox[2] - bbox[0], bbox[3] - bbox[1]
    pad = max(20, size // 4)

    img = Image.new("RGBA", (tw + pad * 2, th + pad * 2), (0, 0, 0, 0))
    rgb = tuple(int(hex_color.lstrip("#")[i:i+2], 16) for i in (0, 2, 4))
    ImageDraw.Draw(img).text((pad - bbox[0], pad - bbox[1]), txt, font=font, fill=(*rgb, 255))
    img.save(path)
    return str(path)
完整的辅助函数(包含粗体、位置、淡入淡出等参数,使用更便捷)位于
examples/quick-spot/build.py
中——直接复制使用即可,无需重新实现。

Audio-anchored timeline pattern

音频锚定时间线模式

For ad-style edits where every frame matters, generate per-scene VO first and anchor every visual to known absolute timestamps. This eliminates timing drift entirely. See CLAUDE.md → Video Timing → Audio-Anchored Timelines for the full pattern. The short version:
python
undefined
对于对每一帧都有严格要求的广告风格剪辑,先生成每个场景的旁白,再将所有视觉内容锚定到已知的绝对时间戳。这可完全消除时间漂移。完整模式详见CLAUDE.md → Video Timing → Audio-Anchored Timelines。简化版本如下:
python
undefined

Audio-anchored timeline (25s):

音频锚定时间线(25秒):

Scene 1 tired 0.3 → 3.74 (audio 3.44s)

场景1 tired 0.3 → 3.74 (音频时长3.44秒)

Scene 2 worries 4.0 → 8.88 (audio 4.88s)

场景2 worries 4.0 → 8.88 (音频时长4.88秒)

text_clip("TIRED OF", start=0.5, duration=1.2) text_clip("THIRD-PARTY", start=1.0, duration=1.8) vo_clip("01_tired.mp3", start=0.3) vo_clip("02_worries.mp3", start=4.0)
undefined
text_clip("TIRED OF", start=0.5, duration=1.2) text_clip("THIRD-PARTY", start=1.0, duration=1.8) vo_clip("01_tired.mp3", start=0.3) vo_clip("02_worries.mp3", start=4.0)
undefined

Common recipes

常用配方

Text on a single AI-generated clip

单段AI生成视频上的文本叠加

python
from moviepy import VideoFileClip, ImageClip, CompositeVideoClip

bg = VideoFileClip("ltx_hero.mp4").without_audio()
caption = (
    ImageClip(render_text_png("THE FUTURE OF AGENTS", 140, "#FFFFFF"))
    .with_duration(bg.duration)
    .with_position(("center", 880))
)
CompositeVideoClip([bg, caption], size=bg.size).write_videofile("captioned.mp4", fps=30)
python
from moviepy import VideoFileClip, ImageClip, CompositeVideoClip

bg = VideoFileClip("ltx_hero.mp4").without_audio()
caption = (
    ImageClip(render_text_png("THE FUTURE OF AGENTS", 140, "#FFFFFF"))
    .with_duration(bg.duration)
    .with_position(("center", 880))
)
CompositeVideoClip([bg, caption], size=bg.size).write_videofile("captioned.mp4", fps=30)

Lower third over a SadTalker talking head

SadTalker说话人头像上的下三分之一字幕

python
from moviepy import VideoFileClip, ImageClip, ColorClip, CompositeVideoClip

talking = VideoFileClip("narrator_sadtalker.mp4")
W, H = talking.size
python
from moviepy import VideoFileClip, ImageClip, ColorClip, CompositeVideoClip

talking = VideoFileClip("narrator_sadtalker.mp4")
W, H = talking.size

Semi-transparent bar across the bottom for contrast

底部半透明条增强对比度

bar = ( ColorClip((W, 140), color=(20, 24, 38)) .with_duration(talking.duration) .with_opacity(0.75) .with_position(("center", H - 160)) ) name = ( ImageClip(render_text_png("LUGH", 72, "#F06859")) .with_duration(talking.duration) .with_position((80, H - 150)) ) title = ( ImageClip(render_text_png("Ancient Warrior God", 36, "#FFFFFF")) .with_duration(talking.duration) .with_position((80, H - 80)) ) CompositeVideoClip([talking, bar, name, title]).write_videofile("with_lower_third.mp4", fps=30)
undefined
bar = ( ColorClip((W, 140), color=(20, 24, 38)) .with_duration(talking.duration) .with_opacity(0.75) .with_position(("center", H - 160)) ) name = ( ImageClip(render_text_png("LUGH", 72, "#F06859")) .with_duration(talking.duration) .with_position((80, H - 150)) ) title = ( ImageClip(render_text_png("Ancient Warrior God", 36, "#FFFFFF")) .with_duration(talking.duration) .with_position((80, H - 80)) ) CompositeVideoClip([talking, bar, name, title]).write_videofile("with_lower_third.mp4", fps=30)
undefined

Tinted overlay for text contrast over busy footage

为复杂画面添加着色叠加层增强文本对比度

LTX-2 b-roll is often too visually busy for legible text. Drop a semi-transparent navy layer between the video and the text:
python
from moviepy import ColorClip

tint = (
    ColorClip((W, H), color=(20, 24, 38))
    .with_duration(duration)
    .with_opacity(0.55)
)
LTX-2生成的B-roll画面通常过于复杂,导致文本难以辨认。在视频和文本之间添加半透明深蓝色层:
python
from moviepy import ColorClip

tint = (
    ColorClip((W, H), color=(20, 24, 38))
    .with_duration(duration)
    .with_opacity(0.55)
)

Composite order: bg → tint → text

合成顺序:背景 → 着色层 → 文本

CompositeVideoClip([bg, tint, text_clip])
undefined
CompositeVideoClip([bg, tint, text_clip])
undefined

Side-by-side composite

并排合成

python
from moviepy import VideoFileClip, CompositeVideoClip, ColorClip

left  = VideoFileClip("demo_a.mp4").resized(width=960).with_position((  0, "center"))
right = VideoFileClip("demo_b.mp4").resized(width=960).with_position((960, "center"))
bg    = ColorClip((1920, 1080), color=(0, 0, 0)).with_duration(max(left.duration, right.duration))
CompositeVideoClip([bg, left, right]).write_videofile("split.mp4", fps=30)
python
from moviepy import VideoFileClip, CompositeVideoClip, ColorClip

left  = VideoFileClip("demo_a.mp4").resized(width=960).with_position((  0, "center"))
right = VideoFileClip("demo_b.mp4").resized(width=960).with_position((960, "center"))
bg    = ColorClip((1920, 1080), color=(0, 0, 0)).with_duration(max(left.duration, right.duration))
CompositeVideoClip([bg, left, right]).write_videofile("split.mp4", fps=30)

Mix per-scene VO with ducked music

混合场景旁白与压低音量的背景音乐

python
from moviepy import AudioFileClip, CompositeAudioClip
from moviepy.audio.fx.MultiplyVolume import MultiplyVolume
from moviepy.audio.fx.AudioFadeIn import AudioFadeIn
from moviepy.audio.fx.AudioFadeOut import AudioFadeOut

music = AudioFileClip("music.mp3").with_effects([
    MultiplyVolume(0.22),  # duck under VO
    AudioFadeIn(0.5),
    AudioFadeOut(1.5),
])
vo = [
    AudioFileClip(f"scenes/0{i}.mp3").with_effects([MultiplyVolume(1.15)]).with_start(start)
    for i, start in [(1, 0.3), (2, 4.0), (3, 9.1)]
]
final_audio = CompositeAudioClip([music] + vo)
python
from moviepy import AudioFileClip, CompositeAudioClip
from moviepy.audio.fx.MultiplyVolume import MultiplyVolume
from moviepy.audio.fx.AudioFadeIn import AudioFadeIn
from moviepy.audio.fx.AudioFadeOut import AudioFadeOut

music = AudioFileClip("music.mp3").with_effects([
    MultiplyVolume(0.22),  # 旁白时压低音量
    AudioFadeIn(0.5),
    AudioFadeOut(1.5),
])
vo = [
    AudioFileClip(f"scenes/0{i}.mp3").with_effects([MultiplyVolume(1.15)]).with_start(start)
    for i, start in [(1, 0.3), (2, 4.0), (3, 9.1)]
]
final_audio = CompositeAudioClip([music] + vo)

Gotchas

常见陷阱

  • moviepy 2.x renamed methods. Use
    subclipped
    (not
    subclip
    ),
    with_duration
    /
    with_start
    /
    with_position
    (not
    set_duration
    etc.),
    with_effects([...])
    instead of
    .fadein()
    /
    .fadeout()
    . Many tutorials online still show 1.x syntax — be skeptical.
  • TextClip(method='label')
    clips ascenders/descenders.
    Always use the PIL workaround above.
  • OffthreadVideo
    is Remotion-only.
    moviepy uses
    VideoFileClip
    . Don't mix the two.
  • Resizing requires Pillow ≥ 10.0 for the LANCZOS resample. If you see
    ANTIALIAS
    errors, upgrade Pillow.
  • ColorClip
    takes RGB tuples, not hex strings.
    Use
    (20, 24, 38)
    , not
    "#141826"
    .
  • Audio in
    VideoFileClip
    is loaded by default.
    Call
    .without_audio()
    if you only want the visual — composing with audio you don't want will cause silent VO drops in
    CompositeAudioClip
    .
  • Always set
    size=(W, H)
    on
    CompositeVideoClip
    .
    Without it, output dimensions follow the first clip, which can be smaller than your target.
  • moviepy 2.x重命名了方法。使用
    subclipped
    (而非
    subclip
    )、
    with_duration
    /
    with_start
    /
    with_position
    (而非
    set_duration
    等)、
    with_effects([...])
    替代
    .fadein()
    /
    .fadeout()
    。网上许多教程仍使用1.x语法——需谨慎辨别。
  • TextClip(method='label')
    会裁剪字母的上升部/下降部
    。始终使用上述PIL解决方案。
  • OffthreadVideo
    是Remotion专属
    。moviepy使用
    VideoFileClip
    。不要混用两者。
  • 调整大小需要Pillow ≥ 10.0以支持LANCZOS重采样。如果遇到
    ANTIALIAS
    错误,请升级Pillow。
  • ColorClip
    接受RGB元组,而非十六进制字符串
    。使用
    (20, 24, 38)
    ,而非
    "#141826"
  • VideoFileClip
    默认加载音频
    。如果仅需要视觉内容,请调用
    .without_audio()
    ——合成不需要的音频会导致
    CompositeAudioClip
    中出现无声旁白丢失的问题。
  • 始终为
    CompositeVideoClip
    设置
    size=(W, H)
    。如果不设置,输出尺寸将跟随第一个剪辑,可能小于目标尺寸。

When to reach for what

工具选择指南

TaskTool
Animate a still image
tools/ltx2.py --input
Talking head from photoreal portrait
tools/sadtalker.py
Talking head from stylized character
tools/ltx2.py --input
(see ltx2 skill)
Add a label/caption/lower third to either of the abovemoviepy + PIL (this skill)
Convert / compress / resize an existing file
ffmpeg
(see ffmpeg skill)
Long-form, design-system-driven videoRemotion (see remotion skill)
任务工具
为静态图片添加动画效果
tools/ltx2.py --input
从写实肖像生成说话人头像
tools/sadtalker.py
从风格化角色生成说话人头像
tools/ltx2.py --input
(详见ltx2技能)
为上述任意一种输出添加标签/字幕/下三分之一字幕moviepy + PIL(本技能)
转换/压缩/调整现有文件大小
ffmpeg
(详见ffmpeg技能)
长格式、基于设计系统的视频Remotion(详见remotion技能)

References

参考资料

  • Runnable example — short ad-style spot:
    examples/quick-spot/build.py
  • Runnable example — data-viz with text overlay:
    examples/data-viz-chart/build.py
  • Audio-anchored timelines:
    CLAUDE.md → Video Timing → Audio-Anchored Timelines
  • Related skills:
    ltx2
    ,
    ffmpeg
    ,
    remotion
  • 可运行示例——短广告风格视频:
    examples/quick-spot/build.py
  • 可运行示例——带文本叠加的数据可视化视频:
    examples/data-viz-chart/build.py
  • 音频锚定时间线:
    CLAUDE.md → Video Timing → Audio-Anchored Timelines
  • 相关技能:
    ltx2
    ffmpeg
    remotion