ltx2

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

LTX-2.3 Video Generation

LTX-2.3 视频生成

Generate ~5 second video clips from text prompts or images using the LTX-2.3 22B DiT model. Runs on Modal (A100-80GB). Requires
MODAL_LTX2_ENDPOINT_URL
in
.env
.
使用LTX-2.3 22B DiT模型,通过文本提示词或图片生成长度约5秒的视频片段。 运行在Modal(A100-80GB)上,需要在
.env
文件中配置
MODAL_LTX2_ENDPOINT_URL
参数。

Quick Reference

快速参考

bash
undefined
bash
undefined

Text-to-video

文本转视频

python3 tools/ltx2.py --prompt "A sunset over the ocean, golden light on waves, cinematic" --output sunset.mp4
python3 tools/ltx2.py --prompt "A sunset over the ocean, golden light on waves, cinematic" --output sunset.mp4

Image-to-video (animate a still image)

图片转视频(给静态图片添加动效)

python3 tools/ltx2.py --prompt "Gentle camera drift, soft ambient motion" --input photo.jpg --output animated.mp4
python3 tools/ltx2.py --prompt "Gentle camera drift, soft ambient motion" --input photo.jpg --output animated.mp4

Custom resolution and duration

自定义分辨率和时长

python3 tools/ltx2.py --prompt "..." --width 1024 --height 576 --num-frames 161 --output wide.mp4
python3 tools/ltx2.py --prompt "..." --width 1024 --height 576 --num-frames 161 --output wide.mp4

Fast mode (fewer steps, quicker)

快速模式(步数更少,生成更快)

python3 tools/ltx2.py --prompt "..." --quality fast --output quick.mp4
python3 tools/ltx2.py --prompt "..." --quality fast --output quick.mp4

Reproducible output

可复现输出

python3 tools/ltx2.py --prompt "..." --seed 42 --output reproducible.mp4
undefined
python3 tools/ltx2.py --prompt "..." --seed 42 --output reproducible.mp4
undefined

Parameters

参数说明

ParameterDefaultDescription
--prompt
(required)Text description of the video
--input
-Input image for image-to-video
--width
768Video width (divisible by 64)
--height
512Video height (divisible by 64)
--num-frames
121Frame count, must satisfy
(n-1) % 8 == 0
--fps
24Frames per second
--quality
standard
standard
(30 steps) or
fast
(15 steps)
--steps
30Override inference steps directly
--seed
randomSeed for reproducibility
--output
autoOutput file path
--negative-prompt
sensible defaultWhat to avoid
参数默认值说明
--prompt
(必填)视频的文本描述
--input
-image-to-video模式的输入图片
--width
768视频宽度(必须能被64整除)
--height
512视频高度(必须能被64整除)
--num-frames
121帧数,必须满足
(n-1) % 8 == 0
--fps
24每秒帧率
--quality
standard可选
standard
(30步推理)或
fast
(15步推理)
--steps
30直接覆盖推理步数配置
--seed
随机用于复现结果的随机种子
--output
自动生成输出文件路径
--negative-prompt
合理默认值需要避免的内容描述

Valid Frame Counts

有效帧数取值

(n - 1) % 8 == 0
: 25 (~1s), 49 (~2s), 73 (~3s), 97 (~4s), 121 (~5s default), 161 (~6.7s), 193 (~8s max practical).
满足
(n - 1) % 8 == 0
的有效值:25(约1秒)、49(约2秒)、73(约3秒)、97(约4秒)、121(默认约5秒)、161(约6.7秒)、193(实用最大时长约8秒)。

Common Resolutions

常用分辨率

ResolutionRatioNotes
768x5123:2Default, good balance
512x5121:1Square, fastest
1024x57616:9Widescreen
576x10249:16Portrait/vertical
分辨率比例备注
768x5123:2默认值,效果与速度平衡
512x5121:1正方形,生成速度最快
1024x57616:9宽屏
576x10249:16竖屏/移动端适配

Prompting Guide

提示词编写指南

LTX-2 responds well to cinematographic descriptions. Layer these dimensions:
  • Camera: "Slow dolly forward", "Aerial drone shot", "Tracking shot", "Static wide angle"
  • Lighting: "Golden hour", "Cinematic lighting", "Neon-lit", "Soft diffused light"
  • Motion: "Timelapse of...", "Slow motion", "Gentle camera drift", "Gradually transitions"
  • Style: "Shot on 35mm film", "Documentary style", "Clean minimal aesthetic"
  • Negative: Always implicitly avoids "worst quality, blurry, jittery, watermark, text, logo"
Keep prompts under 200 words. Be specific about the scene.
LTX-2对电影质感的描述适配性很好,可以从以下维度组合描述:
  • 镜头: "Slow dolly forward(缓慢向前推镜)"、"Aerial drone shot(无人机航拍)"、"Tracking shot(跟拍镜头)"、"Static wide angle(静态广角)"
  • 光线: "Golden hour(黄金时刻)"、"Cinematic lighting(电影级打光)"、"Neon-lit(霓虹灯光)"、"Soft diffused light(柔和漫射光)"
  • 动效: "Timelapse of...(...的延时摄影)"、"Slow motion(慢动作)"、"Gentle camera drift(镜头缓慢漂移)"、"Gradually transitions(渐变过渡)"
  • 风格: "Shot on 35mm film(35mm胶片拍摄)"、"Documentary style(纪录片风格)"、"Clean minimal aesthetic(极简美学)"
  • 负面提示词: 默认会自动规避「最低画质、模糊、抖动、水印、文字、logo」等内容
提示词长度控制在200词以内,场景描述尽量具体。

Good Prompts

优质提示词示例

undefined
undefined

Atmospheric b-roll

氛围感b-roll

"Aerial drone shot slowly flying over turquoise ocean waves breaking on white sand, golden hour sunlight, cinematic"
"Aerial drone shot slowly flying over turquoise ocean waves breaking on white sand, golden hour sunlight, cinematic"

Product/tech scene

产品/科技场景

"Close-up of hands typing on a mechanical keyboard, shallow depth of field, soft desk lamp lighting, cozy atmosphere"
"Close-up of hands typing on a mechanical keyboard, shallow depth of field, soft desk lamp lighting, cozy atmosphere"

Abstract background

抽象背景

"Dark moody abstract background with flowing blue light streaks, subtle geometric grid, bokeh particles floating, cinematic tech atmosphere"
"Dark moody abstract background with flowing blue light streaks, subtle geometric grid, bokeh particles floating, cinematic tech atmosphere"

Animate a portrait

人像照片动效化

"Professional headshot, subtle natural head movement, confident warm expression, studio lighting, shallow depth of field"
"Professional headshot, subtle natural head movement, confident warm expression, studio lighting, shallow depth of field"

Animate a slide/screenshot

幻灯片/截图动效化

"Gentle subtle particle effects floating across a presentation slide, soft ambient light shifts, very slight camera drift"
undefined
"Gentle subtle particle effects floating across a presentation slide, soft ambient light shifts, very slight camera drift"
undefined

Bad Prompts

劣质提示词示例

undefined
undefined

Too vague

太模糊

"A cool video"
"A cool video"

Too many competing ideas

元素过多冲突

"A cat riding a skateboard while juggling fire on the moon during a thunderstorm"
"A cat riding a skateboard while juggling fire on the moon during a thunderstorm"

Describing text/UI (model can't render text reliably)

描述文字/UI(模型无法稳定生成可识别的文字)

"A website showing the text 'Welcome to our platform'"
undefined
"A website showing the text 'Welcome to our platform'"
undefined

Video Production Use Cases

视频制作使用场景

B-Roll Clips

B-Roll片段

Generate atmospheric 5s shots for cutaways between narrated scenes:
bash
python3 tools/ltx2.py --prompt "Futuristic holographic interface, glowing data visualizations, clean workspace, cinematic" --output broll_tech.mp4
python3 tools/ltx2.py --prompt "Aerial view of European city at golden hour, modern architecture" --output broll_europe.mp4
生成5秒氛围感镜头,用于旁白场景之间的转场:
bash
python3 tools/ltx2.py --prompt "Futuristic holographic interface, glowing data visualizations, clean workspace, cinematic" --output broll_tech.mp4
python3 tools/ltx2.py --prompt "Aerial view of European city at golden hour, modern architecture" --output broll_europe.mp4

Animated Slide Backgrounds

动态幻灯片背景

Feed a slide screenshot and add subtle motion:
bash
python3 tools/ltx2.py --prompt "Gentle particle effects, soft ambient light shifts, very slight camera drift" --input slide.png --output animated_slide.mp4
传入幻灯片截图,添加细微动效:
bash
python3 tools/ltx2.py --prompt "Gentle particle effects, soft ambient light shifts, very slight camera drift" --input slide.png --output animated_slide.mp4

Animated Portraits

动态人像

Bring still headshots to life:
bash
python3 tools/ltx2.py --prompt "Subtle natural head movement, warm expression, professional lighting" --input headshot.png --output animated_portrait.mp4
让静态头像照片动起来:
bash
python3 tools/ltx2.py --prompt "Subtle natural head movement, warm expression, professional lighting" --input headshot.png --output animated_portrait.mp4

Branded Intro/Outro

品牌片头/片尾

Generate abstract motion backgrounds for title cards:
bash
python3 tools/ltx2.py --prompt "Dark moody background with flowing blue and coral light streaks, bokeh particles, cinematic tech atmosphere, no text" --output intro_bg.mp4
为标题卡生成抽象动态背景:
bash
python3 tools/ltx2.py --prompt "Dark moody background with flowing blue and coral light streaks, bokeh particles, cinematic tech atmosphere, no text" --output intro_bg.mp4

Combining with Other Tools

与其他工具组合使用

LTX-2 generates raw clips. Combine with the rest of the toolkit:
WorkflowTools
Generate clip → upscale
ltx2.py
upscale.py
Generate clip → add to Remotion
ltx2.py
→ use as
<OffthreadVideo>
in composition
Generate image → animate
flux2.py
ltx2.py --input
Generate clip → extract audio
ltx2.py
ffmpeg -i clip.mp4 -vn audio.wav
Generate clip → add voiceover
ltx2.py
→ mix with
qwen3_tts.py
output
LTX-2生成原始片段后,可以和工具包其他组件配合使用:
工作流工具
生成片段 → 超分辨率
ltx2.py
upscale.py
生成片段 → 导入Remotion
ltx2.py
→ 在合成中作为
<OffthreadVideo>
使用
生成图片 → 动效化
flux2.py
ltx2.py --input
生成片段 → 提取音频
ltx2.py
ffmpeg -i clip.mp4 -vn audio.wav
生成片段 → 添加配音
ltx2.py
→ 与
qwen3_tts.py
输出音频混音

Technical Details

技术细节

  • Model: LTX-2.3 22B DiT (Lightricks), bf16
  • GPU: A100-80GB on Modal (~$4.68/hr)
  • Inference: ~2.5 min per clip (768x512, 121 frames, 30 steps)
  • Cost: ~$0.20-0.25 per 5s clip
  • Cold start: ~60-90s (loading ~55GB weights)
  • Output: H.264 MP4 with synchronized ambient audio (24fps)
  • Max duration: ~8s (193 frames) per clip
  • 模型: LTX-2.3 22B DiT(Lightricks出品),bf16精度
  • GPU: Modal平台的A100-80GB(约4.68美元/小时)
  • 推理速度: 每个片段约2.5分钟(768x512分辨率、121帧、30步推理)
  • 成本: 每个5秒片段约0.2-0.25美元
  • 冷启动时间: 约60-90秒(需要加载约55GB权重)
  • 输出: H.264编码MP4,带同步环境音(24fps)
  • 单片段最大时长: 约8秒(193帧)

Known Limitations

已知限制

  • Training data artifacts: ~30% of generations may have unwanted logos/text from training data. Re-run with different
    --seed
    .
  • Text rendering: Cannot reliably generate readable text in video. Use Remotion overlays instead.
  • Max duration: ~8s per clip. Longer content needs stitching.
  • Audio: Generated audio is ambient/environmental only. Use voiceover/music tools for speech and music.
  • License: Community License — free under $10M revenue, commercial license needed above that.
  • 训练数据遗留问题: 约30%的生成结果可能带有训练数据中的非预期logo/文字,更换
    --seed
    重新生成即可
  • 文字生成能力: 无法稳定生成可识别的视频文字,可使用Remotion叠加文字层实现
  • 单片段时长限制: 最长约8秒,更长内容需要拼接
  • 音频能力: 生成的音频仅为环境音,需要使用配音/音乐工具添加语音和背景音乐
  • 许可协议: 社区许可——年收入低于1000万美元可免费使用,超过该标准需购买商业许可

Setup

安装配置

bash
undefined
bash
undefined

1. Create Modal secret for HuggingFace (one-time)

1. 创建Modal的HuggingFace密钥(仅需执行一次)

modal secret create huggingface-token HF_TOKEN=hf_your_token
modal secret create huggingface-token HF_TOKEN=hf_your_token

2. Deploy (downloads ~55GB of weights, takes ~10 min)

2. 部署(会下载约55GB权重,耗时约10分钟)

modal deploy docker/modal-ltx2/app.py
modal deploy docker/modal-ltx2/app.py

3. Save endpoint URL to .env

3. 将端点URL保存到.env文件

4. Test

4. 测试

python3 tools/ltx2.py --prompt "A candle flickering on a dark table, cinematic" --output test.mp4

**Important:** HuggingFace token needs read-access scope. Accept the [Gemma 3 license](https://huggingface.co/google/gemma-3-12b-it-qat-q4_0-unquantized) before deploying. Unauthenticated downloads are severely rate-limited.
python3 tools/ltx2.py --prompt "A candle flickering on a dark table, cinematic" --output test.mp4

**重要提示:** HuggingFace令牌需要具备读取权限,部署前请先同意[Gemma 3许可协议](https://huggingface.co/google/gemma-3-12b-it-qat-q4_0-unquantized)。未认证的下载请求会被严格限流。