ltx2

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

LTX-2.3 Video Generation

LTX-2.3 视频生成

Generate ~5 second video clips from text prompts or images using the LTX-2.3 22B DiT model. Runs on Modal (A100-80GB). Requires

MODAL_LTX2_ENDPOINT_URL

.env

使用LTX-2.3 22B DiT模型，通过文本提示词或图片生成长度约5秒的视频片段。运行在Modal（A100-80GB）上，需要在

.env

文件中配置

MODAL_LTX2_ENDPOINT_URL

参数。

Quick Reference

快速参考

bash

undefined

bash

undefined

Text-to-video

文本转视频

python3 tools/ltx2.py --prompt "A sunset over the ocean, golden light on waves, cinematic" --output sunset.mp4

Image-to-video (animate a still image)

图片转视频（给静态图片添加动效）

python3 tools/ltx2.py --prompt "Gentle camera drift, soft ambient motion" --input photo.jpg --output animated.mp4

Custom resolution and duration

自定义分辨率和时长

python3 tools/ltx2.py --prompt "..." --width 1024 --height 576 --num-frames 161 --output wide.mp4

Fast mode (fewer steps, quicker)

快速模式（步数更少，生成更快）

python3 tools/ltx2.py --prompt "..." --quality fast --output quick.mp4

Reproducible output

可复现输出

python3 tools/ltx2.py --prompt "..." --seed 42 --output reproducible.mp4

undefined

python3 tools/ltx2.py --prompt "..." --seed 42 --output reproducible.mp4

undefined

Parameters

参数说明

Parameter	Default	Description
`--prompt`	(required)	Text description of the video
`--input`	-	Input image for image-to-video
`--width`	768	Video width (divisible by 64)
`--height`	512	Video height (divisible by 64)
`--num-frames`	121	Frame count, must satisfy `(n-1) % 8 == 0`
`--fps`	24	Frames per second
`--quality`	standard	`standard` (30 steps) or `fast` (15 steps)
`--steps`	30	Override inference steps directly
`--seed`	random	Seed for reproducibility
`--output`	auto	Output file path
`--negative-prompt`	sensible default	What to avoid

参数	默认值	说明
`--prompt`	（必填）	视频的文本描述
`--input`	-	image-to-video模式的输入图片
`--width`	768	视频宽度（必须能被64整除）
`--height`	512	视频高度（必须能被64整除）
`--num-frames`	121	帧数，必须满足 `(n-1) % 8 == 0`
`--fps`	24	每秒帧率
`--quality`	standard	可选 `standard` （30步推理）或 `fast` （15步推理）
`--steps`	30	直接覆盖推理步数配置
`--seed`	随机	用于复现结果的随机种子
`--output`	自动生成	输出文件路径
`--negative-prompt`	合理默认值	需要避免的内容描述

Valid Frame Counts

有效帧数取值

(n - 1) % 8 == 0

: 25 (~1s), 49 (~2s), 73 (~3s), 97 (~4s), 121 (~5s default), 161 (~6.7s), 193 (~8s max practical).

满足

(n - 1) % 8 == 0

的有效值：25（约1秒）、49（约2秒）、73（约3秒）、97（约4秒）、121（默认约5秒）、161（约6.7秒）、193（实用最大时长约8秒）。

Common Resolutions

常用分辨率

Resolution	Ratio	Notes
768x512	3:2	Default, good balance
512x512	1:1	Square, fastest
1024x576	16:9	Widescreen
576x1024	9:16	Portrait/vertical

分辨率	比例	备注
768x512	3:2	默认值，效果与速度平衡
512x512	1:1	正方形，生成速度最快
1024x576	16:9	宽屏
576x1024	9:16	竖屏/移动端适配

Prompting Guide

提示词编写指南

LTX-2 responds well to cinematographic descriptions. Layer these dimensions:

Camera: "Slow dolly forward", "Aerial drone shot", "Tracking shot", "Static wide angle"
Lighting: "Golden hour", "Cinematic lighting", "Neon-lit", "Soft diffused light"
Motion: "Timelapse of...", "Slow motion", "Gentle camera drift", "Gradually transitions"
Style: "Shot on 35mm film", "Documentary style", "Clean minimal aesthetic"
Negative: Always implicitly avoids "worst quality, blurry, jittery, watermark, text, logo"

Keep prompts under 200 words. Be specific about the scene.

LTX-2对电影质感的描述适配性很好，可以从以下维度组合描述：

镜头： "Slow dolly forward（缓慢向前推镜）"、"Aerial drone shot（无人机航拍）"、"Tracking shot（跟拍镜头）"、"Static wide angle（静态广角）"
光线： "Golden hour（黄金时刻）"、"Cinematic lighting（电影级打光）"、"Neon-lit（霓虹灯光）"、"Soft diffused light（柔和漫射光）"
动效： "Timelapse of...（...的延时摄影）"、"Slow motion（慢动作）"、"Gentle camera drift（镜头缓慢漂移）"、"Gradually transitions（渐变过渡）"
风格： "Shot on 35mm film（35mm胶片拍摄）"、"Documentary style（纪录片风格）"、"Clean minimal aesthetic（极简美学）"
负面提示词： 默认会自动规避「最低画质、模糊、抖动、水印、文字、logo」等内容

提示词长度控制在200词以内，场景描述尽量具体。

Good Prompts

优质提示词示例

undefined

undefined

Atmospheric b-roll

氛围感b-roll

"Aerial drone shot slowly flying over turquoise ocean waves breaking on white sand, golden hour sunlight, cinematic"

Product/tech scene

产品/科技场景

"Close-up of hands typing on a mechanical keyboard, shallow depth of field, soft desk lamp lighting, cozy atmosphere"

Abstract background

抽象背景

"Dark moody abstract background with flowing blue light streaks, subtle geometric grid, bokeh particles floating, cinematic tech atmosphere"

Animate a portrait

人像照片动效化

"Professional headshot, subtle natural head movement, confident warm expression, studio lighting, shallow depth of field"

Animate a slide/screenshot

幻灯片/截图动效化

"Gentle subtle particle effects floating across a presentation slide, soft ambient light shifts, very slight camera drift"

undefined

"Gentle subtle particle effects floating across a presentation slide, soft ambient light shifts, very slight camera drift"

undefined

Bad Prompts

劣质提示词示例

undefined

undefined

Too vague

太模糊

"A cool video"

Too many competing ideas

元素过多冲突

"A cat riding a skateboard while juggling fire on the moon during a thunderstorm"

Describing text/UI (model can't render text reliably)

描述文字/UI（模型无法稳定生成可识别的文字）

"A website showing the text 'Welcome to our platform'"

undefined

"A website showing the text 'Welcome to our platform'"

undefined

Video Production Use Cases

视频制作使用场景

B-Roll Clips

B-Roll片段

Generate atmospheric 5s shots for cutaways between narrated scenes:

bash

python3 tools/ltx2.py --prompt "Futuristic holographic interface, glowing data visualizations, clean workspace, cinematic" --output broll_tech.mp4
python3 tools/ltx2.py --prompt "Aerial view of European city at golden hour, modern architecture" --output broll_europe.mp4

生成5秒氛围感镜头，用于旁白场景之间的转场：

bash

python3 tools/ltx2.py --prompt "Futuristic holographic interface, glowing data visualizations, clean workspace, cinematic" --output broll_tech.mp4
python3 tools/ltx2.py --prompt "Aerial view of European city at golden hour, modern architecture" --output broll_europe.mp4

Animated Slide Backgrounds

动态幻灯片背景

Feed a slide screenshot and add subtle motion:

bash

python3 tools/ltx2.py --prompt "Gentle particle effects, soft ambient light shifts, very slight camera drift" --input slide.png --output animated_slide.mp4

传入幻灯片截图，添加细微动效：

bash

python3 tools/ltx2.py --prompt "Gentle particle effects, soft ambient light shifts, very slight camera drift" --input slide.png --output animated_slide.mp4

Animated Portraits

动态人像

Bring still headshots to life:

bash

python3 tools/ltx2.py --prompt "Subtle natural head movement, warm expression, professional lighting" --input headshot.png --output animated_portrait.mp4

让静态头像照片动起来：

bash

python3 tools/ltx2.py --prompt "Subtle natural head movement, warm expression, professional lighting" --input headshot.png --output animated_portrait.mp4

Branded Intro/Outro

品牌片头/片尾

Generate abstract motion backgrounds for title cards:

bash

python3 tools/ltx2.py --prompt "Dark moody background with flowing blue and coral light streaks, bokeh particles, cinematic tech atmosphere, no text" --output intro_bg.mp4

为标题卡生成抽象动态背景：

bash

python3 tools/ltx2.py --prompt "Dark moody background with flowing blue and coral light streaks, bokeh particles, cinematic tech atmosphere, no text" --output intro_bg.mp4

Combining with Other Tools

与其他工具组合使用

LTX-2 generates raw clips. Combine with the rest of the toolkit:

Workflow	Tools
Generate clip → upscale	`ltx2.py` → `upscale.py`
Generate clip → add to Remotion	`ltx2.py` → use as `<OffthreadVideo>` in composition
Generate image → animate	`flux2.py` → `ltx2.py --input`
Generate clip → extract audio	`ltx2.py` → `ffmpeg -i clip.mp4 -vn audio.wav`
Generate clip → add voiceover	`ltx2.py` → mix with `qwen3_tts.py` output

LTX-2生成原始片段后，可以和工具包其他组件配合使用：

工作流	工具
生成片段 → 超分辨率	`ltx2.py` → `upscale.py`
生成片段 → 导入Remotion	`ltx2.py` → 在合成中作为 `<OffthreadVideo>` 使用
生成图片 → 动效化	`flux2.py` → `ltx2.py --input`
生成片段 → 提取音频	`ltx2.py` → `ffmpeg -i clip.mp4 -vn audio.wav`
生成片段 → 添加配音	`ltx2.py` → 与 `qwen3_tts.py` 输出音频混音

Technical Details

技术细节

Model: LTX-2.3 22B DiT (Lightricks), bf16
GPU: A100-80GB on Modal (~$4.68/hr)
Inference: ~2.5 min per clip (768x512, 121 frames, 30 steps)
Cost: ~$0.20-0.25 per 5s clip
Cold start: ~60-90s (loading ~55GB weights)
Output: H.264 MP4 with synchronized ambient audio (24fps)
Max duration: ~8s (193 frames) per clip

模型： LTX-2.3 22B DiT（Lightricks出品），bf16精度
GPU： Modal平台的A100-80GB（约4.68美元/小时）
推理速度： 每个片段约2.5分钟（768x512分辨率、121帧、30步推理）
成本： 每个5秒片段约0.2-0.25美元
冷启动时间： 约60-90秒（需要加载约55GB权重）
输出： H.264编码MP4，带同步环境音（24fps）
单片段最大时长： 约8秒（193帧）

Known Limitations

已知限制

Training data artifacts: ~30% of generations may have unwanted logos/text from training data. Re-run with different
```
--seed
```
.
Text rendering: Cannot reliably generate readable text in video. Use Remotion overlays instead.
Max duration: ~8s per clip. Longer content needs stitching.
Audio: Generated audio is ambient/environmental only. Use voiceover/music tools for speech and music.
License: Community License — free under $10M revenue, commercial license needed above that.

训练数据遗留问题： 约30%的生成结果可能带有训练数据中的非预期logo/文字，更换
```
--seed
```
重新生成即可
文字生成能力： 无法稳定生成可识别的视频文字，可使用Remotion叠加文字层实现
单片段时长限制： 最长约8秒，更长内容需要拼接
音频能力： 生成的音频仅为环境音，需要使用配音/音乐工具添加语音和背景音乐
许可协议： 社区许可——年收入低于1000万美元可免费使用，超过该标准需购买商业许可

Setup

安装配置

bash

undefined

bash

undefined

1. Create Modal secret for HuggingFace (one-time)

1. 创建Modal的HuggingFace密钥（仅需执行一次）

modal secret create huggingface-token HF_TOKEN=hf_your_token

2. Deploy (downloads ~55GB of weights, takes ~10 min)

2. 部署（会下载约55GB权重，耗时约10分钟）

modal deploy docker/modal-ltx2/app.py

3. Save endpoint URL to .env

3. 将端点URL保存到.env文件

echo "MODAL_LTX2_ENDPOINT_URL=https://yourname--video-toolkit-ltx2-ltx2-generate.modal.run" >> .env

4. Test

4. 测试

python3 tools/ltx2.py --prompt "A candle flickering on a dark table, cinematic" --output test.mp4


**Important:** HuggingFace token needs read-access scope. Accept the [Gemma 3 license](https://huggingface.co/google/gemma-3-12b-it-qat-q4_0-unquantized) before deploying. Unauthenticated downloads are severely rate-limited.

python3 tools/ltx2.py --prompt "A candle flickering on a dark table, cinematic" --output test.mp4


**重要提示：** HuggingFace令牌需要具备读取权限，部署前请先同意[Gemma 3许可协议](https://huggingface.co/google/gemma-3-12b-it-qat-q4_0-unquantized)。未认证的下载请求会被严格限流。