ltx2
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseLTX-2.3 Video Generation
LTX-2.3 视频生成
Generate ~5 second video clips from text prompts or images using the LTX-2.3 22B DiT model.
Runs on Modal (A100-80GB). Requires in .
MODAL_LTX2_ENDPOINT_URL.env使用LTX-2.3 22B DiT模型,通过文本提示词或图片生成长度约5秒的视频片段。
运行在Modal(A100-80GB)上,需要在文件中配置参数。
.envMODAL_LTX2_ENDPOINT_URLQuick Reference
快速参考
bash
undefinedbash
undefinedText-to-video
文本转视频
python3 tools/ltx2.py --prompt "A sunset over the ocean, golden light on waves, cinematic" --output sunset.mp4
python3 tools/ltx2.py --prompt "A sunset over the ocean, golden light on waves, cinematic" --output sunset.mp4
Image-to-video (animate a still image)
图片转视频(给静态图片添加动效)
python3 tools/ltx2.py --prompt "Gentle camera drift, soft ambient motion" --input photo.jpg --output animated.mp4
python3 tools/ltx2.py --prompt "Gentle camera drift, soft ambient motion" --input photo.jpg --output animated.mp4
Custom resolution and duration
自定义分辨率和时长
python3 tools/ltx2.py --prompt "..." --width 1024 --height 576 --num-frames 161 --output wide.mp4
python3 tools/ltx2.py --prompt "..." --width 1024 --height 576 --num-frames 161 --output wide.mp4
Fast mode (fewer steps, quicker)
快速模式(步数更少,生成更快)
python3 tools/ltx2.py --prompt "..." --quality fast --output quick.mp4
python3 tools/ltx2.py --prompt "..." --quality fast --output quick.mp4
Reproducible output
可复现输出
python3 tools/ltx2.py --prompt "..." --seed 42 --output reproducible.mp4
undefinedpython3 tools/ltx2.py --prompt "..." --seed 42 --output reproducible.mp4
undefinedParameters
参数说明
| Parameter | Default | Description |
|---|---|---|
| (required) | Text description of the video |
| - | Input image for image-to-video |
| 768 | Video width (divisible by 64) |
| 512 | Video height (divisible by 64) |
| 121 | Frame count, must satisfy |
| 24 | Frames per second |
| standard | |
| 30 | Override inference steps directly |
| random | Seed for reproducibility |
| auto | Output file path |
| sensible default | What to avoid |
| 参数 | 默认值 | 说明 |
|---|---|---|
| (必填) | 视频的文本描述 |
| - | image-to-video模式的输入图片 |
| 768 | 视频宽度(必须能被64整除) |
| 512 | 视频高度(必须能被64整除) |
| 121 | 帧数,必须满足 |
| 24 | 每秒帧率 |
| standard | 可选 |
| 30 | 直接覆盖推理步数配置 |
| 随机 | 用于复现结果的随机种子 |
| 自动生成 | 输出文件路径 |
| 合理默认值 | 需要避免的内容描述 |
Valid Frame Counts
有效帧数取值
(n - 1) % 8 == 0满足的有效值:25(约1秒)、49(约2秒)、73(约3秒)、97(约4秒)、121(默认约5秒)、161(约6.7秒)、193(实用最大时长约8秒)。
(n - 1) % 8 == 0Common Resolutions
常用分辨率
| Resolution | Ratio | Notes |
|---|---|---|
| 768x512 | 3:2 | Default, good balance |
| 512x512 | 1:1 | Square, fastest |
| 1024x576 | 16:9 | Widescreen |
| 576x1024 | 9:16 | Portrait/vertical |
| 分辨率 | 比例 | 备注 |
|---|---|---|
| 768x512 | 3:2 | 默认值,效果与速度平衡 |
| 512x512 | 1:1 | 正方形,生成速度最快 |
| 1024x576 | 16:9 | 宽屏 |
| 576x1024 | 9:16 | 竖屏/移动端适配 |
Prompting Guide
提示词编写指南
LTX-2 responds well to cinematographic descriptions. Layer these dimensions:
- Camera: "Slow dolly forward", "Aerial drone shot", "Tracking shot", "Static wide angle"
- Lighting: "Golden hour", "Cinematic lighting", "Neon-lit", "Soft diffused light"
- Motion: "Timelapse of...", "Slow motion", "Gentle camera drift", "Gradually transitions"
- Style: "Shot on 35mm film", "Documentary style", "Clean minimal aesthetic"
- Negative: Always implicitly avoids "worst quality, blurry, jittery, watermark, text, logo"
Keep prompts under 200 words. Be specific about the scene.
LTX-2对电影质感的描述适配性很好,可以从以下维度组合描述:
- 镜头: "Slow dolly forward(缓慢向前推镜)"、"Aerial drone shot(无人机航拍)"、"Tracking shot(跟拍镜头)"、"Static wide angle(静态广角)"
- 光线: "Golden hour(黄金时刻)"、"Cinematic lighting(电影级打光)"、"Neon-lit(霓虹灯光)"、"Soft diffused light(柔和漫射光)"
- 动效: "Timelapse of...(...的延时摄影)"、"Slow motion(慢动作)"、"Gentle camera drift(镜头缓慢漂移)"、"Gradually transitions(渐变过渡)"
- 风格: "Shot on 35mm film(35mm胶片拍摄)"、"Documentary style(纪录片风格)"、"Clean minimal aesthetic(极简美学)"
- 负面提示词: 默认会自动规避「最低画质、模糊、抖动、水印、文字、logo」等内容
提示词长度控制在200词以内,场景描述尽量具体。
Good Prompts
优质提示词示例
undefinedundefinedAtmospheric b-roll
氛围感b-roll
"Aerial drone shot slowly flying over turquoise ocean waves breaking on white sand, golden hour sunlight, cinematic"
"Aerial drone shot slowly flying over turquoise ocean waves breaking on white sand, golden hour sunlight, cinematic"
Product/tech scene
产品/科技场景
"Close-up of hands typing on a mechanical keyboard, shallow depth of field, soft desk lamp lighting, cozy atmosphere"
"Close-up of hands typing on a mechanical keyboard, shallow depth of field, soft desk lamp lighting, cozy atmosphere"
Abstract background
抽象背景
"Dark moody abstract background with flowing blue light streaks, subtle geometric grid, bokeh particles floating, cinematic tech atmosphere"
"Dark moody abstract background with flowing blue light streaks, subtle geometric grid, bokeh particles floating, cinematic tech atmosphere"
Animate a portrait
人像照片动效化
"Professional headshot, subtle natural head movement, confident warm expression, studio lighting, shallow depth of field"
"Professional headshot, subtle natural head movement, confident warm expression, studio lighting, shallow depth of field"
Animate a slide/screenshot
幻灯片/截图动效化
"Gentle subtle particle effects floating across a presentation slide, soft ambient light shifts, very slight camera drift"
undefined"Gentle subtle particle effects floating across a presentation slide, soft ambient light shifts, very slight camera drift"
undefinedBad Prompts
劣质提示词示例
undefinedundefinedToo vague
太模糊
"A cool video"
"A cool video"
Too many competing ideas
元素过多冲突
"A cat riding a skateboard while juggling fire on the moon during a thunderstorm"
"A cat riding a skateboard while juggling fire on the moon during a thunderstorm"
Describing text/UI (model can't render text reliably)
描述文字/UI(模型无法稳定生成可识别的文字)
"A website showing the text 'Welcome to our platform'"
undefined"A website showing the text 'Welcome to our platform'"
undefinedVideo Production Use Cases
视频制作使用场景
B-Roll Clips
B-Roll片段
Generate atmospheric 5s shots for cutaways between narrated scenes:
bash
python3 tools/ltx2.py --prompt "Futuristic holographic interface, glowing data visualizations, clean workspace, cinematic" --output broll_tech.mp4
python3 tools/ltx2.py --prompt "Aerial view of European city at golden hour, modern architecture" --output broll_europe.mp4生成5秒氛围感镜头,用于旁白场景之间的转场:
bash
python3 tools/ltx2.py --prompt "Futuristic holographic interface, glowing data visualizations, clean workspace, cinematic" --output broll_tech.mp4
python3 tools/ltx2.py --prompt "Aerial view of European city at golden hour, modern architecture" --output broll_europe.mp4Animated Slide Backgrounds
动态幻灯片背景
Feed a slide screenshot and add subtle motion:
bash
python3 tools/ltx2.py --prompt "Gentle particle effects, soft ambient light shifts, very slight camera drift" --input slide.png --output animated_slide.mp4传入幻灯片截图,添加细微动效:
bash
python3 tools/ltx2.py --prompt "Gentle particle effects, soft ambient light shifts, very slight camera drift" --input slide.png --output animated_slide.mp4Animated Portraits
动态人像
Bring still headshots to life:
bash
python3 tools/ltx2.py --prompt "Subtle natural head movement, warm expression, professional lighting" --input headshot.png --output animated_portrait.mp4让静态头像照片动起来:
bash
python3 tools/ltx2.py --prompt "Subtle natural head movement, warm expression, professional lighting" --input headshot.png --output animated_portrait.mp4Branded Intro/Outro
品牌片头/片尾
Generate abstract motion backgrounds for title cards:
bash
python3 tools/ltx2.py --prompt "Dark moody background with flowing blue and coral light streaks, bokeh particles, cinematic tech atmosphere, no text" --output intro_bg.mp4为标题卡生成抽象动态背景:
bash
python3 tools/ltx2.py --prompt "Dark moody background with flowing blue and coral light streaks, bokeh particles, cinematic tech atmosphere, no text" --output intro_bg.mp4Combining with Other Tools
与其他工具组合使用
LTX-2 generates raw clips. Combine with the rest of the toolkit:
| Workflow | Tools |
|---|---|
| Generate clip → upscale | |
| Generate clip → add to Remotion | |
| Generate image → animate | |
| Generate clip → extract audio | |
| Generate clip → add voiceover | |
LTX-2生成原始片段后,可以和工具包其他组件配合使用:
| 工作流 | 工具 |
|---|---|
| 生成片段 → 超分辨率 | |
| 生成片段 → 导入Remotion | |
| 生成图片 → 动效化 | |
| 生成片段 → 提取音频 | |
| 生成片段 → 添加配音 | |
Technical Details
技术细节
- Model: LTX-2.3 22B DiT (Lightricks), bf16
- GPU: A100-80GB on Modal (~$4.68/hr)
- Inference: ~2.5 min per clip (768x512, 121 frames, 30 steps)
- Cost: ~$0.20-0.25 per 5s clip
- Cold start: ~60-90s (loading ~55GB weights)
- Output: H.264 MP4 with synchronized ambient audio (24fps)
- Max duration: ~8s (193 frames) per clip
- 模型: LTX-2.3 22B DiT(Lightricks出品),bf16精度
- GPU: Modal平台的A100-80GB(约4.68美元/小时)
- 推理速度: 每个片段约2.5分钟(768x512分辨率、121帧、30步推理)
- 成本: 每个5秒片段约0.2-0.25美元
- 冷启动时间: 约60-90秒(需要加载约55GB权重)
- 输出: H.264编码MP4,带同步环境音(24fps)
- 单片段最大时长: 约8秒(193帧)
Known Limitations
已知限制
- Training data artifacts: ~30% of generations may have unwanted logos/text from training data. Re-run with different .
--seed - Text rendering: Cannot reliably generate readable text in video. Use Remotion overlays instead.
- Max duration: ~8s per clip. Longer content needs stitching.
- Audio: Generated audio is ambient/environmental only. Use voiceover/music tools for speech and music.
- License: Community License — free under $10M revenue, commercial license needed above that.
- 训练数据遗留问题: 约30%的生成结果可能带有训练数据中的非预期logo/文字,更换重新生成即可
--seed - 文字生成能力: 无法稳定生成可识别的视频文字,可使用Remotion叠加文字层实现
- 单片段时长限制: 最长约8秒,更长内容需要拼接
- 音频能力: 生成的音频仅为环境音,需要使用配音/音乐工具添加语音和背景音乐
- 许可协议: 社区许可——年收入低于1000万美元可免费使用,超过该标准需购买商业许可
Setup
安装配置
bash
undefinedbash
undefined1. Create Modal secret for HuggingFace (one-time)
1. 创建Modal的HuggingFace密钥(仅需执行一次)
modal secret create huggingface-token HF_TOKEN=hf_your_token
modal secret create huggingface-token HF_TOKEN=hf_your_token
2. Deploy (downloads ~55GB of weights, takes ~10 min)
2. 部署(会下载约55GB权重,耗时约10分钟)
modal deploy docker/modal-ltx2/app.py
modal deploy docker/modal-ltx2/app.py
3. Save endpoint URL to .env
3. 将端点URL保存到.env文件
echo "MODAL_LTX2_ENDPOINT_URL=https://yourname--video-toolkit-ltx2-ltx2-generate.modal.run" >> .env
echo "MODAL_LTX2_ENDPOINT_URL=https://yourname--video-toolkit-ltx2-ltx2-generate.modal.run" >> .env
4. Test
4. 测试
python3 tools/ltx2.py --prompt "A candle flickering on a dark table, cinematic" --output test.mp4
**Important:** HuggingFace token needs read-access scope. Accept the [Gemma 3 license](https://huggingface.co/google/gemma-3-12b-it-qat-q4_0-unquantized) before deploying. Unauthenticated downloads are severely rate-limited.python3 tools/ltx2.py --prompt "A candle flickering on a dark table, cinematic" --output test.mp4
**重要提示:** HuggingFace令牌需要具备读取权限,部署前请先同意[Gemma 3许可协议](https://huggingface.co/google/gemma-3-12b-it-qat-q4_0-unquantized)。未认证的下载请求会被严格限流。