ai-avatar-video
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAI Avatar & Talking Head Videos
AI头像与会说话的头部视频
Create AI avatars and talking head videos via inference.sh CLI.

通过inference.sh CLI创建AI头像和会说话的头部视频。

Quick Start
快速开始
bash
curl -fsSL https://cli.inference.sh | sh && infsh loginbash
curl -fsSL https://cli.inference.sh | sh && infsh loginCreate avatar video from image + audio
从图片+音频创建头像视频
infsh app run bytedance/omnihuman-1-5 --input '{
"image_url": "https://portrait.jpg",
"audio_url": "https://speech.mp3"
}'
undefinedinfsh app run bytedance/omnihuman-1-5 --input '{
"image_url": "https://portrait.jpg",
"audio_url": "https://speech.mp3"
}'
undefinedAvailable Models
可用模型
| Model | App ID | Best For |
|---|---|---|
| OmniHuman 1.5 | | Multi-character, best quality |
| OmniHuman 1.0 | | Single character |
| Fabric 1.0 | | Image talks with lipsync |
| PixVerse Lipsync | | Highly realistic |
| 模型 | App ID | 最佳适用场景 |
|---|---|---|
| OmniHuman 1.5 | | 多角色,画质最佳 |
| OmniHuman 1.0 | | 单角色 |
| Fabric 1.0 | | 图片唇形同步说话 |
| PixVerse Lipsync | | 超真实效果 |
Search Avatar Apps
搜索头像应用
bash
infsh app list --search "omnihuman"
infsh app list --search "lipsync"
infsh app list --search "fabric"bash
infsh app list --search "omnihuman"
infsh app list --search "lipsync"
infsh app list --search "fabric"Examples
示例
OmniHuman 1.5 (Multi-Character)
OmniHuman 1.5(多角色)
bash
infsh app run bytedance/omnihuman-1-5 --input '{
"image_url": "https://portrait.jpg",
"audio_url": "https://speech.mp3"
}'Supports specifying which character to drive in multi-person images.
bash
infsh app run bytedance/omnihuman-1-5 --input '{
"image_url": "https://portrait.jpg",
"audio_url": "https://speech.mp3"
}'支持在多人图片中指定要驱动的角色。
Fabric 1.0 (Image Talks)
Fabric 1.0(图片说话)
bash
infsh app run falai/fabric-1-0 --input '{
"image_url": "https://face.jpg",
"audio_url": "https://audio.mp3"
}'bash
infsh app run falai/fabric-1-0 --input '{
"image_url": "https://face.jpg",
"audio_url": "https://audio.mp3"
}'PixVerse Lipsync
PixVerse Lipsync
bash
infsh app run falai/pixverse-lipsync --input '{
"image_url": "https://portrait.jpg",
"audio_url": "https://speech.mp3"
}'Generates highly realistic lipsync from any audio.
bash
infsh app run falai/pixverse-lipsync --input '{
"image_url": "https://portrait.jpg",
"audio_url": "https://speech.mp3"
}'可通过任意音频生成超真实的唇形同步效果。
Full Workflow: TTS + Avatar
完整工作流:文本转语音 + 头像生成
bash
undefinedbash
undefined1. Generate speech from text
1. 从文本生成语音
infsh app run infsh/kokoro-tts --input '{
"text": "Welcome to our product demo. Today I will show you..."
}' > speech.json
infsh app run infsh/kokoro-tts --input '{
"text": "欢迎观看我们的产品演示,今天我将为大家展示..."
}' > speech.json
2. Create avatar video with the speech
2. 用生成的语音创建头像视频
infsh app run bytedance/omnihuman-1-5 --input '{
"image_url": "https://presenter-photo.jpg",
"audio_url": "<audio-url-from-step-1>"
}'
undefinedinfsh app run bytedance/omnihuman-1-5 --input '{
"image_url": "https://presenter-photo.jpg",
"audio_url": "<步骤1生成的音频链接>"
}'
undefinedFull Workflow: Dub Video in Another Language
完整工作流:视频多语言配音
bash
undefinedbash
undefined1. Transcribe original video
1. 转录原视频音频
infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://video.mp4"}' > transcript.json
infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://video.mp4"}' > transcript.json
2. Translate text (manually or with an LLM)
2. 翻译文本(手动或借助大语言模型)
3. Generate speech in new language
3. 生成目标语言的语音
infsh app run infsh/kokoro-tts --input '{"text": "<translated-text>"}' > new_speech.json
infsh app run infsh/kokoro-tts --input '{"text": "<翻译后的文本>"}' > new_speech.json
4. Lipsync the original video with new audio
4. 将原视频与新音频进行唇形同步
infsh app run infsh/latentsync-1-6 --input '{
"video_url": "https://original-video.mp4",
"audio_url": "<new-audio-url>"
}'
undefinedinfsh app run infsh/latentsync-1-6 --input '{
"video_url": "https://original-video.mp4",
"audio_url": "<新音频链接>"
}'
undefinedUse Cases
适用场景
- Marketing: Product demos with AI presenter
- Education: Course videos, explainers
- Localization: Dub content in multiple languages
- Social Media: Consistent virtual influencer
- Corporate: Training videos, announcements
- 营销:AI主持人产品演示
- 教育:课程视频、讲解视频
- 本地化:多语言内容配音
- 社交媒体:风格统一的虚拟网红
- 企业:培训视频、公告视频
Tips
小贴士
- Use high-quality portrait photos (front-facing, good lighting)
- Audio should be clear with minimal background noise
- OmniHuman 1.5 supports multiple people in one image
- LatentSync is best for syncing existing videos to new audio
- 使用高质量正面肖像照(光线充足)
- 音频需清晰,背景噪音尽可能小
- OmniHuman 1.5支持单图多角色
- LatentSync最适合将现有视频与新音频同步
Related Skills
相关技能
bash
undefinedbash
undefinedFull platform skill (all 150+ apps)
全平台技能(包含150+应用)
npx skills add inference-sh/skills@inference-sh
npx skills add inference-sh/skills@inference-sh
Text-to-speech (generate audio for avatars)
文本转语音(为头像生成音频)
npx skills add inference-sh/skills@text-to-speech
npx skills add inference-sh/skills@text-to-speech
Speech-to-text (transcribe for dubbing)
语音转文本(用于配音转录)
npx skills add inference-sh/skills@speech-to-text
npx skills add inference-sh/skills@speech-to-text
Video generation
视频生成
npx skills add inference-sh/skills@ai-video-generation
npx skills add inference-sh/skills@ai-video-generation
Image generation (create avatar images)
图像生成(创建头像图片)
npx skills add inference-sh/skills@ai-image-generation
Browse all video apps: `infsh app list --category video`npx skills add inference-sh/skills@ai-image-generation
浏览所有视频应用:`infsh app list --category video`Documentation
文档
- Running Apps - How to run apps via CLI
- Content Pipeline Example - Building media workflows
- Streaming Results - Real-time progress updates