ai-avatar-video

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

AI Avatar & Talking Head Videos

AI虚拟形象与会说话的头部视频

Create AI avatars and talking head videos via inference.sh CLI.

通过inference.sh CLI工具创建AI虚拟形象和会说话的头部视频。

Quick Start

快速开始

bash

curl -fsSL https://cli.inference.sh | sh && infsh login

bash

curl -fsSL https://cli.inference.sh | sh && infsh login

Create avatar video from image + audio

从图片+音频创建虚拟形象视频

infsh app run bytedance/omnihuman-1-5 --input '{ "image_url": "https://portrait.jpg", "audio_url": "https://speech.mp3" }'

undefined

infsh app run bytedance/omnihuman-1-5 --input '{ "image_url": "https://portrait.jpg", "audio_url": "https://speech.mp3" }'

undefined

Available Models

可用模型

Model	App ID	Best For
OmniHuman 1.5	`bytedance/omnihuman-1-5`	Multi-character, best quality
OmniHuman 1.0	`bytedance/omnihuman-1-0`	Single character
Fabric 1.0	`falai/fabric-1-0`	Image talks with lipsync
PixVerse Lipsync	`falai/pixverse-lipsync`	Highly realistic

模型	应用ID	最佳适用场景
OmniHuman 1.5	`bytedance/omnihuman-1-5`	多角色，画质最佳
OmniHuman 1.0	`bytedance/omnihuman-1-0`	单角色
Fabric 1.0	`falai/fabric-1-0`	图片动嘴说话（唇形同步）
PixVerse Lipsync	`falai/pixverse-lipsync`	高度写实

Search Avatar Apps

搜索虚拟形象应用

bash

infsh app list --search "omnihuman"
infsh app list --search "lipsync"
infsh app list --search "fabric"

bash

infsh app list --search "omnihuman"
infsh app list --search "lipsync"
infsh app list --search "fabric"

Examples

示例

OmniHuman 1.5 (Multi-Character)

OmniHuman 1.5（多角色）

bash

infsh app run bytedance/omnihuman-1-5 --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "https://speech.mp3"
}'

Supports specifying which character to drive in multi-person images.

bash

infsh app run bytedance/omnihuman-1-5 --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "https://speech.mp3"
}'

支持指定驱动多人图片中的某个角色。

Fabric 1.0 (Image Talks)

Fabric 1.0（图片动嘴说话）

bash

infsh app run falai/fabric-1-0 --input '{
  "image_url": "https://face.jpg",
  "audio_url": "https://audio.mp3"
}'

bash

infsh app run falai/fabric-1-0 --input '{
  "image_url": "https://face.jpg",
  "audio_url": "https://audio.mp3"
}'

PixVerse Lipsync

bash

infsh app run falai/pixverse-lipsync --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "https://speech.mp3"
}'

Generates highly realistic lipsync from any audio.

bash

infsh app run falai/pixverse-lipsync --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "https://speech.mp3"
}'

通过任意音频生成高度写实的唇形同步效果。

Full Workflow: TTS + Avatar

完整工作流：TTS + 虚拟形象

bash

undefined

bash

undefined

1. Generate speech from text

1. 从文本生成语音

infsh app run infsh/kokoro-tts --input '{ "text": "Welcome to our product demo. Today I will show you..." }' > speech.json

2. Create avatar video with the speech

2. 用生成的语音创建虚拟形象视频

infsh app run bytedance/omnihuman-1-5 --input '{ "image_url": "https://presenter-photo.jpg", "audio_url": "<audio-url-from-step-1>" }'

undefined

infsh app run bytedance/omnihuman-1-5 --input '{ "image_url": "https://presenter-photo.jpg", "audio_url": "<audio-url-from-step-1>" }'

undefined

Full Workflow: Dub Video in Another Language

完整工作流：为视频添加其他语言配音

bash

undefined

bash

undefined

1. Transcribe original video

1. 转录原视频音频

infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://video.mp4"}' > transcript.json

2. Translate text (manually or with an LLM)

2. 翻译文本（手动或借助大语言模型）

3. Generate speech in new language

3. 生成新语言的语音

infsh app run infsh/kokoro-tts --input '{"text": "<translated-text>"}' > new_speech.json

4. Lipsync the original video with new audio

4. 为原视频匹配新音频的唇形

infsh app run infsh/latentsync-1-6 --input '{ "video_url": "https://original-video.mp4", "audio_url": "<new-audio-url>" }'

undefined

infsh app run infsh/latentsync-1-6 --input '{ "video_url": "https://original-video.mp4", "audio_url": "<new-audio-url>" }'

undefined

Use Cases

适用场景

Marketing: Product demos with AI presenter
Education: Course videos, explainers
Localization: Dub content in multiple languages
Social Media: Consistent virtual influencer
Corporate: Training videos, announcements

营销：使用AI主持人的产品演示
教育：课程视频、讲解视频
本地化：为内容添加多语言配音
社交媒体：打造风格统一的虚拟网红
企业：培训视频、公告视频

Tips

小贴士

Use high-quality portrait photos (front-facing, good lighting)
Audio should be clear with minimal background noise
OmniHuman 1.5 supports multiple people in one image
LatentSync is best for syncing existing videos to new audio

使用高质量的正面肖像照片（光线良好）
音频应清晰，背景噪音尽可能小
OmniHuman 1.5支持单张图片中的多角色
LatentSync最适合为现有视频匹配新音频的唇形

Related Skills

Full platform skill (all 150+ apps)

全平台技能（包含150+应用）

npx skills add inference-sh/skills@inference-sh

Text-to-speech (generate audio for avatars)

文本转语音（为虚拟形象生成音频）

npx skills add inference-sh/skills@text-to-speech

Speech-to-text (transcribe for dubbing)

语音转文本（为配音转录内容）

npx skills add inference-sh/skills@speech-to-text

Video generation

视频生成

npx skills add inference-sh/skills@ai-video-generation

Image generation (create avatar images)

图片生成（创建虚拟形象图片）

npx skills add inference-sh/skills@ai-image-generation


Browse all video apps: `infsh app list --category video`

npx skills add inference-sh/skills@ai-image-generation


浏览所有视频类应用：`infsh app list --category video`

Documentation

文档

Running Apps - How to run apps via CLI
Content Pipeline Example - Building media workflows
Streaming Results - Real-time progress updates

运行应用 - 如何通过CLI运行应用
内容流水线示例 - 构建媒体工作流
流式传输结果 - 实时进度更新

ai-avatar-video

Original

Translation

AI Avatar & Talking Head Videos

AI虚拟形象与会说话的头部视频

Quick Start

快速开始

Create avatar video from image + audio

从图片+音频创建虚拟形象视频

Available Models

可用模型

Search Avatar Apps

搜索虚拟形象应用

Examples

示例

OmniHuman 1.5 (Multi-Character)

OmniHuman 1.5（多角色）

Fabric 1.0 (Image Talks)

Fabric 1.0（图片动嘴说话）

PixVerse Lipsync

PixVerse Lipsync

Full Workflow: TTS + Avatar

完整工作流：TTS + 虚拟形象

1. Generate speech from text

1. 从文本生成语音

2. Create avatar video with the speech

2. 用生成的语音创建虚拟形象视频

Full Workflow: Dub Video in Another Language

完整工作流：为视频添加其他语言配音

1. Transcribe original video

1. 转录原视频音频

2. Translate text (manually or with an LLM)

2. 翻译文本（手动或借助大语言模型）

3. Generate speech in new language

3. 生成新语言的语音

4. Lipsync the original video with new audio

4. 为原视频匹配新音频的唇形

Use Cases

适用场景

Tips

小贴士

Related Skills

相关技能

Full platform skill (all 150+ apps)

全平台技能（包含150+应用）

Text-to-speech (generate audio for avatars)

文本转语音（为虚拟形象生成音频）

Speech-to-text (transcribe for dubbing)

语音转文本（为配音转录内容）

Video generation

视频生成

Image generation (create avatar images)

图片生成（创建虚拟形象图片）

Documentation

文档