ai-avatar-video

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

AI Avatar & Talking Head Videos

AI头像与会说话的头部视频

Create AI avatars and talking head videos via inference.sh CLI.

通过inference.sh CLI创建AI头像和会说话的头部视频。

Quick Start

快速开始

bash

curl -fsSL https://cli.inference.sh | sh && infsh login

bash

curl -fsSL https://cli.inference.sh | sh && infsh login

Create avatar video from image + audio

从图片+音频创建头像视频

infsh app run bytedance/omnihuman-1-5 --input '{ "image_url": "https://portrait.jpg", "audio_url": "https://speech.mp3" }'

undefined

infsh app run bytedance/omnihuman-1-5 --input '{ "image_url": "https://portrait.jpg", "audio_url": "https://speech.mp3" }'

undefined

Available Models

可用模型

Model	App ID	Best For
OmniHuman 1.5	`bytedance/omnihuman-1-5`	Multi-character, best quality
OmniHuman 1.0	`bytedance/omnihuman-1-0`	Single character
Fabric 1.0	`falai/fabric-1-0`	Image talks with lipsync
PixVerse Lipsync	`falai/pixverse-lipsync`	Highly realistic

模型	App ID	最佳适用场景
OmniHuman 1.5	`bytedance/omnihuman-1-5`	多角色，画质最佳
OmniHuman 1.0	`bytedance/omnihuman-1-0`	单角色
Fabric 1.0	`falai/fabric-1-0`	图片唇形同步说话
PixVerse Lipsync	`falai/pixverse-lipsync`	超真实效果

Search Avatar Apps

搜索头像应用

bash

infsh app list --search "omnihuman"
infsh app list --search "lipsync"
infsh app list --search "fabric"

bash

infsh app list --search "omnihuman"
infsh app list --search "lipsync"
infsh app list --search "fabric"

Examples

示例

OmniHuman 1.5 (Multi-Character)

OmniHuman 1.5（多角色）

bash

infsh app run bytedance/omnihuman-1-5 --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "https://speech.mp3"
}'

Supports specifying which character to drive in multi-person images.

bash

infsh app run bytedance/omnihuman-1-5 --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "https://speech.mp3"
}'

支持在多人图片中指定要驱动的角色。

Fabric 1.0 (Image Talks)

Fabric 1.0（图片说话）

bash

infsh app run falai/fabric-1-0 --input '{
  "image_url": "https://face.jpg",
  "audio_url": "https://audio.mp3"
}'

bash

infsh app run falai/fabric-1-0 --input '{
  "image_url": "https://face.jpg",
  "audio_url": "https://audio.mp3"
}'

PixVerse Lipsync

bash

infsh app run falai/pixverse-lipsync --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "https://speech.mp3"
}'

Generates highly realistic lipsync from any audio.

bash

infsh app run falai/pixverse-lipsync --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "https://speech.mp3"
}'

可通过任意音频生成超真实的唇形同步效果。

Full Workflow: TTS + Avatar

完整工作流：文本转语音 + 头像生成

bash

undefined

bash

undefined

1. Generate speech from text

1. 从文本生成语音

infsh app run infsh/kokoro-tts --input '{ "text": "Welcome to our product demo. Today I will show you..." }' > speech.json

infsh app run infsh/kokoro-tts --input '{ "text": "欢迎观看我们的产品演示，今天我将为大家展示..." }' > speech.json

2. Create avatar video with the speech

2. 用生成的语音创建头像视频

infsh app run bytedance/omnihuman-1-5 --input '{ "image_url": "https://presenter-photo.jpg", "audio_url": "<audio-url-from-step-1>" }'

undefined

infsh app run bytedance/omnihuman-1-5 --input '{ "image_url": "https://presenter-photo.jpg", "audio_url": "<步骤1生成的音频链接>" }'

undefined

Full Workflow: Dub Video in Another Language

完整工作流：视频多语言配音

bash

undefined

bash

undefined

1. Transcribe original video

1. 转录原视频音频

infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://video.mp4"}' > transcript.json

2. Translate text (manually or with an LLM)

2. 翻译文本（手动或借助大语言模型）

3. Generate speech in new language

3. 生成目标语言的语音

infsh app run infsh/kokoro-tts --input '{"text": "<translated-text>"}' > new_speech.json

infsh app run infsh/kokoro-tts --input '{"text": "<翻译后的文本>"}' > new_speech.json

4. Lipsync the original video with new audio

4. 将原视频与新音频进行唇形同步

infsh app run infsh/latentsync-1-6 --input '{ "video_url": "https://original-video.mp4", "audio_url": "<new-audio-url>" }'

undefined

infsh app run infsh/latentsync-1-6 --input '{ "video_url": "https://original-video.mp4", "audio_url": "<新音频链接>" }'

undefined

Use Cases

适用场景

Marketing: Product demos with AI presenter
Education: Course videos, explainers
Localization: Dub content in multiple languages
Social Media: Consistent virtual influencer
Corporate: Training videos, announcements

营销：AI主持人产品演示
教育：课程视频、讲解视频
本地化：多语言内容配音
社交媒体：风格统一的虚拟网红
企业：培训视频、公告视频

Tips

小贴士

Use high-quality portrait photos (front-facing, good lighting)
Audio should be clear with minimal background noise
OmniHuman 1.5 supports multiple people in one image
LatentSync is best for syncing existing videos to new audio

使用高质量正面肖像照（光线充足）
音频需清晰，背景噪音尽可能小
OmniHuman 1.5支持单图多角色
LatentSync最适合将现有视频与新音频同步

Related Skills

Full platform skill (all 150+ apps)

全平台技能（包含150+应用）

npx skills add inference-sh/skills@inference-sh

Text-to-speech (generate audio for avatars)

文本转语音（为头像生成音频）

npx skills add inference-sh/skills@text-to-speech

Speech-to-text (transcribe for dubbing)

语音转文本（用于配音转录）

npx skills add inference-sh/skills@speech-to-text

Video generation

视频生成

npx skills add inference-sh/skills@ai-video-generation

Image generation (create avatar images)

图像生成（创建头像图片）

npx skills add inference-sh/skills@ai-image-generation


Browse all video apps: `infsh app list --category video`

npx skills add inference-sh/skills@ai-image-generation


浏览所有视频应用：`infsh app list --category video`

Documentation

文档

Running Apps - How to run apps via CLI
Content Pipeline Example - Building media workflows
Streaming Results - Real-time progress updates

运行应用 - 如何通过CLI运行应用
内容工作流示例 - 构建媒体工作流
流式结果 - 实时进度更新

ai-avatar-video

Original

Translation

AI Avatar & Talking Head Videos

AI头像与会说话的头部视频

Quick Start

快速开始

Create avatar video from image + audio

从图片+音频创建头像视频

Available Models

可用模型

Search Avatar Apps

搜索头像应用

Examples

示例

OmniHuman 1.5 (Multi-Character)

OmniHuman 1.5（多角色）

Fabric 1.0 (Image Talks)

Fabric 1.0（图片说话）

PixVerse Lipsync

PixVerse Lipsync

Full Workflow: TTS + Avatar

完整工作流：文本转语音 + 头像生成

1. Generate speech from text

1. 从文本生成语音

2. Create avatar video with the speech

2. 用生成的语音创建头像视频

Full Workflow: Dub Video in Another Language

完整工作流：视频多语言配音

1. Transcribe original video

1. 转录原视频音频

2. Translate text (manually or with an LLM)

2. 翻译文本（手动或借助大语言模型）

3. Generate speech in new language

3. 生成目标语言的语音

4. Lipsync the original video with new audio

4. 将原视频与新音频进行唇形同步

Use Cases

适用场景

Tips

小贴士

Related Skills

相关技能

Full platform skill (all 150+ apps)

全平台技能（包含150+应用）

Text-to-speech (generate audio for avatars)

文本转语音（为头像生成音频）

Speech-to-text (transcribe for dubbing)

语音转文本（用于配音转录）

Video generation

视频生成

Image generation (create avatar images)

图像生成（创建头像图片）

Documentation

文档