ai-avatar-video

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

AI Avatar & Talking Head Videos

AI虚拟形象与会说话的头部视频

Create AI avatars and talking head videos via inference.sh CLI.
通过inference.sh CLI工具创建AI虚拟形象和会说话的头部视频。

Quick Start

快速开始

bash
curl -fsSL https://cli.inference.sh | sh && infsh login
bash
curl -fsSL https://cli.inference.sh | sh && infsh login

Create avatar video from image + audio

从图片+音频创建虚拟形象视频

infsh app run bytedance/omnihuman-1-5 --input '{ "image_url": "https://portrait.jpg", "audio_url": "https://speech.mp3" }'
undefined
infsh app run bytedance/omnihuman-1-5 --input '{ "image_url": "https://portrait.jpg", "audio_url": "https://speech.mp3" }'
undefined

Available Models

可用模型

ModelApp IDBest For
OmniHuman 1.5
bytedance/omnihuman-1-5
Multi-character, best quality
OmniHuman 1.0
bytedance/omnihuman-1-0
Single character
Fabric 1.0
falai/fabric-1-0
Image talks with lipsync
PixVerse Lipsync
falai/pixverse-lipsync
Highly realistic
模型应用ID最佳适用场景
OmniHuman 1.5
bytedance/omnihuman-1-5
多角色,画质最佳
OmniHuman 1.0
bytedance/omnihuman-1-0
单角色
Fabric 1.0
falai/fabric-1-0
图片动嘴说话(唇形同步)
PixVerse Lipsync
falai/pixverse-lipsync
高度写实

Search Avatar Apps

搜索虚拟形象应用

bash
infsh app list --search "omnihuman"
infsh app list --search "lipsync"
infsh app list --search "fabric"
bash
infsh app list --search "omnihuman"
infsh app list --search "lipsync"
infsh app list --search "fabric"

Examples

示例

OmniHuman 1.5 (Multi-Character)

OmniHuman 1.5(多角色)

bash
infsh app run bytedance/omnihuman-1-5 --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "https://speech.mp3"
}'
Supports specifying which character to drive in multi-person images.
bash
infsh app run bytedance/omnihuman-1-5 --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "https://speech.mp3"
}'
支持指定驱动多人图片中的某个角色。

Fabric 1.0 (Image Talks)

Fabric 1.0(图片动嘴说话)

bash
infsh app run falai/fabric-1-0 --input '{
  "image_url": "https://face.jpg",
  "audio_url": "https://audio.mp3"
}'
bash
infsh app run falai/fabric-1-0 --input '{
  "image_url": "https://face.jpg",
  "audio_url": "https://audio.mp3"
}'

PixVerse Lipsync

PixVerse Lipsync

bash
infsh app run falai/pixverse-lipsync --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "https://speech.mp3"
}'
Generates highly realistic lipsync from any audio.
bash
infsh app run falai/pixverse-lipsync --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "https://speech.mp3"
}'
通过任意音频生成高度写实的唇形同步效果。

Full Workflow: TTS + Avatar

完整工作流:TTS + 虚拟形象

bash
undefined
bash
undefined

1. Generate speech from text

1. 从文本生成语音

infsh app run infsh/kokoro-tts --input '{ "text": "Welcome to our product demo. Today I will show you..." }' > speech.json
infsh app run infsh/kokoro-tts --input '{ "text": "Welcome to our product demo. Today I will show you..." }' > speech.json

2. Create avatar video with the speech

2. 用生成的语音创建虚拟形象视频

infsh app run bytedance/omnihuman-1-5 --input '{ "image_url": "https://presenter-photo.jpg", "audio_url": "<audio-url-from-step-1>" }'
undefined
infsh app run bytedance/omnihuman-1-5 --input '{ "image_url": "https://presenter-photo.jpg", "audio_url": "<audio-url-from-step-1>" }'
undefined

Full Workflow: Dub Video in Another Language

完整工作流:为视频添加其他语言配音

bash
undefined
bash
undefined

1. Transcribe original video

1. 转录原视频音频

infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://video.mp4"}' > transcript.json
infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://video.mp4"}' > transcript.json

2. Translate text (manually or with an LLM)

2. 翻译文本(手动或借助大语言模型)

3. Generate speech in new language

3. 生成新语言的语音

infsh app run infsh/kokoro-tts --input '{"text": "<translated-text>"}' > new_speech.json
infsh app run infsh/kokoro-tts --input '{"text": "<translated-text>"}' > new_speech.json

4. Lipsync the original video with new audio

4. 为原视频匹配新音频的唇形

infsh app run infsh/latentsync-1-6 --input '{ "video_url": "https://original-video.mp4", "audio_url": "<new-audio-url>" }'
undefined
infsh app run infsh/latentsync-1-6 --input '{ "video_url": "https://original-video.mp4", "audio_url": "<new-audio-url>" }'
undefined

Use Cases

适用场景

  • Marketing: Product demos with AI presenter
  • Education: Course videos, explainers
  • Localization: Dub content in multiple languages
  • Social Media: Consistent virtual influencer
  • Corporate: Training videos, announcements
  • 营销:使用AI主持人的产品演示
  • 教育:课程视频、讲解视频
  • 本地化:为内容添加多语言配音
  • 社交媒体:打造风格统一的虚拟网红
  • 企业:培训视频、公告视频

Tips

小贴士

  • Use high-quality portrait photos (front-facing, good lighting)
  • Audio should be clear with minimal background noise
  • OmniHuman 1.5 supports multiple people in one image
  • LatentSync is best for syncing existing videos to new audio
  • 使用高质量的正面肖像照片(光线良好)
  • 音频应清晰,背景噪音尽可能小
  • OmniHuman 1.5支持单张图片中的多角色
  • LatentSync最适合为现有视频匹配新音频的唇形

Related Skills

相关技能

bash
undefined
bash
undefined

Full platform skill (all 150+ apps)

全平台技能(包含150+应用)

npx skills add inference-sh/skills@inference-sh
npx skills add inference-sh/skills@inference-sh

Text-to-speech (generate audio for avatars)

文本转语音(为虚拟形象生成音频)

npx skills add inference-sh/skills@text-to-speech
npx skills add inference-sh/skills@text-to-speech

Speech-to-text (transcribe for dubbing)

语音转文本(为配音转录内容)

npx skills add inference-sh/skills@speech-to-text
npx skills add inference-sh/skills@speech-to-text

Video generation

视频生成

npx skills add inference-sh/skills@ai-video-generation
npx skills add inference-sh/skills@ai-video-generation

Image generation (create avatar images)

图片生成(创建虚拟形象图片)

npx skills add inference-sh/skills@ai-image-generation

Browse all video apps: `infsh app list --category video`
npx skills add inference-sh/skills@ai-image-generation

浏览所有视频类应用:`infsh app list --category video`

Documentation

文档