bailian-multimodal-skills
Original:🇺🇸 English
Translated
1 scriptsChecked / no sensitive code detected
Generate images, video, speech, and transcribe audio using Aliyun Bailian models.
5installs
Added on
NPX Install
npx skill4agent add cclank/openclaw_provider_plugins bailian-multimodal-skillsTags
Translated version includes tags in frontmatterSKILL.md Content
View Translation Comparison →Bailian Multimodal Skills
Generate images, audio, video, and transcribe speech using Aliyun Bailian (Qwen/Wan/CosyVoice) models.
Features
- Image Generation: ,
z-image-turbowan2.6-t2i - ASR (Speech-to-Text):
qwen3-asr-flash - TTS (Text-to-Speech):
qwen3-tts-flash - Text-to-Video:
wan2.6-t2v - Image-to-Video: ,
wan2.6-i2v-flashwan2.6-i2v - Reference-to-Video: ,
wan2.6-r2v-flashwan2.6-r2v
Usage
1. Image Generation
Generate images from text.
bash
uv run {baseDir}/scripts/run_multimodal.py --mode image --model z-image-turbo --prompt "A futuristic city" --output "city.png"Models: ,
z-image-turbowan2.6-t2i2. ASR (Speech Recognition)
Transcribe audio files or URLs to text.
bash
uv run {baseDir}/scripts/run_multimodal.py --mode asr --model qwen3-asr-flash --input-audio "https://example.com/audio.mp3"3. TTS (Speech Synthesis)
Convert text to speech.
bash
uv run {baseDir}/scripts/run_multimodal.py --mode tts --model qwen3-tts-flash --text "Hello world" --output "hello.wav"4. Text-to-Video (T2V)
Generate video from text prompt. Async task with auto-polling.
bash
uv run {baseDir}/scripts/run_multimodal.py --mode t2v --model wan2.6-t2v --prompt "一只小猫在月光下奔跑" --duration 10 --size "1280*720" --output "cat.mp4"Models:
wan2.6-t2vOptions: (e.g., 1280720, 19201080), (2-15s), /, , , , ,
--size--duration--prompt-extend--no-prompt-extend--shot-type single|multi--negative-prompt--audio-url--watermark--seed5. Image-to-Video (I2V)
Generate video from a reference image (first frame).
bash
uv run {baseDir}/scripts/run_multimodal.py --mode i2v --model wan2.6-i2v-flash --img-url "https://example.com/cat.png" --prompt "A cat running" --resolution 720P --duration 5 --output "cat_run.mp4"Models: ,
wan2.6-i2v-flashwan2.6-i2vOptions: (required, image URL or base64), , (480P/720P/1080P), , /, , , , ,
--img-url--prompt--resolution--duration--prompt-extend--no-prompt-extend--shot-type single|multi--negative-prompt--audio-url--watermark--seed6. Reference-to-Video (R2V)
Generate video with character/object references (images or videos as actors).
bash
uv run {baseDir}/scripts/run_multimodal.py --mode r2v --model wan2.6-r2v-flash --prompt "character1 在公园里散步" --reference-urls "https://example.com/person.png" --size "1280*720" --duration 5 --output "walk.mp4"Multi-character example:
bash
uv run {baseDir}/scripts/run_multimodal.py --mode r2v --model wan2.6-r2v-flash --prompt "character1 对 character2 说你好" --reference-urls "https://example.com/role1.mp4" "https://example.com/role2.png" --shot-type multi --output "dialog.mp4"Models: ,
wan2.6-r2v-flashwan2.6-r2vOptions: (required, space-separated, up to 5), (required, use character1/character2 to map references), , (2-10s), , , (silent, r2v-flash only), ,
--reference-urls--prompt--size--duration--shot-type single|multi--negative-prompt--no-audio--watermark--seedConfiguration
API Key 按以下优先级读取:
- 命令行参数
--api-key - 环境变量
DASHSCOPE_API_KEY - 配置文件
~/.config/bailian-multimodal/api_key.txt
bash
# 方式一:环境变量
export DASHSCOPE_API_KEY="sk-..."
# 方式二:配置文件
mkdir -p ~/.config/bailian-multimodal
echo "sk-..." > ~/.config/bailian-multimodal/api_key.txt