Loading...
Loading...
Found 156 Skills
Use when creating cloned voices with Alibaba Cloud Model Studio CosyVoice customization models, especially cosyvoice-v3.5-plus or cosyvoice-v3.5-flash, from reference audio and then reusing the returned voice_id in later TTS calls.
Use when generating a non-interactive cutscene clip — opening scene, story beat, character intro, ending. Locks the look with a reference image, image-to-videos a 5-10s shot, optionally adds TTS dialogue, and wires it as a VideoStreamPlayer that fades in/out. Trigger on "cutscene", "intro cinematic", "opening scene", "ending cinematic", "story beat video", "character intro video", "in-engine cinematic", "non-playable scene".
Generate speech from text using Google Gemini TTS models via scripts/. Use for text-to-speech, audio generation, voice synthesis, multi-speaker conversations, and creating audio content. Supports multiple voices and streaming. Triggers on "text to speech", "TTS", "generate audio", "voice synthesis", "speak this text".
Inworld TTS API. Covers voice cloning, audio markups, timestamps. Keywords: text-to-speech, visemes.
One-time bootstrap for Kokoro TTS engine, Telegram bot, and BotFather setup. TRIGGERS - setup tts, install kokoro, botfather, bootstrap tts-telegram-sync, configure telegram bot, full stack setup.
Generate high-quality voiceover audio with ElevenLabs. Includes word-level timestamps for video sync. Use when: creating demo narration, video voiceover, podcast intros, or any TTS need. Keywords: voiceover, TTS, text to speech, ElevenLabs, narration, audio, timestamps.
Minimal voice cloning TTS smoke test for Model Studio Qwen TTS VC.
Configure TTS voices, speed, timeouts, queue depth, and bot settings. TRIGGERS - configure tts, change voice, tts speed, queue depth, tts timeout, bot config, tune settings, adjust parameters.
Tired of juggling multiple audio APIs? This skill gives you one-command access to TTS, music generation, sound effects, and voice cloning. Use when you want to generate any audio without managing multiple API keys.
MiMo V2.5 TTS Text-to-Speech. Generate speech using Xiaomi MiMo V2.5 TTS series models. This skill is activated when text needs to be converted to speech, voice messages need to be sent, content needs to be read aloud, or when users request 'speak it out' or 'voice reply'. It supports three modes: preset voice, voice design, and voice cloning, as well as natural language control and director mode. It also supports style tag control for tone, emotion, and dialect, and preset voices support singing.
Use this skill when the user wants to convert a Wang Jianshuo-style WeChat article (article.md) into a narrated short MP4 video — featuring TTS voiceover via Volcano Engine Volcano TTS, scene-specific HyperFrames CSS/GSAP animations, subtle sound effects (SFX), abstract watercolor backgrounds, and end-to-end pipeline rendering to a 1080×1920 portrait MP4 (30-90 seconds). Triggers — "把这篇文章做成视频", "做一个解说视频", "讲解视频", "/wjs-converting-text-to-video".
Text-to-Speech Tool - Supports script parsing, emotion tagging, and post-processing, based on Edge TTS