Loading...
Loading...
Found 34 Skills
AI voice generation, text-to-speech, and voice synthesis via inference.sh CLI. Models: Kokoro TTS, DIA, Chatterbox, Higgs, VibeVoice for natural speech. Capabilities: multiple voices, emotions, accents, long-form narration, conversation. Use for: voiceovers, audiobooks, podcasts, video narration, accessibility. Triggers: voice cloning, tts, text to speech, ai voice, voice generation, voice synthesis, voice over, narration, speech synthesis, ai narrator, elevenlabs alternative, natural voice, realistic speech, voice ai
AI voice generation, text-to-speech, and voice synthesis via inference.sh CLI. Models: Inworld TTS-2 (100+ languages, emotion/non-verbal steering), Inworld TTS 1.5 (ultra-low latency), ElevenLabs (22+ premium voices, 32 languages), Kokoro TTS, DIA, Chatterbox, Higgs, VibeVoice for natural speech. Capabilities: multiple voices, emotions, accents, long-form narration, conversation, voice transformation, delivery mode control, character voices. Use for: voiceovers, audiobooks, podcasts, video narration, accessibility, gaming NPCs, avatar audio, UGC. Triggers: voice cloning, tts, text to speech, ai voice, voice generation, voice synthesis, voice over, narration, speech synthesis, ai narrator, elevenlabs, eleven labs, natural voice, realistic speech, voice ai, voice changer, inworld, inworld tts, character voice, npc voice
Generate AI voiceovers, sound effects, and music using ElevenLabs APIs. Use when creating audio content for videos, podcasts, or games. Triggers include generating voiceovers, narration, dialogue, sound effects from descriptions, background music, soundtrack generation, voice cloning, or any audio synthesis task.
MiniMax TTS API - Text-to-Speech, Voice Cloning, Voice Design
Convert text into speech with Kokoro or Noiz, including simple and timeline-aligned modes.
Vox single-entry voice orchestration skill. Used to complete environment guarding, CLI installation, on-demand model download, ASR transcription, voice cloning, pipeline execution and task troubleshooting through natural language. It is used when users only describe the target without providing specific commands.
Generate (TTS), Transcribe (STT), and Clone voices using Google's GenAI and Cloud Speech SDKs. Supports Gemini-TTS, Chirp 3, and Instant Custom Voice.
Voice cloning workflows with Alibaba Cloud Model Studio Qwen TTS VC models. Use when creating cloned voices from sample audio and synthesizing text with cloned timbre.
Make generated speech feel companion-like with fillers, emotional tuning, and preset speaking styles.
Two-host podcast video for any URL or free-form topic — 1 minute, 4 acts × ~15s, native multi-shot dialogue, optional voice cloning for Host A. Use when the user asks to "make a podcast", "podcast about [thing]", "podcast review of [url]", "two-host explainer", "interview-style clip", "two people talking on camera", "I/me and X talk about Y", or "interview with [persona] about [topic]". Native audio is the deliverable; captions are skipped by default because podcast dialogue mistranscribes domain terms.
Chat with any real person or fictional character in their own voice by automatically finding their speech online, extracting a clean reference sample, and generating audio replies. Use when the user says "我想跟xxx聊天", "你来扮演xxx跟我说话", "让xxx给我讲讲这篇文章", or similar.
Expert skill for Voicebox — the open-source local voice cloning and TTS studio built with Tauri, React, and FastAPI