Loading...
Loading...
Found 51 Skills
MiniMax TTS API - Text-to-Speech, Voice Cloning, Voice Design
Inworld TTS API. Covers voice cloning, audio markups, timestamps. Keywords: text-to-speech, visemes.
AI audio generation powered by CellCog. Text-to-speech, voice synthesis, voiceovers, podcast audio, narration, music generation, background music, sound design. Professional audio creation with AI.
This skill converts text to high-quality audio files using ElevenLabs API. Use this skill when users request text-to-speech generation, audio narration, or voice synthesis with customizable voice parameters (stability, similarity boost) and voice presets (rachel, adam, bella, elli, josh, arnold, ava).
Generate realistic audio from text using ElevenLabs Text-to-Speech API. Use when the user needs to convert text to speech, create voiceovers, generate narration, or produce audio content. Triggers include "generate audio", "text to speech", "TTS", "voiceover", "narration", "ElevenLabs", "audio from text", "read this text aloud"
Complete ElevenLabs AI audio platform: text-to-speech (TTS), speech-to-text (STT/Scribe), voice cloning, voice design, sound effects, music generation, dubbing, voice changer, voice isolator, and conversational voice agents. Use when working with audio generation, voice synthesis, transcription, audio processing, or building voice-enabled applications. Triggers: generate speech, clone voice, transcribe audio, create sound effects, compose music, dub video, change voice, isolate vocals, build voice agent, ElevenLabs API/SDK/CLI/MCP.
Use for Azure AI: Search, Speech, OpenAI, Document Intelligence. Helps with search, vector/hybrid search, speech-to-text, text-to-speech, transcription, OCR. USE FOR: AI Search, query search, vector search, hybrid search, semantic search, speech-to-text, text-to-speech, transcribe, OCR, convert text to speech. DO NOT USE FOR: Function apps/Functions (use azure-functions), databases (azure-postgres/azure-kusto), general Azure resources.
ElevenLabs text-to-speech with 22+ premium voices, multilingual support, and voice tuning via inference.sh CLI. Models: eleven_multilingual_v2 (highest quality), eleven_turbo_v2_5 (low latency), eleven_flash_v2_5 (ultra-fast). Capabilities: text-to-speech, voice selection, stability/style control, 32 languages. Use for: voiceovers, audiobooks, video narration, podcasts, accessibility, IVR. Triggers: elevenlabs, eleven labs, elevenlabs tts, premium tts, professional voice, ai voice, high quality tts, multilingual tts, eleven labs voice, voice generation, natural speech, realistic voice, voice over, speech synthesis
Convert text to natural speech with DIA TTS, Kokoro, Chatterbox, and more via inference.sh CLI. Models: DIA TTS (conversational), Kokoro TTS, Chatterbox, Higgs Audio, VibeVoice (podcasts). Capabilities: text-to-speech, voice cloning, multi-speaker dialogue, podcast generation, expressive speech. Use for: voiceovers, audiobooks, podcasts, accessibility, video narration, IVR, voice assistants. Triggers: text to speech, tts, voice generation, ai voice, speech synthesis, voice over, generate speech, ai narrator, voice cloning, text to audio, elevenlabs alternative, voice ai, ai voiceover, speech generator, natural voice
Expert skill for implementing text-to-speech with Kokoro TTS. Covers voice synthesis, audio generation, performance optimization, and secure handling of generated audio for JARVIS voice assistant.
Local text-to-speech via sherpa-onnx (offline, no cloud)
Unified media generation via fal.ai MCP — image, video, and audio. Covers text-to-image (Nano Banana), text/image-to-video (Seedance, Kling, Veo 3), text-to-speech (CSM-1B), and video-to-audio (ThinkSound). Use when the user wants to generate images, videos, or audio with AI.