Loading...
Loading...
Found 118 Skills
Text-to-speech and speech-to-text using fal.ai audio models. Use when the user requests "Convert text to speech", "Transcribe audio", "Generate voice", "Speech to text", "TTS", "STT", or similar audio tasks.
Expert skill for Voicebox — the open-source local voice cloning and TTS studio built with Tauri, React, and FastAPI
Translate and dub videos from one language to another, replacing the original audio with TTS while keeping the video intact.
Generate speech audio from text using HeyGen's Starfish TTS model. Use when: (1) Generating standalone speech audio files from text, (2) Converting text to speech with voice selection, speed, and pitch control, (3) Creating audio for voiceovers, narration, or podcasts, (4) Working with HeyGen's /v1/audio endpoints, (5) Listing available TTS voices by language or gender.
Tired of juggling multiple audio APIs? This skill gives you one-command access to TTS, music generation, sound effects, and voice cloning. Use when you want to generate any audio without managing multiple API keys.
Use when creating cloned voices with Alibaba Cloud Model Studio CosyVoice customization models, especially cosyvoice-v3.5-plus or cosyvoice-v3.5-flash, from reference audio and then reusing the returned voice_id in later TTS calls.
Use when the user wants to generate speech, voiceover, or text-to-audio. Converts text to AI voice via Giggle.pro TTS API. Triggers: generate speech, text-to-speech, TTS, voiceover, read this text aloud, synthesize speech.
Build real-time conversational AI voice engines using async worker pipelines, streaming transcription, LLM agents, and TTS synthesis with interrupt handling and multi-provider support
Build LiveKit Agent backends in TypeScript or JavaScript. Use this skill when creating voice AI agents, voice assistants, or any realtime AI application using LiveKit's Node.js Agents SDK (@livekit/agents-js). Covers AgentSession, Agent class, function tools with zod, STT/LLM/TTS models, turn detection, and realtime models.
Text-to-Speech Tool - Supports script parsing, emotion tagging, and post-processing, based on Edge TTS
Build LiveKit Agent backends in Python. Use this skill when creating voice AI agents, voice assistants, or any realtime AI application using LiveKit's Python Agents SDK (livekit-agents). Covers AgentSession, Agent class, function tools, STT/LLM/TTS models, turn detection, and multi-agent workflows.
Use this skill to create podcast episodes, interviews, dialogues, and conversation-style audio. Triggers: "create podcast", "podcast episode", "interview audio", "dialogue", "conversation", "two hosts discussing", "audio show", "radio show", "audio drama" Orchestrates: script generation, multi-speaker TTS, intro/outro music, and audio assembly.