Search Results: text-to-speech

Found 89 Skills

hyperframes-media

Asset preprocessing for HyperFrames compositions — text-to-speech narration (Kokoro), audio/video transcription (Whisper), and background removal for transparent overlays (u2net). Use when generating voiceover from text, transcribing speech for captions, removing the background from a video or image to use as a transparent overlay, choosing a TTS voice or whisper model, or chaining these (TTS → transcribe → captions). Each command downloads its own model on first run.

🇺🇸|EnglishTranslated

75.0k

AI & Machine Learningskill-zero/s

ai-voice-cloning

AI voice generation, text-to-speech, and voice synthesis via inference.sh CLI. Models: Kokoro TTS, DIA, Chatterbox, Higgs, VibeVoice for natural speech. Capabilities: multiple voices, emotions, accents, long-form narration, conversation. Use for: voiceovers, audiobooks, podcasts, video narration, accessibility. Triggers: voice cloning, tts, text to speech, ai voice, voice generation, voice synthesis, voice over, narration, speech synthesis, ai narrator, elevenlabs alternative, natural voice, realistic speech, voice ai

🇺🇸|EnglishTranslated

828

AI & Machine Learningskill-zero/s

ai-podcast-creation

Create AI-powered podcasts with text-to-speech, music, and audio editing. Tools: Kokoro TTS, DIA TTS, Chatterbox, AI music generation, media merger. Capabilities: multi-voice conversations, background music, intro/outro, full episodes. Use for: podcast production, audiobooks, voice content, audio newsletters. Triggers: podcast, ai podcast, text to speech podcast, audio content, voice over, ai audiobook, multi voice, conversation ai, notebooklm alternative, audio generation, podcast automation, ai narrator, voice content, audio newsletter, podcast maker

🇺🇸|EnglishTranslated

688

AI & Machine Learningmicrosoft/agent-skills

podcast-generation

Generate AI-powered podcast-style audio narratives using Azure OpenAI's GPT Realtime Mini model via WebSocket. Use when building text-to-speech features, audio narrative generation, podcast creation from content, or integrating with Azure OpenAI Realtime API for real audio output. Covers full-stack implementation from React frontend to Python FastAPI backend with WebSocket streaming.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningnotedit/happy-skills

tts-skill

MiniMax TTS API - Text-to-Speech, Voice Cloning, Voice Design

🇨🇳|ChineseTranslated

1 scripts/Checked

AI & Machine Learningglebis/claude-skills

elevenlabs-tts

This skill converts text to high-quality audio files using ElevenLabs API. Use this skill when users request text-to-speech generation, audio narration, or voice synthesis with customizable voice parameters (stability, similarity boost) and voice presets (rachel, adam, bella, elli, josh, arnold, ava).

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningqodex-ai/ai-agent-skills

voice-ai-integration

Build voice-enabled AI applications with speech recognition, text-to-speech, and voice-based interactions. Supports multiple voice providers and real-time processing. Use when creating voice assistants, voice-controlled applications, audio interfaces, or hands-free AI systems.

🇺🇸|EnglishTranslated

4 scripts/Checked

AI & Machine Learningsanjay3290/ai-skills

google-tts

Convert documents and text to audio using Google Cloud Text-to-Speech. Use this skill when the user wants to: narrate a document, read aloud text, generate audio from a file, convert text to speech, create a recording of documentation or analysis, create a podcast from a document, or use Google TTS/text-to-speech. Trigger phrases: "read this aloud", "narrate this", "create a recording", "text to speech", "TTS", "convert to audio", "audio from document", "listen to this", "generate audio", "google tts", "create a podcast".

🇺🇸|EnglishTranslated

2 scripts/Checked

AI & Machine Learningboomsystel-code/openclaw-...

audio-cog

AI audio generation powered by CellCog. Text-to-speech, voice synthesis, voiceovers, podcast audio, narration, music generation, background music, sound design. Professional audio creation with AI.

🇺🇸|EnglishTranslated

AI & Machine Learningdeepgram/skills

api

Deepgram API reference for speech-to-text, text-to-speech, voice agents, audio intelligence, and account management. Use whenever building with Deepgram APIs — REST or WebSocket. Covers authentication, all endpoints, query parameters, request/response schemas, and WebSocket message formats. Reference files are organized by domain: listen (STT), speak (TTS), agent (voice agents), read (text/audio intelligence), models, projects, auth, and self-hosted.

🇺🇸|EnglishTranslated

AI & Machine Learningsteipete/clawdis

sag

ElevenLabs text-to-speech with mac-style say UX.

🇺🇸|EnglishTranslated

AI & Machine Learningmassgen/massgen

audio-generation

Guide to audio generation and understanding in MassGen. Covers text-to-speech, music, sound effects, and audio understanding across ElevenLabs and OpenAI backends.

🇺🇸|EnglishTranslated