Loading...
Loading...
Found 72 Skills
Generate realistic audio from text using ElevenLabs Text-to-Speech API. Use when the user needs to convert text to speech, create voiceovers, generate narration, or produce audio content. Triggers include "generate audio", "text to speech", "TTS", "voiceover", "narration", "ElevenLabs", "audio from text", "read this text aloud"
Text-to-speech using edge-tts or macOS say. Use when user says "speak", "say", "read aloud", or wants text spoken.
Complete ElevenLabs AI audio platform: text-to-speech (TTS), speech-to-text (STT/Scribe), voice cloning, voice design, sound effects, music generation, dubbing, voice changer, voice isolator, and conversational voice agents. Use when working with audio generation, voice synthesis, transcription, audio processing, or building voice-enabled applications. Triggers: generate speech, clone voice, transcribe audio, create sound effects, compose music, dub video, change voice, isolate vocals, build voice agent, ElevenLabs API/SDK/CLI/MCP.
Text-to-Speech using Doubao (Volcano Engine) API. Use when converting text to natural-sounding speech, generating audio files from text, listing available TTS voices, or synthesizing speech with customizable speed/volume parameters.
Text-to-speech synthesis with ElevenLabs and system voices
Use for Azure AI: Search, Speech, OpenAI, Document Intelligence. Helps with search, vector/hybrid search, speech-to-text, text-to-speech, transcription, OCR. USE FOR: AI Search, query search, vector search, hybrid search, semantic search, speech-to-text, text-to-speech, transcribe, OCR, convert text to speech. DO NOT USE FOR: Function apps/Functions (use azure-functions), databases (azure-postgres/azure-kusto), general Azure resources.
Unified media generation via fal.ai MCP — image, video, and audio. Covers text-to-image (Nano Banana), text/image-to-video (Seedance, Kling, Veo 3), text-to-speech (CSM-1B), and video-to-audio (ThinkSound). Use when the user wants to generate images, videos, or audio with AI.
Generate images, videos, audio, and 3D models via RunningHub API (170+ endpoints) and run any RunningHub AI Application (custom ComfyUI workflow) by webappId. Covers text-to-image, image-to-video, text-to-speech, music generation, 3D modeling, image upscaling, AI apps, and more.
Access paid services (verification, search, AI models, images, audio, browser automation) for AI agents via Sapiom. Use when building agents that need to verify phone/email, search the web, call AI models, generate images, convert text-to-speech, or automate browsers — without setting up vendor accounts.
Search and integrate Fal AI models from fal.ai platform. Use when the user wants to (1) search for models on Fal AI platform, (2) get detailed information about a specific Fal AI model, (3) integrate a Fal AI model into the project, (4) explore available AI models on fal.ai, or mentions "fal.ai", "图像生成", "AI video model", "text to image", "text-to-speech".
OpenAI API via curl. Use this skill for GPT chat completions, DALL-E image generation, Whisper audio transcription, embeddings, and text-to-speech.
Generate and transcribe speech using Google's Gemini-TTS and Chirp 3 models. Supports Text-to-Speech (Single/Multi-speaker), Instant Custom Voice, and Speech-to-Text (Transcription/Diarization).