Loading...
Loading...
Found 196 Skills
Use this skill when the user wants to convert a Wang Jianshuo-style WeChat article (article.md) into a narrated short MP4 video — featuring TTS voiceover via Volcano Engine Volcano TTS, scene-specific HyperFrames CSS/GSAP animations, subtle sound effects (SFX), abstract watercolor backgrounds, and end-to-end pipeline rendering to a 1080×1920 portrait MP4 (30-90 seconds). Triggers — "把这篇文章做成视频", "做一个解说视频", "讲解视频", "/wjs-converting-text-to-video".
Run provider-agnostic live voice conversations with VAD, silence boundaries, wake-word gating, STT, and TTS through the AgentOS speech runtime.
Text-to-Speech Tool - Supports script parsing, emotion tagging, and post-processing, based on Edge TTS
Expert in voice synthesis, TTS, voice cloning, podcast production, speech processing, and voice UI design via ElevenLabs integration. Specializes in vocal clarity, loudness standards (LUFS), de-essing, dialogue mixing, and voice transformation. Activate on 'TTS', 'text-to-speech', 'voice clone', 'voice synthesis', 'ElevenLabs', 'podcast', 'voice recording', 'speech-to-speech', 'voice UI', 'audiobook', 'dialogue'. NOT for spatial audio (use sound-engineer), music production (use DAW tools), game audio middleware (use sound-engineer), sound effects generation (use sound-engineer with ElevenLabs SFX), or live concert audio.
Use the Gemini API (Nano Banana image generation, Veo video, Gemini TTS speech and audio understanding) to deliver end-to-end multimodal media workflows and code templates for "generation + understanding".
Apply cognitive science and HCI research to design decisions. Use when you need the scientific 'why' behind usability, explaining user behavior, understanding perception/memory/attention limits, evaluating cognitive load, assessing mental model alignment, predicting performance with Fitts's/Hick's Law, or grounding interface decisions in research rather than opinion.
Generate audio replies using TTS. Trigger with "read it to me [URL]" to fetch and read content aloud, or "talk to me [topic]" to generate a spoken response. Also responds to "speak", "say it", "voice reply".
Health check for TTS and Telegram bot subsystems. TRIGGERS - tts health, kokoro status, telegram bot check, tts diagnostics.
Generate voice messages using local Qwen3-TTS (offline, Apple Silicon). Convert text to speech with customizable voices, emotions, and speed. Use when user asks for voice reply, audio, or TTS.
Use when the user wants to generate speech, voiceover, or text-to-audio. Converts text to AI voice via Giggle.pro TTS API. Triggers: generate speech, text-to-speech, TTS, voiceover, read this text aloud, synthesize speech.
Convert text to speech (TTS). Powered by the VolcEngine Doubao Text-to-Speech API, it supports streaming synthesis, multiple voice timbres, adjustments to speech rate/pitch/loudness, Markdown syntax filtering, and LaTeX formula broadcasting. Use this skill when users need to convert text to speech, generate reading audio, dubbing, narration, broadcasts, or mention terms like 'text-to-speech', 'TTS', 'speech synthesis', 'reading aloud', or 'dubbing'.
Generate multi-person talking head podcast videos from scratch using AI — character creation, TTS, avatar animation, and video stitching. Use when the user wants to create a podcast, talking head video, or multi-speaker conversation video.