Loading...
Loading...
Found 27 Skills
基于 FFmpeg 的音频处理技能,提供最实用的日常音频处理命令
Convert audio/video to text using Whisper, with support for word-level timestamps. Use this when users need speech-to-text conversion, audio-to-text transcription, video-to-text extraction, subtitle generation, transcribe audio, speech to text, generate subtitles, or speech recognition.
Text-to-speech, speech-to-text, voice conversion, and audio processing using EachLabs AI models. Supports ElevenLabs TTS, Whisper transcription with diarization, and RVC voice conversion. Use when the user needs TTS, transcription, or voice conversion.
Use this skill to create single-voice audio content like audiobooks, voiceovers, narrations, jingles, and audio ads. Triggers: "create audiobook", "generate voiceover", "narration", "audio ad", "radio ad", "jingle", "brand audio", "sonic logo", "text to audio", "read this aloud", "audio guide", "meditation audio", "soundscape" Orchestrates: narration/TTS, background music, and audio assembly. NOTE: For conversations/dialogues, use podcast-producer instead.
Expert skill for Voicebox — the open-source local voice cloning and TTS studio built with Tauri, React, and FastAPI
Video and audio processing with FFmpeg. Use for format conversion, resizing, compression, audio extraction, and preparing assets for Remotion. Triggers include converting GIF to MP4, resizing video, extracting audio, compressing files, or any media transformation task.
Transcribe audio files to text with optional diarization and known-speaker hints. Use when a user asks to transcribe speech from audio/video, extract text from recordings, or label speakers in interviews or meetings.
Generate audio visualization videos using each::sense AI. Create waveforms, spectrum analyzers, particle effects, 3D visualizations, and beat-synced animations from audio files.
ML-powered Karaoke app in Rust using Bevy, WhisperX, and Demucs for stem separation, lyrics transcription, and pitch scoring.
Expert skill for implementing speech-to-text with Faster Whisper. Covers audio processing, transcription optimization, privacy protection, and secure handling of voice data for JARVIS voice assistant.
Expert in web audio, audio processing, and interactive sound design
Local speech-to-text with the Whisper CLI (no API key).