Search Results: audio-processing

Found 48 Skills

ffmpeg-audio-processing

基于 FFmpeg 的音频处理技能，提供最实用的日常音频处理命令

Marketing & Growthvivy-yi/xiaohongshu-skill...

audio-processing

Use when processing audio for Xiaohongshu content, editing voiceovers, improving sound quality, creating podcasts, or producing audio-based posts

🇺🇸|EnglishTranslated

AI & Machine Learningeachlabs/skills

eachlabs-voice-audio

Text-to-speech, speech-to-text, voice conversion, and audio processing using EachLabs AI models. Supports ElevenLabs TTS, Whisper transcription with diarization, and RVC voice conversion. Use when the user needs TTS, transcription, or voice conversion.

🇺🇸|EnglishTranslated

AI & Machine Learningmichaelboeding/skills

audio-producer-agent

Use this skill to create single-voice audio content like audiobooks, voiceovers, narrations, jingles, and audio ads. Triggers: "create audiobook", "generate voiceover", "narration", "audio ad", "radio ad", "jingle", "brand audio", "sonic logo", "text to audio", "read this aloud", "audio guide", "meditation audio", "soundscape" Orchestrates: narration/TTS, background music, and audio assembly. NOTE: For conversations/dialogues, use podcast-producer instead.

🇺🇸|EnglishTranslated

AI & Machine Learningbytedance/agentkit-sample...

byted-las-asr-pro

ASR (Automatic Speech Recognition) — enhanced speech-to-text built on Doubao large model, with audio preprocessing, denoising, and extended analysis capabilities. Async API. Choose this skill when: - Input is a video file (mp4/mov/mkv) — auto-extracts audio track - Audio needs denoising before recognition - File exceeds 512MB or 5 hours (no size limit) - Audio source is a TOS internal path (tos://bucket/key) - Need structured JSON output with timestamped utterances and metadata - Need speaker diarization, emotion/gender detection, speech rate, or sensitive word filtering Supports 99 languages, multiple formats (wav/mp3/m4a/aac/flac/ogg/mp4/mov/mkv), and auto language detection.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningopenai/skills

transcribe

Transcribe audio files to text with optional diarization and known-speaker hints. Use when a user asks to transcribe speech from audio/video, extract text from recordings, or label speakers in interviews or meetings.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningsteipete/clawdis

openai-whisper

Local speech-to-text with the Whisper CLI (no API key).

🇺🇸|EnglishTranslated

Tools & Utilitiesmichaelboeding/skills

media-utils

Internal utility skill for media assembly operations. NOT called directly by users. Used by producer skills (video-producer, podcast-producer, audio-producer, social-producer) to stitch, mix, and assemble final media outputs.

🇺🇸|EnglishTranslated

7 scripts/Checked

AI & Machine Learningyonatangross/orchestkit

multimodal-llm

Vision, audio, and multimodal LLM integration patterns. Use when processing images, transcribing audio, generating speech, or building multimodal AI pipelines.

🇺🇸|EnglishTranslated

Backend Developmentrunwayml/skills

integrate-audio

Help users integrate Runway audio APIs (TTS, sound effects, voice isolation, dubbing)

🇺🇸|EnglishTranslated

AI & Machine Learningmartinholovsky/claude-ski...

speech-to-text

Expert skill for implementing speech-to-text with Faster Whisper. Covers audio processing, transcription optimization, privacy protection, and secure handling of voice data for JARVIS voice assistant.

🇺🇸|EnglishTranslated

AI & Machine Learninginfquest/vibe-ops-plugin

audio-transcribe

Convert audio/video to text using Whisper, with support for word-level timestamps. Use this when users need speech-to-text conversion, audio-to-text transcription, video-to-text extraction, subtitle generation, transcribe audio, speech to text, generate subtitles, or speech recognition.

🇨🇳|ChineseTranslated

1 scripts/Checked