Loading...
Loading...
Found 48 Skills
Speaker diarization — identifies and tracks who is speaking at each moment in an audio stream, using provider-delegated labels or local offline clustering.
FFmpeg video and audio processing patterns. Use when transcoding video/audio, extracting clips, adding filters, merging media, creating thumbnails, or batch processing media files.
Text-to-Speech Tool - Supports script parsing, emotion tagging, and post-processing, based on Edge TTS
Generate subtitles (SRT/VTT) and plain text transcripts from video or audio files using AWS Transcribe. Use when creating captions, extracting spoken content, generating transcripts for notes, or making video content searchable.
Music generation queueing, retrieval, and completion endpoints via Venice.ai. Suited for jingles, background loops, and prototype scoring.
Transcribe audio/video files to text using Whisper via OpenKBS AI proxy. Supports MP4, MP3, WAV, OGG, MKV and other ffmpeg-compatible formats. Splits large files into chunks automatically.
Assemble final video from generated clips, audio, and assets using FFmpeg or Remotion. Handles concatenation, audio mixing, transitions, titles, and export. Use when combining multiple production outputs into a final deliverable.
Use when asked to normalize audio volume, match loudness, or apply peak/RMS normalization to audio files.
Polishes raw Suno audio by processing per-stem WAVs (vocals, backing_vocals, drums, bass, guitar, keyboard, strings, brass, woodwinds, percussion, synth, other) with targeted cleanup, EQ, and compression, then remixing into a polished stereo WAV ready for mastering. Use after audio import and before mastering.
Audio spectrograms/features (mel, chroma, MFCC) via CLI.
Analyze audio recording quality - echo detection, loudness, speech intelligibility, SNR, spectral analysis. Use when the user wants to check a recording's quality, detect echo or duplication in audio files, measure speech clarity, compare original vs processed audio, diagnose why a recording sounds bad, or analyze audio tracks from Blackbox or any call recording app. Triggers on audio quality, recording analysis, echo detection, check recording, sound quality, analyze audio, speech quality, PESQ, STOI, loudness, SNR, audio diagnostics, recording sounds bad, echo in recording, audio duplication.
Transcribe audio with StepFun's stepaudio-2.5-asr — an SSE endpoint (NOT /v1/audio/transcriptions) with 32K context, ~85-101x RTF on long audio, and a single-call ceiling around 30 minutes (no client-side chunking). Use when transcribing Chinese / English audio with StepFun, when long-form recordings (5-30 min) need to land in one request, when migrating from step-asr / step-asr-1.1, or when hitting the misleading `model stepaudio-2.5-asr not supported` error (which actually means wrong endpoint). Triggers on 阶跃 ASR, StepFun ASR, stepaudio-2.5-asr, 转录, 语音识别, 长音频转写, 语音转文字. For TTS with the sibling stepaudio-2.5-tts model, use the stepfun-tts skill instead.