Search Results: speech-to-text

Found 70 Skills

Tools & Utilitiesconversiontools/agent-ski...

conversiontools

Convert files between 140+ formats using the ConversionTools MCP server. Use when the user needs to convert documents (Word, PDF, Excel, PowerPoint), data formats (JSON, CSV, XML, YAML, Parquet), images (PNG, JPG, WebP, AVIF, HEIC, JXL, SVG), audio (MP3, WAV, FLAC), video (MOV, MKV, AVI to MP4), e-books (EPUB, MOBI, AZW), OCR text extraction, AI-powered data extraction, AI text-to-speech (TTS), AI speech-to-text transcription (STT), subtitle conversion (SRT, VTT, ASS), or website screenshots.

🇺🇸|EnglishTranslated

AI & Machine Learningframersai/agentos-skills

streaming-stt-whisper

Chunked sliding-window streaming speech-to-text via OpenAI Whisper HTTP API — compatible with local Faster-Whisper, Groq, and OpenRouter endpoints.

🇺🇸|EnglishTranslated

AI & Machine Learninginference-sh-skills/skill...

elevenlabs-stt

ElevenLabs speech-to-text with Scribe models and forced alignment via inference.sh CLI. Models: Scribe v1/v2 (98%+ accuracy, 90+ languages). Capabilities: transcription, speaker diarization, audio event tagging, word-level timestamps, forced alignment, subtitle generation. Use for: meeting transcription, subtitles, podcast transcripts, lip-sync timing, karaoke. Triggers: elevenlabs stt, elevenlabs transcription, scribe, elevenlabs speech to text, forced alignment, word alignment, subtitle timing, diarization, speaker identification, audio event detection, eleven labs transcribe

🇺🇸|EnglishTranslated

AI & Machine Learningopenrouterteam/skills

openrouter-stt

Transcribe speech to text using OpenRouter's speech-to-text API. Use when the user asks to transcribe audio, convert speech to text, extract a transcript from a recording or meeting, caption a video's audio, or mentions STT, speech-to-text, ASR, or transcription.

🇺🇸|EnglishTranslated

Tools & Utilitiesfeiskyer/video-skills

transcribe-video

Extract transcript or subtitles from a local video file. Use this skill whenever the user asks to transcribe a video, extract speech-to-text, get subtitles, or wants a text version of what's said in a video. Also trigger on "提取字幕", "视频转文字", "语音转文字", "transcribe", "extract audio text", or when the user references getting a script/transcript from any video file (mp4, mkv, mov, avi, webm). This skill is for LOCAL video files — for YouTube or other online URLs, use the download-video skill first to get the file, then transcribe it.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningvanman2024/ai-dev-marketp...

stt-integration

ElevenLabs Speech-to-Text transcription workflows with Scribe v1 supporting 99 languages, speaker diarization, and Vercel AI SDK integration. Use when implementing audio transcription, building STT features, integrating speech-to-text, setting up Vercel AI SDK with ElevenLabs, or when user mentions transcription, STT, Scribe v1, audio-to-text, speaker diarization, or multi-language transcription.

🇺🇸|EnglishTranslated

5 scripts/Attention

AI & Machine Learningtdimino/claude-code-minoa...

parakeet

Local speech-to-text via Handy app (push-to-talk) and NeMo CLI scripts. Parakeet V3: 25 languages, auto-detection, ~30x realtime on M4 Max, 6% WER. This skill should be used when transcribing audio files or dictating voice input.

🇺🇸|EnglishTranslated

3 scripts/Attention

Frontend Developmentsyncfusion/blazor-ui-comp...

syncfusion-blazor-speech-to-text

Implement speech-to-text voice input in Blazor applications using Syncfusion SpeechToText component. ALWAYS use this when users need voice input, speech recognition, audio transcription, or implementing the SpeechToText component in Blazor. Trigger for Syncfusion.Blazor.Inputs, microphone input, voice-to-text conversion, language support, transcript binding, listening states, error handling, browser speech API, or any speech recognition requirements.

🇺🇸|EnglishTranslated

AI & Machine Learningframersai/agentos-skills

whisper-transcribe

Transcribe audio and video files to text using OpenAI Whisper or compatible speech-to-text APIs.

🇺🇸|EnglishTranslated

AI & Machine Learningframersai/agentos-skills

vosk

Fully offline speech-to-text via the Vosk library — streaming recognition, 16 kHz PCM, no network required after model download.

🇺🇸|EnglishTranslated

AI & Machine Learningaraa47/ez-voice

ez-stt

Local speech-to-text with selectable backends - Parakeet (best accuracy) or Whisper (fastest, multilingual).

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learninginference-sh/skills

speech-to-text

Transcribe audio to text with Whisper models via inference.sh CLI. Models: Fast Whisper Large V3, Whisper V3 Large. Capabilities: transcription, translation, multi-language, timestamps. Use for: meeting transcription, subtitles, podcast transcripts, voice notes. Triggers: speech to text, transcription, whisper, audio to text, transcribe audio, voice to text, stt, automatic transcription, subtitles generation, transcribe meeting, audio transcription, whisper ai

🇺🇸|EnglishTranslated

862