Search Results: speech-to-text

Found 70 Skills

AI & Machine Learningmicrosoft/agent-skills

azure-ai-transcription-py

Azure AI Transcription SDK for Python. Use for real-time and batch speech-to-text transcription with timestamps and diarization. Triggers: "transcription", "speech to text", "Azure AI Transcription", "TranscriptionClient".

🇺🇸|EnglishTranslated

AI & Machine Learningeachlabs/skills

eachlabs-voice-audio

Text-to-speech, speech-to-text, voice conversion, and audio processing using EachLabs AI models. Supports ElevenLabs TTS, Whisper transcription with diarization, and RVC voice conversion. Use when the user needs TTS, transcription, or voice conversion.

🇺🇸|EnglishTranslated

AI & Machine Learningdaymade/claude-code-skill...

transcript-fixer

Corrects speech-to-text transcription errors in meeting notes, lectures, and interviews using dictionary rules and AI. Learns patterns to build personalized correction databases. Use when working with transcripts containing ASR/STT errors, homophones, or Chinese/English mixed content requiring cleanup.

🇺🇸|EnglishTranslated

51 scripts/Checked

AI & Machine Learninginfquest/vibe-ops-plugin

audio-transcribe

Convert audio/video to text using Whisper, with support for word-level timestamps. Use this when users need speech-to-text conversion, audio-to-text transcription, video-to-text extraction, subtitle generation, transcribe audio, speech to text, generate subtitles, or speech recognition.

🇨🇳|ChineseTranslated

1 scripts/Checked

AI & Machine Learningframersai/agentos-skills

streaming-stt-deepgram

Real-time streaming speech-to-text via Deepgram WebSocket API — sub-300 ms latency, Nova-2 model, speaker diarization, auto-reconnect.

🇺🇸|EnglishTranslated

AI & Machine Learningassemblyai/assemblyai-ski...

assemblyai

Use when implementing speech-to-text, audio transcription, real-time streaming STT, audio intelligence features, or voice AI using AssemblyAI APIs or SDKs. Use when user mentions AssemblyAI, voice agents, transcription, speaker diarization, PII redaction of audio, LLM Gateway for audio understanding, or applying LLMs to transcripts. Also use when building voice agents with LiveKit or Pipecat that need speech-to-text, or when the user is working with any audio/video processing pipeline that could benefit from transcription, even if they don't mention AssemblyAI by name.

🇺🇸|EnglishTranslated

AI & Machine Learningpostplusai/postplus-skill...

audio-transcription

Transcribe local or remote audio into durable text and timestamp artifacts using hosted Whisper models. Use this when the job is speech-to-text from audio files and you need request/response persistence, optional timestamps, and subtitle-ready outputs.

🇺🇸|EnglishTranslated

11 scripts/Attention

Document Processingrookie-ricardo/erduo-skil...

transcript-polisher

Refine speech transcription texts (interviews, speeches, podcasts, meetings) into more readable article paragraphs. Trigger this skill when users mention terms like "subtitle refinement", "transcript polish", "subtitle polishing", "organize video subtitles into articles", "interview text organization", processing interview records, transcription text optimization, speech-to-text organization, or when they need to organize long dialogue/speech texts into readable articles. It is suitable for organizing transcription texts of solo speeches or multi-person conversations, requiring the retention of original sentences and words, and rejecting high-level generalization. This skill should also be triggered even if users only say "help me organize this text" and attach obviously colloquial text.

🇨🇳|ChineseTranslated

AI & Machine Learningelevenlabs/skills

speech-engine

Add real-time voice conversations to a custom LLM, OpenClaw, or similar agent runtime with ElevenLabs Speech Engine. Use when building Speech Engine servers, WebSocket handlers, WebRTC browser clients, conversation token endpoints, interruption-aware streaming responses, or voice-enabled chat agents that connect a developer-owned LLM to ElevenLabs speech-to-text and text-to-speech.

🇺🇸|EnglishTranslated

AI & Machine Learningdaymade/claude-code-skill...

asr-transcribe-to-text

Transcribe audio and video files to text using a remote ASR service (Qwen3-ASR or OpenAI-compatible endpoint). Extracts audio from video, sends to configurable ASR endpoint, outputs clean text. Use when the user wants to transcribe recordings, convert audio/video to text, do speech-to-text, or mentions ASR, Qwen ASR, 转录, 语音转文字, 录音转文字, or has a meeting recording, lecture, interview, or screen recording to transcribe.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningdeepgram/skills

api

Deepgram API reference for speech-to-text, text-to-speech, voice agents, audio intelligence, and account management. Use whenever building with Deepgram APIs — REST or WebSocket. Covers authentication, all endpoints, query parameters, request/response schemas, and WebSocket message formats. Reference files are organized by domain: listen (STT), speak (TTS), agent (voice agents), read (text/audio intelligence), models, projects, auth, and self-hosted.

🇺🇸|EnglishTranslated

AI & Machine Learningcharleswiltgen/axiom

axiom-ios-ml

Use when deploying ANY machine learning model on-device, converting models to CoreML, compressing models, or implementing speech-to-text. Covers CoreML conversion, MLTensor, model compression (quantization/palettization/pruning), stateful models, KV-cache, multi-function models, async prediction, SpeechAnalyzer, SpeechTranscriber.

🇺🇸|EnglishTranslated