Search Results: speech-recognition

Found 17 Skills

Mobile Developmentdpearson2699/swift-ios-sk...

speech-recognition

Transcribe speech to text using the Speech framework. Use when implementing live microphone transcription with AVAudioEngine, recognizing pre-recorded audio files, configuring on-device vs server-based recognition, handling authorization flows, or adopting the new SpeechAnalyzer API (iOS 26+) for modern async/await speech-to-text.

🇺🇸|EnglishTranslated

AI & Machine Learningdavila7/claude-code-templ...

whisper

OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M params). Use for speech-to-text, podcast transcription, or multilingual audio processing. Best for robust, multilingual ASR.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

digital-health-clinical-asr-setup

Stage 1 of Clinical ASR Flywheel. Use when bootstrapping a cycle: NVCF+MW disclosure, NVIDIA_API_KEY check, deps install, TTS+ASR smoke test.

🇺🇸|EnglishTranslated

Frontend Developmentsyncfusion/angular-ui-com...

syncfusion-angular-speech-to-text

Implement the Syncfusion Angular SpeechToText component. Use this skill for real-time speech-to-text conversion with text transcripts, custom button appearance and tooltips, recognition event handling, multiple language support with localization and RTL, error handling, and security best practices for microphone access and data transmission.

🇺🇸|EnglishTranslated

AI & Machine Learningbytedance/agentkit-sample...

byted-las-asr-pro

ASR (Automatic Speech Recognition) — enhanced speech-to-text built on Doubao large model, with audio preprocessing, denoising, and extended analysis capabilities. Async API. Choose this skill when: - Input is a video file (mp4/mov/mkv) — auto-extracts audio track - Audio needs denoising before recognition - File exceeds 512MB or 5 hours (no size limit) - Audio source is a TOS internal path (tos://bucket/key) - Need structured JSON output with timestamped utterances and metadata - Need speaker diarization, emotion/gender detection, speech rate, or sensitive word filtering Supports 99 languages, multiple formats (wav/mp3/m4a/aac/flac/ogg/mp4/mov/mkv), and auto language detection.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningtavus-engineering/tavus-s...

tavus-cvi-persona

Configure Tavus CVI personas with custom LLMs, TTS engines, perception, and turn-taking. Use when customizing AI behavior, bringing your own LLM, configuring voice/TTS, enabling vision with Raven, or tuning conversation flow with Sparrow.

🇺🇸|EnglishTranslated

AI & Machine Learningqodex-ai/ai-agent-skills

voice-ai-integration

Build voice-enabled AI applications with speech recognition, text-to-speech, and voice-based interactions. Supports multiple voice providers and real-time processing. Use when creating voice assistants, voice-controlled applications, audio interfaces, or hands-free AI systems.

🇺🇸|EnglishTranslated

4 scripts/Checked

AI & Machine Learningzainhas/togetherai-skills

together-audio

Text-to-speech (TTS) and speech-to-text (STT) via Together AI. TTS models include Orpheus, Kokoro, Cartesia Sonic, Rime, MiniMax with REST, streaming, and WebSocket support. STT models include Whisper and Voxtral. Use when users need voice synthesis, audio generation, speech recognition, transcription, TTS, STT, or real-time voice applications.

🇺🇸|EnglishTranslated

2 scripts/Checked

AI & Machine Learningbytedance/agentkit-sample...

byted-voice-to-text

Automatic Speech Recognition (ASR). Uses Volcano Engine BigModel ASR for speech recognition, with two available modes: Express Edition (≤2h/100MB, synchronous fast response) and Standard Edition (≤5h, asynchronous recognition). It supports Feishu voice messages, local audio files and audio URLs. Use this skill when you receive voice messages or audio attachments (.ogg/.mp3/.wav).

🇨🇳|ChineseTranslated

5 scripts/Attention

Frontend Developmentdaffy0208/ai-dev-standard...

voice-interface-builder

Expert in building voice interfaces, speech recognition, and text-to-speech systems

🇺🇸|EnglishTranslated

AI & Machine Learningjrusso1020/video-understa...

video-understand

Video understanding and transcription with intelligent multi-provider fallback. Use when: (1) Transcribing video or audio content, (2) Understanding video content including visual elements and scenes, (3) Analyzing YouTube videos by URL, (4) Extracting information from local video files, (5) Getting timestamps, summaries, or answering questions about video content. Automatically selects the best available provider based on configured API keys - prefers full video understanding (Gemini/OpenRouter) over ASR-only providers. Supports model selection per provider.

🇺🇸|EnglishTranslated

3 scripts/Attention

Tools & Utilitiessugarforever/01coder-agen...

subtitle-correction

Correct subtitle files (.srt) generated from speech recognition. Use when the user uploads subtitle files and asks to correct, fix, or proofread subtitles, especially for technical content like programming tutorials, AI/ML courses, or any content with domain-specific terminology. Supports Chinese and English subtitles with intelligent error detection and correction while preserving exact timeline information.

🇺🇸|EnglishTranslated

1 scripts/Checked