Search Results: asr

Found 34 Skills

AI & Machine Learningcinience/alicloud-skills

aliyun-qwen-asr

Use when transcribing non-realtime speech with Alibaba Cloud Model Studio Qwen ASR models (`qwen3-asr-flash`, `qwen-audio-asr`, `qwen3-asr-flash-filetrans`). Use when converting recorded audio files to text, generating transcripts with timestamps, or documenting DashScope/OpenAI-compatible ASR request and response fields.

🇺🇸|EnglishTranslated

1 scripts/Checked

Document Processingwdkns/wdkns-skills

subtitle-refine

Professional-level refinement and verification for Chinese SRT subtitles for launch. Used to clean ASR-based raw subtitles into a publishable version, only performing subtitle-level cleaning and correction without formal rewriting, summarization, or expansion; meanwhile, strictly maintaining synchronization with the original audio, splitting entries only within the original subtitle time range when necessary, outputting a complete clean SRT, and then using the accompanying verification script for final rule checks and timeline review. Suitable for tasks such as documentaries, interviews, oral broadcasts, screen recordings that require correcting recognition errors, deleting meaningless filler words, adding pause spaces, limiting single-entry word count, and avoiding accidental deletion of meaningful subtitles.

🇨🇳|ChineseTranslated

1 scripts/Attention

Testing & QAcinience/alicloud-skills

alicloud-ai-audio-asr-test

Minimal non-realtime ASR smoke test for Model Studio Qwen ASR.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

nemotron-speech

Routes NVIDIA Nemotron Speech (Riva) NIM tasks — deploys, runs, and tests ASR, TTS, and NMT NIMs on build.nvidia.com or self-hosted.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningopenrouterteam/skills

openrouter-stt

Transcribe speech to text using OpenRouter's speech-to-text API. Use when the user asks to transcribe audio, convert speech to text, extract a transcript from a recording or meeting, caption a video's audio, or mentions STT, speech-to-text, ASR, or transcription.

🇺🇸|EnglishTranslated

AI & Machine Learningdaymade/claude-code-skill...

stepfun-asr

Transcribe audio with StepFun's stepaudio-2.5-asr — an SSE endpoint (NOT /v1/audio/transcriptions) with 32K context, ~85-101x RTF on long audio, and a single-call ceiling around 30 minutes (no client-side chunking). Use when transcribing Chinese / English audio with StepFun, when long-form recordings (5-30 min) need to land in one request, when migrating from step-asr / step-asr-1.1, or when hitting the misleading `model stepaudio-2.5-asr not supported` error (which actually means wrong endpoint). Triggers on 阶跃 ASR, StepFun ASR, stepaudio-2.5-asr, 转录, 语音识别, 长音频转写, 语音转文字. For TTS with the sibling stepaudio-2.5-tts model, use the stepfun-tts skill instead.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningnvidia/skills

digital-health-clinical-asr-setup

Stage 1 of Clinical ASR Flywheel. Use when bootstrapping a cycle: NVCF+MW disclosure, NVIDIA_API_KEY check, deps install, TTS+ASR smoke test.

🇺🇸|EnglishTranslated

AI & Machine Learningdavila7/claude-code-templ...

whisper

OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M params). Use for speech-to-text, podcast transcription, or multilingual audio processing. Best for robust, multilingual ASR.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

digital-health-clinical-asr-finetune

Stage 4 of the Clinical ASR Flywheel. Use when priority KER is above 0.3 to run stock NeMo SFT on Parakeet TDT v2 and offline cycle N+1 re-eval. NOT for generic word boosting (use /finetune-asr).

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

digital-health-clinical-asr-build

Stage 2 of the Clinical ASR Flywheel. Use when curating clinical terms, tagging IPA, and synthesizing a NeMo manifest. NOT for scoring (use /digital-health-clinical-asr-eval).

🇺🇸|EnglishTranslated

AI & Machine Learningnovitalabs/novita-skills

novita-ai

Novita AI: LLM, Image Generation & Editing, Video Generation, Audio (TTS/ASR), and GPU Cloud. Use this skill whenever the user wants to call Novita AI APIs — chat with LLMs (DeepSeek, Llama, Qwen), generate images (FLUX, Stable Diffusion, Seedream, Hunyuan Image), edit images (remove background, upscale, inpainting, img2img, outpainting, reimagine, merge face, replace background, remove text), generate videos (Kling, Wan, Hunyuan, Minimax Hailuo, Vidu, PixVerse, Seedance), do text-to-speech or speech-to-text (MiniMax TTS, GLM TTS, Fish Audio, ASR, voice cloning), run OpenAI-compatible batch jobs, manage GPU cloud instances and serverless endpoints, or check account balance and billing. Also trigger when the user mentions novita.ai, Novita AI, Novita API key, or wants to use any Novita platform service — even if they just say "generate an image" or "run an LLM" and Novita is available as a provider.

🇺🇸|EnglishTranslated

AI & Machine Learningdaymade/claude-code-skill...

transcript-fixer

Corrects speech-to-text transcription errors in meeting notes, lectures, and interviews using dictionary rules and AI. Learns patterns to build personalized correction databases. Use when working with transcripts containing ASR/STT errors, homophones, or Chinese/English mixed content requiring cleanup.

🇺🇸|EnglishTranslated

51 scripts/Checked