Search Results: audio-transcription

Found 24 Skills

ai-multimodal

Multimodal AI processing via Google Gemini API (2M tokens context). Capabilities: audio (transcription, 9.5hr max, summarization, music analysis), images (captioning, OCR, object detection, segmentation, visual Q&A), video (scene detection, 6hr max, YouTube URLs, temporal analysis), documents (PDF extraction, tables, forms, charts), image generation (text-to-image, editing). Actions: transcribe, analyze, extract, caption, detect, segment, generate from media. Keywords: Gemini API, audio transcription, image captioning, OCR, object detection, video analysis, PDF extraction, text-to-image, multimodal, speech recognition, visual Q&A, scene detection, YouTube transcription, table extraction, form processing, image generation, Imagen. Use when: transcribing audio/video, analyzing images/screenshots, extracting data from PDFs, processing YouTube videos, generating images from text, implementing multimodal AI features.

🇺🇸|EnglishTranslated

6 scripts/Attention

AI & Machine Learningaahl/skills

qwen-asr

Transcribe audio files using Qwen ASR. Use when the user sends voice messages and wants them converted to text.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningthinkfleetai/thinkfleet-e...

local-whisper

Local speech-to-text using OpenAI Whisper. Runs fully offline after model download. High quality transcription with multiple model sizes.

🇺🇸|EnglishTranslated

Tools & Utilitiesnateherkai/hyperframes-st...

hyperframes-cli

HyperFrames CLI tool — hyperframes init, lint, preview, render, transcribe, tts, doctor, browser, info, upgrade, compositions, docs, benchmark. Use when scaffolding a project, linting or validating compositions, previewing in the studio, rendering to video, transcribing audio, generating TTS, or troubleshooting the HyperFrames environment.

🇺🇸|EnglishTranslated

AI & Machine Learningbadlogic/pi-skills

transcribe

Speech-to-text transcription using Groq Whisper API. Supports m4a, mp3, wav, ogg, flac, webm.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningaia-11-hn-mib/mib-mockint...

gemini-video-understanding

Analyze videos using Google's Gemini API - describe content, answer questions, transcribe audio with visual descriptions, reference timestamps, clip videos, and process YouTube URLs. Supports 9 video formats, multiple models (Gemini 2.5/2.0), and context windows up to 2M tokens (6 hours of video).

🇺🇸|EnglishTranslated

Tools & Utilities0xdarkmatter/claude-mods

markitdown

Convert local documents to Markdown using Microsoft's markitdown CLI. Best for: PDF, Word, Excel, PowerPoint, images (OCR), audio. Can fetch URLs but Jina is faster for web. Triggers on: convert to markdown, read PDF, parse document, extract text from, docx, xlsx, pptx, OCR image, local file.

🇺🇸|EnglishTranslated

AI & Machine Learningkouko/monkey-knowledge-yo...

mk-youtube-audio-transcribe

Transcribe audio to text using local whisper.cpp. Use when user wants to convert audio/video to text, get transcription, or speech-to-text.

🇺🇸|EnglishTranslated

13 scripts/Attention

Tools & Utilitieshyperpuncher/dotagents

chough

Fast ASR CLI tool for transcribing audio/video files. Use when user wants to transcribe audio/video, generate subtitles (VTT), convert speech to text with timestamps (JSON), or optimize transcription for low memory.

🇺🇸|EnglishTranslated

AI & Machine Learningsarvamai/skills

speech-to-text

Transcribe audio to text using Sarvam AI's Saaras model. Handles speech recognition, transcription, and voice interfaces for 23 Indian languages. Supports 5 output modes, auto language detection, WebSocket streaming, and batch diarization. Use when converting speech to text or building voice-enabled apps.

🇺🇸|EnglishTranslated

Data Processingnaohainezha/skill

video-reader

Read, watch, and listen to video/audio files. Extract key frames to "see" videos, extract audio to "hear" them via Whisper transcription. Use when a user sends a video/audio and asks about its content, what's in it, what someone said, etc.

🇺🇸|EnglishTranslated

AI & Machine Learningalphaonedev/openclaw-grap...

openai-whisper-api

OpenAI Whisper API: audio transcription, translation, structured output, large file handling

🇺🇸|EnglishTranslated