Search Results: media-processing

Found 40 Skills

AI & Machine Learningbinhmuc/autobot-review

ai-multimodal

Process and generate multimedia content using Google Gemini API for better vision capabilities. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (better image analysis than Claude models, captioning, reasoning, object detection, design extraction, OCR, visual Q&A, segmentation, handle multiple images), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image with Imagen 4, editing, composition, refinement), generate videos (text-to-video with Veo 3, 8-second clips with native audio). Use when working with audio/video files, analyzing images or screenshots (instead of default vision capabilities of Claude, only fallback to Claude's vision capabilities if needed), processing PDF documents, extracting structured data from media, creating images/videos from text prompts, or implementing multimodal AI features. Supports Gemini 3/2.5, Imagen 4, and Veo 3 models with context windows up to 2M tokens.

🇺🇸|EnglishTranslated

7 scripts/Attention

Data Processingbarefootford/buttercut

analyze-video

Adds visual descriptions to transcripts by extracting and analyzing video frames with ffmpeg. Creates visual transcript with periodic visual descriptions of the video clip. Use when all files have audio transcripts present (transcript) but don't yet have visual transcripts created (visual_transcript).

🇺🇸|EnglishTranslated

1 scripts/Checked

DevOps & Cloud Servicescinience/alicloud-skills

alicloud-media-mps

Manage Alibaba Cloud ApsaraVideo for Media Processing (MPS/MTS) resources and workflows via OpenAPI/SDK. Use for media ingest and metadata tasks, transcoding/snapshot jobs, pipeline/template/workflow operations, and MPS job troubleshooting.

🇺🇸|EnglishTranslated

1 scripts/Checked

Tools & Utilitiesagntswrm/agent-media

image-extend

Extends an image canvas by adding padding on all sides with a solid background color. Use when you need to add borders, margins, or expand the canvas area around an image.

🇺🇸|EnglishTranslated

AI & Machine Learningcivitai/civitai

civitai-orchestration

Query and explore Civitai Orchestration workflows, jobs, and results. Use for analyzing image/video generation jobs, viewing job results, searching by workflow ID, job ID, user, or date range.

🇺🇸|EnglishTranslated

1 scripts/Attention

Tools & Utilitiesgupsammy/claudest

extract-audio

This skill should be used when the user asks to "extract audio", "get the mp3", "strip audio from video", "rip audio", "save audio from video", "convert to audio", "get the soundtrack", "pull the audio track", "save as mp3", "export audio", or "separate audio from video".

🇺🇸|EnglishTranslated

Backend Developmentmiles990/claude-software-...

content-platforms

CMS, blogging platforms, and content management patterns

🇺🇸|EnglishTranslated

1 scripts/Checked

Tools & Utilitiesagntswrm/agent-media

agent-media

Agent-first media toolkit for image, video, and audio processing. Use when you need to resize, convert, generate images, remove backgrounds, extract audio, transcribe speech, or generate videos. All commands return deterministic JSON output.

🇺🇸|EnglishTranslated

Tools & Utilitiesliang121/video-summarizer

video-summarizer

Download videos from 1800+ platforms (YouTube, Bilibili, Twitter/X, TikTok, Vimeo, Instagram, etc.) and generate complete resource package with video, audio, subtitles, and AI summary. Actions: summarize, download, transcribe, extract video content. Platforms: youtube.com, bilibili.com, twitter.com, x.com, tiktok.com, vimeo.com, instagram.com, twitch.tv. Outputs: MP4 video, MP3 audio, VTT subtitles with timestamps, TXT transcript, MD AI summary. Auto-installs uv, yt-dlp, ffmpeg. Python dependencies managed by uv.

🇺🇸|EnglishTranslated

2 scripts/Attention

AI & Machine Learningmnvsk97/eyeroll

watch-video

Analyze videos, screen recordings, and screenshots to generate structured, actionable notes for coding agents. Supports Loom, YouTube, and local files. Extracts visual context, on-screen text, and audio narration. Use when someone shares a video and you need to understand what it shows.

🇺🇸|EnglishTranslated

Backend Developmentanthropics/knowledge-work...

zoom-rtms

Reference skill for Zoom RTMS. Use after routing to a live-media workflow when processing real-time audio, video, chat, transcripts, screen share, or contact-center voice streams.

🇺🇸|EnglishTranslated

AI & Machine Learningerichowens/some_claude_sk...

wedding-immortalist

Transform thousands of wedding photos and hours of footage into an immersive 3D Gaussian Splatting experience with theatre mode replay, face-clustered guest roster, and AI-curated best photos per person. Expert in 3DGS pipelines, face clustering, aesthetic scoring, and adaptive design matching the couple's wedding theme (disco, rustic, modern, LGBTQ+ celebrations). Activate on "wedding photos", "wedding video", "3D wedding", "Gaussian Splatting wedding", "wedding memory", "wedding immortalize", "face clustering wedding", "best wedding photos". NOT for general photo editing (use native-app-designer), non-wedding 3DGS (use drone-inspection-specialist), or event planning (not a wedding planner).

🇺🇸|EnglishTranslated