Loading...
Loading...
Found 937 Skills
Read, watch, and listen to video/audio files. Use Gemini for native video understanding, or extract key frames + Whisper transcription as fallback. Use when a user sends a video/audio and asks about its content, what's in it, what someone said, etc.
Multimodal AI processing via Google Gemini API (2M tokens context). Capabilities: audio (transcription, 9.5hr max, summarization, music analysis), images (captioning, OCR, object detection, segmentation, visual Q&A), video (scene detection, 6hr max, YouTube URLs, temporal analysis), documents (PDF extraction, tables, forms, charts), image generation (text-to-image, editing). Actions: transcribe, analyze, extract, caption, detect, segment, generate from media. Keywords: Gemini API, audio transcription, image captioning, OCR, object detection, video analysis, PDF extraction, text-to-image, multimodal, speech recognition, visual Q&A, scene detection, YouTube transcription, table extraction, form processing, image generation, Imagen. Use when: transcribing audio/video, analyzing images/screenshots, extracting data from PDFs, processing YouTube videos, generating images from text, implementing multimodal AI features.
HTTP Networking with Dio, Retry & Caching Patterns
Core Python development concepts, idioms, best practices, and language features. Covers Python 3.10+ features, type hints, async/await, and Pythonic patterns. For running scripts, see uv-run. For project setup, see uv-project-management. Use when user mentions Python, type hints, async Python, decorators, context managers, or writing Pythonic code.
Microsoft 365 Agents SDK for .NET. Build multichannel agents for Teams/M365/Copilot Studio with ASP.NET Core hosting, AgentApplication routing, and MSAL-based auth.
Expert blueprint for rhythm games including audio synchronization (BPM conductor, latency compensation with AudioServer.get_time_since_last_mix), note highways (scroll speed, timing windows), judgment systems (Perfect/Great/Good/Bad/Miss), scoring with combo multipliers, input processing (lane-based, hold note detection), and chart/beatmap loading. Based on DDR/osu!/Beat Saber research. Trigger keywords: rhythm_game, audio_sync, timing_judgment, note_highway, combo_system, BPM_conductor, latency_compensation.
Comprehensive Python expertise covering language fundamentals, idiomatic patterns, software design principles, and production best practices. Use when writing, reviewing, debugging, or refactoring Python code. Triggers: Python, .py files, pip, uv, pytest, dataclasses, asyncio, type hints, or any Python library.
Transcribe audio files using Qwen ASR. Use when the user sends voice messages and wants them converted to text.
Эксперт podcast production. Используй для создания подкастов, audio editing и distribution.
Minimal video editing smoke test for Model Studio Wan edit models.
Minimal rerank smoke test for Model Studio rerank models.
Use when visual reasoning is needed with Alibaba Cloud Model Studio QVQ models, including step-by-step image reasoning, chart analysis, and visually grounded problem solving.