Loading...
Loading...
Found 937 Skills
State-of-the-art Machine Learning for PyTorch, TensorFlow, and JAX. Provides thousands of pretrained models to perform tasks on different modalities such as text, vision, and audio. The industry standard for Large Language Models (LLMs) and foundation models in science.
Master router for immersive visual experiences combining React Three Fiber, shaders, particles, post-processing, GSAP animation, and audio. Use when building 3D web experiences, visualizers, creative coding projects, interactive installations, or any project requiring multiple visual technologies. Dispatches to 6 specialized domain routers covering 29 total skills.
Use this skill when building AI voice agents with the ElevenLabs Agents Platform. This skill covers the complete platform including agent configuration (system prompts, turn-taking, workflows), voice & language features (multi-voice, pronunciation, speed control), knowledge base (RAG), tools (client/server/MCP/system), SDKs (React, JavaScript, React Native, Swift, Widget), Scribe (real-time STT), WebRTC/WebSocket connections, testing & evaluation, analytics, privacy/compliance (GDPR/HIPAA/SOC 2), cost optimization, CLI workflows ("agents as code"), and DevOps integration. Prevents 17+ common errors including package deprecation, Android audio cutoff, CSP violations, missing dynamic variables, case-sensitive tool names, webhook authentication failures, and WebRTC configuration issues. Provides production-tested templates for React, Next.js, React Native, Swift, and Cloudflare Workers. Token savings: ~73% (22k → 6k tokens). Production tested. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication
Download videos, audio, playlists, and channels from YouTube and 1000+ websites using yt-dlp. Supports quality selection, format conversion, subtitle download, playlist filtering, metadata extraction, thumbnail download, and batch operations. Use when downloading YouTube videos in any quality (4K, 8K, HDR), extracting audio as MP3/M4A/FLAC, downloading entire playlists/channels, getting subtitles in multiple languages, converting to specific formats, downloading live streams, archiving content, or batch processing multiple URLs. Optimized for reliability with automatic retries, rate limiting, and error handling.
Complete guide for Google Gemini API using the CORRECT current SDK (@google/genai v1.27+, NOT the deprecated @google/generative-ai). Covers text generation, multimodal inputs (text + images + video + audio + PDFs), function calling, thinking mode, streaming, and system instructions with accurate 2025 model information (Gemini 2.5 Pro/Flash/Flash-Lite with 1M input tokens, NOT 2M). Use when: integrating Gemini API, implementing multimodal AI applications, using thinking mode for complex reasoning, function calling with parallel execution, streaming responses, deploying to Cloudflare Workers, building chat applications, or encountering SDK deprecation warnings, context window errors, model not found errors, function calling failures, or multimodal format errors. Keywords: gemini api, @google/genai, gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite, multimodal gemini, thinking mode, google ai, genai sdk, function calling gemini, streaming gemini, gemini vision, gemini video, gemini audio, gemini pdf, system instructions, multi-turn chat, DEPRECATED @google/generative-ai, gemini context window, gemini models 2025, gemini 1m tokens, gemini tool use, parallel function calling, compositional function calling
Production-tested setup for AutoAnimate (@formkit/auto-animate) - a zero-config, drop-in animation library that automatically adds smooth transitions when DOM elements are added, removed, or moved. This skill should be used when building UIs that need simple, automatic animations for lists, accordions, toasts, or form validation messages without the complexity of full animation libraries. Use when: Adding smooth animations to dynamic lists, building filter/sort interfaces, creating accordion components, implementing toast notifications, animating form validation messages, needing simple transitions without animation code, working with Vite + React + Tailwind, deploying to Cloudflare Workers Static Assets, or encountering SSR errors with animation libraries. Keywords: auto-animate, @formkit/auto-animate, formkit, zero-config animation, automatic animations, drop-in animation, list animations, accordion animation, toast animation, form validation animation, lightweight animation, 2kb animation, prefers-reduced-motion, accessible animations, vite react animation, cloudflare workers animation, ssr safe animation
Complete guide for OpenAI's traditional/stateless APIs: Chat Completions (GPT-5, GPT-4o), Embeddings, Images (DALL-E 3), Audio (Whisper + TTS), and Moderation. Includes both Node.js SDK and fetch-based approaches for maximum compatibility. Use when: integrating OpenAI APIs, implementing chat completions with GPT-5/GPT-4o, generating text with streaming, using function calling/tools, creating structured outputs with JSON schemas, implementing embeddings for RAG, generating images with DALL-E 3, transcribing audio with Whisper, synthesizing speech with TTS, moderating content, deploying to Cloudflare Workers, or encountering errors like rate limits (429), invalid API keys (401), function calling failures, streaming parse errors, embeddings dimension mismatches, or token limit exceeded. Keywords: openai api, chat completions, gpt-5, gpt-5-mini, gpt-5-nano, gpt-4o, gpt-4-turbo, openai sdk, openai streaming, function calling, structured output, json schema, openai embeddings, text-embedding-3, dall-e-3, image generation, whisper api, openai tts, text-to-speech, moderation api, openai fetch, cloudflare workers openai, openai rate limit, openai 429, reasoning_effort, verbosity
Python package for working with DICOM files. It allows you to read, modify, and write DICOM data in a Pythonic way. Essential for medical imaging processing, clinical data extraction, and AI in radiology.
Cross-compile OBS Studio plugins from Linux to Windows using MinGW, CMake presets, and CI/CD workflows. Covers toolchain files, headers-only linking, OBS SDK fetching, and multi-platform artifact packaging. Use when building OBS plugins for Windows from Linux or setting up CI pipelines.
Build OBS Studio plugins for Windows using MSVC or MinGW. Covers Visual Studio setup, .def file exports, Windows linking (ws2_32, comctl32), platform-specific sources, and DLL verification. Use when building OBS plugins natively on Windows or troubleshooting Windows builds.
Process and generate multimedia content using Google Gemini API for better vision capabilities. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (better image analysis than Claude models, captioning, reasoning, object detection, design extraction, OCR, visual Q&A, segmentation, handle multiple images), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image with Imagen 4, editing, composition, refinement), generate videos (text-to-video with Veo 3, 8-second clips with native audio). Use when working with audio/video files, analyzing images or screenshots (instead of default vision capabilities of Claude, only fallback to Claude's vision capabilities if needed), processing PDF documents, extracting structured data from media, creating images/videos from text prompts, or implementing multimodal AI features. Supports Gemini 3/2.5, Imagen 4, and Veo 3 models with context windows up to 2M tokens.
Expertise in Go programming according to the Google Go Style Guide. Use when the user needs to write, refactor, or review Go code for clarity, simplicity, and maintainability. This skill ensures adherence to Google's official Go idioms, formatting, and the "Least Mechanism" principle.