Loading...
Loading...
Found 311 Skills
Convert a YouTube video into infographic slides. Extracts transcript, segments into sections, summarizes, and generates stylized infographic images using Gemini AI. 5 styles: davinci, magazine, comic, geek, chalkboard. Use when user wants slide summaries from YouTube.
Use this skill when the user asks about current events, real-time information, recent news, live scores, financial data, price updates, recent changes, or any question that may require up-to-date web information. This skill first determines if web search is necessary using Gemini, then attempts Google Search Grounding via Gemini, and automatically falls back to Tavily if any failure occurs.
Generate images using Google's Gemini API — hero backgrounds, OG images, placeholder photos, textures, and style-matched variants. Uses free-tier models for drafts, paid for finals. No dependencies beyond Python 3. Trigger with 'generate image', 'gemini image', 'make a hero background', 'create placeholder photo', 'generate OG image', 'AI image', or 'need an image for'.
Use this skill to query your Google NotebookLM notebooks directly from Claude Code for source-grounded, citation-backed answers from Gemini. Browser automation, library management, persistent auth. Drastically reduced hallucinations through document-only responses.
AI image generation and editing using Google Gemini models (Nano Banana). Use when the user asks to generate an image, create an image, edit an image, or references "nano banana", "nanobanana", or "gemini image". Supports text-to-image, image editing, multi-image references, and 1K/2K/4K resolution.
Read, watch, and listen to video/audio files. Use Gemini for native video understanding, or extract key frames + Whisper transcription as fallback. Use when a user sends a video/audio and asks about its content, what's in it, what someone said, etc.
Model Context Protocol (MCP) server development and tool management. Languages: Python, TypeScript. Capabilities: build MCP servers, integrate external APIs, discover/execute MCP tools, manage multi-server configs, design agent-centric tools. Actions: create, build, integrate, discover, execute, configure MCP servers/tools. Keywords: MCP, Model Context Protocol, MCP server, MCP tool, stdio transport, SSE transport, tool discovery, resource provider, prompt template, external API integration, Gemini CLI MCP, Claude MCP, agent tools, tool execution, server config. Use when: building MCP servers, integrating external APIs as MCP tools, discovering available MCP tools, executing MCP capabilities, configuring multi-server setups, designing tools for AI agents.
Multimodal AI processing via Google Gemini API (2M tokens context). Capabilities: audio (transcription, 9.5hr max, summarization, music analysis), images (captioning, OCR, object detection, segmentation, visual Q&A), video (scene detection, 6hr max, YouTube URLs, temporal analysis), documents (PDF extraction, tables, forms, charts), image generation (text-to-image, editing). Actions: transcribe, analyze, extract, caption, detect, segment, generate from media. Keywords: Gemini API, audio transcription, image captioning, OCR, object detection, video analysis, PDF extraction, text-to-image, multimodal, speech recognition, visual Q&A, scene detection, YouTube transcription, table extraction, form processing, image generation, Imagen. Use when: transcribing audio/video, analyzing images/screenshots, extracting data from PDFs, processing YouTube videos, generating images from text, implementing multimodal AI features.
AI image generation skill powered by Google Gemini, enabling seamless visual content creation for UI placeholders, documentation, and design assets.
使用 Gemini 图像生成 API 生成或修改图片。支持自定义 API Key 和 Base URL。
Guides the usage of Gemini API on Google Cloud Vertex AI with the Gen AI SDK. Use when the user asks about using Gemini in an enterprise environment or explicitly mentions Vertex AI. Covers SDK usage (Python, JS/TS, Go, Java, C#), capabilities like Live API, tools, multimedia generation, caching, and batch prediction.
Operate OpenWord end-to-end for live adventure sessions. Use when Codex needs to download/install/start OpenWord, guide a human player in the browser, or play autonomously through REST API (create/load game, do_action loop, state/image retrieval), including configuring GEMINI_API_KEY and sharing interesting scenes and choices during play.