Loading...
Loading...
Found 644 Skills
Read, watch, and listen to video/audio files. Use Gemini for native video understanding, or extract key frames + Whisper transcription as fallback. Use when a user sends a video/audio and asks about its content, what's in it, what someone said, etc.
Use this when users explicitly request to "generate NSFC schematic diagram/mechanism diagram" or need to convert the research mechanism, algorithm architecture, and module relationships in the proposal into "editable + embeddable" diagrams. By default, editable source files (`.drawio`) and rendered files (`.pdf`/`.svg`/`.png`) are output; when users actively mention the Nano Banana/Gemini image model, you can switch to PNG-only mode. ⚠️ Not applicable scenarios: Users only want to polish the main text (should rewrite text directly), only want to modify the format/size of existing images (should use image processing skills), and have no clear intention of requiring "schematic/mechanism diagram".
Sync skills (symlinks) and MCP settings from Claude to Gemini CLI and Codex CLI
Generate AI images using Gemini or GPT APIs directly. Covers model selection (Gemini for scenes, GPT for transparent icons), the 5-part prompting framework, API calling patterns, multi-turn editing, and quality assurance. Produces photorealistic scenes, icons, illustrations, OG images, and product shots. Use when building websites that need images, creating marketing assets, or generating visual content. Triggers: 'generate image', 'ai image', 'create hero image', 'make an icon', 'generate illustration', 'create og image', 'ai art', 'image generation'.
Generate and edit images using Google's Gemini image generation models (Nano Banana family). Supports style presets, platform-specific sizing (YouTube/slides/blog), variants, image editing via inlineData, reference images for style transfer, and organized output with metadata. Default model is Nano Banana 2 (gemini-3.1-flash-image-preview). Key is auto-decrypted via SOPS.
Transforms content (URLs, uploaded documents, pasted text, meeting transcripts) into professional visualizations across four output modes. Accepts a mode argument or a keyword trigger in the user message. Mode "diagram" produces an Excalidraw diagram via Excalidraw:create_view. Mode "infographic" generates a Swiss Pulse PNG via the Gemini image-generation API. Mode "visualize" renders an inline Visualizer widget (SVG or HTML) via visualize:show_widget. Mode "publish" ships an interactive Swiss Pulse HTML visual to HeyGenverse via HeyGenverse:create_app and returns a shareable link. Keywords that activate the skill: "diagram it", "excalidraw this", "draw a diagram of this", "nano this", "vis it", "ver it", "hey it", "heygenverse this". Do not use for plain-text summaries, code explanations, prose responses, or generic chat visualizations without a chosen output format.
Generate images from a text prompt using Google's Nano Banana Pro (Gemini 3 Pro Image) with configurable aspect ratio, resolution (1K/2K/4K), and up to 5 reference images. Use when the user wants to generate an image, produce artwork for a blog post, create a product photo, mock up a portrait or cinematic scene, or when they mention "generate an image", "make a picture", "Nano Banana", or "Gemini image".
Decide which CLI worker (Claude, OpenCode, or Gemini) should implement a given task. Routes by task type — large-context to Gemini, mechanical to OpenCode, judgment to Claude. Returns the chosen worker and a short rationale; the caller invokes the worker via scripts/invoke-worker.sh.
CLI and skills for building, evaluating, and deploying AI agents on Google Cloud's Gemini Enterprise Agent Platform using ADK
AI image generation with Google Gemini via MCP - smart model selection, 4K output, aspect ratios, and templates
Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.
Generate and transcribe speech using Google's Gemini-TTS and Chirp 3 models. Supports Text-to-Speech (Single/Multi-speaker), Instant Custom Voice, and Speech-to-Text (Transcription/Diarization).