Loading...
Loading...
Found 39 Skills
Use when transcribing audio/video to text with timestamps, speaker labels, and chapters. Supports YouTube URLs and local files. Produces structured markdown output.
Use when translating captions/captions to another language. Supports bilingual output and context-aware translation. Default uses Claude native, Gemini API optional.
This skill should be used when generating and editing images using the Gemini API (Nano Banana Pro). It applies when creating images from text prompts, editing existing images, applying style transfers, generating logos with text, creating stickers, product mockups, or any image generation/manipulation task. Supports text-to-image, image editing, multi-turn refinement, and composition from multiple reference images.
Generate ideas and explore possibilities with AI. Use for creative problem solving, generating alternatives, and expanding on concepts.
Generate or edit images using Google Gemini API via nanobanana. Triggers: "nanobanana", "generate image", "create image", "edit image", "AI drawing", "图片生成", "AI绘图", "图片编辑", "生成图片".
Generate or edit images using Google Gemini API via nanobanana. Use when the user asks to create, generate, edit images with nanobanana, or mentions image generation/editing tasks.
Official skill for integrating Firebase AI Logic (Gemini API) into web applications. Covers setup, multimodal inference, structured output, and security.
Generate images using Google's Gemini API — hero backgrounds, OG images, placeholder photos, textures, and style-matched variants. Uses free-tier models for drafts, paid for finals. No dependencies beyond Python 3. Trigger with 'generate image', 'gemini image', 'make a hero background', 'create placeholder photo', 'generate OG image', 'AI image', or 'need an image for'.
Multimodal AI processing via Google Gemini API (2M tokens context). Capabilities: audio (transcription, 9.5hr max, summarization, music analysis), images (captioning, OCR, object detection, segmentation, visual Q&A), video (scene detection, 6hr max, YouTube URLs, temporal analysis), documents (PDF extraction, tables, forms, charts), image generation (text-to-image, editing). Actions: transcribe, analyze, extract, caption, detect, segment, generate from media. Keywords: Gemini API, audio transcription, image captioning, OCR, object detection, video analysis, PDF extraction, text-to-image, multimodal, speech recognition, visual Q&A, scene detection, YouTube transcription, table extraction, form processing, image generation, Imagen. Use when: transcribing audio/video, analyzing images/screenshots, extracting data from PDFs, processing YouTube videos, generating images from text, implementing multimodal AI features.
Generate and edit images using Google's Nano Banana 2 (Gemini 3.1 Flash Image Preview) API. This skill should be used when the user asks to create or modify images, especially when they need fast iteration, explicit aspect-ratio control, or resolution control from 512px to 4K.
Guides the usage of Gemini API on Google Cloud Vertex AI with the Gen AI SDK. Use when the user asks about using Gemini in an enterprise environment or explicitly mentions Vertex AI. Covers SDK usage (Python, JS/TS, Go, Java, C#), capabilities like Live API, tools, multimedia generation, caching, and batch prediction.
Operate OpenWord end-to-end for live adventure sessions. Use when Codex needs to download/install/start OpenWord, guide a human player in the browser, or play autonomously through REST API (create/load game, do_action loop, state/image retrieval), including configuring GEMINI_API_KEY and sharing interesting scenes and choices during play.