Loading...
Loading...
Found 86 Skills
Create effective AI image generation prompts for DALL-E, Midjourney, and Stable Diffusion. Generate prompts for various styles and use cases.
Agent-IM Conversation Skill - Create sessions, send messages such as image/video generation requests via OpenAPI, and query session progress. This skill is activated when users need to generate images/videos or query current session messages.
Generate on-brand marketing images via Codex's built-in image_generation tool. Trigger when user asks to: (a) create marketing assets (ad / logo / slide / product-mockup / scene / lighting-transform / LinkedIn-or-social carousel) for a specific brand, (b) extract or build a brand profile (DESIGN.md) from URL / Tailwind config / tokens.json / Figma Variables / CSS custom props / description / existing brand asset, (c) maintain on-brand consistency across multiple image jobs for the same brand. Do NOT trigger for: UI code generation, frontend reference imagery (use imagegen-frontend-web instead), video generation, or general image editing without brand context.
Usage guide for the `flux` CLI (@vforsh/flux) to generate and edit images via Black Forest Labs (BFL) FLUX API. Use for setting up the BFL API key, choosing models, generating images, editing with references, inpaint/outpaint, waiting for results, checking credits, using --plain/--json output, and handling common API errors (402/403/429).
External AI API integration with retry logic, rate limiting, content safety detection, and multi-turn conversation support for image generation.
Generate images using ModelScope Z-Image models (Z-Image-Turbo, Z-Image, Z-Image-Edit). Use when user asks to generate images, create artwork, or requests image generation functionality. Supports async generation with polling and optional LoRA configurations. IMPORTANT - Model Selection Rule: If the user explicitly mentions "Z-Image-Turbo" in their prompt, use "Tongyi-MAI/Z-Image-Turbo"; if they explicitly mention "Z-Image" (without Turbo), use "Tongyi-MAI/Z-Image"; otherwise, use the default "Tongyi-MAI/Z-Image-Turbo".
MiniMax multimodal model skill — use MiniMax Multi-Modal models for speech, music, video, and image. Create voice, music, video, and images with MiniMax AI: TTS (text-to-speech, voice cloning, voice design, multi-segment), music (songs, instrumentals), video (text-to-video, image-to-video, start-end frame, subject reference, templates, long-form multi-scene), image (text-to-image, image-to-image with character reference), and media processing (convert, concat, trim, extract). Use when the user mentions MiniMax, multimodal generation, or wants speech/music/video/image AI, MiniMax APIs, or FFmpeg workflows alongside MiniMax outputs.
Generate images and videos with Kling O3 — Kling's most powerful model family. Text-to-image, text-to-video, image-to-video, and video-to-video editing. Use when the user requests "Kling", "Kling O3", "Best quality video", "Kling image", "Kling video editing".
Generate images using Minimax image-01, triggered when the user says "Generate images with Minimax".
Build explicit learn/do-not-copy contracts for image and video generation references. Use this when a prompt uses benchmark videos, contact sheets, frames, or product images and you need to state exactly what the model should learn, what identity elements must change, and which references should be excluded from the first test.
Process and generate multimedia content using Google Gemini API. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (captioning, object detection, OCR, visual Q&A, segmentation), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image, editing, composition, refinement). Use when working with audio/video files, analyzing images or screenshots, processing PDF documents, extracting structured data from media, creating images from text prompts, or implementing multimodal AI features. Supports multiple models (Gemini 2.5/2.0) with context windows up to 2M tokens.
Image generation skill using Gemini Web. Generates images from text prompts via Google Gemini. Also supports text generation. Use as the image generation backend for other skills like cover-image, xhs-images, article-illustrator.