Loading...
Loading...
Found 117 Skills
Use this skill whenever a user asks to generate, create, draw, render, or edit images with GPT Image 2 / gpt-image-2, text-to-image, reference-image editing, inpainting, posters, typography, Chinese text, UI mockups, diagrams, or gallery prompts. Analyze the user's prompt, search the bundled Reference Gallery/craft files for matching design patterns, confer on direction when useful, then call the packaged `gpt-image` CLI or bundled `scripts/generate.py`. Do not write new image-generation code unless explicitly asked to modify this repo.
MiniMax multimodal model skill — use MiniMax Multi-Modal models for speech, music, video, and image. Create voice, music, video, and images with MiniMax AI: TTS (text-to-speech, voice cloning, voice design, multi-segment), music (songs, instrumentals), video (text-to-video, image-to-video, start-end frame, subject reference, templates, long-form multi-scene), image (text-to-image, image-to-image with character reference), and media processing (convert, concat, trim, extract). Use when the user mentions MiniMax, multimodal generation, or wants speech/music/video/image AI, MiniMax APIs, or FFmpeg workflows alongside MiniMax outputs.
Choose the right fal.ai endpoint for a given task. Modality-organized catalog of production endpoint defaults, text-to-image, image-to-image, text-to-video, image-to-video, and more. Use when the user has not named a specific model, or asks "which model for X", "best endpoint for Y", "what should I use for Z".
Craft high-quality natural-language image prompts for any modern text-to-image or image-edit model that accepts flowing English. Trigger when the user wants help writing, rewriting, improving, or translating an English natural-language image prompt — including "write me an image prompt", "improve this image prompt", "describe this scene for an image model", or "convert these tags into a natural language prompt". Do NOT trigger for requests that are purely about dispatching to an image API, choosing samplers/schedulers, picking LoRAs, or setting up ControlNet — those belong to a runtime skill.
Process and generate multimedia content using Google Gemini API. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (captioning, object detection, OCR, visual Q&A, segmentation), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image, editing, composition, refinement). Use when working with audio/video files, analyzing images or screenshots, processing PDF documents, extracting structured data from media, creating images from text prompts, or implementing multimodal AI features. Supports multiple models (Gemini 2.5/2.0) with context windows up to 2M tokens.
Score and compare images using vision LLMs as judges. YAML-defined criteria presets for 11 use cases (text-to-image, photorealism, document OCR, charts, UI, portrait, product, scientific, invoice, alt-text, artistic style). Supports OpenAI, Anthropic, Gemini, Mistral, and OpenRouter as judge providers. Keys auto-decrypted via SOPS + age.
Alicloud OSS AI Content Awareness Skill. Use for enabling and querying OSS semantic search with AI-powered content understanding. Triggers: "OSS AI Content Awareness", "OSS semantic search", "OSS vector search", "search by text", "text-to-image search", "text-to-video search", "OSS MetaQuery", "OSS data index", "OSS AI内容感知", "OSS语义检索", "OSS向量检索", "以文搜图", "以文搜视频", "OSS数据索引"
Generate and edit images using OpenAI's GPT Image v2 via EachLabs. Supports text-to-image (gpt-image-v2-text-to-image) and instruction-based editing (gpt-image-v2-edit). Use when the user specifically asks for GPT Image 2 / OpenAI image generation, or needs high-fidelity photorealism, precise text rendering, or reference-faithful edits.
Build full-stack web applications powered by Google Gemini's Nano Banana & Nano Banana Pro image generation APIs. Use when creating Next.js image generation apps, text-to-image tools, or iterative image editors.
3D-style image generation: 3D characters, product renders, isometric dioramas, 3D icons, 3D text, interior design renders, architectural visualization, 3D scenes, game assets. Use when generating 3D-style 2D images from text descriptions or reference photos (e.g. 3D character design, isometric diorama, 3D product render, interior design visualization, architectural render, 3D app icon, 3D text effect, game asset render).
Generate article cover images with 5 dimensions (type, palette, rendering, text, mood). Supports cinematic (2.35:1), widescreen (16:9), and square (1:1) aspects. Use when user asks to 'generate cover image', 'create article cover', or 'make cover'.
Generate images using Google's Gemini API. Use when creating images from text prompts, editing existing images, or combining reference images for AI-generated visual content.