Loading...
Loading...
Found 61 Skills
AI image generation with OpenAI, Google, DashScope and Canghe APIs. Supports text-to-image, reference images, aspect ratios. Sequential by default; parallel generation available on request. Use when user asks to generate, create, or draw images.
Multimodal AI processing via Google Gemini API (2M tokens context). Capabilities: audio (transcription, 9.5hr max, summarization, music analysis), images (captioning, OCR, object detection, segmentation, visual Q&A), video (scene detection, 6hr max, YouTube URLs, temporal analysis), documents (PDF extraction, tables, forms, charts), image generation (text-to-image, editing). Actions: transcribe, analyze, extract, caption, detect, segment, generate from media. Keywords: Gemini API, audio transcription, image captioning, OCR, object detection, video analysis, PDF extraction, text-to-image, multimodal, speech recognition, visual Q&A, scene detection, YouTube transcription, table extraction, form processing, image generation, Imagen. Use when: transcribing audio/video, analyzing images/screenshots, extracting data from PDFs, processing YouTube videos, generating images from text, implementing multimodal AI features.
Write marketing copy and App Store / Google Play listings (ASO keywords, titles, subtitles, short+long descriptions, feature bullets, release notes), plus screenshot caption sets and text-to-image prompt templates for generating store screenshot backgrounds/promo visuals. Use when asked to: write/refresh app marketing copy, craft app store metadata, brainstorm taglines/value props, produce ad/landing/email copy, or generate prompts for screenshot/creative generation.
Generates image prompts for Seedream 5.0/4.0 (Jimeng AI), and can call the API to generate images and automatically download them to the output/ directory. Workflow: describe your idea → the agent outputs a prompt for review → user confirms → the agent runs generate.py. It covers text-to-image, image editing, multi-image fusion, character consistency, knowledge cards, posters, PPT backgrounds, e-commerce images, avatars, and group/storyboard generation. Activate this tool when the user mentions terms like seedream, jimeng, AI image generation, text-to-image, image-to-image, seedream prompt, prompt keyword, one-click image generation, knowledge card, poster design, e-commerce image, character consistency, or image generation.
Generate web assets including favicons, app icons (PWA), and social media meta images (Open Graph) for Facebook, Twitter, WhatsApp, and LinkedIn. Use when users need icons, favicons, social sharing images, or Open Graph images from logos or text slogans. Handles image resizing, text-to-image generation, and provides proper HTML meta tags.
Generate and edit high-quality images using Gemini 2.5 Flash Image and Gemini 3 Pro Image (Nano Banana). Supports Text-to-Image, Style Transfer, Virtual Try-On, and Character Consistency.
Process and generate multimedia content using Google Gemini API for better vision capabilities. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (better image analysis than Claude models, captioning, reasoning, object detection, design extraction, OCR, visual Q&A, segmentation, handle multiple images), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image with Imagen 4, editing, composition, refinement), generate videos (text-to-video with Veo 3, 8-second clips with native audio). Use when working with audio/video files, analyzing images or screenshots (instead of default vision capabilities of Claude, only fallback to Claude's vision capabilities if needed), processing PDF documents, extracting structured data from media, creating images/videos from text prompts, or implementing multimodal AI features. Supports Gemini 3/2.5, Imagen 4, and Veo 3 models with context windows up to 2M tokens.
AI image generation and editing capabilities, implemented based on Nano Banana (Gemini Image) to support text-to-image, image-to-image, and image editing. Suitable for scenarios such as creative design, marketing materials, social media content, and presentation illustrations. Supports multiple styles, high-resolution output (up to 4K), text rendering, and character consistency preservation.
AI Image Generation and Processing Workflow. Generate images via prompts, supporting text-to-image, image-to-image, batch generation, image hosting management, long image merging, and PPT packaging. The core feature is generating images with one-by-one confirmation to avoid wasting API credits.
Generate/edit images with Nano Banana Pro via grsai.com API. Use for image create/modify requests incl. edits. Supports text-to-image + image-to-image; 1K/2K/4K; use --input-image.
Image generation and editing using Google Gemini's Nano Banana Pro (gemini-3-pro-image-preview) model. Use when user requests: "Generate an image", "Create an image", "Make me a picture", "Draw", "Edit that image", "Change the color", "Remove background", "Add transparency", "Modify this image", "Make it transparent", "Change the style", "Add text to image", or any image creation/manipulation task. Supports text-to-image generation, image editing, multi-turn conversations, and transparency extraction via difference matting technique.
Generate images with Alibaba Qwen-Image-2.0-Pro via inference.sh CLI. Professional text rendering, fine-grained realism, enhanced semantic adherence. Ideal for posters, banners, and text-heavy designs. Triggers: qwen image pro, qwen-image-pro, qwen 2 pro, alibaba image pro, dashscope pro, professional text rendering