Image Processing Skill
Overview
| Capability | Description | Script |
|---|
| Text-to-image | Generate images based on Chinese text descriptions | |
| Image-to-image | Edit based on existing images | scripts/image_to_image.py
|
| Image-to-text | Analyze image content (description, OCR, charts, etc.) | |
| Long image stitching | Vertically stitch multiple images into a WeChat long image | scripts/merge_long_image.py
|
| Research illustration | Preset hand-drawn style infographics for research reports | scripts/research_image.py
|
Configuration
| Configuration Item | Value |
|---|
| IMAGE_API_BASE_URL | https://llm.api.zyuncs.com/v1
|
| IMAGE_MODEL | |
| VISION_MODEL | |
Execution Specifications
Images are saved to the current working directory by default when executing commands:
- Do NOT use to switch to the skill directory for command execution
- Always execute in the user's working directory, using the absolute path of the script
- Script path: under the skill directory
bash
# Correct Example (replace PYTHON and SKILL_DIR with actual paths in your environment)
$PYTHON $SKILL_DIR/scripts/text_to_image.py "description" -r 3:4 -o output.png
Quick Start
Text-to-image
bash
$PYTHON $SKILL_DIR/scripts/text_to_image.py "Infographic style, title: AI Technology Trends" -r 16:9
$PYTHON $SKILL_DIR/scripts/text_to_image.py "Vertical poster, product display" -r 3:4 -o poster.png
Parameters:
Aspect ratio |
Size |
Output path
Supported ratios:
,
,
,
,
,
,
,
,
,
Image-to-image
bash
$PYTHON $SKILL_DIR/scripts/image_to_image.py input.png "edit description" -r 3:4
Image-to-text
bash
$PYTHON $SKILL_DIR/scripts/image_to_text.py image.jpg -m describe
$PYTHON $SKILL_DIR/scripts/image_to_text.py screenshot.png -m ocr
Long image stitching
bash
$PYTHON $SKILL_DIR/scripts/merge_long_image.py img1.png img2.png -o output.png --blend 20
$PYTHON $SKILL_DIR/scripts/merge_long_image.py -p "*.png" -o long.png --sort name
Parameters:
Wildcard |
Output |
Width |
Gap |
Blend |
Sort
Research illustration
bash
$PYTHON $SKILL_DIR/scripts/research_image.py -t arch -n "Title" -c "Content" -o output.png
Types:
Architecture diagram |
Flowchart |
Comparison chart |
Concept diagram
Pre-execution Must-do: Demand Type Judgment (Iron Rule)
After receiving an image generation request, you must first determine the type before deciding on the execution method:
Long Image Recognition Rules
A request is judged as a long image demand if the prompt contains any of the following features:
| Feature Type | Recognition Keywords/Patterns |
|---|
| Explicit Declaration | long image, long image poster, vertical long image, WeChat long image, Infographic, Long Banner |
| Segmented Structure | The prompt contains multiple paragraphs (e.g., "Part 1", "Top", "Middle", "Bottom") |
| Numbered List | Uses numbering like , to segment content |
| Multi-screen Content | Describes 3 or more independent frames/modules |
| Top-to-bottom Layout | Contains descriptions like "from top to bottom" |
Execution Path After Judgment
Identified as long image → Must first read references/long-image-guide.md → Execute according to long image process
Identified as single image → Directly use text_to_image.py to generate
Iron Rule: Once identified as a long image, direct generation is prohibited! You must first load the long image guide and execute according to the guide process.
Detailed Guides (Load On Demand)
| Scenario | Trigger Condition | Reference Document |
|---|
| Generate multi-screen long image | Hits any of the above long image recognition rules | references/long-image-guide.md
(Must load) |
| Image contains Chinese text | The prompt requires the image to include Chinese titles/text | references/text-rendering-guide.md
|
| Create illustrations for PPT/documents | The user provides color requirements or reference documents | references/color-sync-guide.md
|
| API interface details | Need to understand underlying implementation | |
| Prompt engineering tips | Need to optimize prompt effects | |
Prompt Key Points
- Must use Chinese to write prompts
- Titles and labels in the image must be in Chinese
- Default aspect ratio is 16:9, adjustable via the parameter
- Recommended styles: infographic, data visualization, hand-drawn text, tech illustration
Marketing Material Generation (Product Images/Material Packs/Design Images/Element Disassembly)
When the user mentions keywords such as "product image", "material pack", "marketing material", "detail page", "e-commerce image", "design drawing", "exploded view", "image set", "nine-grid", execute according to the following processes.
Prompt Template Library:
references/marketing-templates.md
(Must load, contains complete category templates)
Capability Matrix
| Capability | Trigger Keywords | Process | Output |
|---|
| E-commerce detail long image | Detail page, long image, product introduction | Stacked serial image generation → merge stitching | 1 long image |
| Marketing material pack | Material pack, marketing material, multi-size | Disassemble elements → multi-angle multi-scene → zip | 10-15 images + zip package |
| Product design image | Product image, rendering, effect image | Base image → multi-angle/color variants | 3-8 images |
| Element disassembly diagram | Disassembly, exploded view, decomposition, close-up | Overall image → local close-up/function disassembly | 4-8 images |
| Social media image set | Image set, nine-grid, Moments | Unified style → multi-size adaptation | 9 images (1:1) |
| Multi-color/SKU images | Color scheme, multi-color, SKU | Base image → image-to-image color change | N images |
Process 1: E-commerce Detail Long Image
Input: Product name + selling points + style
↓
Step 1: Plan screens (usually 5-8 screens)
- Screen 1: Hero image (product + core selling points)
- Screen 2-N: Expand each selling point (function/material/scene/parameters)
- Last screen: Specification parameter table
↓
Step 2: Stacked serial image generation (must read references/long-image-guide.md)
- Screen 1: Generate base image via text_to_image
- Screen 2-N: Use image_to_image with the previous screen as reference to maintain consistent style
↓
Step 3: Stitch via merge_long_image (use --blend 20 to merge seams)
↓
Step 4: Output long image + original images of each screen
Process 2: Marketing Material Pack (Key!)
Iron Rule: Don't just resize the same image! Disassemble elements, change angles, change scenes!
Input: Product name + list of selling points + style preference
↓
Step 1: Generate base main image (text_to_image, full product view 16:9)
↓
Step 2: Element disassembly (image_to_image × 4-6 images)
- Macro close-up of core selling points (1:1)
- Function exploded view/disassembly diagram (3:4)
- Material/craft details (4:3)
- Full set of accessories (16:9)
↓
Step 3: Scene variants (text_to_image / image_to_image × 3-4 images)
- Daily usage scene (16:9)
- Work usage scene (4:3)
- Unboxing scene (1:1)
- Art silhouette/atmosphere image (21:9)
↓
Step 4: Marketing creativity (text_to_image × 3-4 images)
- Comparison review chart (3:4)
- Data visualization/sound wave chart (16:9)
- Multi-color SKU display (16:9)
- Nine-grid social media images (1:1)
↓
Step 5: Package all into zip + send previews one by one
Concurrency Rule: Maximum 8 concurrent images per batch, split into batches if exceeding. Retry failed ones individually.
Process 3: Product Design Image
Input: Product name + design requirements
↓
Step 1: Generate base main image via text_to_image (front view of product, 16:9)
↓
Step 2: Generate variants via image_to_image (using the base image as reference)
- 45-degree view
- Side/back view
- Top view
- Different color versions
- Different usage scenes
Process 4: Element Disassembly Diagram
Input: Product image (existing) or product description
↓
Step 1: If there is a product image → disassemble via image_to_image; if no image → first generate the full view via text_to_image
↓
Step 2: Generate element by element (image_to_image)
- Exploded view/decomposition perspective
- Macro close-up of Part 1 + function annotation
- Macro close-up of Part 2 + craft annotation
- Macro close-up of Part 3 + material annotation
↓
Step 3: Optional long image stitching (merge_long_image)
Process 5: Social Media Image Set
Input: Product/theme + platform (Xiaohongshu/Moments/Weibo)
↓
Step 1: Determine quantity and ratio
- Xiaohongshu: 6-9 images, 3:4
- Moments nine-grid: 9 images, 1:1
- Weibo: 4-9 images, 16:9 or 1:1
↓
Step 2: Plan content for each image (refer to nine-grid templates in marketing-templates.md)
↓
Step 3: Define a unified style prefix and generate concurrently
↓
Step 4: Output in numbered order
General Specifications
- Prompts must be in Chinese, load
references/marketing-templates.md
to get templates
- Unified style for the same batch: Define a style prefix and reuse it for all images
- Concurrency ≤8 images, retry failed ones individually
- Naming convention: (e.g., , )
- When delivering: Send previews one by one + zip package (if multiple images)
Trigger Keywords
- Generation category: generate image, create image, text-to-image, image-to-image, infographic, data visualization
- Analysis category: analyze image, OCR, recognize text, image-to-text
- Stitching category: long image, WeChat long image, stitch image
- Marketing category: product image, material pack, marketing material, detail page, e-commerce image, design drawing, rendering, effect image
- Disassembly category: disassembly, exploded view, decomposition, close-up, macro
- Image set category: image set, nine-grid, Moments, multi-size, multi-color, SKU