Image Processing Skill

Overview

Capability	Description	Script
Text-to-image	Generate images based on Chinese text descriptions	`scripts/text_to_image.py`
Image-to-image	Edit based on existing images	`scripts/image_to_image.py`
Image-to-text	Analyze image content (description, OCR, charts, etc.)	`scripts/image_to_text.py`
Long image stitching	Vertically stitch multiple images into a WeChat long image	`scripts/merge_long_image.py`
Research illustration	Preset hand-drawn style infographics for research reports	`scripts/research_image.py`

Configuration

Configuration file:

config/settings.json

Configuration Item	Value
IMAGE_API_BASE_URL	`https://llm.api.zyuncs.com/v1`
IMAGE_MODEL	`lyra-flash-9`
VISION_MODEL	`qwen2.5-vl-72b-instruct`

Execution Specifications

Images are saved to the current working directory by default when executing commands:

Do NOT use
```
workdir
```
to switch to the skill directory for command execution
Always execute in the user's working directory, using the absolute path of the script
Script path:
```
scripts/
```
under the skill directory

bash

# Correct Example (replace PYTHON and SKILL_DIR with actual paths in your environment)
$PYTHON $SKILL_DIR/scripts/text_to_image.py "description" -r 3:4 -o output.png

Quick Start

Text-to-image

bash

$PYTHON $SKILL_DIR/scripts/text_to_image.py "Infographic style, title: AI Technology Trends" -r 16:9
$PYTHON $SKILL_DIR/scripts/text_to_image.py "Vertical poster, product display" -r 3:4 -o poster.png

Parameters:

-r

Aspect ratio |

-s

Size |

-o

Output path

Supported ratios:

1:1

2:3

3:2

3:4

4:3

4:5

5:4

9:16

16:9

21:9

Image-to-image

bash

$PYTHON $SKILL_DIR/scripts/image_to_image.py input.png "edit description" -r 3:4

Image-to-text

bash

$PYTHON $SKILL_DIR/scripts/image_to_text.py image.jpg -m describe
$PYTHON $SKILL_DIR/scripts/image_to_text.py screenshot.png -m ocr

Modes:

describe

ocr

chart

fashion

product

scene

Long image stitching

bash

$PYTHON $SKILL_DIR/scripts/merge_long_image.py img1.png img2.png -o output.png --blend 20
$PYTHON $SKILL_DIR/scripts/merge_long_image.py -p "*.png" -o long.png --sort name

Parameters:

-p

Wildcard |

-o

Output |

-w

Width |

-g

Gap |

--blend

Blend |

--sort

Sort

Research illustration

bash

$PYTHON $SKILL_DIR/scripts/research_image.py -t arch -n "Title" -c "Content" -o output.png

Types:

arch

Architecture diagram |

flow

Flowchart |

compare

Comparison chart |

concept

Concept diagram

Pre-execution Must-do: Demand Type Judgment (Iron Rule)

After receiving an image generation request, you must first determine the type before deciding on the execution method:

Long Image Recognition Rules

A request is judged as a long image demand if the prompt contains any of the following features:

Feature Type	Recognition Keywords/Patterns
Explicit Declaration	long image, long image poster, vertical long image, WeChat long image, Infographic, Long Banner
Segmented Structure	The prompt contains multiple paragraphs (e.g., "Part 1", "Top", "Middle", "Bottom")
Numbered List	Uses numbering like `### 1.` , `### 2.` to segment content
Multi-screen Content	Describes 3 or more independent frames/modules
Top-to-bottom Layout	Contains descriptions like "from top to bottom"

Execution Path After Judgment

Identified as long image → Must first read references/long-image-guide.md → Execute according to long image process
Identified as single image → Directly use text_to_image.py to generate

Iron Rule: Once identified as a long image, direct generation is prohibited! You must first load the long image guide and execute according to the guide process.

Detailed Guides (Load On Demand)

Scenario	Trigger Condition	Reference Document
Generate multi-screen long image	Hits any of the above long image recognition rules	`references/long-image-guide.md` (Must load)
Image contains Chinese text	The prompt requires the image to include Chinese titles/text	`references/text-rendering-guide.md`
Create illustrations for PPT/documents	The user provides color requirements or reference documents	`references/color-sync-guide.md`
API interface details	Need to understand underlying implementation	`docs/api-reference.md`
Prompt engineering tips	Need to optimize prompt effects	`docs/prompt-guide.md`

Prompt Key Points

Must use Chinese to write prompts
Titles and labels in the image must be in Chinese
Default aspect ratio is 16:9, adjustable via the
```
-r
```
parameter
Recommended styles: infographic, data visualization, hand-drawn text, tech illustration

Marketing Material Generation (Product Images/Material Packs/Design Images/Element Disassembly)

When the user mentions keywords such as "product image", "material pack", "marketing material", "detail page", "e-commerce image", "design drawing", "exploded view", "image set", "nine-grid", execute according to the following processes.

Prompt Template Library:

references/marketing-templates.md

(Must load, contains complete category templates)

Capability Matrix

Capability	Trigger Keywords	Process	Output
E-commerce detail long image	Detail page, long image, product introduction	Stacked serial image generation → merge stitching	1 long image
Marketing material pack	Material pack, marketing material, multi-size	Disassemble elements → multi-angle multi-scene → zip	10-15 images + zip package
Product design image	Product image, rendering, effect image	Base image → multi-angle/color variants	3-8 images
Element disassembly diagram	Disassembly, exploded view, decomposition, close-up	Overall image → local close-up/function disassembly	4-8 images
Social media image set	Image set, nine-grid, Moments	Unified style → multi-size adaptation	9 images (1:1)
Multi-color/SKU images	Color scheme, multi-color, SKU	Base image → image-to-image color change	N images

Process 1: E-commerce Detail Long Image

Input: Product name + selling points + style
  ↓
Step 1: Plan screens (usually 5-8 screens)
  - Screen 1: Hero image (product + core selling points)
  - Screen 2-N: Expand each selling point (function/material/scene/parameters)
  - Last screen: Specification parameter table
  ↓
Step 2: Stacked serial image generation (must read references/long-image-guide.md)
  - Screen 1: Generate base image via text_to_image
  - Screen 2-N: Use image_to_image with the previous screen as reference to maintain consistent style
  ↓
Step 3: Stitch via merge_long_image (use --blend 20 to merge seams)
  ↓
Step 4: Output long image + original images of each screen

Process 2: Marketing Material Pack (Key!)

Iron Rule: Don't just resize the same image! Disassemble elements, change angles, change scenes!

Input: Product name + list of selling points + style preference
  ↓
Step 1: Generate base main image (text_to_image, full product view 16:9)
  ↓
Step 2: Element disassembly (image_to_image × 4-6 images)
  - Macro close-up of core selling points (1:1)
  - Function exploded view/disassembly diagram (3:4)
  - Material/craft details (4:3)
  - Full set of accessories (16:9)
  ↓
Step 3: Scene variants (text_to_image / image_to_image × 3-4 images)
  - Daily usage scene (16:9)
  - Work usage scene (4:3)
  - Unboxing scene (1:1)
  - Art silhouette/atmosphere image (21:9)
  ↓
Step 4: Marketing creativity (text_to_image × 3-4 images)
  - Comparison review chart (3:4)
  - Data visualization/sound wave chart (16:9)
  - Multi-color SKU display (16:9)
  - Nine-grid social media images (1:1)
  ↓
Step 5: Package all into zip + send previews one by one

Concurrency Rule: Maximum 8 concurrent images per batch, split into batches if exceeding. Retry failed ones individually.

Process 3: Product Design Image

Input: Product name + design requirements
  ↓
Step 1: Generate base main image via text_to_image (front view of product, 16:9)
  ↓
Step 2: Generate variants via image_to_image (using the base image as reference)
  - 45-degree view
  - Side/back view
  - Top view
  - Different color versions
  - Different usage scenes

Process 4: Element Disassembly Diagram

Input: Product image (existing) or product description
  ↓
Step 1: If there is a product image → disassemble via image_to_image; if no image → first generate the full view via text_to_image
  ↓
Step 2: Generate element by element (image_to_image)
  - Exploded view/decomposition perspective
  - Macro close-up of Part 1 + function annotation
  - Macro close-up of Part 2 + craft annotation
  - Macro close-up of Part 3 + material annotation
  ↓
Step 3: Optional long image stitching (merge_long_image)

Process 5: Social Media Image Set

Input: Product/theme + platform (Xiaohongshu/Moments/Weibo)
  ↓
Step 1: Determine quantity and ratio
  - Xiaohongshu: 6-9 images, 3:4
  - Moments nine-grid: 9 images, 1:1
  - Weibo: 4-9 images, 16:9 or 1:1
  ↓
Step 2: Plan content for each image (refer to nine-grid templates in marketing-templates.md)
  ↓
Step 3: Define a unified style prefix and generate concurrently
  ↓
Step 4: Output in numbered order

General Specifications

Prompts must be in Chinese, load
```
references/marketing-templates.md
```
to get templates
Unified style for the same batch: Define a style prefix and reuse it for all images
Concurrency ≤8 images, retry failed ones individually

Naming convention:

{type}_{sequence}.png

(e.g.,

detail_01.png

scene_gaming.png

)

When delivering: Send previews one by one + zip package (if multiple images)

Trigger Keywords

Generation category: generate image, create image, text-to-image, image-to-image, infographic, data visualization
Analysis category: analyze image, OCR, recognize text, image-to-text
Stitching category: long image, WeChat long image, stitch image
Marketing category: product image, material pack, marketing material, detail page, e-commerce image, design drawing, rendering, effect image
Disassembly category: disassembly, exploded view, decomposition, close-up, macro
Image set category: image set, nine-grid, Moments, multi-size, multi-color, SKU

image-service

NPX Install

Tags

SKILL.md Content (Chinese)

Image Processing Skill

Overview

Configuration

Execution Specifications

Quick Start

Text-to-image

Image-to-image

Image-to-text

Long image stitching

Research illustration

Pre-execution Must-do: Demand Type Judgment (Iron Rule)

Long Image Recognition Rules

Execution Path After Judgment

Detailed Guides (Load On Demand)

Prompt Key Points

Marketing Material Generation (Product Images/Material Packs/Design Images/Element Disassembly)

Capability Matrix

Process 1: E-commerce Detail Long Image

Process 2: Marketing Material Pack (Key!)

Process 3: Product Design Image

Process 4: Element Disassembly Diagram

Process 5: Social Media Image Set

General Specifications

Trigger Keywords