Loading...
Loading...
Multimodal image processing skill, supporting text-to-image, image-to-image, image-to-text, long image stitching, marketing material packs, product design images, element disassembly diagrams, and social media image sets. Triggered when the user mentions keywords such as "draw", "generate image", "draw XX", "image processing", "image-to-image", "OCR", "image recognition", "stitch long image", "infographic", "illustration", "product image", "material pack", "marketing material", "detail page", "e-commerce image", "design drawing", "exploded view", "disassembly", "image set", "nine-grid", etc. Note: If the user requests a video (including illustrations + voiceover), use the video-creator skill instead.
npx skill4agent add zrt-ai-lab/opencode-skills image-service| Capability | Description | Script |
|---|---|---|
| Text-to-image | Generate images based on Chinese text descriptions | |
| Image-to-image | Edit based on existing images | |
| Image-to-text | Analyze image content (description, OCR, charts, etc.) | |
| Long image stitching | Vertically stitch multiple images into a WeChat long image | |
| Research illustration | Preset hand-drawn style infographics for research reports | |
config/settings.json| Configuration Item | Value |
|---|---|
| IMAGE_API_BASE_URL | |
| IMAGE_MODEL | |
| VISION_MODEL | |
workdirscripts/# Correct Example (replace PYTHON and SKILL_DIR with actual paths in your environment)
$PYTHON $SKILL_DIR/scripts/text_to_image.py "description" -r 3:4 -o output.png$PYTHON $SKILL_DIR/scripts/text_to_image.py "Infographic style, title: AI Technology Trends" -r 16:9
$PYTHON $SKILL_DIR/scripts/text_to_image.py "Vertical poster, product display" -r 3:4 -o poster.png-r-s-o1:12:33:23:44:34:55:49:1616:921:9$PYTHON $SKILL_DIR/scripts/image_to_image.py input.png "edit description" -r 3:4$PYTHON $SKILL_DIR/scripts/image_to_text.py image.jpg -m describe
$PYTHON $SKILL_DIR/scripts/image_to_text.py screenshot.png -m ocrdescribeocrchartfashionproductscene$PYTHON $SKILL_DIR/scripts/merge_long_image.py img1.png img2.png -o output.png --blend 20
$PYTHON $SKILL_DIR/scripts/merge_long_image.py -p "*.png" -o long.png --sort name-p-o-w-g--blend--sort$PYTHON $SKILL_DIR/scripts/research_image.py -t arch -n "Title" -c "Content" -o output.pngarchflowcompareconcept| Feature Type | Recognition Keywords/Patterns |
|---|---|
| Explicit Declaration | long image, long image poster, vertical long image, WeChat long image, Infographic, Long Banner |
| Segmented Structure | The prompt contains multiple paragraphs (e.g., "Part 1", "Top", "Middle", "Bottom") |
| Numbered List | Uses numbering like |
| Multi-screen Content | Describes 3 or more independent frames/modules |
| Top-to-bottom Layout | Contains descriptions like "from top to bottom" |
Identified as long image → Must first read references/long-image-guide.md → Execute according to long image process
Identified as single image → Directly use text_to_image.py to generate| Scenario | Trigger Condition | Reference Document |
|---|---|---|
| Generate multi-screen long image | Hits any of the above long image recognition rules | |
| Image contains Chinese text | The prompt requires the image to include Chinese titles/text | |
| Create illustrations for PPT/documents | The user provides color requirements or reference documents | |
| API interface details | Need to understand underlying implementation | |
| Prompt engineering tips | Need to optimize prompt effects | |
-rreferences/marketing-templates.md| Capability | Trigger Keywords | Process | Output |
|---|---|---|---|
| E-commerce detail long image | Detail page, long image, product introduction | Stacked serial image generation → merge stitching | 1 long image |
| Marketing material pack | Material pack, marketing material, multi-size | Disassemble elements → multi-angle multi-scene → zip | 10-15 images + zip package |
| Product design image | Product image, rendering, effect image | Base image → multi-angle/color variants | 3-8 images |
| Element disassembly diagram | Disassembly, exploded view, decomposition, close-up | Overall image → local close-up/function disassembly | 4-8 images |
| Social media image set | Image set, nine-grid, Moments | Unified style → multi-size adaptation | 9 images (1:1) |
| Multi-color/SKU images | Color scheme, multi-color, SKU | Base image → image-to-image color change | N images |
Input: Product name + selling points + style
↓
Step 1: Plan screens (usually 5-8 screens)
- Screen 1: Hero image (product + core selling points)
- Screen 2-N: Expand each selling point (function/material/scene/parameters)
- Last screen: Specification parameter table
↓
Step 2: Stacked serial image generation (must read references/long-image-guide.md)
- Screen 1: Generate base image via text_to_image
- Screen 2-N: Use image_to_image with the previous screen as reference to maintain consistent style
↓
Step 3: Stitch via merge_long_image (use --blend 20 to merge seams)
↓
Step 4: Output long image + original images of each screenInput: Product name + list of selling points + style preference
↓
Step 1: Generate base main image (text_to_image, full product view 16:9)
↓
Step 2: Element disassembly (image_to_image × 4-6 images)
- Macro close-up of core selling points (1:1)
- Function exploded view/disassembly diagram (3:4)
- Material/craft details (4:3)
- Full set of accessories (16:9)
↓
Step 3: Scene variants (text_to_image / image_to_image × 3-4 images)
- Daily usage scene (16:9)
- Work usage scene (4:3)
- Unboxing scene (1:1)
- Art silhouette/atmosphere image (21:9)
↓
Step 4: Marketing creativity (text_to_image × 3-4 images)
- Comparison review chart (3:4)
- Data visualization/sound wave chart (16:9)
- Multi-color SKU display (16:9)
- Nine-grid social media images (1:1)
↓
Step 5: Package all into zip + send previews one by oneInput: Product name + design requirements
↓
Step 1: Generate base main image via text_to_image (front view of product, 16:9)
↓
Step 2: Generate variants via image_to_image (using the base image as reference)
- 45-degree view
- Side/back view
- Top view
- Different color versions
- Different usage scenesInput: Product image (existing) or product description
↓
Step 1: If there is a product image → disassemble via image_to_image; if no image → first generate the full view via text_to_image
↓
Step 2: Generate element by element (image_to_image)
- Exploded view/decomposition perspective
- Macro close-up of Part 1 + function annotation
- Macro close-up of Part 2 + craft annotation
- Macro close-up of Part 3 + material annotation
↓
Step 3: Optional long image stitching (merge_long_image)Input: Product/theme + platform (Xiaohongshu/Moments/Weibo)
↓
Step 1: Determine quantity and ratio
- Xiaohongshu: 6-9 images, 3:4
- Moments nine-grid: 9 images, 1:1
- Weibo: 4-9 images, 16:9 or 1:1
↓
Step 2: Plan content for each image (refer to nine-grid templates in marketing-templates.md)
↓
Step 3: Define a unified style prefix and generate concurrently
↓
Step 4: Output in numbered orderreferences/marketing-templates.md{type}_{sequence}.pngdetail_01.pngscene_gaming.png