image-generation

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Image Generation

图像生成

Generate and edit images using AI models. The script automatically picks a backend based on which API keys are configured — you don't need to specify a model unless the user explicitly names one.
Supported models (passed via
model
only when the user asks for a specific one):
  • OpenAI
    gpt-image-2
    ,
    gpt-image-1
  • Gemini Nano Banana
    nano-banana-2
    ,
    nano-banana-pro
    ,
    nano-banana
  • Seedream (Volcengine Ark)
    seedream-5.0-lite
    ,
    seedream-4.5
  • Qwen (DashScope)
    qwen-image-2.0
    ,
    qwen-image-2.0-pro
  • MiniMax
    image-01
使用AI模型生成和编辑图像。脚本会根据已配置的API密钥自动选择后端——除非用户明确指定模型,否则无需手动指定
支持的模型(仅当用户要求特定模型时才通过
model
参数传递):
  • OpenAI
    gpt-image-2
    ,
    gpt-image-1
  • Gemini Nano Banana
    nano-banana-2
    ,
    nano-banana-pro
    ,
    nano-banana
  • Seedream (Volcengine Ark)
    seedream-5.0-lite
    ,
    seedream-4.5
  • Qwen (DashScope)
    qwen-image-2.0
    ,
    qwen-image-2.0-pro
  • MiniMax
    image-01

Usage

使用方法

Run
scripts/generate.py
with a JSON argument. The path is relative to this skill's
base_dir
.
bash
python <base_dir>/scripts/generate.py '<json_args>'
Set bash timeout to at least 600 seconds, as image generation can take 30–200s per provider, and the script may try multiple providers sequentially.
运行
scripts/generate.py
并传入JSON参数。路径相对于此技能的
base_dir
bash
python <base_dir>/scripts/generate.py '<json_args>'
将bash超时时间设置为至少600秒,因为每个服务商的图像生成可能需要30–200秒,且脚本可能会依次尝试多个服务商。

Parameters

参数

ParameterTypeRequiredDefaultDescription
prompt
stringyesImage description
image_url
string / listnonullInput image(s) for editing: local file path or URL. Multi-image fusion is supported (pass a list)
quality
stringnoauto
low
/
medium
/
high
(only some backends honour this)
size
stringnoauto
512
/
1K
/
2K
/
3K
/
4K
, or pixel value (
1024x1024
)
aspect_ratio
stringnonull
1:1
/
3:2
/
2:3
/
16:9
/
9:16
/
21:9
(some backends also support extreme ratios like
1:4
/
8:1
)
Higher
quality
and larger
size
cost more and run slower.
Default to omitting both (
auto
) so the model picks a balanced setting. Only raise them when the user explicitly asks for high quality / a poster / print-ready output. For quick previews or chat scenarios prefer
quality=low
+
size=1K
.
参数类型是否必填默认值说明
prompt
字符串图像描述文本
image_url
字符串 / 列表null用于编辑的输入图像:本地文件路径或URL。支持多图像融合(传入列表)
quality
字符串auto
low
/
medium
/
high
(仅部分后端支持此参数)
size
字符串auto
512
/
1K
/
2K
/
3K
/
4K
,或像素值(如
1024x1024
aspect_ratio
字符串null
1:1
/
3:2
/
2:3
/
16:9
/
9:16
/
21:9
(部分后端还支持
1:4
/
8:1
等极端比例)
更高的
quality
和更大的
size
会增加成本并减慢生成速度
。默认情况下不指定这两个参数(使用
auto
),由模型选择平衡设置。仅当用户明确要求高质量、海报或印刷级输出时才调高参数。对于快速预览或聊天场景,优先选择
quality=low
+
size=1K

Example — generate

示例——生成图像

bash
python <base_dir>/scripts/generate.py '{"prompt": "A corgi astronaut floating in space"}'
With aspect ratio:
bash
python <base_dir>/scripts/generate.py '{"prompt": "Isometric miniature city of Shanghai at sunset", "size": "2K", "aspect_ratio": "16:9"}'
bash
python <base_dir>/scripts/generate.py '{"prompt": "A corgi astronaut floating in space"}'
指定宽高比的示例:
bash
python <base_dir>/scripts/generate.py '{"prompt": "Isometric miniature city of Shanghai at sunset", "size": "2K", "aspect_ratio": "16:9"}'

Important: Editing vs Generating

重要提示:编辑vs生成

When the user asks to edit, modify, or improve an existing image, pass the original image via
image_url
. Prefer local file paths directly — the script handles file reading internally. Without
image_url
, the script generates a brand-new image instead of editing.
当用户要求编辑、修改或优化现有图像时,通过
image_url
传入原始图像。优先使用本地文件路径——脚本会自动处理文件读取。如果未传入
image_url
,脚本将生成全新图像而非编辑现有图像。

Example — edit (image-to-image)

示例——编辑图像(图生图)

bash
python <base_dir>/scripts/generate.py '{"prompt": "Add a Santa hat to the dog", "image_url": "/path/to/dog.png"}'
Multi-image fusion — pass a list:
bash
python <base_dir>/scripts/generate.py '{"prompt": "Combine these characters into a group photo", "image_url": ["/path/a.png", "/path/b.png"]}'
bash
python <base_dir>/scripts/generate.py '{"prompt": "Add a Santa hat to the dog", "image_url": "/path/to/dog.png"}'
多图像融合——传入列表:
bash
python <base_dir>/scripts/generate.py '{"prompt": "Combine these characters into a group photo", "image_url": ["/path/a.png", "/path/b.png"]}'

Output

输出

Prints JSON to stdout:
json
{
  "model": "doubao-seedream-5-0-260128",
  "images": [
    {"url": "/path/to/output.png"}
  ]
}
After success, display the image to the user. You can either embed it in markdown (
![description](/path/to/output.png)
) or use the
send
tool.
On error:
json
{
  "error": "error message"
}
将JSON打印到标准输出:
json
{
  "model": "doubao-seedream-5-0-260128",
  "images": [
    {"url": "/path/to/output.png"}
  ]
}
成功后,将图像展示给用户。可以将其嵌入markdown(
![描述](/path/to/output.png)
)或使用
send
工具发送。
错误输出:
json
{
  "error": "error message"
}

Setup

配置

The script needs at least one of these API keys (set via
env_config
or
config.json
):
OPENAI_API_KEY
/
GEMINI_API_KEY
/
ARK_API_KEY
/
DASHSCOPE_API_KEY
/
MINIMAX_API_KEY
/
LINKAI_API_KEY
Each also has an optional
*_API_BASE
for custom endpoints. The script automatically picks the first configured backend and falls back to the next if it fails — no need to specify a model.
脚本需要至少以下API密钥之一(通过
env_config
config.json
设置):
OPENAI_API_KEY
/
GEMINI_API_KEY
/
ARK_API_KEY
/
DASHSCOPE_API_KEY
/
MINIMAX_API_KEY
/
LINKAI_API_KEY
每个API还可选择配置
*_API_BASE
以使用自定义端点。脚本会自动选择第一个已配置的后端,如果失败则回退到下一个——无需指定模型。

Error Handling

错误处理

If the script returns an error after trying all configured backends, do NOT retry with the same parameters — the failure is almost always a configuration issue (wrong API key, unsupported API base). Tell the user to fix it via
env_config
, then retry.
如果脚本尝试所有已配置的后端后仍返回错误,请勿使用相同参数重试——失败几乎总是配置问题(API密钥错误、不支持的API端点)。告知用户通过
env_config
修复后再重试。

Notes

注意事项

  • HTTP timeout is 300s — high-resolution generation can take over 200s.
  • Omit
    quality
    /
    size
    to let the model pick automatically (
    auto
    ).
  • Input images for editing are auto-compressed to ≤ 4MB / longest edge ≤ 4096px.
  • HTTP超时时间为300秒——高分辨率生成可能需要超过200秒。
  • 省略
    quality
    /
    size
    参数,让模型自动选择(
    auto
    )。
  • 用于编辑的输入图像会自动压缩至≤4MB / 最长边≤4096像素。