image-generation

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Image Generation & Editing Skill

图像生成与编辑Skill

Generate and edit images using AI (Google Gemini Nano Banana Pro, OpenAI DALL-E 3).

Capabilities:

🎨 Generate: Create new images from text descriptions
✏️ Edit: Modify existing images (add/remove elements, change colors)
🛍️ Product Placement: Put products into scenes
🎭 Style Transfer: Apply artistic styles to photos
🖼️ Composite: Combine multiple images into one

使用AI（Google Gemini Nano Banana Pro、OpenAI DALL-E 3）生成和编辑图像。

功能特性：

🎨 生成：根据文本描述创建新图像
✏️ 编辑：修改现有图像（添加/移除元素、更改颜色）
🛍️ 产品放置：将产品融入场景中
🎭 风格迁移：为照片应用艺术风格
🖼️ 合成：将多张图像合并为一张

Quick Examples

快速示例

Users can specify what they want:

User Says	Mode	What Happens
"Generate an image of a sunset"	Generate	Text-to-image, no reference needed
"Create a logo for my coffee shop"	Generate	Text-to-image with text rendering
"Edit this image: add a hat to the cat"	Edit	User provides image, AI modifies it
"Remove the background from this photo"	Edit	User provides image, AI edits it
"Put this product on a kitchen counter"	Product	User provides product + optional scene
"Make this photo look like Van Gogh painted it"	Style	User provides photo, AI applies style
"Combine these photos into a group shot"	Composite	User provides multiple images

用户可以明确自己的需求：

用户指令	模式	执行动作
"Generate an image of a sunset"	生成	文本转图像，无需参考图
"Create a logo for my coffee shop"	生成	支持文本渲染的文本转图像
"Edit this image: add a hat to the cat"	编辑	用户提供图像，AI进行修改
"Remove the background from this photo"	编辑	用户提供图像，AI进行编辑
"Put this product on a kitchen counter"	产品放置	用户提供产品图 + 可选场景图
"Make this photo look like Van Gogh painted it"	风格迁移	用户提供照片，AI应用风格
"Combine these photos into a group shot"	合成	用户提供多张图像

Prerequisites

前置条件

Environment variables must be configured for the APIs to work. At least one API key is required:

```
OPENAI_API_KEY
```
- For OpenAI DALL-E 3 image generation
```
GOOGLE_API_KEY
```
- For Google Gemini (Nano Banana / Nano Banana Pro)

See the repository README for setup instructions.

必须配置环境变量才能让API正常工作。至少需要一个API密钥：

```
OPENAI_API_KEY
```
- 用于OpenAI DALL-E 3图像生成
```
GOOGLE_API_KEY
```
- 用于Google Gemini（Nano Banana / Nano Banana Pro）

请查看仓库README获取设置说明。

Available APIs

可用API

OpenAI GPT Image (Recommended for pure generation)

OpenAI GPT Image（推荐用于纯生成场景）

Models:
- ```
gpt-image-1.5
```
  (state of the art, best quality)
- ```
gpt-image-1
```
  (great quality, cost-effective)
- ```
gpt-image-1-mini
```
  (fastest, most affordable)
Best for: High-quality generation, transparency, text rendering, image editing
Sizes: 1024x1024 (square), 1536x1024 (landscape), 1024x1536 (portrait), or
```
auto
```
Quality: low (fast), medium (balanced), high (best), or
```
auto
```
Background: transparent, opaque, or
```
auto
```
Output formats: png (default), jpeg (faster), webp
Compression: 0-100% (for jpeg/webp)
Features:
- Image editing with up to 16 input images
- Transparent backgrounds
- Streaming with partial images
- High input fidelity for preserving faces/logos
- Inpainting with masks
- 32,000 character prompts

⚠️ Note: DALL-E 2 and DALL-E 3 are deprecated and will stop being supported on 05/12/2026.

模型:
- ```
gpt-image-1.5
```
  （最先进，画质最佳）
- ```
gpt-image-1
```
  （画质出色，性价比高）
- ```
gpt-image-1-mini
```
  （速度最快，成本最低）
最适合：高质量生成、透明背景、流式传输、最多16张输入图像
尺寸：1024x1024（正方形）、1536x1024（横屏）、1024x1536（竖屏）或
```
auto
```
画质：low（快速）、medium（平衡）、high（最佳）或
```
auto
```
背景：transparent、opaque或
```
auto
```
输出格式：png（默认）、jpeg（更快）、webp
压缩率：0-100%（适用于jpeg/webp）
功能:
- 支持最多16张输入图像的图像编辑
- 透明背景
- 带部分图像的流式传输
- 高输入保真度，可保留人脸/标志
- 带蒙版的图像修复
- 支持32000字符的提示词

⚠️ 注意：DALL-E 2和DALL-E 3已被弃用，将于2026年5月12日停止支持。

Google Gemini Native Image Generation (Recommended for editing)

Google Gemini原生图像生成（推荐用于编辑场景）

Nano Banana (
```
gemini-2.5-flash-image
```
): Fast, efficient, 1K resolution, up to 3 reference images
Nano Banana Pro (
```
gemini-3-pro-image-preview
```
): Professional quality, up to 4K, thinking mode, up to 14 reference images (default)
Aspect ratios: 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9
Resolutions (Pro only): 1K, 2K, 4K
Features:
- Image editing (add/remove elements, color changes)
- Product placement and composition
- Style transfer
- Advanced text rendering
- Google Search grounding (Pro only)
- Thinking mode for complex prompts (Pro only)

Nano Banana (
```
gemini-2.5-flash-image
```
)：快速高效，1K分辨率，最多3张参考图
Nano Banana Pro (
```
gemini-3-pro-image-preview
```
)：专业画质，最高4K分辨率，思考模式，最多14张参考图（默认）
宽高比：1:1、2:3、3:2、3:4、4:3、4:5、5:4、9:16、16:9、21:9
分辨率（仅Pro版）：1K、2K、4K
功能:
- 图像编辑（添加/移除元素、颜色调整）
- 产品放置与合成
- 风格迁移
- 高级文本渲染
- Google搜索 grounding（仅Pro版）
- 复杂提示词的思考模式（仅Pro版）

Workflow

工作流程

Step 1: Gather Requirements (REQUIRED)

步骤1：收集需求（必填）

⚠️ Use interactive questioning — ask ONE question at a time.

⚠️ 使用交互式提问——一次只问一个问题。

Question Flow

提问流程

⚠️ Use the
AskUserQuestion
tool for each question below. Do not just print questions in your response — use the tool to create interactive prompts with the options shown.

Q0: Model Selection

"Which image generation model would you like to use?

Google Gemini (Nano Banana Pro) - Up to 4K, 14 reference images, style transfer, thinking mode (Recommended)

OpenAI GPT Image 1.5 - State of the art, transparency, streaming, up to 16 input images

OpenAI GPT Image 1 - Great quality, transparency, image editing

OpenAI GPT Image 1 Mini - Fastest, most affordable"

Wait for response. If user doesn't have a preference, recommend Gemini for editing/reference tasks or GPT Image 1.5 for pure generation.

Q1: Reference

"I'll generate that image for you! First — do you have any reference images?

Product photos to include

Style references

Images to edit

No, generate from scratch"

Wait for response.

Q2: Aspect Ratio

"What aspect ratio?

1:1 (square)

16:9 (landscape/widescreen)

9:16 (portrait/vertical)

4:3 / 3:4 (classic)

Other (2:3, 3:2, 4:5, 5:4, 21:9)

Or specify"

Wait for response.

Q3: Resolution

"What resolution?

1K (fast)

2K (balanced)

4K (highest quality)"

Wait for response.

Q4: Style

"Any style preferences?

Photorealistic

Artistic/painterly

Cartoon/illustration

3D render

Or describe your own"

Wait for response.

⚠️ 使用
AskUserQuestion
工具处理以下每个问题。不要直接在回复中打印问题——使用工具创建包含所示选项的交互式提示。

问题0：模型选择

"你想使用哪种图像生成模型？

Google Gemini（Nano Banana Pro）——最高4K分辨率，14张参考图，风格迁移，思考模式（推荐）

OpenAI GPT Image 1.5——最先进，透明背景，流式传输，最多16张输入图像

OpenAI GPT Image 1——画质出色，透明背景，图像编辑

OpenAI GPT Image 1 Mini——速度最快，成本最低"

等待回复。如果用户没有偏好，编辑/参考任务推荐Gemini，纯生成任务推荐GPT Image 1.5。

问题1：参考图

"我将为你生成图像！首先——你有参考图吗？

要包含的产品照片

风格参考图

要编辑的图像

没有，从头生成"

等待回复。

问题2：宽高比

"需要什么宽高比？

1:1（正方形）

16:9（横屏/宽屏）

9:16（竖屏/垂直）

4:3 / 3:4（经典）

其他（2:3、3:2、4:5、5:4、21:9）

或自定义"

等待回复。

问题3：分辨率

"需要什么分辨率？

1K（快速）

2K（平衡）

4K（最高画质）"

等待回复。

问题4：风格偏好

"有风格偏好吗？

写实风格

艺术/绘画风格

卡通/插画风格

3D渲染风格

或自定义描述"

等待回复。

Quick Reference

快速参考

Question	Determines
Reference	Generation vs editing mode
Aspect Ratio	Image dimensions
Resolution	Quality level
Style	Prompt enhancement direction

Parsing:

If user provides reference images → use image editing mode
If user doesn't answer all questions → use sensible defaults and note assumptions
Parse: subject, style, mood, special requirements (colors, text, composition)

问题	决定因素
参考图	生成模式 vs 编辑模式
宽高比	图像尺寸
分辨率	画质等级
风格	提示词优化方向

解析规则：

如果用户提供参考图 → 使用图像编辑模式
如果用户未回答所有问题 → 使用合理默认值并注明假设
解析内容：主体、风格、氛围、特殊要求（颜色、文本、构图）

Step 2: Craft the Prompt

步骤2：编写提示词

Transform the user request into an effective image generation prompt:

Be specific: Add details the user might not have mentioned
Describe style: "digital art", "oil painting", "photograph", "3D render"
Include lighting: "soft lighting", "dramatic shadows", "golden hour"
Specify quality: "highly detailed", "8k", "professional"

Example transformation:

User: "a cat in space"
Enhanced: "A majestic orange tabby cat floating in outer space, surrounded by colorful nebulae and distant stars, wearing a small astronaut helmet, digital art style, highly detailed, vibrant colors, cinematic lighting"

将用户请求转换为有效的图像生成提示词：

具体化：补充用户可能未提及的细节
描述风格：例如“digital art”、“oil painting”、“photograph”、“3D render”
包含光线：例如“soft lighting”、“dramatic shadows”、“golden hour”
指定画质：例如“highly detailed”、“8k”、“professional”

示例转换：

用户："a cat in space"
优化后："A majestic orange tabby cat floating in outer space, surrounded by colorful nebulae and distant stars, wearing a small astronaut helmet, digital art style, highly detailed, vibrant colors, cinematic lighting"

Step 3: Select the API

步骤3：选择API

Use the model selected by the user in Q0:

Check which API keys are configured in environment:
- ```
OPENAI_API_KEY
```
  → GPT Image models available
- ```
GOOGLE_API_KEY
```
  → Gemini (Nano Banana Pro) available
If the user's selected model isn't available: Inform them and offer alternatives.
Model mapping from Q0:
- "Google Gemini (Nano Banana Pro)" → Use
```
gemini.py
```
  with
```
gemini-3-pro-image-preview
```
- "OpenAI GPT Image 1.5" → Use
```
openai_image.py
```
  with
```
gpt-image-1.5
```
- "OpenAI GPT Image 1" → Use
```
openai_image.py
```
  with
```
gpt-image-1
```
- "OpenAI GPT Image 1 Mini" → Use
```
openai_image.py
```
  with
```
gpt-image-1-mini
```

使用用户在问题0中选择的模型：

检查环境中配置的API密钥：
- ```
OPENAI_API_KEY
```
  → 可使用GPT Image模型
- ```
GOOGLE_API_KEY
```
  → 可使用Gemini（Nano Banana Pro）
如果用户选择的模型不可用：告知用户并提供替代方案。
问题0的模型映射：
- "Google Gemini（Nano Banana Pro）" → 使用
```
gemini.py
```
  ，模型为
```
gemini-3-pro-image-preview
```
- "OpenAI GPT Image 1.5" → 使用
```
openai_image.py
```
  ，模型为
```
gpt-image-1.5
```
- "OpenAI GPT Image 1" → 使用
```
openai_image.py
```
  ，模型为
```
gpt-image-1
```
- "OpenAI GPT Image 1 Mini" → 使用
```
openai_image.py
```
  ，模型为
```
gpt-image-1-mini
```

Step 4: Generate the Image

步骤4：生成图像

Execute the appropriate script from

${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/

For OpenAI GPT Image - Text to Image:

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
  --prompt "your enhanced prompt" \
  --model "gpt-image-1" \
  --size "1024x1024" \
  --quality "high" \
  --output "/path/to/output.png"

For OpenAI GPT Image - With Transparent Background:

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
  --prompt "A product icon with no background" \
  --model "gpt-image-1" \
  --background "transparent" \
  --quality "high" \
  --output "/path/to/output.png"

For OpenAI GPT Image - Image Editing (with reference images):

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
  --prompt "Add a wizard hat to this cat" \
  --model "gpt-image-1" \
  --image "/path/to/cat.jpg" \
  --input-fidelity "high" \
  --output "/path/to/output.png"

For OpenAI GPT Image - Multiple Reference Images:

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
  --prompt "Create a gift basket containing these items" \
  --model "gpt-image-1" \
  --image "/path/to/item1.png" \
  --image "/path/to/item2.png" \
  --image "/path/to/item3.png" \
  --output "/path/to/output.png"

For OpenAI GPT Image - With Mask (Inpainting):

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
  --prompt "Replace the pool with a garden" \
  --model "gpt-image-1" \
  --image "/path/to/scene.jpg" \
  --mask "/path/to/mask.png" \
  --output "/path/to/output.png"

For OpenAI GPT Image - Streaming with Partial Images:

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
  --prompt "A beautiful sunset over mountains" \
  --model "gpt-image-1" \
  --stream \
  --partial-images 2 \
  --output "/path/to/output.png"

For Google Gemini (Nano Banana Pro) - Text to Image:

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/gemini.py \
  --prompt "your enhanced prompt" \
  --model "gemini-3-pro-image-preview" \
  --aspect-ratio "1:1" \
  --resolution "2K" \
  --output "/path/to/output.png"

For Google Gemini - With Reference Images (editing, product placement, etc.):

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/gemini.py \
  --prompt "Add a wizard hat to this cat" \
  --image "/path/to/cat.jpg" \
  --aspect-ratio "1:1" \
  --resolution "2K"

For Google Gemini - Multiple Reference Images (composition, style transfer):

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/gemini.py \
  --prompt "Place this product on the kitchen counter in this scene" \
  --image "/path/to/product.png" \
  --image "/path/to/kitchen.jpg" \
  --aspect-ratio "16:9" \
  --resolution "2K"

For Google Gemini (Nano Banana - faster, fewer features):

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/gemini.py \
  --prompt "your enhanced prompt" \
  --model "gemini-2.5-flash-image" \
  --aspect-ratio "1:1"

执行

${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/

目录下的对应脚本：

OpenAI GPT Image - 文本转图像：

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
  --prompt "your enhanced prompt" \
  --model "gpt-image-1" \
  --size "1024x1024" \
  --quality "high" \
  --output "/path/to/output.png"

OpenAI GPT Image - 透明背景：

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
  --prompt "A product icon with no background" \
  --model "gpt-image-1" \
  --background "transparent" \
  --quality "high" \
  --output "/path/to/output.png"

OpenAI GPT Image - 图像编辑（带参考图）：

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
  --prompt "Add a wizard hat to this cat" \
  --model "gpt-image-1" \
  --image "/path/to/cat.jpg" \
  --input-fidelity "high" \
  --output "/path/to/output.png"

OpenAI GPT Image - 多张参考图：

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
  --prompt "Create a gift basket containing these items" \
  --model "gpt-image-1" \
  --image "/path/to/item1.png" \
  --image "/path/to/item2.png" \
  --image "/path/to/item3.png" \
  --output "/path/to/output.png"

OpenAI GPT Image - 带蒙版（图像修复）：

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
  --prompt "Replace the pool with a garden" \
  --model "gpt-image-1" \
  --image "/path/to/scene.jpg" \
  --mask "/path/to/mask.png" \
  --output "/path/to/output.png"

OpenAI GPT Image - 流式传输（带部分图像）：

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
  --prompt "A beautiful sunset over mountains" \
  --model "gpt-image-1" \
  --stream \
  --partial-images 2 \
  --output "/path/to/output.png"

Google Gemini（Nano Banana Pro）- 文本转图像：

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/gemini.py \
  --prompt "your enhanced prompt" \
  --model "gemini-3-pro-image-preview" \
  --aspect-ratio "1:1" \
  --resolution "2K" \
  --output "/path/to/output.png"

Google Gemini - 带参考图（编辑、产品放置等）：

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/gemini.py \
  --prompt "Add a wizard hat to this cat" \
  --image "/path/to/cat.jpg" \
  --aspect-ratio "1:1" \
  --resolution "2K"

Google Gemini - 多张参考图（合成、风格迁移）：

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/gemini.py \
  --prompt "Place this product on the kitchen counter in this scene" \
  --image "/path/to/product.png" \
  --image "/path/to/kitchen.jpg" \
  --aspect-ratio "16:9" \
  --resolution "2K"

Google Gemini（Nano Banana - 更快，功能较少）：

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/gemini.py \
  --prompt "your enhanced prompt" \
  --model "gemini-2.5-flash-image" \
  --aspect-ratio "1:1"

Step 5: Deliver the Result

步骤5：交付结果

Show the generated image to the user
Provide the enhanced prompt used (so they can iterate)
Offer to:
- Generate variations
- Try a different style
- Use a different API/model
- Refine the prompt

向用户展示生成的图像
提供使用的优化提示词（方便用户迭代）
提供以下选项：
- 生成变体
- 尝试不同风格
- 使用不同的API/模型
- 优化提示词

Error Handling

错误处理

Missing API key: Inform the user which key is needed and how to set it up:

OpenAI: https://platform.openai.com/api-keys
Google: https://aistudio.google.com/apikey

API rate limit: Suggest waiting or trying the other API.

Content policy violation: Rephrase the prompt to be more appropriate.

Generation failed: Retry with simplified prompt or different API.

缺少API密钥：告知用户需要的密钥类型及设置方法：

OpenAI：https://platform.openai.com/api-keys
Google：https://aistudio.google.com/apikey

API速率限制：建议等待或尝试其他API。

内容政策违规：重新调整提示词使其更合规。

生成失败：使用简化提示词或不同API重试。

Reference Image Use Cases

参考图使用场景

Both OpenAI GPT Image and Google Gemini support reference images for advanced editing:

OpenAI GPT Image: Up to 16 input images, with

input_fidelity: high

for preserving faces/logos Google Gemini: Nano Banana (up to 3), Nano Banana Pro (up to 14)

OpenAI GPT Image和Google Gemini均支持使用参考图进行高级编辑：

OpenAI GPT Image：最多16张输入图像，

input_fidelity: high

可保留人脸/标志 Google Gemini：Nano Banana（最多3张）、Nano Banana Pro（最多14张）

Image Editing

图像编辑

"Add a santa hat to this person" + person.jpg
"Remove the background and replace with a beach scene" + product.jpg
"Change the sofa color to blue" + living_room.jpg

"Add a santa hat to this person" + person.jpg
"Remove the background and replace with a beach scene" + product.jpg
"Change the sofa color to blue" + living_room.jpg

Product Placement

产品放置

"Place this product on a marble kitchen counter" + product.png + kitchen.jpg
"Show this watch on a person's wrist" + watch.png + arm.jpg

"Place this product on a marble kitchen counter" + product.png + kitchen.jpg
"Show this watch on a person's wrist" + watch.png + arm.jpg

Style Transfer

风格迁移

"Transform this photo into Van Gogh's Starry Night style" + photo.jpg
"Make this look like a watercolor painting" + landscape.jpg

"Transform this photo into Van Gogh's Starry Night style" + photo.jpg
"Make this look like a watercolor painting" + landscape.jpg

Multi-Image Composition

多图像合成

"Create a group photo of these people in an office" + person1.jpg + person2.jpg + person3.jpg
"Combine these elements into a cohesive scene" + element1.png + element2.png + background.jpg

"Create a group photo of these people in an office" + person1.jpg + person2.jpg + person3.jpg
"Combine these elements into a cohesive scene" + element1.png + element2.png + background.jpg

Character Consistency

角色一致性

"Show this character from a different angle" + character.jpg
"Put this person in a superhero costume" + person.jpg

Tip: For best results with reference images, be specific about what you want to preserve vs. change.

"Show this character from a different angle" + character.jpg
"Put this person in a superhero costume" + person.jpg

提示：使用参考图时，明确说明要保留和修改的内容，以获得最佳效果。

Prompt Engineering Tips

提示词工程技巧

For Photorealism

写实风格

Include "photograph", "DSLR", "35mm film"
Specify camera settings: "shallow depth of field", "bokeh"
Add lighting: "natural light", "studio lighting"

包含"photograph"、"DSLR"、"35mm film"
指定相机设置："shallow depth of field"、"bokeh"
添加光线描述："natural light"、"studio lighting"

For Artistic Styles

艺术风格

Reference art movements: "impressionist", "art nouveau", "cyberpunk"
Name artist styles: "in the style of Studio Ghibli", "Moebius style"
Specify medium: "watercolor", "oil painting", "pencil sketch"

参考艺术流派："impressionist"、"art nouveau"、"cyberpunk"
提及艺术家风格："in the style of Studio Ghibli"、"Moebius style"
指定媒介："watercolor"、"oil painting"、"pencil sketch"

For Consistency

一致性

Use seed values when available
Save successful prompts for reference
Note which API produced best results for similar requests

可用时使用种子值
保存成功的提示词以供参考
记录哪种API在类似请求中效果最佳

API Comparison

API对比

Feature	GPT Image 1.5	GPT Image 1	GPT Image 1 Mini	Nano Banana	Nano Banana Pro
Provider	OpenAI	OpenAI	OpenAI	Google	Google
Model ID	gpt-image-1.5	gpt-image-1	gpt-image-1-mini	gemini-2.5-flash-image	gemini-3-pro-image-preview
Best for	State of the art	Quality + value	Speed + cost	Fast generation	Professional assets
Sizes	1024², 1536x1024, 1024x1536, auto	Same	Same	1K only	Up to 4K
Quality options	low, medium, high, auto	Same	Same	N/A	N/A
Aspect ratios	3 + auto	Same	Same	10 options	10 options
Reference images	Up to 16	Up to 16	Up to 16	Up to 3	Up to 14
Image editing	Yes	Yes	Yes	Yes	Yes
Inpainting (mask)	Yes	Yes	Yes	Yes	Yes
Transparent background	Yes	Yes	Yes	No	No
Streaming	Yes	Yes	Yes	No	No
Input fidelity	high/low	high/low	low only	N/A	N/A
Output formats	png, jpeg, webp	Same	Same	png	png
Compression	0-100%	Same	Same	No	No
Text rendering	Excellent	Excellent	Good	Good	Excellent
Thinking mode	No	No	No	No	Yes
Max prompt length	32,000 chars	32,000 chars	32,000 chars	N/A	N/A
Speed	~30-60s	~20-40s	~10-20s	~10-20s	~30-60s

⚠️ DALL-E 2 and DALL-E 3 are deprecated and will stop being supported on 05/12/2026. Use GPT Image models instead.

特性	GPT Image 1.5	GPT Image 1	GPT Image 1 Mini	Nano Banana	Nano Banana Pro
提供商	OpenAI	OpenAI	OpenAI	Google	Google
模型ID	gpt-image-1.5	gpt-image-1	gpt-image-1-mini	gemini-2.5-flash-image	gemini-3-pro-image-preview
最佳场景	最先进画质	画质与性价比平衡	速度与成本优先	快速生成	专业级资产生成
尺寸	1024²、1536x1024、1024x1536、auto	相同	相同	仅1K	最高4K
画质选项	low、medium、high、auto	相同	相同	无	无
宽高比	3种 + auto	相同	相同	10种	10种
参考图数量	最多16张	最多16张	最多16张	最多3张	最多14张
图像编辑	是	是	是	是	是
蒙版修复	是	是	是	是	是
透明背景	是	是	是	否	否
流式传输	是	是	是	否	否
输入保真度	high/low	high/low	仅low	无	无
输出格式	png、jpeg、webp	相同	相同	png	png
压缩率	0-100%	相同	相同	无	无
文本渲染	优秀	优秀	良好	良好	优秀
思考模式	否	否	否	否	是
最大提示词长度	32000字符	32000字符	32000字符	无	无
速度	~30-60秒	~20-40秒	~10-20秒	~10-20秒	~30-60秒

⚠️ DALL-E 2和DALL-E 3已被弃用，将于2026年5月12日停止支持。请使用GPT Image模型替代。