image-generation

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Image Generation & Editing Skill

图像生成与编辑Skill

Generate and edit images using AI (Google Gemini Nano Banana Pro, OpenAI DALL-E 3).
Capabilities:
  • 🎨 Generate: Create new images from text descriptions
  • ✏️ Edit: Modify existing images (add/remove elements, change colors)
  • 🛍️ Product Placement: Put products into scenes
  • 🎭 Style Transfer: Apply artistic styles to photos
  • 🖼️ Composite: Combine multiple images into one
使用AI(Google Gemini Nano Banana Pro、OpenAI DALL-E 3)生成和编辑图像。
功能特性:
  • 🎨 生成:根据文本描述创建新图像
  • ✏️ 编辑:修改现有图像(添加/移除元素、更改颜色)
  • 🛍️ 产品放置:将产品融入场景中
  • 🎭 风格迁移:为照片应用艺术风格
  • 🖼️ 合成:将多张图像合并为一张

Quick Examples

快速示例

Users can specify what they want:
User SaysModeWhat Happens
"Generate an image of a sunset"GenerateText-to-image, no reference needed
"Create a logo for my coffee shop"GenerateText-to-image with text rendering
"Edit this image: add a hat to the cat"EditUser provides image, AI modifies it
"Remove the background from this photo"EditUser provides image, AI edits it
"Put this product on a kitchen counter"ProductUser provides product + optional scene
"Make this photo look like Van Gogh painted it"StyleUser provides photo, AI applies style
"Combine these photos into a group shot"CompositeUser provides multiple images
用户可以明确自己的需求:
用户指令模式执行动作
"Generate an image of a sunset"生成文本转图像,无需参考图
"Create a logo for my coffee shop"生成支持文本渲染的文本转图像
"Edit this image: add a hat to the cat"编辑用户提供图像,AI进行修改
"Remove the background from this photo"编辑用户提供图像,AI进行编辑
"Put this product on a kitchen counter"产品放置用户提供产品图 + 可选场景图
"Make this photo look like Van Gogh painted it"风格迁移用户提供照片,AI应用风格
"Combine these photos into a group shot"合成用户提供多张图像

Prerequisites

前置条件

Environment variables must be configured for the APIs to work. At least one API key is required:
  • OPENAI_API_KEY
    - For OpenAI DALL-E 3 image generation
  • GOOGLE_API_KEY
    - For Google Gemini (Nano Banana / Nano Banana Pro)
See the repository README for setup instructions.
必须配置环境变量才能让API正常工作。至少需要一个API密钥:
  • OPENAI_API_KEY
    - 用于OpenAI DALL-E 3图像生成
  • GOOGLE_API_KEY
    - 用于Google Gemini(Nano Banana / Nano Banana Pro)
请查看仓库README获取设置说明。

Available APIs

可用API

OpenAI GPT Image (Recommended for pure generation)

OpenAI GPT Image(推荐用于纯生成场景)

  • Models:
    • gpt-image-1.5
      (state of the art, best quality)
    • gpt-image-1
      (great quality, cost-effective)
    • gpt-image-1-mini
      (fastest, most affordable)
  • Best for: High-quality generation, transparency, text rendering, image editing
  • Sizes: 1024x1024 (square), 1536x1024 (landscape), 1024x1536 (portrait), or
    auto
  • Quality: low (fast), medium (balanced), high (best), or
    auto
  • Background: transparent, opaque, or
    auto
  • Output formats: png (default), jpeg (faster), webp
  • Compression: 0-100% (for jpeg/webp)
  • Features:
    • Image editing with up to 16 input images
    • Transparent backgrounds
    • Streaming with partial images
    • High input fidelity for preserving faces/logos
    • Inpainting with masks
    • 32,000 character prompts
⚠️ Note: DALL-E 2 and DALL-E 3 are deprecated and will stop being supported on 05/12/2026.
  • 模型:
    • gpt-image-1.5
      (最先进,画质最佳)
    • gpt-image-1
      (画质出色,性价比高)
    • gpt-image-1-mini
      (速度最快,成本最低)
  • 最适合:高质量生成、透明背景、流式传输、最多16张输入图像
  • 尺寸:1024x1024(正方形)、1536x1024(横屏)、1024x1536(竖屏)或
    auto
  • 画质:low(快速)、medium(平衡)、high(最佳)或
    auto
  • 背景:transparent、opaque或
    auto
  • 输出格式:png(默认)、jpeg(更快)、webp
  • 压缩率:0-100%(适用于jpeg/webp)
  • 功能:
    • 支持最多16张输入图像的图像编辑
    • 透明背景
    • 带部分图像的流式传输
    • 高输入保真度,可保留人脸/标志
    • 带蒙版的图像修复
    • 支持32000字符的提示词
⚠️ 注意:DALL-E 2和DALL-E 3已被弃用,将于2026年5月12日停止支持。

Google Gemini Native Image Generation (Recommended for editing)

Google Gemini原生图像生成(推荐用于编辑场景)

  • Nano Banana (
    gemini-2.5-flash-image
    ): Fast, efficient, 1K resolution, up to 3 reference images
  • Nano Banana Pro (
    gemini-3-pro-image-preview
    ): Professional quality, up to 4K, thinking mode, up to 14 reference images (default)
  • Aspect ratios: 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9
  • Resolutions (Pro only): 1K, 2K, 4K
  • Features:
    • Image editing (add/remove elements, color changes)
    • Product placement and composition
    • Style transfer
    • Advanced text rendering
    • Google Search grounding (Pro only)
    • Thinking mode for complex prompts (Pro only)
  • Nano Banana (
    gemini-2.5-flash-image
    ):快速高效,1K分辨率,最多3张参考图
  • Nano Banana Pro (
    gemini-3-pro-image-preview
    ):专业画质,最高4K分辨率,思考模式,最多14张参考图(默认)
  • 宽高比:1:1、2:3、3:2、3:4、4:3、4:5、5:4、9:16、16:9、21:9
  • 分辨率(仅Pro版):1K、2K、4K
  • 功能:
    • 图像编辑(添加/移除元素、颜色调整)
    • 产品放置与合成
    • 风格迁移
    • 高级文本渲染
    • Google搜索 grounding(仅Pro版)
    • 复杂提示词的思考模式(仅Pro版)

Workflow

工作流程

Step 1: Gather Requirements (REQUIRED)

步骤1:收集需求(必填)

⚠️ Use interactive questioning — ask ONE question at a time.
⚠️ 使用交互式提问——一次只问一个问题。

Question Flow

提问流程

⚠️ Use the
AskUserQuestion
tool for each question below.
Do not just print questions in your response — use the tool to create interactive prompts with the options shown.
Q0: Model Selection
"Which image generation model would you like to use?
  • Google Gemini (Nano Banana Pro) - Up to 4K, 14 reference images, style transfer, thinking mode (Recommended)
  • OpenAI GPT Image 1.5 - State of the art, transparency, streaming, up to 16 input images
  • OpenAI GPT Image 1 - Great quality, transparency, image editing
  • OpenAI GPT Image 1 Mini - Fastest, most affordable"
Wait for response. If user doesn't have a preference, recommend Gemini for editing/reference tasks or GPT Image 1.5 for pure generation.
Q1: Reference
"I'll generate that image for you! First — do you have any reference images?
  • Product photos to include
  • Style references
  • Images to edit
  • No, generate from scratch"
Wait for response.
Q2: Aspect Ratio
"What aspect ratio?
  • 1:1 (square)
  • 16:9 (landscape/widescreen)
  • 9:16 (portrait/vertical)
  • 4:3 / 3:4 (classic)
  • Other (2:3, 3:2, 4:5, 5:4, 21:9)
  • Or specify"
Wait for response.
Q3: Resolution
"What resolution?
  • 1K (fast)
  • 2K (balanced)
  • 4K (highest quality)"
Wait for response.
Q4: Style
"Any style preferences?
  • Photorealistic
  • Artistic/painterly
  • Cartoon/illustration
  • 3D render
  • Or describe your own"
Wait for response.
⚠️ 使用
AskUserQuestion
工具处理以下每个问题。
不要直接在回复中打印问题——使用工具创建包含所示选项的交互式提示。
问题0:模型选择
"你想使用哪种图像生成模型?
  • Google Gemini(Nano Banana Pro)——最高4K分辨率,14张参考图,风格迁移,思考模式(推荐)
  • OpenAI GPT Image 1.5——最先进,透明背景,流式传输,最多16张输入图像
  • OpenAI GPT Image 1——画质出色,透明背景,图像编辑
  • OpenAI GPT Image 1 Mini——速度最快,成本最低"
等待回复。如果用户没有偏好,编辑/参考任务推荐Gemini,纯生成任务推荐GPT Image 1.5。
问题1:参考图
"我将为你生成图像!首先——你有参考图吗?
  • 要包含的产品照片
  • 风格参考图
  • 要编辑的图像
  • 没有,从头生成"
等待回复。
问题2:宽高比
"需要什么宽高比
  • 1:1(正方形)
  • 16:9(横屏/宽屏)
  • 9:16(竖屏/垂直)
  • 4:3 / 3:4(经典)
  • 其他(2:3、3:2、4:5、5:4、21:9)
  • 或自定义"
等待回复。
问题3:分辨率
"需要什么分辨率
  • 1K(快速)
  • 2K(平衡)
  • 4K(最高画质)"
等待回复。
问题4:风格偏好
"有风格偏好吗?
  • 写实风格
  • 艺术/绘画风格
  • 卡通/插画风格
  • 3D渲染风格
  • 或自定义描述"
等待回复。

Quick Reference

快速参考

QuestionDetermines
ReferenceGeneration vs editing mode
Aspect RatioImage dimensions
ResolutionQuality level
StylePrompt enhancement direction
Parsing:
  • If user provides reference images → use image editing mode
  • If user doesn't answer all questions → use sensible defaults and note assumptions
  • Parse: subject, style, mood, special requirements (colors, text, composition)
问题决定因素
参考图生成模式 vs 编辑模式
宽高比图像尺寸
分辨率画质等级
风格提示词优化方向
解析规则:
  • 如果用户提供参考图 → 使用图像编辑模式
  • 如果用户未回答所有问题 → 使用合理默认值并注明假设
  • 解析内容:主体、风格、氛围、特殊要求(颜色、文本、构图)

Step 2: Craft the Prompt

步骤2:编写提示词

Transform the user request into an effective image generation prompt:
  1. Be specific: Add details the user might not have mentioned
  2. Describe style: "digital art", "oil painting", "photograph", "3D render"
  3. Include lighting: "soft lighting", "dramatic shadows", "golden hour"
  4. Specify quality: "highly detailed", "8k", "professional"
Example transformation:
  • User: "a cat in space"
  • Enhanced: "A majestic orange tabby cat floating in outer space, surrounded by colorful nebulae and distant stars, wearing a small astronaut helmet, digital art style, highly detailed, vibrant colors, cinematic lighting"
将用户请求转换为有效的图像生成提示词:
  1. 具体化:补充用户可能未提及的细节
  2. 描述风格:例如“digital art”、“oil painting”、“photograph”、“3D render”
  3. 包含光线:例如“soft lighting”、“dramatic shadows”、“golden hour”
  4. 指定画质:例如“highly detailed”、“8k”、“professional”
示例转换:
  • 用户:"a cat in space"
  • 优化后:"A majestic orange tabby cat floating in outer space, surrounded by colorful nebulae and distant stars, wearing a small astronaut helmet, digital art style, highly detailed, vibrant colors, cinematic lighting"

Step 3: Select the API

步骤3:选择API

Use the model selected by the user in Q0:
  1. Check which API keys are configured in environment:
    • OPENAI_API_KEY
      → GPT Image models available
    • GOOGLE_API_KEY
      → Gemini (Nano Banana Pro) available
  2. If the user's selected model isn't available: Inform them and offer alternatives.
  3. Model mapping from Q0:
    • "Google Gemini (Nano Banana Pro)" → Use
      gemini.py
      with
      gemini-3-pro-image-preview
    • "OpenAI GPT Image 1.5" → Use
      openai_image.py
      with
      gpt-image-1.5
    • "OpenAI GPT Image 1" → Use
      openai_image.py
      with
      gpt-image-1
    • "OpenAI GPT Image 1 Mini" → Use
      openai_image.py
      with
      gpt-image-1-mini
使用用户在问题0中选择的模型:
  1. 检查环境中配置的API密钥
    • OPENAI_API_KEY
      → 可使用GPT Image模型
    • GOOGLE_API_KEY
      → 可使用Gemini(Nano Banana Pro)
  2. 如果用户选择的模型不可用:告知用户并提供替代方案。
  3. 问题0的模型映射
    • "Google Gemini(Nano Banana Pro)" → 使用
      gemini.py
      ,模型为
      gemini-3-pro-image-preview
    • "OpenAI GPT Image 1.5" → 使用
      openai_image.py
      ,模型为
      gpt-image-1.5
    • "OpenAI GPT Image 1" → 使用
      openai_image.py
      ,模型为
      gpt-image-1
    • "OpenAI GPT Image 1 Mini" → 使用
      openai_image.py
      ,模型为
      gpt-image-1-mini

Step 4: Generate the Image

步骤4:生成图像

Execute the appropriate script from
${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/
:
For OpenAI GPT Image - Text to Image:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
  --prompt "your enhanced prompt" \
  --model "gpt-image-1" \
  --size "1024x1024" \
  --quality "high" \
  --output "/path/to/output.png"
For OpenAI GPT Image - With Transparent Background:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
  --prompt "A product icon with no background" \
  --model "gpt-image-1" \
  --background "transparent" \
  --quality "high" \
  --output "/path/to/output.png"
For OpenAI GPT Image - Image Editing (with reference images):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
  --prompt "Add a wizard hat to this cat" \
  --model "gpt-image-1" \
  --image "/path/to/cat.jpg" \
  --input-fidelity "high" \
  --output "/path/to/output.png"
For OpenAI GPT Image - Multiple Reference Images:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
  --prompt "Create a gift basket containing these items" \
  --model "gpt-image-1" \
  --image "/path/to/item1.png" \
  --image "/path/to/item2.png" \
  --image "/path/to/item3.png" \
  --output "/path/to/output.png"
For OpenAI GPT Image - With Mask (Inpainting):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
  --prompt "Replace the pool with a garden" \
  --model "gpt-image-1" \
  --image "/path/to/scene.jpg" \
  --mask "/path/to/mask.png" \
  --output "/path/to/output.png"
For OpenAI GPT Image - Streaming with Partial Images:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
  --prompt "A beautiful sunset over mountains" \
  --model "gpt-image-1" \
  --stream \
  --partial-images 2 \
  --output "/path/to/output.png"
For Google Gemini (Nano Banana Pro) - Text to Image:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/gemini.py \
  --prompt "your enhanced prompt" \
  --model "gemini-3-pro-image-preview" \
  --aspect-ratio "1:1" \
  --resolution "2K" \
  --output "/path/to/output.png"
For Google Gemini - With Reference Images (editing, product placement, etc.):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/gemini.py \
  --prompt "Add a wizard hat to this cat" \
  --image "/path/to/cat.jpg" \
  --aspect-ratio "1:1" \
  --resolution "2K"
For Google Gemini - Multiple Reference Images (composition, style transfer):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/gemini.py \
  --prompt "Place this product on the kitchen counter in this scene" \
  --image "/path/to/product.png" \
  --image "/path/to/kitchen.jpg" \
  --aspect-ratio "16:9" \
  --resolution "2K"
For Google Gemini (Nano Banana - faster, fewer features):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/gemini.py \
  --prompt "your enhanced prompt" \
  --model "gemini-2.5-flash-image" \
  --aspect-ratio "1:1"
执行
${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/
目录下的对应脚本:
OpenAI GPT Image - 文本转图像:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
  --prompt "your enhanced prompt" \
  --model "gpt-image-1" \
  --size "1024x1024" \
  --quality "high" \
  --output "/path/to/output.png"
OpenAI GPT Image - 透明背景:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
  --prompt "A product icon with no background" \
  --model "gpt-image-1" \
  --background "transparent" \
  --quality "high" \
  --output "/path/to/output.png"
OpenAI GPT Image - 图像编辑(带参考图):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
  --prompt "Add a wizard hat to this cat" \
  --model "gpt-image-1" \
  --image "/path/to/cat.jpg" \
  --input-fidelity "high" \
  --output "/path/to/output.png"
OpenAI GPT Image - 多张参考图:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
  --prompt "Create a gift basket containing these items" \
  --model "gpt-image-1" \
  --image "/path/to/item1.png" \
  --image "/path/to/item2.png" \
  --image "/path/to/item3.png" \
  --output "/path/to/output.png"
OpenAI GPT Image - 带蒙版(图像修复):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
  --prompt "Replace the pool with a garden" \
  --model "gpt-image-1" \
  --image "/path/to/scene.jpg" \
  --mask "/path/to/mask.png" \
  --output "/path/to/output.png"
OpenAI GPT Image - 流式传输(带部分图像):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
  --prompt "A beautiful sunset over mountains" \
  --model "gpt-image-1" \
  --stream \
  --partial-images 2 \
  --output "/path/to/output.png"
Google Gemini(Nano Banana Pro)- 文本转图像:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/gemini.py \
  --prompt "your enhanced prompt" \
  --model "gemini-3-pro-image-preview" \
  --aspect-ratio "1:1" \
  --resolution "2K" \
  --output "/path/to/output.png"
Google Gemini - 带参考图(编辑、产品放置等):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/gemini.py \
  --prompt "Add a wizard hat to this cat" \
  --image "/path/to/cat.jpg" \
  --aspect-ratio "1:1" \
  --resolution "2K"
Google Gemini - 多张参考图(合成、风格迁移):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/gemini.py \
  --prompt "Place this product on the kitchen counter in this scene" \
  --image "/path/to/product.png" \
  --image "/path/to/kitchen.jpg" \
  --aspect-ratio "16:9" \
  --resolution "2K"
Google Gemini(Nano Banana - 更快,功能较少):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/gemini.py \
  --prompt "your enhanced prompt" \
  --model "gemini-2.5-flash-image" \
  --aspect-ratio "1:1"

Step 5: Deliver the Result

步骤5:交付结果

  1. Show the generated image to the user
  2. Provide the enhanced prompt used (so they can iterate)
  3. Offer to:
    • Generate variations
    • Try a different style
    • Use a different API/model
    • Refine the prompt
  1. 向用户展示生成的图像
  2. 提供使用的优化提示词(方便用户迭代)
  3. 提供以下选项:
    • 生成变体
    • 尝试不同风格
    • 使用不同的API/模型
    • 优化提示词

Error Handling

错误处理

Missing API key: Inform the user which key is needed and how to set it up:
API rate limit: Suggest waiting or trying the other API.
Content policy violation: Rephrase the prompt to be more appropriate.
Generation failed: Retry with simplified prompt or different API.
缺少API密钥:告知用户需要的密钥类型及设置方法:
API速率限制:建议等待或尝试其他API。
内容政策违规:重新调整提示词使其更合规。
生成失败:使用简化提示词或不同API重试。

Reference Image Use Cases

参考图使用场景

Both OpenAI GPT Image and Google Gemini support reference images for advanced editing:
OpenAI GPT Image: Up to 16 input images, with
input_fidelity: high
for preserving faces/logos Google Gemini: Nano Banana (up to 3), Nano Banana Pro (up to 14)
OpenAI GPT Image和Google Gemini均支持使用参考图进行高级编辑:
OpenAI GPT Image:最多16张输入图像,
input_fidelity: high
可保留人脸/标志 Google Gemini:Nano Banana(最多3张)、Nano Banana Pro(最多14张)

Image Editing

图像编辑

  • "Add a santa hat to this person" + person.jpg
  • "Remove the background and replace with a beach scene" + product.jpg
  • "Change the sofa color to blue" + living_room.jpg
  • "Add a santa hat to this person" + person.jpg
  • "Remove the background and replace with a beach scene" + product.jpg
  • "Change the sofa color to blue" + living_room.jpg

Product Placement

产品放置

  • "Place this product on a marble kitchen counter" + product.png + kitchen.jpg
  • "Show this watch on a person's wrist" + watch.png + arm.jpg
  • "Place this product on a marble kitchen counter" + product.png + kitchen.jpg
  • "Show this watch on a person's wrist" + watch.png + arm.jpg

Style Transfer

风格迁移

  • "Transform this photo into Van Gogh's Starry Night style" + photo.jpg
  • "Make this look like a watercolor painting" + landscape.jpg
  • "Transform this photo into Van Gogh's Starry Night style" + photo.jpg
  • "Make this look like a watercolor painting" + landscape.jpg

Multi-Image Composition

多图像合成

  • "Create a group photo of these people in an office" + person1.jpg + person2.jpg + person3.jpg
  • "Combine these elements into a cohesive scene" + element1.png + element2.png + background.jpg
  • "Create a group photo of these people in an office" + person1.jpg + person2.jpg + person3.jpg
  • "Combine these elements into a cohesive scene" + element1.png + element2.png + background.jpg

Character Consistency

角色一致性

  • "Show this character from a different angle" + character.jpg
  • "Put this person in a superhero costume" + person.jpg
Tip: For best results with reference images, be specific about what you want to preserve vs. change.
  • "Show this character from a different angle" + character.jpg
  • "Put this person in a superhero costume" + person.jpg
提示:使用参考图时,明确说明要保留和修改的内容,以获得最佳效果。

Prompt Engineering Tips

提示词工程技巧

For Photorealism

写实风格

  • Include "photograph", "DSLR", "35mm film"
  • Specify camera settings: "shallow depth of field", "bokeh"
  • Add lighting: "natural light", "studio lighting"
  • 包含"photograph"、"DSLR"、"35mm film"
  • 指定相机设置:"shallow depth of field"、"bokeh"
  • 添加光线描述:"natural light"、"studio lighting"

For Artistic Styles

艺术风格

  • Reference art movements: "impressionist", "art nouveau", "cyberpunk"
  • Name artist styles: "in the style of Studio Ghibli", "Moebius style"
  • Specify medium: "watercolor", "oil painting", "pencil sketch"
  • 参考艺术流派:"impressionist"、"art nouveau"、"cyberpunk"
  • 提及艺术家风格:"in the style of Studio Ghibli"、"Moebius style"
  • 指定媒介:"watercolor"、"oil painting"、"pencil sketch"

For Consistency

一致性

  • Use seed values when available
  • Save successful prompts for reference
  • Note which API produced best results for similar requests
  • 可用时使用种子值
  • 保存成功的提示词以供参考
  • 记录哪种API在类似请求中效果最佳

API Comparison

API对比

FeatureGPT Image 1.5GPT Image 1GPT Image 1 MiniNano BananaNano Banana Pro
ProviderOpenAIOpenAIOpenAIGoogleGoogle
Model IDgpt-image-1.5gpt-image-1gpt-image-1-minigemini-2.5-flash-imagegemini-3-pro-image-preview
Best forState of the artQuality + valueSpeed + costFast generationProfessional assets
Sizes1024², 1536x1024, 1024x1536, autoSameSame1K onlyUp to 4K
Quality optionslow, medium, high, autoSameSameN/AN/A
Aspect ratios3 + autoSameSame10 options10 options
Reference imagesUp to 16Up to 16Up to 16Up to 3Up to 14
Image editingYesYesYesYesYes
Inpainting (mask)YesYesYesYesYes
Transparent backgroundYesYesYesNoNo
StreamingYesYesYesNoNo
Input fidelityhigh/lowhigh/lowlow onlyN/AN/A
Output formatspng, jpeg, webpSameSamepngpng
Compression0-100%SameSameNoNo
Text renderingExcellentExcellentGoodGoodExcellent
Thinking modeNoNoNoNoYes
Max prompt length32,000 chars32,000 chars32,000 charsN/AN/A
Speed~30-60s~20-40s~10-20s~10-20s~30-60s
⚠️ DALL-E 2 and DALL-E 3 are deprecated and will stop being supported on 05/12/2026. Use GPT Image models instead.
特性GPT Image 1.5GPT Image 1GPT Image 1 MiniNano BananaNano Banana Pro
提供商OpenAIOpenAIOpenAIGoogleGoogle
模型IDgpt-image-1.5gpt-image-1gpt-image-1-minigemini-2.5-flash-imagegemini-3-pro-image-preview
最佳场景最先进画质画质与性价比平衡速度与成本优先快速生成专业级资产生成
尺寸1024²、1536x1024、1024x1536、auto相同相同仅1K最高4K
画质选项low、medium、high、auto相同相同
宽高比3种 + auto相同相同10种10种
参考图数量最多16张最多16张最多16张最多3张最多14张
图像编辑
蒙版修复
透明背景
流式传输
输入保真度high/lowhigh/low仅low
输出格式png、jpeg、webp相同相同pngpng
压缩率0-100%相同相同
文本渲染优秀优秀良好良好优秀
思考模式
最大提示词长度32000字符32000字符32000字符
速度~30-60秒~20-40秒~10-20秒~10-20秒~30-60秒
⚠️ DALL-E 2和DALL-E 3已被弃用,将于2026年5月12日停止支持。请使用GPT Image模型替代。