ai-image-generator

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

AI Image Generator

AI图片生成器

Generate images using AI APIs (Google Gemini and OpenAI GPT). This skill teaches the prompting patterns and API mechanics for producing professional images directly from Claude Code.
Managed alternative: If you don't want to manage API keys, ImageBot provides a managed image generation service with album templates and brand kit support.
使用AI API(Google Gemini和OpenAI GPT)生成图片。本技能将讲解直接通过Claude Code生成专业图片的提示词模式和API机制。
托管替代方案:如果你不想管理API密钥,ImageBot提供了带有相册模板和品牌套件支持的托管图片生成服务。

Model Selection

模型选择

Choose the right model for the job:
NeedModelWhy
Scenes / stock photosGemini 3.1 Flash ImageBest depth, complexity, environmental context
Transparent icons / logosGPT Image 1.5Native RGBA alpha channel (
background: "transparent"
)
Text on imagesGPT Image 1.590% accurate text rendering
Drafts / iterationGemini 2.5 Flash ImageFree tier (~500/day)
Final client assetsGemini 3 Pro ImageHigher detail, better style consistency
根据需求选择合适的模型:
需求模型原因
场景图/图库照片Gemini 3.1 Flash Image在景深、复杂度和环境背景表现上最佳
透明图标/LogoGPT Image 1.5原生支持RGBA alpha通道(
background: "transparent"
图片添加文字GPT Image 1.5文字渲染准确率达90%
草稿/迭代生成Gemini 2.5 Flash Image免费额度(每日约500次)
最终客户素材Gemini 3 Pro Image细节更丰富,风格一致性更好

Model IDs

模型ID

ModelAPI IDProvider
Gemini 3.1 Flash Image
gemini-3.1-flash-image-preview
Google AI
Gemini 3 Pro Image
gemini-3-pro-image-preview
Google AI
Gemini 2.5 Flash Image
gemini-2.5-flash-image
Google AI
GPT Image 1.5
gpt-image-1.5
OpenAI
Verify model IDs before use — they change frequently:
bash
curl -s "https://generativelanguage.googleapis.com/v1beta/models?key=$GEMINI_API_KEY" | python3 -c "import sys,json; [print(m['name']) for m in json.load(sys.stdin)['models'] if 'image' in m['name'].lower()]"
模型API ID提供商
Gemini 3.1 Flash Image
gemini-3.1-flash-image-preview
Google AI
Gemini 3 Pro Image
gemini-3-pro-image-preview
Google AI
Gemini 2.5 Flash Image
gemini-2.5-flash-image
Google AI
GPT Image 1.5
gpt-image-1.5
OpenAI
使用前请验证模型ID——这些ID会经常变更:
bash
curl -s "https://generativelanguage.googleapis.com/v1beta/models?key=$GEMINI_API_KEY" | python3 -c "import sys,json; [print(m['name']) for m in json.load(sys.stdin)['models'] if 'image' in m['name'].lower()]"

The 5-Part Prompting Framework

五部分提示词框架

Build prompts in this order for consistent results:
按照以下顺序构建提示词,以获得一致的生成结果:

1. Image Type

1. 图片类型

Set the genre: "A photorealistic photograph", "An isometric illustration", "A flat vector icon"
设定图片风格:“一张写实风格照片”、“一张等距插画”、“一张扁平化矢量图标”

2. Subject

2. 主体

Who or what, with specific details: "of a warm, approachable Australian woman in her early 30s, smiling naturally"
明确主体及细节:“一位30岁左右、亲切友善的澳大利亚女性,自然微笑”

3. Environment

3. 环境

Setting and spatial relationships: "in a bright modern home with terracotta decor on wooden shelves behind her"
设定场景和空间关系:“在一个明亮的现代住宅中,背景是木质架子上的赤陶装饰”

4. Technical Specs

4. 技术参数

Camera and lighting: "Shot at 85mm f/2.0, natural window light, head and shoulders framing"
相机和灯光设置:“使用85mm f/2.0镜头拍摄,自然光,头肩构图”

5. Constraints

5. 约束条件

What to exclude: "Photorealistic, no text, no watermarks, no logos"
明确排除内容:“写实风格,无文字,无水印,无Logo”

Example (Good vs Bad)

示例(优秀vs糟糕)

BAD — keyword soup:
"professional woman, spa, warm lighting, high quality, 4K"

GOOD — narrative direction:
"A professional skin treatment scene in a warm clinical setting.
A practitioner wearing blue medical gloves uses a microneedling pen
on the client's forehead. The client lies on a white treatment bed,
eyes closed, relaxed. Warm golden-hour light from a window to the
left. Terracotta-toned wall visible in the background. Shot at
85mm f/2.0, shallow depth of field. No text, no watermarks."
糟糕——关键词堆砌:
"职业女性,spa,暖光,高质量,4K"

优秀——叙事式引导:
"一个温暖的临床环境中的专业皮肤护理场景。
一位戴着蓝色医用手套的从业者,使用微针笔为客户的额头进行护理。客户躺在白色护理床上,双眼紧闭,神态放松。左侧窗户透入温暖的黄金时段光线。背景可见赤陶色调的墙面。使用85mm f/2.0镜头拍摄,浅景深。无文字,无水印。"

Workflow

工作流程

1. Determine Image Need

1. 明确图片需求

PurposeAspect RatioModel
Hero banner16:9 or 21:9Gemini
Service card4:3 or 3:4Gemini
Profile / avatar1:1Gemini
Icon / badge1:1GPT (transparent)
OG / social share1.91:1Gemini
Instagram post1:1 or 4:5Gemini
Mobile hero9:16Gemini
用途宽高比模型
首页横幅16:9 或 21:9Gemini
服务卡片4:3 或 3:4Gemini
头像/个人资料图1:1Gemini
图标/徽章1:1GPT(透明背景)
OG图/社交分享图1.91:1Gemini
Instagram帖子1:1 或 4:5Gemini
移动端首页横幅9:16Gemini

2. Build the Prompt

2. 构建提示词

Use the 5-part framework. Refer to
references/prompting-guide.md
for detailed photography parameters.
使用五部分框架。可参考
references/prompting-guide.md
获取详细摄影参数。

3. Generate via API

3. 通过API生成图片

Gemini (Python — handles shell escaping correctly)

Gemini(Python——可正确处理Shell转义)

python
python3 << 'PYEOF'
import json, base64, urllib.request, os, sys

GEMINI_API_KEY = os.environ.get("GEMINI_API_KEY")
if not GEMINI_API_KEY:
    print("Set GEMINI_API_KEY environment variable"); sys.exit(1)

model = "gemini-3.1-flash-image-preview"
url = f"https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent?key={GEMINI_API_KEY}"

prompt = """A professional photograph of a modern co-working space in
Newcastle, Australia. Natural light floods through floor-to-ceiling
windows. Three people collaborate at a standing desk — one pointing
at a laptop screen. Exposed brick wall, potted fiddle-leaf fig,
coffee cups on the desk. Shot at 35mm f/4.0, environmental portrait
style. No text, no watermarks, no logos."""

payload = json.dumps({
    "contents": [{"parts": [{"text": prompt}]}],
    "generationConfig": {
        "responseModalities": ["TEXT", "IMAGE"],
        "temperature": 0.8
    }
}).encode()

req = urllib.request.Request(url, data=payload, headers={
    "Content-Type": "application/json",
    "User-Agent": "ImageGen/1.0"
})

resp = urllib.request.urlopen(req, timeout=120)
result = json.loads(resp.read())
python
python3 << 'PYEOF'
import json, base64, urllib.request, os, sys

GEMINI_API_KEY = os.environ.get("GEMINI_API_KEY")
if not GEMINI_API_KEY:
    print("Set GEMINI_API_KEY environment variable"); sys.exit(1)

model = "gemini-3.1-flash-image-preview"
url = f"https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent?key={GEMINI_API_KEY}"

prompt = """A professional photograph of a modern co-working space in
Newcastle, Australia. Natural light floods through floor-to-ceiling
windows. Three people collaborate at a standing desk — one pointing
at a laptop screen. Exposed brick wall, potted fiddle-leaf fig,
coffee cups on the desk. Shot at 35mm f/4.0, environmental portrait
style. No text, no watermarks, no logos."""

payload = json.dumps({
    "contents": [{"parts": [{"text": prompt}]}],
    "generationConfig": {
        "responseModalities": ["TEXT", "IMAGE"],
        "temperature": 0.8
    }
}).encode()

req = urllib.request.Request(url, data=payload, headers={
    "Content-Type": "application/json",
    "User-Agent": "ImageGen/1.0"
})

resp = urllib.request.urlopen(req, timeout=120)
result = json.loads(resp.read())

Extract image from response

Extract image from response

for part in result["candidates"][0]["content"]["parts"]: if "inlineData" in part: img_data = base64.b64decode(part["inlineData"]["data"]) output_path = "hero-image.png" with open(output_path, "wb") as f: f.write(img_data) print(f"Saved: {output_path} ({len(img_data):,} bytes)") break PYEOF
undefined
for part in result["candidates"][0]["content"]["parts"]: if "inlineData" in part: img_data = base64.b64decode(part["inlineData"]["data"]) output_path = "hero-image.png" with open(output_path, "wb") as f: f.write(img_data) print(f"Saved: {output_path} ({len(img_data):,} bytes)") break PYEOF
undefined

GPT (Transparent Icons)

GPT(透明图标生成)

python
python3 << 'PYEOF'
import json, base64, urllib.request, os, sys

OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
if not OPENAI_API_KEY:
    print("Set OPENAI_API_KEY environment variable"); sys.exit(1)

url = "https://api.openai.com/v1/images/generations"

payload = json.dumps({
    "model": "gpt-image-1.5",
    "prompt": "A minimal, clean plumbing wrench icon. Flat design, single consistent stroke weight, modern style. On a transparent background.",
    "n": 1,
    "size": "1024x1024",
    "background": "transparent",
    "output_format": "png"
}).encode()

req = urllib.request.Request(url, data=payload, headers={
    "Content-Type": "application/json",
    "Authorization": f"Bearer {OPENAI_API_KEY}"
})

resp = urllib.request.urlopen(req, timeout=120)
result = json.loads(resp.read())

img_data = base64.b64decode(result["data"][0]["b64_json"])
with open("icon-wrench.png", "wb") as f:
    f.write(img_data)
print(f"Saved: icon-wrench.png ({len(img_data):,} bytes)")
PYEOF
python
python3 << 'PYEOF'
import json, base64, urllib.request, os, sys

OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
if not OPENAI_API_KEY:
    print("Set OPENAI_API_KEY environment variable"); sys.exit(1)

url = "https://api.openai.com/v1/images/generations"

payload = json.dumps({
    "model": "gpt-image-1.5",
    "prompt": "A minimal, clean plumbing wrench icon. Flat design, single consistent stroke weight, modern style. On a transparent background.",
    "n": 1,
    "size": "1024x1024",
    "background": "transparent",
    "output_format": "png"
}).encode()

req = urllib.request.Request(url, data=payload, headers={
    "Content-Type": "application/json",
    "Authorization": f"Bearer {OPENAI_API_KEY}"
})

resp = urllib.request.urlopen(req, timeout=120)
result = json.loads(resp.read())

img_data = base64.b64decode(result["data"][0]["b64_json"])
with open("icon-wrench.png", "wb") as f:
    f.write(img_data)
print(f"Saved: icon-wrench.png ({len(img_data):,} bytes)")
PYEOF

4. Save and Optimise

4. 保存与优化

Save generated images to
.jez/artifacts/
or the user's specified path.
Post-processing (optional):
bash
undefined
将生成的图片保存至
.jez/artifacts/
或用户指定路径。
后期处理(可选):
bash
undefined

Convert to WebP for web use

转换为WebP格式用于网页

python3 -c " from PIL import Image img = Image.open('hero-image.png') img.save('hero-image.webp', 'WEBP', quality=85) print(f'WebP: {img.size[0]}x{img.size[1]}') "
python3 -c " from PIL import Image img = Image.open('hero-image.png') img.save('hero-image.webp', 'WEBP', quality=85) print(f'WebP: {img.size[0]}x{img.size[1]}') "

Trim whitespace from transparent icons

裁剪透明图标周围的空白

python3 -c " from PIL import Image img = Image.open('icon.png') trimmed = img.crop(img.getbbox()) trimmed.save('icon-trimmed.png') "
undefined
python3 -c " from PIL import Image img = Image.open('icon.png') trimmed = img.crop(img.getbbox()) trimmed.save('icon-trimmed.png') "
undefined

5. Quality Check (Optional)

5. 质量检查(可选)

Send the generated image back to a vision model for QA:
python
undefined
将生成的图片发送至视觉模型进行质量验证:
python
undefined

Send to Gemini Flash for critique

发送至Gemini Flash进行评估

critique_prompt = """Review this image for:
  1. AI artifacts (extra fingers, floating objects, text errors)
  2. Technical accuracy (wrong equipment, unsafe positioning)
  3. Composition issues (awkward cropping, cluttered background)
  4. Style consistency with a professional stock photo
List any issues found, or say 'PASS' if the image is production-ready."""

If issues are found, append them as negative guidance to the original prompt and regenerate.
critique_prompt = """Review this image for:
  1. AI artifacts (extra fingers, floating objects, text errors)
  2. Technical accuracy (wrong equipment, unsafe positioning)
  3. Composition issues (awkward cropping, cluttered background)
  4. Style consistency with a professional stock photo
List any issues found, or say 'PASS' if the image is production-ready."""

如果发现问题,将问题作为负面引导添加到原始提示词中,重新生成图片。

Multi-Turn Editing

多轮编辑

Gemini supports editing a generated image across conversation turns. The key requirement: preserve thought signatures from model responses.
python
undefined
Gemini支持在对话轮次中编辑已生成的图片。核心要求:保留模型响应中的所有思维签名(thought signatures)
python
undefined

Turn 1: Generate base image

第一轮:生成基础图片

contents = [{"role": "user", "parts": [{"text": "Scene prompt..."}]}]
contents = [{"role": "user", "parts": [{"text": "Scene prompt..."}]}]

The response includes thoughtSignature on parts — preserve them ALL

响应结果的parts中包含thoughtSignature——需完整保留所有内容

Turn 2: Edit the image

第二轮:编辑图片

contents = [ {"role": "user", "parts": [{"text": "Original prompt"}]}, {"role": "model", "parts": response_parts_with_signatures}, # Keep intact {"role": "user", "parts": [{"text": "Edit: change the wall colour to blue. Keep everything else exactly the same."}]} ]

**Edit prompt pattern**: Always specify what to KEEP unchanged, not just what to change. The model treats unlisted elements as free to modify.
GOOD: "Edit this image: keep the people, desk, and window unchanged. Only change: wall colour from terracotta to ocean blue."
BAD: "Now make the wall blue." (Model may change everything else too)
undefined
contents = [ {"role": "user", "parts": [{"text": "Original prompt"}]}, {"role": "model", "parts": response_parts_with_signatures}, # 完整保留 {"role": "user", "parts": [{"text": "Edit: change the wall colour to blue. Keep everything else exactly the same."}]} ]

**编辑提示词模式**:务必明确指定需要**保留不变**的内容,而不仅仅是要修改的内容。模型会将未列出的元素视为可自由修改的部分。
优秀:"编辑此图片:保留人物、桌子和窗户不变。 仅修改:将墙面颜色从赤陶色改为海洋蓝。"
糟糕:"现在把墙改成蓝色。" (模型可能会同时修改其他所有内容)
undefined

API Key Setup

API密钥设置

ProviderGet key atEnv variable
Google Geminiaistudio.google.com
GEMINI_API_KEY
OpenAIplatform.openai.com
OPENAI_API_KEY
bash
export GEMINI_API_KEY="your-key-here"
export OPENAI_API_KEY="your-key-here"
提供商获取地址环境变量
Google Geminiaistudio.google.com
GEMINI_API_KEY
OpenAIplatform.openai.com
OPENAI_API_KEY
bash
export GEMINI_API_KEY="your-key-here"
export OPENAI_API_KEY="your-key-here"

Common Mistakes

常见错误

MistakeFix
Using curl for Gemini promptsUse Python — shell escaping breaks on apostrophes
"Beautiful, professional, high quality"Use concrete specs: "85mm f/1.8, golden hour light"
Not specifying what to excludeAlways end with "No text, no watermarks, no logos"
Requesting transparent PNG from GeminiGemini cannot do transparency — use GPT with
background: "transparent"
American defaults for AU businessesExplicitly specify "Australian" + local architecture, vegetation
Generic data for model IDVerify current model IDs — they change frequently
错误修复方案
使用curl调用Gemini提示词使用Python——Shell转义会在遇到撇号时出错
使用“精美、专业、高质量”这类模糊描述使用具体参数:“85mm f/1.8镜头,黄金时段光线”
未明确排除内容提示词末尾务必添加“无文字,无水印,无Logo”
要求Gemini生成透明PNGGemini不支持透明背景——使用GPT并设置
background: "transparent"
为澳大利亚企业生成带有美式默认风格的图片明确指定“澳大利亚”+当地建筑、植被元素
使用通用的模型ID数据验证当前模型ID——这些ID会经常变更