imagegen

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Image Generation Skill

图片生成技能

Generates or edits images for the current project (e.g., website assets, game assets, UI mockups, product mockups, wireframes, logo design, photorealistic images, infographics). Defaults to
gpt-image-1.5
and the OpenAI Image API, and prefers the bundled CLI for deterministic, reproducible runs.
为当前项目生成或编辑图片(例如:网站资产、游戏资产、UI原型、产品原型、线框图、标志设计、逼真照片、信息图)。默认使用
gpt-image-1.5
和OpenAI Image API,优先使用捆绑的CLI以实现可确定、可复现的运行。

When to use

适用场景

  • Generate a new image (concept art, product shot, cover, website hero)
  • Edit an existing image (inpainting, masked edits, lighting or weather transformations, background replacement, object removal, compositing, transparent background)
  • Batch runs (many prompts, or many variants across prompts)
  • 生成新图片(概念艺术、产品照片、封面、网站首屏图)
  • 编辑现有图片(修复、蒙版编辑、光线或天气变换、背景替换、物体移除、合成、透明背景)
  • 批量运行(多个提示词,或单个提示词生成多个变体)

Decision tree (generate vs edit vs batch)

决策树(生成 vs 编辑 vs 批量)

  • If the user provides an input image (or says “edit/retouch/inpaint/mask/translate/localize/change only X”) → edit
  • Else if the user needs many different prompts/assets → generate-batch
  • Else → generate
  • 如果用户提供了输入图片(或提到“编辑/润色/修复/蒙版/翻译/本地化/仅修改X”)→ 编辑
  • 否则如果用户需要多个不同的提示词/资产 → 批量生成
  • 否则 → 生成

Workflow

工作流程

  1. Decide intent: generate vs edit vs batch (see decision tree above).
  2. Collect inputs up front: prompt(s), exact text (verbatim), constraints/avoid list, and any input image(s)/mask(s). For multi-image edits, label each input by index and role; for edits, list invariants explicitly.
  3. If batch: write a temporary JSONL under tmp/ (one job per line), run once, then delete the JSONL.
  4. Augment prompt into a short labeled spec (structure + constraints) without inventing new creative requirements.
  5. Run the bundled CLI (
    scripts/image_gen.py
    ) with sensible defaults (see references/cli.md).
  6. For complex edits/generations, inspect outputs (open/view images) and validate: subject, style, composition, text accuracy, and invariants/avoid items.
  7. Iterate: make a single targeted change (prompt or mask), re-run, re-check.
  8. Save/return final outputs and note the final prompt + flags used.
  1. 确定意图:生成、编辑还是批量(参考上述决策树)。
  2. 提前收集输入信息:提示词(多个)、精确文本(原文)、约束/避免清单,以及任何输入图片/蒙版。对于多图片编辑,按索引和作用标记每个输入;对于编辑操作,明确列出不变项。
  3. 如果是批量操作:在tmp/目录下创建临时JSONL文件(每行一个任务),运行一次后删除该JSONL文件。
  4. 将提示词扩充为简短的带标签规范(结构+约束),不得添加新的创意要求。
  5. 使用合理的默认值运行捆绑的CLI(
    scripts/image_gen.py
    )(参考references/cli.md)。
  6. 对于复杂的编辑/生成任务,检查输出(打开/查看图片)并验证:主题、风格、构图、文本准确性,以及不变项/避免项。
  7. 迭代:进行单次针对性修改(提示词或蒙版),重新运行并再次检查。
  8. 保存/返回最终输出,并记录使用的最终提示词和参数。

Temp and output conventions

临时文件与输出规范

  • Use
    tmp/imagegen/
    for intermediate files (for example JSONL batches); delete when done.
  • Write final artifacts under
    output/imagegen/
    when working in this repo.
  • Use
    --out
    or
    --out-dir
    to control output paths; keep filenames stable and descriptive.
  • 使用
    tmp/imagegen/
    存储中间文件(例如JSONL批量文件);完成后删除。
  • 在当前仓库中工作时,将最终产物保存到
    output/imagegen/
    目录下。
  • 使用
    --out
    --out-dir
    控制输出路径;保持文件名稳定且具有描述性。

Dependencies (install if missing)

依赖项(缺失时安装)

Prefer
uv
for dependency management.
Python packages:
uv pip install openai pillow
If
uv
is unavailable:
python3 -m pip install openai pillow
优先使用
uv
进行依赖管理。
Python包:
uv pip install openai pillow
如果
uv
不可用:
python3 -m pip install openai pillow

Environment

环境配置

  • OPENAI_API_KEY
    must be set for live API calls.
If the key is missing, give the user these steps:
  1. Create an API key in the OpenAI platform UI: https://platform.openai.com/api-keys
  2. Set
    OPENAI_API_KEY
    as an environment variable in their system.
  3. Offer to guide them through setting the environment variable for their OS/shell if needed.
  • Never ask the user to paste the full key in chat. Ask them to set it locally and confirm when ready.
If installation isn't possible in this environment, tell the user which dependency is missing and how to install it locally.
  • 实时API调用必须设置
    OPENAI_API_KEY
如果缺少密钥,请告知用户以下步骤:
  1. 在OpenAI平台UI中创建API密钥:https://platform.openai.com/api-keys
  2. 在系统中设置
    OPENAI_API_KEY
    为环境变量。
  3. 如果需要,可指导用户根据其操作系统/Shell设置环境变量。
  • 切勿要求用户在聊天中粘贴完整密钥。请让用户在本地设置并确认准备就绪。
如果当前环境无法安装依赖项,请告知用户缺失的依赖项及其本地安装方法。

Defaults & rules

默认设置与规则

  • Use
    gpt-image-1.5
    unless the user explicitly asks for
    gpt-image-1-mini
    or explicitly prefers a cheaper/faster model.
  • Assume the user wants a new image unless they explicitly ask for an edit.
  • Require
    OPENAI_API_KEY
    before any live API call.
  • Use the OpenAI Python SDK (
    openai
    package) for all API calls; do not use raw HTTP.
  • If the user requests edits, use
    client.images.edit(...)
    and include input images (and mask if provided).
  • Prefer the bundled CLI (
    scripts/image_gen.py
    ) over writing new one-off scripts.
  • Never modify
    scripts/image_gen.py
    . If something is missing, ask the user before doing anything else.
  • If the result isn’t clearly relevant or doesn’t satisfy constraints, iterate with small targeted prompt changes; only ask a question if a missing detail blocks success.
  • 除非用户明确要求使用
    gpt-image-1-mini
    或明确偏好更便宜/更快的模型,否则默认使用
    gpt-image-1.5
  • 除非用户明确要求编辑,否则默认认为用户需要生成新图片。
  • 进行任何实时API调用前必须要求
    OPENAI_API_KEY
  • 所有API调用使用OpenAI Python SDK(
    openai
    包);不得使用原始HTTP请求。
  • 如果用户要求编辑,使用
    client.images.edit(...)
    并包含输入图片(如果提供了蒙版也需包含)。
  • 优先使用捆绑的CLI(
    scripts/image_gen.py
    )而非编写一次性脚本。
  • 切勿修改
    scripts/image_gen.py
    。如果缺少功能,请先询问用户。
  • 如果结果明显不相关或不满足约束条件,通过小幅度针对性修改提示词进行迭代;仅当缺失关键细节阻碍任务完成时才提问。

Prompt augmentation

提示词增强

Reformat user prompts into a structured, production-oriented spec. Only make implicit details explicit; do not invent new requirements.
将用户的提示词重新格式化为结构化、面向生产的规范。仅明确用户已隐含或在其他地方提供的细节;不得添加新要求。

Use-case taxonomy (exact slugs)

用例分类(固定标识)

Classify each request into one of these buckets and keep the slug consistent across prompts and references.
Generate:
  • photorealistic-natural — candid/editorial lifestyle scenes with real texture and natural lighting.
  • product-mockup — product/packaging shots, catalog imagery, merch concepts.
  • ui-mockup — app/web interface mockups that look shippable.
  • infographic-diagram — diagrams/infographics with structured layout and text.
  • logo-brand — logo/mark exploration, vector-friendly.
  • illustration-story — comics, children’s book art, narrative scenes.
  • stylized-concept — style-driven concept art, 3D/stylized renders.
  • historical-scene — period-accurate/world-knowledge scenes.
Edit:
  • text-localization — translate/replace in-image text, preserve layout.
  • identity-preserve — try-on, person-in-scene; lock face/body/pose.
  • precise-object-edit — remove/replace a specific element (incl. interior swaps).
  • lighting-weather — time-of-day/season/atmosphere changes only.
  • background-extraction — transparent background / clean cutout.
  • style-transfer — apply reference style while changing subject/scene.
  • compositing — multi-image insert/merge with matched lighting/perspective.
  • sketch-to-render — drawing/line art to photoreal render.
Quick clarification (augmentation vs invention):
  • If the user says “a hero image for a landing page”, you may add layout/composition constraints that are implied by that use (e.g., “generous negative space on the right for headline text”).
  • Do not introduce new creative elements the user didn’t ask for (e.g., adding a mascot, changing the subject, inventing brand names/logos).
Template (include only relevant lines):
Use case: <taxonomy slug>
Asset type: <where the asset will be used>
Primary request: <user's main prompt>
Scene/background: <environment>
Subject: <main subject>
Style/medium: <photo/illustration/3D/etc>
Composition/framing: <wide/close/top-down; placement>
Lighting/mood: <lighting + mood>
Color palette: <palette notes>
Materials/textures: <surface details>
Quality: <low/medium/high/auto>
Input fidelity (edits): <low/high>
Text (verbatim): "<exact text>"
Constraints: <must keep/must avoid>
Avoid: <negative constraints>
Augmentation rules:
  • Keep it short; add only details the user already implied or provided elsewhere.
  • Always classify the request into a taxonomy slug above and tailor constraints/composition/quality to that bucket. Use the slug to find the matching example in
    references/sample-prompts.md
    .
  • If the user gives a broad request (e.g., "Generate images for this website"), use judgment to propose tasteful, context-appropriate assets and map each to a taxonomy slug.
  • For edits, explicitly list invariants ("change only X; keep Y unchanged").
  • If any critical detail is missing and blocks success, ask a question; otherwise proceed.
将每个请求归类到以下类别之一,并在提示词和参考资料中保持标识一致。
生成:
  • photorealistic-natural — 具有真实纹理和自然光线的 candid/编辑风格生活场景。
  • product-mockup — 产品/包装照片、目录图像、周边概念图。
  • ui-mockup — 看起来可上线的应用/网页界面原型。
  • infographic-diagram — 具有结构化布局和文本的图表/信息图。
  • logo-brand — 标志/标识探索,支持矢量格式。
  • illustration-story — 漫画、儿童读物插画、叙事场景。
  • stylized-concept — 风格驱动的概念艺术、3D/风格化渲染图。
  • historical-scene — 符合时代背景/包含世界知识的场景。
编辑:
  • text-localization — 翻译/替换图片中的文本,保留布局。
  • identity-preserve — 试穿、人物入景;锁定面部/身体/姿势。
  • precise-object-edit — 移除/替换特定元素(包括内部替换)。
  • lighting-weather — 仅修改时间/季节/氛围。
  • background-extraction — 透明背景/清晰抠图。
  • style-transfer — 应用参考风格,同时更改主题/场景。
  • compositing — 多图片插入/合并,匹配光线/视角。
  • sketch-to-render — 草图/线稿转为逼真渲染图。
快速说明(增强 vs 创造):
  • 如果用户说“为着陆页生成首屏图”,你可以添加该用途隐含的布局/构图约束(例如:“右侧留有充足留白用于标题文本”)。
  • 不得引入用户未要求的新创意元素(例如:添加吉祥物、更改主题、创造品牌名称/标志)。
模板(仅包含相关行):
用例:<分类标识>
资产类型:<资产用途>
核心请求:<用户的主要提示词>
场景/背景:<环境>
主体:<核心对象>
风格/媒介:<照片/插画/3D等>
构图/取景:<宽幅/特写/俯视;布局>
光线/氛围:<光线+氛围>
调色板:<调色板说明>
材质/纹理:<表面细节>
质量:<低/中/高/自动>
输入保真度(编辑):<低/高>
文本(原文):"<精确文本>"
约束:<必须保留/必须避免>
避免:<负向约束>
增强规则:
  • 保持简洁;仅添加用户已隐含或提供的细节。
  • 始终将请求归类到上述分类标识中,并根据该类别调整约束/构图/质量。使用该标识在
    references/sample-prompts.md
    中查找匹配示例。
  • 如果用户给出宽泛的请求(例如:“为这个网站生成图片”),自行判断提出符合品味、上下文合适的资产,并将每个资产映射到对应的分类标识。
  • 对于编辑操作,明确列出不变项(“仅修改X;保持Y不变”)。
  • 如果缺少关键细节阻碍任务完成,则提问;否则继续执行。

Examples

示例

Generation example (hero image)

生成示例(首屏图)

Use case: stylized-concept
Asset type: landing page hero
Primary request: a minimal hero image of a ceramic coffee mug
Style/medium: clean product photography
Composition/framing: centered product, generous negative space on the right
Lighting/mood: soft studio lighting
Constraints: no logos, no text, no watermark
用例:stylized-concept
资产类型:着陆页首屏图
核心请求:一个简约的陶瓷咖啡杯首屏图
风格/媒介:简洁产品摄影
构图/取景:产品居中,右侧留有充足留白
光线/氛围:柔和工作室灯光
约束:无标志、无文本、无水印

Edit example (invariants)

编辑示例(不变项)

Use case: precise-object-edit
Asset type: product photo background replacement
Primary request: replace the background with a warm sunset gradient
Constraints: change only the background; keep the product and its edges unchanged; no text; no watermark
用例:precise-object-edit
资产类型:产品照片背景替换
核心请求:将背景替换为暖色调日落渐变
约束:仅修改背景;保持产品及其边缘不变;无文本;无水印

Prompting best practices (short list)

提示词最佳实践(精简版)

  • Structure prompt as scene -> subject -> details -> constraints.
  • Include intended use (ad, UI mock, infographic) to set the mode and polish level.
  • Use camera/composition language for photorealism.
  • Quote exact text and specify typography + placement.
  • For tricky words, spell them letter-by-letter and require verbatim rendering.
  • For multi-image inputs, reference images by index and describe how to combine them.
  • For edits, repeat invariants every iteration to reduce drift.
  • Iterate with single-change follow-ups.
  • For latency-sensitive runs, start with quality=low; use quality=high for text-heavy or detail-critical outputs.
  • For strict edits (identity/layout lock), consider input_fidelity=high.
  • If results feel “tacky”, add a brief “Avoid:” line (stock-photo vibe; cheesy lens flare; oversaturated neon; harsh bloom; oversharpening; clutter) and specify restraint (“editorial”, “premium”, “subtle”).
More principles:
references/prompting.md
. Copy/paste specs:
references/sample-prompts.md
.
  • 按场景→主体→细节→约束的结构组织提示词。
  • 包含预期用途(广告、UI原型、信息图)以设置模式和精细度。
  • 对于逼真效果,使用相机/构图术语。
  • 引用精确文本并指定排版+位置。
  • 对于生僻词,逐个字母拼写并要求严格按原文呈现。
  • 对于多图片输入,按索引引用图片并说明如何组合。
  • 对于编辑操作,每次迭代重复不变项以减少偏差。
  • 通过单次修改的后续操作进行迭代。
  • 对延迟敏感的任务,从quality=low开始;对于文本密集或细节关键的输出,使用quality=high。
  • 对于严格的编辑(身份/布局锁定),考虑设置input_fidelity=high。
  • 如果结果感觉“俗气”,添加简短的“避免:”行(库存照片风格;俗气的镜头光晕;过度饱和的霓虹;刺眼的 bloom;过度锐化;杂乱)并指定克制风格(“编辑级”、“高端”、“微妙”)。
更多原则:
references/prompting.md
。可复制的规范:
references/sample-prompts.md

Guidance by asset type

按资产类型的指导

Asset-type templates (website assets, game assets, wireframes, logo) are consolidated in
references/sample-prompts.md
.
资产类型模板(网站资产、游戏资产、线框图、标志)整合在
references/sample-prompts.md
中。

CLI + environment notes

CLI与环境说明

  • CLI commands + examples:
    references/cli.md
  • API parameter quick reference:
    references/image-api.md
  • If network approvals / sandbox settings are getting in the way:
    references/codex-network.md
  • CLI命令+示例:
    references/cli.md
  • API参数速查:
    references/image-api.md
  • 如果网络审批/沙箱设置造成阻碍:
    references/codex-network.md

Reference map

参考资源映射

  • references/cli.md
    : how to run image generation/edits/batches via
    scripts/image_gen.py
    (commands, flags, recipes).
  • references/image-api.md
    : what knobs exist at the API level (parameters, sizes, quality, background, edit-only fields).
  • references/prompting.md
    : prompting principles (structure, constraints/invariants, iteration patterns).
  • references/sample-prompts.md
    : copy/paste prompt recipes (generate + edit workflows; examples only).
  • references/codex-network.md
    : environment/sandbox/network-approval troubleshooting.
  • references/cli.md
    :如何通过
    scripts/image_gen.py
    运行图片生成/编辑/批量任务(命令、参数、方案)。
  • references/image-api.md
    :API层面的可配置项(参数、尺寸、质量、背景、仅编辑可用字段)。
  • references/prompting.md
    :提示词原则(结构、约束/不变项、迭代模式)。
  • references/sample-prompts.md
    :可复制的提示词方案(生成+编辑工作流;仅示例)。
  • references/codex-network.md
    :环境/沙箱/网络审批故障排除。