gpt-image-2

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

🎨 GPT Image 2 — Pro Pack on RunComfy

🎨 GPT Image 2 — RunComfy专业包

runcomfy.com · Text-to-image · Edit · GitHub

OpenAI GPT Image 2 (ChatGPT Images 2.0) hosted on the RunComfy Model API — no OpenAI key, async REST.

bash

npx skills add agentspace-so/runcomfy-skills --skill gpt-image-2 -g

runcomfy.com · 文本转图像 · 编辑 · GitHub

托管在RunComfy模型API上的OpenAI GPT Image 2（ChatGPT Images 2.0）——无需OpenAI密钥，支持异步REST调用。

bash

npx skills add agentspace-so/runcomfy-skills --skill gpt-image-2 -g

When to pick this model (vs siblings)

何时选择该模型（对比同类模型）

GPT Image 2's distinct strength is directive precision: it follows multi-element prompts, layout cues, and embedded-text instructions more reliably than its peers. Pick it when what's on the canvas matters more than how stylized it looks.

You want	Use
Embedded text, logos, signage, multilingual typography	GPT Image 2 ✓
Brand-safe, e-commerce / ad / UI mockup imagery	GPT Image 2 ✓
Iterative refinement that holds composition stable	GPT Image 2 ✓
Heavy stylization, painterly look	Flux 2
Hyperrealistic portrait	Nano Banana Pro
Cinematic / aesthetic-first hero shots	Seedream 5

If the user explicitly asked for GPT Image 2 / ChatGPT Image 2 / Image 2, route here regardless — don't second-guess the model choice.

GPT Image 2的独特优势是指令精准度：相比同类模型，它能更可靠地遵循多元素提示、布局线索和嵌入文本指令。当画布内容比风格化效果更重要时，选择它。

你的需求	选择模型
嵌入文本、标识、标牌、多语言排版	GPT Image 2 ✓
品牌合规的电商/广告/UI原型图	GPT Image 2 ✓
保持构图稳定的迭代优化	GPT Image 2 ✓
重度风格化、绘画质感	Flux 2
超写实肖像	Nano Banana Pro
电影级/美学优先的主视觉图	Seedream 5

如果用户明确要求使用GPT Image 2 / ChatGPT Image 2 / Image 2，无论何种场景都选择该模型——不要质疑用户的模型选择。

Prerequisites

前置条件

RunComfy CLI —
```
npm i -g @runcomfy/cli
```
RunComfy account —
```
runcomfy login
```
opens a browser device-code flow.
CI / containers — set
```
RUNCOMFY_TOKEN=<token>
```
instead of
```
runcomfy login
```
.

RunComfy CLI — 执行
```
npm i -g @runcomfy/cli
```
安装
RunComfy账号 — 执行
```
runcomfy login
```
会打开浏览器设备码验证流程
CI/容器环境 — 设置环境变量
```
RUNCOMFY_TOKEN=<token>
```
替代
```
runcomfy login
```

Endpoints + input schema

端点及输入 schema

Two endpoints, same model.

两个端点，使用同一模型。

openai/gpt-image-2/text-to-image

openai/gpt-image-2/text-to-image

Field	Type	Required	Default	Notes
`prompt`	string	yes	—	The positive prompt
`size`	enum	no	`1024_1024`	`1024_1024` (1:1), `1024_1536` (2:3 portrait), `1536_1024` (3:2 landscape) — only these three

字段	类型	必填	默认值	说明
`prompt`	string	是	—	正向提示词
`size`	枚举	否	`1024_1024`	可选值为 `1024_1024` （1:1）、 `1024_1536` （2:3竖版）、 `1536_1024` （3:2横版）——仅支持这三种尺寸

openai/gpt-image-2/edit

openai/gpt-image-2/edit

Field	Type	Required	Default	Notes
`prompt`	string	yes	—	Natural-language edit instruction
`images`	string[]	yes	—	Up to 10 reference image URLs (publicly fetchable HTTPS)
`size`	enum	no	`auto`	`auto` (preserve input ratio), or one of the three fixed sizes above

size=auto

on edit preserves the input aspect ratio — strongly recommended unless the edit explicitly changes framing.

字段	类型	必填	默认值	说明
`prompt`	string	是	—	自然语言格式的编辑指令
`images`	string[]	是	—	最多10张可公开访问的HTTPS参考图片URL
`size`	枚举	否	`auto`	`auto` （保留输入图片比例），或上述三种固定尺寸之一

编辑时设置

size=auto

可保留输入图片的宽高比——除非编辑明确要求改变画幅，否则强烈推荐使用该设置。

How to invoke

调用方式

Text-to-image:

bash

runcomfy run openai/gpt-image-2/text-to-image \
  --input '{"prompt": "<user prompt>", "size": "1024_1536"}' \
  --output-dir <absolute/path>

Edit (single ref):

bash

runcomfy run openai/gpt-image-2/edit \
  --input '{
    "prompt": "<edit instruction>",
    "images": ["https://..."]
  }' \
  --output-dir <absolute/path>

Edit (multi-ref, up to 10):

bash

runcomfy run openai/gpt-image-2/edit \
  --input '{
    "prompt": "compose subject from image 1 into the room from image 2; match the lighting of image 2",
    "images": ["https://...subject.jpg", "https://...room.jpg"]
  }' \
  --output-dir <absolute/path>

The CLI submits, polls every 2s until terminal, then downloads any

*.runcomfy.net

*.runcomfy.com

URL from the result into

--output-dir

. Stdout is the result JSON. Stderr is progress.

For pipe-friendly usage:

bash

runcomfy --output json run openai/gpt-image-2/text-to-image \
  --input '{"prompt":"..."}' --no-wait | jq -r .request_id

文本转图像：

bash

runcomfy run openai/gpt-image-2/text-to-image \
  --input '{"prompt": "<用户提示词>", "size": "1024_1536"}' \
  --output-dir <绝对路径>

单参考图编辑：

bash

runcomfy run openai/gpt-image-2/edit \
  --input '{
    "prompt": "<编辑指令>",
    "images": ["https://..."]
  }' \
  --output-dir <绝对路径>

多参考图编辑（最多10张）：

bash

runcomfy run openai/gpt-image-2/edit \
  --input '{
    "prompt": "将图1中的主体合成到图2的房间中；匹配图2的光线",
    "images": ["https://...subject.jpg", "https://...room.jpg"]
  }' \
  --output-dir <绝对路径>

CLI会提交请求，每2秒轮询一次直到任务完成，然后将结果中所有

*.runcomfy.net

*.runcomfy.com

的URL下载到

--output-dir

指定的目录。标准输出为结果JSON，标准错误输出为进度信息。

管道友好型用法：

bash

runcomfy --output json run openai/gpt-image-2/text-to-image \
  --input '{"prompt":"..."}' --no-wait | jq -r .request_id

Prompting — what actually works

提示词技巧——有效方法

These are model-specific patterns that empirically improve output quality. Apply to text-to-image and edit alike.

Be explicit on subject + setting + mood. "A close-up of a matte ceramic water bottle on warm linen, soft window light, neutral background" — three concrete directives — beats "nice product photo of a bottle".

Quote embedded text exactly. Keep it short. GPT Image 2 is the strongest text-rendering model in this class, but only when you put the literal characters in quotes. Long blocks of text degrade. For multilingual text, name the script: "Japanese kana", "Cyrillic", "Arabic right-to-left".

Use compositional cues directly. "rule of thirds", "close-up", "aerial view", "centered subject", "shallow depth of field" — these have learned-meaning to the model.

Iterate one attribute at a time. When refining, change one thing per iteration (lighting OR background OR pose OR text) and keep the rest of the prompt verbatim. The model holds composition stable across iterations when only one knob moves.

Don't conflict instructions. "no text" + "the word 'AQUA+' on the label" is incoherent — the model will pick one and you don't control which.

Don't pile up styles. "ukiyo-e + watercolor + 8K + cinematic + minimalist" cancels out. Pick one or two style anchors max.

For the edit endpoint specifically:

State preservation goals. "keep the person's pose and face identity unchanged", "keep the brand mark and typography on the package", "keep the overall framing". The model needs to know what NOT to change.
Use directional language for spatial edits. "Move the headline from top-right to bottom-center", not "reposition the headline".
Multi-ref: number the images in the prompt — "subject from image 1, lighting and background from image 2" — and the model will route the cues correctly.

以下是经实践验证可提升输出质量的模型专属提示词模式，适用于文本转图像和编辑场景。

明确主体+场景+氛围。例如“暖亚麻布上哑光陶瓷水瓶的特写，柔和的窗边光线，中性背景”——三个具体指令——效果远优于“好看的水瓶产品图”。

精确引用嵌入文本，保持简短。GPT Image 2是同类模型中文本渲染能力最强的，但只有当你将文字放在引号内时效果才好。长文本会降低输出质量。对于多语言文本，需注明文字类型：“日文假名”、“西里尔字母”、“阿拉伯文从右到左”。

直接使用构图提示词。“三分法”、“特写”、“鸟瞰视角”、“主体居中”、“浅景深”——这些词汇对模型来说有明确的语义。

每次只调整一个属性。优化时，每次只修改一个元素（光线/背景/姿势/文本），其余提示词保持不变。当只有一个变量变化时，模型会在迭代过程中保持构图稳定。

避免指令冲突。“无文本”+“标签上添加文字'AQUA+'”是矛盾的——模型会选择其中一个执行，但你无法控制选择哪一个。

不要堆砌风格。“浮世绘+水彩+8K+电影感+极简主义”会相互抵消。最多选择1-2种风格作为核心。

针对编辑端点的额外技巧：

明确保留目标。例如“保留人物的姿势和面部特征不变”、“保留包装上的品牌标识和排版”、“保留整体画幅”。模型需要知道哪些内容不能修改。
使用方向性语言进行空间编辑。例如“将标题从右上角移到底部中央”，而不是“重新定位标题”。
多参考图：在提示词中给图片编号——“图1的主体，图2的光线和背景”——模型会正确识别线索。

Where it shines

适用场景

Use case	Why GPT Image 2
E-commerce product photography	Reliable text on labels, brand-safe lighting, consistent across SKUs
High-conversion ads	Headline + visual integration in one pass
Brand asset localization	One source asset → many language variants of the same headline
Signage, posters, packaging mock-ups	Text rendering accuracy at multiple scales
UI mockups, scientific illustrations	Layout precision and label legibility

使用场景	选择GPT Image 2的原因
电商产品摄影	标签文本渲染可靠，品牌合规的光线效果，SKU间风格一致
高转化率广告	标题与视觉元素可一次性整合生成
品牌资产本地化	一份源素材可生成多种语言版本的标题
标牌、海报、包装原型	多尺寸下文本渲染精准
UI原型、科学插图	布局精准，标签清晰可读

Sample prompts (verified to produce strong results)

验证有效的示例提示词

Text-to-image — product hero:

A minimal hero product still life: a matte ceramic water bottle on warm linen,
soft window light, the word "AQUA+" in clean sans-serif on the label,
subtle rim highlights, e-commerce ready, 8K detail, neutral background

Text-to-image — multilingual signage:

A small Tokyo café storefront at dusk, warm interior glow,
the sign reads "コーヒー" in bold Japanese kana on a wooden plaque,
shallow depth of field, rule of thirds, cinematic

Edit — background swap with preservation:

Turn the background into a bright minimal white-to-soft-gray studio sweep
with gentle floor shadow; add a large headline in-image that reads
"OPEN STUDIO" in a bold clean sans-serif, high contrast, centered;
keep the main person or product, pose, and face identity unchanged

文本转图像——产品主图：

极简风格的产品静物主图：暖亚麻布上的哑光陶瓷水瓶，
柔和的窗边光线，标签上印有简洁无衬线字体的"AQUA+"，
微妙的边缘高光，适合电商使用，8K细节，中性背景

文本转图像——多语言标牌：

黄昏时分的东京小咖啡馆店面，温暖的室内灯光，
木质招牌上印有粗体日文假名"コーヒー"，
浅景深，三分法构图，电影感

编辑——保留主体的背景替换：

将背景替换为明亮的极简白到浅灰渐变工作室场景，
添加高对比度、居中的醒目标题"OPEN STUDIO"，字体为粗体简洁无衬线；
保留主要人物/产品、姿势和面部特征不变

Limitations

局限性

Only 3 fixed sizes on text-to-image (and the same 3 +
```
auto
```
on edit). Extreme aspect ratios are auto-resized to the nearest supported one.
Prompt length ~ a few thousand tokens. Long blocks of embedded text degrade output.
Edit's multi-image support is "guidance from up to 10 refs", not ControlNet-style stacks. The first image is treated as the primary; the rest provide auxiliary cues.
Photorealism on portraits is not its strongest suit — Nano Banana Pro wins that head-to-head.

文本转图像仅支持3种固定尺寸（编辑支持这3种尺寸+
```
auto
```
）。极端宽高比会自动调整为最接近的支持尺寸。
提示词长度约为几千token。长文本块会降低输出质量。
编辑的多图支持是“最多10张参考图的引导”，而非ControlNet式的图层叠加。第一张图被视为主体，其余提供辅助线索。
肖像写实度并非其强项——Nano Banana Pro在该场景下表现更优。

Exit codes

退出码

The

runcomfy

CLI uses sysexits-style codes:

code	meaning
0	success
64	bad CLI args
65	bad input JSON / schema mismatch (e.g. `size: "2048_2048"` would 422)
69	upstream 5xx
75	retryable: timeout / 429
77	not signed in or token rejected

Full reference: docs.runcomfy.com/cli/troubleshooting.

runcomfy

CLI使用sysexits风格的退出码：

代码	含义
0	成功
64	CLI参数错误
65	输入JSON错误/schema不匹配（例如 `size: "2048_2048"` 会返回422）
69	上游服务5xx错误
75	可重试：超时/429限流
77	未登录或令牌被拒绝

完整参考：docs.runcomfy.com/cli/troubleshooting。

How it works

工作原理

The skill invokes
```
runcomfy run openai/gpt-image-2/<endpoint>
```
with a JSON body matching the schema above.

The CLI POSTs to

https://model-api.runcomfy.net/v1/models/openai/gpt-image-2/<endpoint>

with the user's bearer token.

The Model API returns a
```
request_id
```
; the CLI polls
```
GET .../requests/<id>/status
```
every 2 seconds.
On terminal status, the CLI fetches
```
GET .../requests/<id>/result
```
and downloads any URL whose host ends with
```
.runcomfy.net
```
or
```
.runcomfy.com
```
into
```
--output-dir
```
. Other URLs are listed but not fetched.
```
Ctrl-C
```
while polling sends
```
POST .../requests/<id>/cancel
```
so you don't get billed for GPU you stopped.

技能调用
```
runcomfy run openai/gpt-image-2/<endpoint>
```
，传入符合上述schema的JSON请求体。

CLI携带用户的Bearer令牌向

https://model-api.runcomfy.net/v1/models/openai/gpt-image-2/<endpoint>

发送POST请求。

模型API返回
```
request_id
```
；CLI每2秒轮询
```
GET .../requests/<id>/status
```
获取状态。
任务完成后，CLI获取
```
GET .../requests/<id>/result
```
，并将所有主机后缀为
```
.runcomfy.net
```
或
```
.runcomfy.com
```
的URL下载到
```
--output-dir
```
指定目录。其他URL仅列出不下载。
轮询时按
```
Ctrl-C
```
会发送
```
POST .../requests/<id>/cancel
```
请求，避免为已停止的GPU使用付费。

What this skill is not

本技能不包含的内容

Not a direct OpenAI API client. Not a capability grant — depends on a working RunComfy account. Not multi-tenant.

并非直接的OpenAI API客户端。不提供能力授权——依赖可用的RunComfy账号。不支持多租户。

Security & Privacy

安全与隐私

Token storage:
```
runcomfy login
```
writes the API token to
```
~/.config/runcomfy/token.json
```
with mode 0600 (owner-only read/write). Set
```
RUNCOMFY_TOKEN
```
env var to bypass the file entirely in CI / containers.
Input boundary: the user prompt is passed as a JSON string to the CLI via
```
--input
```
. The CLI does NOT shell-expand the prompt; it transmits the JSON body directly to the Model API over HTTPS. No shell injection surface from prompt content.
Third-party content: image / mask / video URLs you pass are fetched by the RunComfy model server, not by the CLI on your machine. Treat external URLs as untrusted; image-based prompt injection is a known risk for any image-edit / video-edit model.
Outbound endpoints: only
```
model-api.runcomfy.net
```
(request submission) and
```
*.runcomfy.net
```
/
```
*.runcomfy.com
```
(download whitelist for generated outputs). No telemetry, no callbacks.
Generated-file size cap: the CLI aborts any single download > 2 GiB to prevent disk-fill from a malicious or runaway model output.

令牌存储：
```
runcomfy login
```
会将API令牌写入
```
~/.config/runcomfy/token.json
```
，权限为0600（仅所有者可读写）。在CI/容器环境中可设置
```
RUNCOMFY_TOKEN
```
环境变量完全跳过文件存储。
输入边界：用户提示词通过
```
--input
```
以JSON字符串形式传递给CLI。CLI不会对提示词进行shell扩展；会直接通过HTTPS将JSON请求体传输给模型API。提示词内容不存在shell注入风险。
第三方内容：你传入的图片/蒙版/视频URL由RunComfy模型服务器获取，而非本地CLI。请将外部URL视为不可信；基于图像的提示注入是所有图像/视频编辑模型的已知风险。
出站端点：仅访问
```
model-api.runcomfy.net
```
（请求提交）和
```
*.runcomfy.net
```
/
```
*.runcomfy.com
```
（生成输出的下载白名单）。无遥测，无回调。
生成文件大小限制：CLI会终止任何单个超过2 GiB的下载，防止恶意或异常模型输出占满磁盘。

gpt-image-2

Original

Translation

🎨 GPT Image 2 — Pro Pack on RunComfy

🎨 GPT Image 2 — RunComfy专业包

When to pick this model (vs siblings)

何时选择该模型（对比同类模型）

Prerequisites

前置条件

Endpoints + input schema

端点及输入 schema

`openai/gpt-image-2/text-to-image`

`openai/gpt-image-2/text-to-image`

`openai/gpt-image-2/edit`

`openai/gpt-image-2/edit`

How to invoke

调用方式

Prompting — what actually works

提示词技巧——有效方法

Where it shines

适用场景

Sample prompts (verified to produce strong results)

验证有效的示例提示词

Limitations

局限性

Exit codes

退出码

How it works

工作原理

What this skill is not

本技能不包含的内容

Security & Privacy

安全与隐私