gpt-image-edit

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

🎨 GPT Image Edit — Pro Pack on RunComfy

🎨 GPT Image Edit — RunComfy专业套件

OpenAI GPT Image 2 —
/edit
endpoint
(ChatGPT Images 2.0 image-to-image) on the RunComfy Model API. Strongest in its class at preserving identity through targeted edits and rewriting embedded text in any script (Latin, kana, CJK, Cyrillic, Arabic).
bash
npx skills add agentspace-so/runcomfy-skills --skill gpt-image-edit -g
RunComfy模型API上调用OpenAI GPT Image 2的
/edit
端点
(ChatGPT Images 2.0图生图功能)。该模型在同类工具中表现突出,可通过针对性编辑保留主体特征,并能改写图片中任意文字脚本(拉丁语、假名、中日韩文字、西里尔文、阿拉伯文)。
bash
npx skills add agentspace-so/runcomfy-skills --skill gpt-image-edit -g

When to pick this model (vs siblings)

何时选择该模型(对比同类工具)

You wantUse
Edit multilingual / embedded text in imageGPT Image Edit
Identity preservation through translated headline variantsGPT Image Edit
Layout-precise edit (move headline, swap CTA, etc.)GPT Image Edit
Up to 10 reference imagesGPT Image Edit
Batch up to 20 images consistentlyNano Banana Edit
Single-shot precise local edit, source-fidelity-firstFlux Kontext
Generate from scratch with GPT Image 2sibling
gpt-image-2
skill
Batch SKU galleries with stable identityNano Banana Edit
需求场景推荐工具
编辑图片中的多语言/嵌入文本GPT Image Edit
在替换标题为其他语言版本时保留主体特征GPT Image Edit
布局精准编辑(移动标题、替换CTA按钮等)GPT Image Edit
最多支持10张参考图GPT Image Edit
批量处理最多20张图片且保持一致性Nano Banana Edit
单次精准局部编辑,优先保证与原图一致性Flux Kontext
使用GPT Image 2从头生成图片姊妹技能
gpt-image-2
批量处理SKU图库并保持主体特征稳定Nano Banana Edit

Prerequisites

前置条件

  1. RunComfy CLI
    npm i -g @runcomfy/cli
  2. RunComfy account
    runcomfy login
    opens a browser device-code flow.
  3. CI / containers — set
    RUNCOMFY_TOKEN=<token>
    instead of
    runcomfy login
    .
  1. RunComfy CLI — 执行
    npm i -g @runcomfy/cli
    安装
  2. RunComfy账号 — 执行
    runcomfy login
    会打开浏览器设备码登录流程
  3. CI/容器环境 — 设置环境变量
    RUNCOMFY_TOKEN=<token>
    替代
    runcomfy login
    登录

Endpoints + input schema

端点与输入Schema

openai/gpt-image-2/edit

openai/gpt-image-2/edit

FieldTypeRequiredDefaultNotes
prompt
stringyesEdit instruction. Lead with preservation, end with the change.
images
string[]yesUp to 10 publicly-fetchable HTTPS URLs. First is primary; rest are auxiliary.
size
enumno
auto
auto
(preserve input),
1024_1024
(1:1),
1024_1536
(2:3 portrait),
1536_1024
(3:2 landscape).
size=auto
preserves the input ratio — strongly recommended unless the edit explicitly changes framing.
字段类型是否必填默认值说明
prompt
字符串编辑指令。先说明需要保留的内容,再描述修改需求
images
字符串数组最多10个可公开访问的HTTPS URL。第一个为主要参考图,其余为辅助参考图
size
枚举值
auto
auto
(保留原图比例)、
1024_1024
(1:1)、
1024_1536
(2:3竖屏)、
1536_1024
(3:2横屏)
size=auto
会保留原图比例——除非编辑需求明确要改变画幅,否则强烈推荐使用该值。

How to invoke

调用方式

Single-ref preservation edit:
bash
runcomfy run openai/gpt-image-2/edit \
  --input '{
    "prompt": "Keep the person'\''s face, pose, and brand mark unchanged. Replace the background with a soft warm-grey studio sweep and a gentle floor shadow.",
    "images": ["https://.../portrait.jpg"]
  }' \
  --output-dir <absolute/path>
Multilingual text rewrite (preserve everything except the headline):
bash
runcomfy run openai/gpt-image-2/edit \
  --input '{
    "prompt": "Keep the photograph, layout, and brand mark exactly as in the input. Replace only the in-image headline. The new headline reads \"今日のおすすめ\" in bold Japanese kana, same position and font weight as before.",
    "images": ["https://.../poster-en.jpg"]
  }' \
  --output-dir <absolute/path>
Multi-ref composition:
bash
runcomfy run openai/gpt-image-2/edit \
  --input '{
    "prompt": "Compose subject from image 1 into the room from image 2. Match the lighting and color palette of image 2. Keep image 1 subject identity (face, pose, clothing) unchanged.",
    "images": ["https://.../subject.jpg", "https://.../room.jpg"]
  }' \
  --output-dir <absolute/path>
单参考图保留式编辑:
bash
runcomfy run openai/gpt-image-2/edit \
  --input '{
    "prompt": "Keep the person'\''s face, pose, and brand mark unchanged. Replace the background with a soft warm-grey studio sweep and a gentle floor shadow.",
    "images": ["https://.../portrait.jpg"]
  }' \
  --output-dir <absolute/path>
多语言文本改写(仅替换标题,保留其他所有内容):
bash
runcomfy run openai/gpt-image-2/edit \
  --input '{
    "prompt": "Keep the photograph, layout, and brand mark exactly as in the input. Replace only the in-image headline. The new headline reads \"今日のおすすめ\" in bold Japanese kana, same position and font weight as before.",
    "images": ["https://.../poster-en.jpg"]
  }' \
  --output-dir <absolute/path>
多参考图合成:
bash
runcomfy run openai/gpt-image-2/edit \
  --input '{
    "prompt": "Compose subject from image 1 into the room from image 2. Match the lighting and color palette of image 2. Keep image 1 subject identity (face, pose, clothing) unchanged.",
    "images": ["https://.../subject.jpg", "https://.../room.jpg"]
  }' \
  --output-dir <absolute/path>

Prompting — what actually works

有效提示词技巧

Lead with preservation goals. Always:
"Keep [face / pose / clothing / brand / framing] unchanged."
Then state the change. The model honors what's stated up front.
Multilingual text — quote the characters, name the script.
"the headline reads \"コーヒー\" in bold Japanese kana"
,
"the label says \"АРОМА\" in Cyrillic, white on black"
,
"the right-margin caption reads \"تخفيض\" in Arabic right-to-left"
. Don't paraphrase — quote.
Directional language for spatial edits. Concrete spatial scopes work:
"move the headline from top-right to bottom-center"
,
"remove the leftmost object only"
,
"replace the watermark in the bottom-right corner"
.
Multi-ref numbering. When passing multiple
images
, refer to them by number:
"subject from image 1, lighting from image 2, color palette from image 3"
. The model routes cues correctly.
Use
size: "auto"
to preserve input ratio.
Only override when the edit explicitly changes framing (e.g. cropping a 16:9 to 1:1).
Anti-patterns:
  • Long compound edit instructions ("change A and B and C and D") → drift increases per added scope.
  • Missing preservation goals → model subtly rewrites the face / brand / framing.
  • Paraphrasing in-image text instead of quoting it → text comes out different.
  • Asking for
    size
    outside the 3 fixed values +
    auto
    → 422.
先明确保留目标。务必遵循:
"保留[面部/姿势/服装/品牌/画幅]不变。"
然后再说明修改内容。模型会优先执行开头明确的保留要求。
多语言文本——直接引用字符并标注脚本类型。例如:
"标题改为粗体日文假名\"コーヒー\""
"标签改为西里尔文\"АРОМА\",黑底白字"
"右侧边栏说明改为阿拉伯文\"تخفيض\",从右到左排版"
。不要意译,直接引用原文。
空间编辑使用精准方位词。具体的空间描述更有效:
"将标题从右上角移至底部中央"
"仅移除最左侧的物体"
"替换右下角的水印"
多参考图按编号指代。当传入多张
images
时,按编号引用:
"使用图1的主体,图2的光线,图3的调色板"
。模型会正确对应各参考图的提示信息。
使用
size: "auto"
保留原图比例
。仅当编辑需求明确改变画幅时才覆盖该值(例如将16:9裁剪为1:1)。
避坑指南:
  • 冗长的复合编辑指令(“修改A、B、C和D”)→ 每增加一个修改项,结果偏移的概率就会上升
  • 未明确保留目标→ 模型会轻微修改面部/品牌/画幅
  • 意译图片内文本而非直接引用→ 生成的文本会与预期不符
  • 请求的
    size
    不在3个固定值+
    auto
    范围内→ 返回422错误

Where it shines

优势场景

Use caseWhy GPT Image Edit
Multilingual ad localizationOne source asset → many language variants of the same headline
Brand-safe headline / CTA swapsLayout precision + preservation language hold the rest stable
Multi-ref composition (subject from one, scene from another)Numbered refs route cues correctly
Layout-precise repositioningDirectional language ("top-right to bottom-center") honored
Identity preservation across signage editsStrongest in class for face / brand preservation through targeted edits
使用场景选择GPT Image Edit的原因
多语言广告本地化一份源素材→生成多种语言版本的标题
品牌合规的标题/CTA替换布局精准+保留性语言可确保其他元素稳定不变
多参考图合成(主体来自一张图,场景来自另一张图)编号引用可准确传递各参考图的提示信息
布局精准重定位方位词指令(“右上角移至底部中央”)会被准确执行
标识编辑时保留主体特征在同类工具中,通过针对性编辑保留面部/品牌特征的表现最佳

Sample prompts (verified to produce strong results)

验证有效的示例提示词

Background swap with full preservation (page example):
Turn the background into a bright minimal white-to-soft-gray studio
sweep with gentle floor shadow; add a large headline in-image that
reads "OPEN STUDIO" in a bold clean sans-serif, high contrast, centered;
keep the main person or product, pose, and face identity unchanged
Multilingual variant:
Keep the photograph, layout, lighting, and brand mark exactly as in the
input. Replace only the in-image headline.
The new headline reads "コーヒー" in bold Japanese kana, same position
and font weight as before.
Multi-ref composition:
Compose subject from image 1 into the kitchen from image 2.
Match the warm window light and color palette of image 2.
Keep subject identity (face, pose, clothing) from image 1 unchanged.
保留主体的背景替换(页面示例):
Turn the background into a bright minimal white-to-soft-gray studio
sweep with gentle floor shadow; add a large headline in-image that
reads "OPEN STUDIO" in a bold clean sans-serif, high contrast, centered;
keep the main person or product, pose, and face identity unchanged
多语言版本:
Keep the photograph, layout, lighting, and brand mark exactly as in the
input. Replace only the in-image headline.
The new headline reads "コーヒー" in bold Japanese kana, same position
and font weight as before.
多参考图合成:
Compose subject from image 1 into the kitchen from image 2.
Match the warm window light and color palette of image 2.
Keep subject identity (face, pose, clothing) from image 1 unchanged.

Limitations

局限性

  • size
    : 3 fixed values +
    auto
    — anything else 422s.
  • images
    : up to 10
    — first is primary, rest are auxiliary cues.
  • Long compound prompts drift — split into multiple passes when needed.
  • For batch consistency across many SKU images, Nano Banana Edit (up to 20) is better.
  • Photorealism on portraits — Nano Banana Pro wins head-to-head.
  • size
    仅支持3个固定值+
    auto
    ——其他值会返回422错误
  • images
    最多支持10张
    ——第一张为主要参考图,其余为辅助提示图
  • 冗长复合提示词会导致结果偏移——必要时拆分为多次编辑
  • 如需批量处理大量SKU图片并保持一致性,Nano Banana Edit(最多20张)更合适
  • 人像照片真实度——Nano Banana Pro表现更优

Exit codes

退出码

codemeaning
0success
64bad CLI args
65bad input JSON / schema mismatch
69upstream 5xx
75retryable: timeout / 429
77not signed in or token rejected
代码含义
0成功
64CLI参数错误
65输入JSON错误/Schema不匹配
69上游服务5xx错误
75可重试:超时/429限流
77未登录或令牌被拒绝

How it works

工作原理

The skill invokes
runcomfy run openai/gpt-image-2/edit
with a JSON body matching the schema. The CLI POSTs to
https://model-api.runcomfy.net/v1/models/openai/gpt-image-2/edit
, polls the request, fetches the result, and downloads any
.runcomfy.net
/
.runcomfy.com
URL into
--output-dir
.
Ctrl-C
cancels the remote request before exit.
该技能通过符合Schema的JSON请求体调用
runcomfy run openai/gpt-image-2/edit
。CLI会向
https://model-api.runcomfy.net/v1/models/openai/gpt-image-2/edit
发送POST请求,轮询请求状态,获取结果,并将所有
.runcomfy.net
/
.runcomfy.com
域名的URL下载至
--output-dir
指定目录。按
Ctrl-C
会在退出前取消远程请求。

Security & Privacy

安全与隐私

  • Token storage:
    runcomfy login
    writes the API token to
    ~/.config/runcomfy/token.json
    with mode 0600 (owner-only read/write). Set
    RUNCOMFY_TOKEN
    env var to bypass the file entirely in CI / containers.
  • Input boundary: the user prompt is passed as a JSON string to the CLI via
    --input
    . The CLI does NOT shell-expand the prompt; it transmits the JSON body directly to the Model API over HTTPS. No shell injection surface from prompt content.
  • Third-party content: image / mask / video URLs you pass are fetched by the RunComfy model server, not by the CLI on your machine. Treat external URLs as untrusted; image-based prompt injection is a known risk for any image-edit / video-edit model.
  • Outbound endpoints: only
    model-api.runcomfy.net
    (request submission) and
    *.runcomfy.net
    /
    *.runcomfy.com
    (download whitelist for generated outputs). No telemetry, no callbacks.
  • Generated-file size cap: the CLI aborts any single download > 2 GiB to prevent disk-fill from a malicious or runaway model output.
  • 令牌存储
    runcomfy login
    会将API令牌写入
    ~/.config/runcomfy/token.json
    ,权限为0600(仅所有者可读写)。在CI/容器环境中,可设置环境变量
    RUNCOMFY_TOKEN
    绕过文件存储方式。
  • 输入边界:用户提示词通过
    --input
    以JSON字符串形式传递给CLI。CLI不会对提示词进行Shell扩展,而是直接将JSON请求体通过HTTPS传输至模型API。提示词内容不存在Shell注入风险。
  • 第三方内容:你传入的图片/遮罩/视频URL由RunComfy模型服务器获取,而非本地CLI。请将外部URL视为不可信来源;基于图片的提示词注入是所有图片/视频编辑模型的已知风险。
  • 出站端点:仅与
    model-api.runcomfy.net
    (请求提交)和
    *.runcomfy.net
    /
    *.runcomfy.com
    (生成结果下载白名单)通信。无遥测数据,无回调请求。
  • 生成文件大小限制:CLI会终止任何超过2 GiB的单个文件下载,防止恶意或异常模型输出占满磁盘空间。