image-generation-enhanced

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Image Generation

图像生成

Use this skill when an agent needs to generate or edit images.

当Agent需要生成或编辑图像时使用本技能。

Prerequisites

前置要求

OpenRouter API key — Image generation models are accessed via OpenRouter. You need an OpenRouter API key to use this skill.

This skill is about getting better images, not about any single tool. If the environment already provides a nimage tool, SDK, or wrapper, use that. If the agent has an OpenRouter API key but has not been given another image tool, minibanana is a good lightweight place to start.

This skill is intentionally biased toward image quality rather than minimal prompt length. The main idea is simple:

Do not throw keyword soup at the model. Direct the image like a creative director.

OpenRouter API 密钥 — 图像生成模型通过OpenRouter访问，使用本技能需要你有一个OpenRouter API密钥。

本技能的核心是生成质量更优的图像，而非绑定某个单一工具。如果环境已经提供了原生图像工具、SDK或封装包，优先使用这些。如果Agent有OpenRouter API密钥但没有其他可用的图像工具，minibanana是很好的轻量入门选择。

本技能刻意偏向图像质量优先而非最短提示词长度，核心思路非常简单：

不要给模型投喂杂乱的关键词堆砌内容，要像创意总监一样主导图像创作。

What this skill is for

适用场景

Text-to-image generation from scratch
Reference-guided image generation
Image editing and compositing
Product shots, posters, key art, concept art, social creatives, mockups
Images that require deliberate control over composition, lighting, materials, camera, or text rendering

从零开始的文生图创作
参考图引导的图像生成
图像编辑与合成
产品图、海报、主视觉图、概念图、社交素材、原型稿
需要对构图、光线、材质、镜头、文字渲染进行精准控制的图像

Core principles

核心原则

Prompt quality is usually the biggest lever. Better prompting often matters more than tiny model changes.

Lead with intent. Start with a strong verb such as

create

render

photograph

illustrate

design

edit

transform

replace

, or

remove

Be specific about the visible result. Subject, scene, composition, lighting, style, and materials should be explicit.
Prefer positive framing. Describe what should appear, not only what should be excluded.
Use references deliberately. Assign each reference image a role.
Iterate surgically. After each attempt, change the fewest prompt parts necessary.
Use structured prompting for complex jobs. Hybrid prompts with JSON often work very well for multi-element scenes and precise edits.

提示词质量通常是最大的影响因素，优质的提示词往往比细微的模型调整效果更显著。
以意图开头，使用强动作动词开头，例如
```
create
```
、
```
render
```
、
```
photograph
```
、
```
illustrate
```
、
```
design
```
、
```
edit
```
、
```
transform
```
、
```
replace
```
、
```
remove
```
等。
明确描述可见的输出结果，主体、场景、构图、光线、风格、材质都需要明确说明。
优先使用正向描述，描述应该出现的内容，而不仅仅是要排除的内容。
有意识地使用参考图，给每张参考图分配明确的作用。
精准迭代，每次尝试后，仅修改提示词中必要的最少内容。
复杂任务使用结构化提示，混合JSON的提示词对于多元素场景和精准编辑效果非常好。

Tooling options

工具选项

Use whatever interface the environment already gives you:

an existing image tool
a direct SDK or HTTP client
a local wrapper already available in the repo
minibanana
as a lightweight fallback when the agent has an OpenRouter API key but no other image tool

If you do use

minibanana

, first inspect the current CLI behavior in the environment:

bash

minibanana --help

Minimal example:

bash

minibanana --prompt "A friendly whale" --model "bytedance-seed/seedream-4.5" --out image.png

Practical defaults when using

minibanana

Prefer PNG for edits, graphics, typography, diagrams, and anything sensitive to JPEG artifacts.
Prefer JPEG when file size matters more than lossless output.
Put any non-trivial prompt in a file and pass it with e.g.
```
--prompt @prompt.md
```
.
Use repeated image inputs when you need references, and assign each one a role in the prompt.

优先使用环境已提供的任意接口：

现有图像工具
直接的SDK或HTTP客户端
代码库中已有的本地封装工具
当Agent有OpenRouter API密钥但没有其他图像工具时，minibanana
作为轻量 fallback 选项

如果你使用

minibanana

，首先在环境中检查当前CLI的功能：

bash

minibanana --help

最简示例：

bash

minibanana --prompt "A friendly whale" --model "bytedance-seed/seedream-4.5" --out image.png

使用

minibanana

的实用默认配置：

编辑内容、图形、排版、图表以及对JPEG伪影敏感的内容优先使用 PNG 格式。
当文件大小比无损输出更重要时优先使用 JPEG 格式。
把所有非极简的提示词放在文件中，通过
```
--prompt @prompt.md
```
这类方式传入。
需要参考图时多次传入图像输入，并在提示词中为每张图分配作用。

The working loop

工作流程

Always use this loop:

Classify the task
- text-to-image
- reference-guided generation
- edit / inpaint / remove / replace
- typography-heavy graphic
- diagram / infographic
Choose the aspect ratio from the job
- ```
1:1
```
  for avatars, tiles, concept squares
- ```
4:5
```
  for social posts and product ads
- ```
9:16
```
  for stories, reels, wallpapers
- ```
16:9
```
  for cinematic scenes, thumbnails, hero images
- ```
21:9
```
  for banners and panorama-style scenes
Choose a model that fits the job
- use the user's requested model if they named one
- otherwise prefer a model family suited to the task
Write the first prompt with strong direction
- narrative prompt for simple scenes
- hybrid narrative + JSON for complex scenes
- editing prompt with explicit invariants for image edits
Generate and inspect the image
- if the model produces multiple output images, review every single output before deciding which one(s) to use — do not just pick the first image
- verify subject
- verify composition
- verify lighting
- verify materials/textures
- verify text rendering if present
- verify whether the image actually matches the intended use case
Revise surgically
- if composition is wrong, change composition language first
- if mood is wrong, change lighting and palette first
- if anatomy is wrong, simplify pose and framing
- if text is wrong, shorten it and make it more explicit

始终遵循以下流程：

任务分类
- 文生图
- 参考图引导生成
- 编辑/局部重绘/移除/替换
- 重排版的图形内容
- 图表/信息图
根据任务选择宽高比
- ```
1:1
```
  适用于头像、素材块、方形概念图
- ```
4:5
```
  适用于社交帖子和产品广告
- ```
9:16
```
  适用于故事、短视频、壁纸
- ```
16:9
```
  适用于电影感场景、缩略图、首屏大图
- ```
21:9
```
  适用于横幅和全景风格场景
选择适配任务的模型
- 如果用户指定了模型则使用用户要求的模型
- 否则优先选择适配任务的模型系列
撰写方向明确的初始提示词
- 简单场景使用叙事类提示词
- 复杂场景使用叙事+JSON混合提示词
- 图像编辑使用包含明确不变量的编辑提示词
生成并检查图像
- 如果模型生成了多张输出图像，决定使用哪张前逐一检查所有输出，不要直接选第一张
- 确认主体是否正确
- 确认构图是否正确
- 确认光线是否正确
- 确认材质/纹理是否正确
- 如果有文字则确认文字渲染是否正确
- 确认图像是否符合预期使用场景
精准修改迭代
- 如果构图不对，优先修改构图相关描述
- 如果氛围不对，优先修改光线和调色相关描述
- 如果结构不对，简化姿势和取景
- 如果文字不对，缩短文字并更明确地说明

Model selection heuristics

模型选择启发式规则

Use the model the user explicitly asks for unless there is a strong reason not to. When choosing yourself, use simple task-based heuristics.

除非有强烈的反对理由，否则使用用户明确要求的模型。自行选择时使用简单的任务适配规则。

Good current starting points

目前优秀的入门选择

bytedance-seed/seedream-4.5
- strong visual aesthetics
- improved editing consistency
- good portrait refinement
- stronger small-text rendering than many image models
google/gemini-3-pro-image-preview
- strong multi-image reasoning and blending
- strong text rendering
- strong identity preservation across multiple subjects
- useful for storyboards, composites, and complex scene design
google/gemini-3.1-flash-image-preview
- strong quality-to-speed tradeoff
- good for iterative editing and quick refinement
- supports extended aspect ratios on OpenRouter
openai/gpt-5-image
- strong instruction following
- strong text rendering
- strong detailed editing
sourceful/riverflow-v2-pro
and sourceful/riverflow-v2-fast
- strong text rendering and graphics-oriented work
- useful when Sourceful-specific font inputs or super-resolution references are relevant

bytedance-seed/seedream-4.5
- 优秀的视觉审美
- 编辑一致性更高
- 人像优化效果好
- 小字渲染能力比多数图像模型更强
google/gemini-3-pro-image-preview
- 优秀的多图推理和融合能力
- 优秀的文字渲染能力
- 多主体身份保留能力强
- 适合分镜、合成和复杂场景设计
google/gemini-3.1-flash-image-preview
- 质量和速度的平衡表现优秀
- 适合迭代编辑和快速优化
- 在OpenRouter上支持扩展宽高比
openai/gpt-5-image
- 指令遵循能力强
- 文字渲染能力优秀
- 精细化编辑能力强
sourceful/riverflow-v2-pro
和 sourceful/riverflow-v2-fast
- 文字渲染和图形类工作表现优秀
- 适合需要Sourceful特定字体输入或超分辨率参考的相关场景

Model choice by task

按任务选择模型

Fast iteration / concept exploration: a faster image model is usually enough
High-fidelity editorial / concept art: favor models known for aesthetics and composition quality
Precise edits / multi-image composition: favor models known for multimodal reasoning
Typography-heavy posters / ads / infographics: favor models with strong text rendering

快速迭代/概念探索：通常选择速度更快的图像模型即可
高保真编辑/概念艺术：优先选择审美和构图质量出色的模型
精准编辑/多图合成：优先选择多模态推理能力出色的模型
重排版海报/广告/信息图：优先选择文字渲染能力出色的模型

Prompt format: choose the right shape

提示词格式：选择合适的形式

Use a plain narrative prompt when

以下场景使用纯叙事提示词

the scene is simple
you want fast ideation
you are exploring style directions loosely
the image does not depend on many separate constraints

场景简单
需要快速构思
宽松地探索风格方向
图像不依赖多个独立约束条件

Use a hybrid prompt when

以下场景使用混合提示词

the request is high stakes
multiple subjects or layers matter
references have distinct roles
there is product, brand, or material specificity
composition must be tightly controlled
text rendering matters
you expect to iterate and patch only specific parts later

Best default for serious work:

one or two natural-language sentences that state the visual goal clearly
a structured JSON block that encodes the exact specification

This often works better than raw JSON alone because the natural-language lead establishes the overall intent, while the JSON gives the model a stable structure.

需求重要性高
多个主体或图层要求明确
参考图有不同的作用
有产品、品牌或材质的特定要求
构图需要严格控制
文字渲染很重要
后续需要迭代仅修改特定部分

严肃工作的最佳默认格式：

一到两句自然语言句子清晰说明视觉目标
结构化JSON块编码精确的规格要求

这种格式通常比纯JSON效果更好，因为开头的自然语言确定了整体意图，而JSON给模型提供了稳定的结构。

Prompt anatomy

提示词结构

A strong image prompt usually covers these elements in roughly this order:

Operation — create, render, photograph, illustrate, edit, replace, remove, transform
Primary subject — who or what the image is really about
Action / pose / state — what the subject is doing
Environment / context — where the scene takes place
Composition — shot type, framing, angle, focal point, depth layers
Lighting — source, direction, softness, contrast, time of day, atmosphere
Style / medium — editorial photo, fantasy realism, watercolor, 3D render, film still
Materials / textures — leather, moss-covered stone, polished chrome, tweed, fog, wet pavement
Color palette / grading — warm neutrals, muted teal, rich contrast, desaturated sci-fi
Text instructions — exact text, font feel, placement, line count, language
Constraints — aspect ratio, realism level, keep background, preserve identity, no motion blur
Negative prompt — only targeted artifact suppression, not a giant trash list

一个优质的图像提示词通常按以下顺序覆盖这些元素：

操作类型 — create, render, photograph, illustrate, edit, replace, remove, transform
核心主体 — 图像的核心聚焦对象
动作/姿势/状态 — 主体正在做什么
环境/上下文 — 场景发生的地点
构图 — 镜头类型、取景、角度、焦点、景深层次
光线 — 光源、方向、柔硬度、对比度、时间、氛围
风格/媒介 — 编辑摄影、奇幻写实、水彩、3D渲染、电影截图
材质/纹理 — 皮革、覆满苔藓的石头、抛光铬合金、粗花呢、雾、潮湿路面
调色/配色 — 暖中性色、柔和蓝绿色、高对比度、低饱和科幻风
文字要求 — 精确文字内容、字体风格、位置、行数、语言
约束条件 — 宽高比、真实度等级、保留背景、保留身份、无动态模糊
负面提示词 — 仅针对性抑制伪影，不要冗长的无关列表

The simplest good formula

最简单的优质公式

For text-to-image without references:

text

[Verb] + [Subject] + [Action] + [Location/context] + [Composition] + [Lighting] + [Style] + [Materials/details] + [Output/use-case]

Example:

text

Create a high-end editorial fashion portrait of a confident model wearing a tailored brown dress, sleek boots, and a structured handbag, standing in a statuesque pose against a seamless deep cherry-red studio backdrop. Medium-full shot, center-framed, photographed at a low three-quarter angle with soft cinematic key light and subtle rim light. Shot like a luxury fashion magazine campaign on medium-format analog film with pronounced grain, rich color, and realistic fabric texture.

无参考图的文生图场景：

text

[动词] + [主体] + [动作] + [位置/上下文] + [构图] + [光线] + [风格] + [材质/细节] + [输出/使用场景]

示例：

text

Create a high-end editorial fashion portrait of a confident model wearing a tailored brown dress, sleek boots, and a structured handbag, standing in a statuesque pose against a seamless深樱桃红色studio backdrop. Medium-full shot, center-framed, photographed at a low three-quarter angle with soft cinematic key light and subtle rim light. Shot like a luxury fashion magazine campaign on medium-format analog film with pronounced grain, rich color, and realistic fabric texture.

Prompt like a creative director

像创意总监一样写提示词

Do not stop at nouns. Control the scene deliberately.

不要只写名词，要有意识地控制整个场景。

1. Direct the composition

1. 控制构图

Use terms such as:

extreme close-up
close-up
medium shot
medium-full shot
wide shot
aerial view
top-down
low angle
high angle
over-the-shoulder
symmetrical framing
centered composition
rule-of-thirds placement
foreground / midground / background layers

使用以下术语：

极端特写
特写
中景
中全景
全景
航拍视角
俯视
低角度
高角度
过肩视角
对称取景
居中构图
三分法布局
前景/中景/背景层次

2. Direct the camera and lens

2. 控制相机和镜头

Useful language:

```
24mm wide-angle
```
for environmental scale
```
35mm
```
for natural cinematic framing
```
50mm
```
for neutral human perspective
```
85mm portrait lens
```
for flattering compression
```
macro lens
```
for product and texture detail
```
shallow depth of field (f/1.8)
```
for subject separation
```
deep focus
```
for detailed environments

实用表述：

```
24mm 广角
```
体现环境尺度
```
35mm
```
自然电影感构图
```
50mm
```
中立人类视角
```
85mm 人像镜头
```
讨喜的压缩效果
```
微距镜头
```
展示产品和纹理细节
```
浅景深 (f/1.8)
```
突出主体分离度
```
深焦
```
展示环境细节

3. Direct the lighting

3. 控制光线

Useful language:

golden hour backlight
overcast daylight
soft studio softbox lighting
three-point lighting
harsh chiaroscuro lighting
volumetric god rays
fog diffusion
neon edge lighting
candlelit interior
wet-surface reflections

实用表述：

黄金 hour 逆光
阴天日光
柔光箱柔光灯
三点打光
强烈的明暗对比打光
体积感丁达尔光
雾漫射光
霓虹轮廓光
烛光室内光
湿表面反射光

4. Direct materiality

4. 控制材质

This is one of the most underused quality levers. Name surfaces and imperfections.

Examples:

weathered, cracked, moss-covered stone
navy blue tweed with visible weave
brushed aluminum with subtle micro-scratches
rain-slick asphalt reflecting signage
matte ceramic with fine glaze variation
worn leather with creases and patina

这是最被低估的质量提升杠杆，明确说明表面和瑕疵。

示例：

风化、开裂、覆满苔藓的石头
带有清晰织纹的海军蓝粗花呢
带有细微划痕的拉丝铝
反射着标识的湿滑沥青
带有细微釉面差异的哑光陶瓷
带有褶皱和使用痕迹的旧皮革

5. Direct the color story

5. 控制色彩方案

Examples:

muted teal and amber cinematic grading
warm monochrome neutrals
deep emerald and gold accents
pastel candy palette with soft bloom
high-contrast black-and-white with silver halation

示例：

柔和蓝绿和琥珀色电影调色
暖单色中性色
深祖母绿和金色点缀
柔和弥散的马卡龙糖果色
带银边光晕的高对比度黑白

JSON prompting for high-control jobs

高控制度任务的JSON提示词

For complex or high-fidelity prompts, use a structured block. This is especially useful for:

complex scenes with multiple subjects
product shots with precise materials and layout
image edits with clear preserve/change rules
typography-heavy compositions
iterative workflows where you need to patch one section later

对于复杂或高保真的提示词，使用结构化块，尤其适合以下场景：

多主体的复杂场景
材质和布局精准的产品图
有明确保留/修改规则的图像编辑
重排版的构图
后续需要修改某一部分的迭代工作流

Hybrid JSON template

混合JSON模板

Create a polished, high-fidelity image that feels intentional, cinematic, and production-ready.

```json
{
  "task": "text-to-image",
  "goal": "one-sentence visual objective",
  "subject": {
    "primary": "main subject",
    "secondary": ["supporting elements"]
  },
  "scene": {
    "location": "where it takes place",
    "time": "time of day",
    "weather": "weather or atmosphere",
    "story_moment": "what moment is being depicted"
  },
  "composition": {
    "framing": "wide shot / portrait / macro / etc.",
    "camera_angle": "low angle / eye level / aerial / etc.",
    "focus": "what the eye should land on first",
    "depth_layers": ["foreground", "midground", "background"]
  },
  "lighting": {
    "primary": "main light source",
    "secondary": ["secondary light sources"],
    "mood": "desired emotional tone"
  },
  "style": {
    "genre": "photorealistic / fantasy realism / 3D / watercolor / etc.",
    "visual_aesthetic": ["keywords"],
    "rendering": {
      "detail_level": "high",
      "sharpness": "high",
      "dynamic_range": "wide"
    }
  },
  "materials_and_textures": {
    "subject": "surface/material notes",
    "environment": "environment texture notes"
  },
  "color_palette": {
    "dominant": ["main colors"],
    "accents": ["accent colors"]
  },
  "text_rendering": {
    "enabled": false,
    "text": [],
    "placement": "",
    "style": ""
  },
  "technical_preferences": {
    "aspect_ratio": "16:9",
    "lens": "35mm",
    "depth_of_field": "moderate",
    "realism": "high"
  },
  "negative_prompt": ["blurry image", "flat lighting", "text or watermark"]
}
```

Create a polished, high-fidelity image that feels intentional, cinematic, and production-ready.

```json
{
  "task": "text-to-image",
  "goal": "one-sentence visual objective",
  "subject": {
    "primary": "main subject",
    "secondary": ["supporting elements"]
  },
  "scene": {
    "location": "where it takes place",
    "time": "time of day",
    "weather": "weather or atmosphere",
    "story_moment": "what moment is being depicted"
  },
  "composition": {
    "framing": "wide shot / portrait / macro / etc.",
    "camera_angle": "low angle / eye level / aerial / etc.",
    "focus": "what the eye should land on first",
    "depth_layers": ["foreground", "midground", "background"]
  },
  "lighting": {
    "primary": "main light source",
    "secondary": ["secondary light sources"],
    "mood": "desired emotional tone"
  },
  "style": {
    "genre": "photorealistic / fantasy realism / 3D / watercolor / etc.",
    "visual_aesthetic": ["keywords"],
    "rendering": {
      "detail_level": "high",
      "sharpness": "high",
      "dynamic_range": "wide"
    }
  },
  "materials_and_textures": {
    "subject": "surface/material notes",
    "environment": "environment texture notes"
  },
  "color_palette": {
    "dominant": ["main colors"],
    "accents": ["accent colors"]
  },
  "text_rendering": {
    "enabled": false,
    "text": [],
    "placement": "",
    "style": ""
  },
  "technical_preferences": {
    "aspect_ratio": "16:9",
    "lens": "35mm",
    "depth_of_field": "moderate",
    "realism": "high"
  },
  "negative_prompt": ["blurry image", "flat lighting", "text or watermark"]
}
```

Guidance for structured prompting

结构化提示词指南

Put the most important constraints first.
Keep the hierarchy clean.
Avoid contradictory instructions like
```
minimalist
```
plus
```
highly cluttered
```
.
Keep negative prompts targeted and relevant.
If the prompt becomes too long and starts failing, shorten it to the essential visual hierarchy.

最重要的约束条件放在最前面。
保持层级清晰。
避免矛盾的指令，例如同时要求
```
minimalist
```
（极简）和
```
highly cluttered
```
（极度拥挤）。
负面提示词保持针对性和相关性。
如果提示词太长开始失效，精简到核心的视觉层级即可。

Reference-image prompting

参考图提示词

Use

--in

for any reference images.

所有参考图使用

--in

传入。

Best practice: assign each input image a role

最佳实践：为每张输入图像分配作用

When more than one reference image is used, explicitly define what each image contributes.

Good pattern:

text

Use reference image 1 for the composition and camera angle.
Use reference image 2 for the fabric texture and color palette.
Use reference image 3 for the product silhouette only.
Do not copy the background from references 2 or 3.

使用多张参考图时，明确定义每张图贡献的内容。

优秀示例：

text

Use reference image 1 for the composition and camera angle.
Use reference image 2 for the fabric texture and color palette.
Use reference image 3 for the product silhouette only.
Do not copy the background from references 2 or 3.

Strong multimodal formula

强多模态公式

text

[Reference images] + [role of each reference] + [what must remain unchanged] + [new scenario] + [style and quality target]

Example:

text

Using the first attached image for the room layout and camera position, and the second attached image for the velvet texture and olive-green color, create a high-end interior design render of a reading chair in a sunlit minimalist living room. Keep the composition and scale consistent with the layout reference, but redesign the chair into a premium sculptural form with realistic stitching, soft shadows, and polished oak legs.

text

[参考图像] + [每张参考的作用] + [必须保持不变的内容] + [新场景] + [风格和质量目标]

示例：

text

Using the first attached image for the room layout and camera position, and the second attached image for the velvet texture and olive-green color, create a high-end interior design render of a reading chair in a sunlit minimalist living room. Keep the composition and scale consistent with the layout reference, but redesign the chair into a premium sculptural form with realistic stitching, soft shadows, and polished oak legs.

Rules for reference use

参考图使用规则

Do not say only
```
use these references
```
; specify what to borrow from each.
State what not to copy when necessary.
For identity-sensitive edits, say what must remain unchanged: face, pose, camera angle, outfit color, logo placement, product geometry, etc.
If references conflict, declare the priority order.

不要只说
```
use these references
```
（使用这些参考），要明确从每张参考中借鉴什么。
必要时说明不要复制什么。
对于身份敏感的编辑，说明必须保持不变的内容：面部、姿势、相机角度、服装颜色、logo位置、产品几何形状等。
如果参考图有冲突，说明优先级顺序。

Editing prompts

编辑提示词

Editing prompts are different from fresh generation prompts.

编辑提示词和全新生成的提示词不同。

The golden rule for edits

编辑黄金法则

Be explicit about what changes and what stays exactly the same.

Weak:

text

Make this image better.

Better:

text

Remove the man from the background while keeping the street, perspective, lighting, shadows, storefront signage, and camera position unchanged.

明确说明什么要修改，什么要完全保持不变。

反面示例：

text

Make this image better.

正面示例：

text

Remove the man from the background while keeping the street, perspective, lighting, shadows, storefront signage, and camera position unchanged.

Editing formula

编辑公式

text

[Edit verb] + [specific target] + [exact change] + [what to preserve] + [quality/style target]

Examples:

text

Replace the cloudy sky with a dramatic golden-hour sunset while keeping the building, camera angle, reflections, and street activity unchanged.

text

Transform this product photo into a premium studio advertisement while preserving the exact bottle shape, label wording, brand colors, and front-facing composition.

text

Remove the table lamp from the nightstand. Keep the bed, wall art, shadows, color palette, and image framing identical.

text

[编辑动词] + [具体修改目标] + [具体修改内容] + [要保留的内容] + [质量/风格目标]

示例：

text

Replace the cloudy sky with a dramatic golden-hour sunset while keeping the building, camera angle, reflections, and street activity unchanged.

text

Transform this product photo into a premium studio advertisement while preserving the exact bottle shape, label wording, brand colors, and front-facing composition.

text

Remove the table lamp from the nightstand. Keep the bed, wall art, shadows, color palette, and image framing identical.

Editing tips

编辑技巧

Name the object to change clearly.
State preserve rules in plain language.
For local edits, mention nearby context so the patch blends naturally.
For realism, tell the model to preserve matching shadows, reflections, perspective, and grain.

清晰说明要修改的对象。
用直白的语言说明保留规则。
局部编辑时提及周边上下文，让修改部分自然融合。
追求真实感时，要求模型保留匹配的阴影、反射、透视和颗粒感。

Text rendering and localization

文字渲染和本地化

Text in images is much better than it used to be, but it still benefits from very explicit prompting.

现在图像中的文字渲染效果比以前好很多，但依然需要非常明确的提示词才能得到好效果。

Rules for good text rendering

优秀文字渲染规则

Put exact text in quotes.
Keep text short whenever possible.
State the number of lines.
Specify the placement.
Specify the font feel or a recognizable font style.
Describe the graphic design context so the text feels integrated.

Example:

text

Create a premium skincare advertisement. On the right side of the frame, render three lines of text with exact spelling: "GLOW" on the first line in an elegant flowing brush-script style, "10% OFF" on the second line in a bold block sans-serif style, and "Your First Order" on the third line in a thin minimalist geometric sans-serif. Keep the product jar large and centered-left, with warm studio lighting and clean luxury packaging aesthetics.

把精确的文字放在引号中。
尽可能缩短文字长度。
说明行数。
指定位置。
指定字体风格或者可识别的字体类型。
描述平面设计上下文让文字更融入。

示例：

text

Create a premium skincare advertisement. On the right side of the frame, render three lines of text with exact spelling: "GLOW" on the first line in an elegant flowing brush-script style, "10% OFF" on the second line in a bold block sans-serif style, and "Your First Order" on the third line in a thin minimalist geometric sans-serif. Keep the product jar large and centered-left, with warm studio lighting and clean luxury packaging aesthetics.

Localization prompt pattern

本地化提示词模板

text

Render the poster text in Korean and Arabic with correct script rendering and natural layout. Keep the brand hierarchy identical, with the Korean text as the main headline and the Arabic text as the supporting line.

text

Render the poster text in Korean and Arabic with correct script rendering and natural layout. Keep the brand hierarchy identical, with the Korean text as the main headline and the Arabic text as the supporting line.

Typography tips

排版技巧

Quote the exact words.
Keep copy shorter than you think.
State the visual hierarchy: headline, subhead, caption, badge, CTA.
For posters, specify whether the text is printed on a physical object, floating in layout space, or cut out of the background.
If text keeps failing, shorten it and simplify the layout.

把精确的文字放在引号中。
文案比你预想的要更短。
说明视觉层级：标题、副标题、说明文字、徽章、行动按钮。
对于海报，说明文字是印刷在实物上、悬浮在布局空间中还是从背景中裁切出来的。
如果文字渲染一直出错，缩短文字并简化布局。

Five prompt frameworks

五种提示词框架

1. Text-to-image without references

1. 无参考图文生图

Use when starting from nothing.

Formula:

text

[Subject] + [Action] + [Location/context] + [Composition] + [Lighting] + [Style]

从零开始创作时使用。

公式：

text

[主体] + [动作] + [位置/上下文] + [构图] + [光线] + [风格]

2. Reference-guided generation

2. 参考图引导生成

Use when you need consistency or blended inspiration.

Formula:

text

[Attached references] + [role of each] + [new scenario] + [what to preserve] + [style target]

需要一致性或者融合灵感时使用。

公式：

text

[附件参考图] + [每张参考的作用] + [新场景] + [要保留的内容] + [风格目标]

3. Conversational image editing

3. 对话式图像编辑

Use when you already have a base image and want to change part of it.

Formula:

text

[Edit action] + [what changes] + [what stays the same]

已有基础图像想要修改部分内容时使用。

公式：

text

[编辑动作] + [修改内容] + [保持不变的内容]

4. Style transfer / composition transfer

4. 风格迁移/构图迁移

Use when one image supplies content and another supplies the look.

Formula:

text

[Base image content] + [style source] + [what must remain recognizable]

一张图提供内容，另一张图提供视觉风格时使用。

公式：

text

[基础图像内容] + [风格来源] + [必须保持可识别的内容]

5. Text-first design prompts

5. 文字优先设计提示词

Use for posters, ads, label mockups, and graphics.

Formula:

text

[Design type] + [layout] + [exact text in quotes] + [font/style guidance] + [placement] + [background/subject]

用于海报、广告、标签原型和图形设计时使用。

公式：

text

[设计类型] + [布局] + [引号包裹的精确文字] + [字体/风格指导] + [位置] + [背景/主体]

Advanced prompting tactics

高级提示词技巧

Put non-negotiables early

把硬性要求放在最前面

The first part of the prompt should contain the things that must be right even if the model ignores the rest.

提示词的开头部分应该包含即使模型忽略其他内容也必须正确的信息。

Separate must-haves from nice-to-haves

区分必选项和可选项

A useful structure is:

Must-have: subject, composition, lighting, text, preserved identity
Nice-to-have: extra atmosphere, tiny props, secondary story details

If quality drops, cut nice-to-haves first.

实用结构：

必选项： 主体、构图、光线、文字、保留的身份特征
可选项： 额外氛围、小道具、次要故事细节

如果质量下降，优先删除可选项。

Avoid contradiction density

避免矛盾描述

The more conflicting adjectives you pile in, the more generic the result becomes.

Examples of bad pairings:

minimalist + crowded with detail everywhere
documentary realism + stylized anime cel shading
soft diffused fog + razor-sharp hard sunlight everywhere

你堆砌的冲突形容词越多，结果就会越通用。

错误搭配示例：

极简 + 到处都是细节
纪实写实 + 风格化动漫赛璐珞上色
柔和弥散的雾 + 到处都是锐利的强日光

Add a narrative moment

加入叙事瞬间

A picture gets more interesting when it captures a moment, not just an object.

Examples:

the instant before the temple mechanism awakens

```
mid-step entering the rain
```
```
just after the champagne cork pops
```

the second before sunrise breaks through the clouds

捕捉一个瞬间的图像比单纯展示物体的图像更有吸引力。

示例：

the instant before the temple mechanism awakens

```
mid-step entering the rain
```
```
just after the champagne cork pops
```

the second before sunrise breaks through the clouds

Use materiality to escape the generic look

使用材质感摆脱通用效果

When an image looks synthetic or cheap, the fix is often not

make it realistic

. The fix is naming the materials, imperfections, surfaces, and light behavior.

当图像看起来合成感强或者廉价时，解决方案通常不是

make it realistic

（让它更真实），而是明确说明材质、瑕疵、表面和光线表现。

Use negative prompts sparingly

谨慎使用负面提示词

Good negative prompts remove common artifacts. Bad negative prompts become a giant bag of unrelated anxieties.

Good:

text

negative_prompt: ["blurry image", "distorted hands", "flat lighting", "text or watermark"]

Less good:

text

negative_prompt: [dozens of unrelated items that are unlikely to appear anyway]

好的负面提示词会移除常见伪影，差的负面提示词会变成一堆无关的焦虑集合。

优秀示例：

text

negative_prompt: ["blurry image", "distorted hands", "flat lighting", "text or watermark"]

反面示例：

text

negative_prompt: [几十种本来就不太可能出现的无关内容]

Symptom -> fix guide

问题->解决方案指南

Problem: composition is wrong

问题：构图错误

Fix with:

shot type
camera angle
focal subject
subject placement
lens choice
depth layers

解决方案：

调整镜头类型
调整相机角度
调整焦点主体
调整主体位置
调整镜头选择
调整景深层次

Problem: image looks flat or cheap

问题：图像看起来扁平或者廉价

Fix with:

stronger primary light source
secondary rim or bounce light
material and surface detail
realistic shadows/reflections
color grading direction

解决方案：

更强的主光源
次级轮廓光或反射光
材质和表面细节
真实的阴影/反射
调色方向说明

Problem: too busy / no focal point

问题：太杂乱/无焦点

Fix with:

one primary subject
simpler background
explicit focus target
fewer secondary objects

解决方案：

仅保留一个核心主体
简化背景
明确焦点目标
减少次要对象

Problem: anatomy or hands are bad

问题：身体结构或手部异常

Fix with:

simpler pose
fewer visible fingers/hands if hands are not essential
medium shot instead of extreme close-up of hands
natural action with clear limb positions

解决方案：

简化姿势
如果手部不是核心，减少可见的手指/手部数量
用中景代替手部极端特写
肢体位置清晰的自然动作

Problem: text is misspelled or ugly

问题：文字拼写错误或者难看

Fix with:

shorten copy
put exact text in quotes
define line breaks
specify poster/ad layout
choose a model with stronger text rendering

解决方案：

缩短文案
把精确文字放在引号中
定义换行位置
明确海报/广告布局
选择文字渲染能力更强的模型

Problem: references are ignored

问题：参考图被忽略

Fix with:

reduce the number of references
assign one explicit role per reference
state what must be preserved from the base image
declare priority if references conflict

解决方案：

减少参考图数量
每张参考图分配一个明确的作用
说明基础图像中必须保留的内容
参考图冲突时明确优先级

Problem: image feels generic

问题：图像感觉通用

Fix with:

add a story moment
add material specificity
add camera/lens choices
add atmosphere/weather/time of day
replace vague style labels with concrete visual direction

解决方案：

加入故事瞬间
加入材质细节说明
加入相机/镜头选择
加入氛围/天气/时间说明
把模糊的风格标签替换为具体的视觉指导

Example high-control prompt

高控制度提示词示例

This example shows the general style of prompt that often produces strong results for complex scenes.

Create a cinematic fantasy-realist image with a clear focal point, believable lighting, and rich environmental detail.

```json
{
  "title": "Bioluminescent Jungle Temple at Dawn",
  "style": {
    "genre": "fantasy realism",
    "visual_aesthetic": ["cinematic", "ultra-detailed", "atmospheric", "mythic"]
  },
  "scene": {
    "location": "ancient jungle temple courtyard",
    "environment": "dense tropical rainforest",
    "time_of_day": "early dawn",
    "weather": "light mist after rainfall"
  },
  "composition": {
    "camera_angle": "low three-quarter angle",
    "framing": "wide shot",
    "focus": "central altar and lone explorer",
    "depth_layers": [
      "foreground wet roots and glowing fungi",
      "midground broken steps and altar",
      "background towering ruins and canopy"
    ]
  },
  "lighting": {
    "primary_light_source": "soft golden dawn light through the canopy",
    "secondary_light_sources": [
      "warm lantern glow",
      "subtle bioluminescent cyan glow"
    ],
    "mood": "mysterious, sacred, awe-inspiring"
  },
  "technical_preferences": {
    "aspect_ratio": "16:9",
    "lens": "24mm cinematic wide-angle",
    "realism": "high"
  },
  "negative_prompt": [
    "cartoon style",
    "low detail",
    "flat lighting",
    "text or watermark"
  ]
}
```

这个示例展示了针对复杂场景通常能产出优秀结果的提示词风格。

Create a cinematic fantasy-realist image with a clear focal point, believable lighting, and rich environmental detail.

```json
{
  "title": "Bioluminescent Jungle Temple at Dawn",
  "style": {
    "genre": "fantasy realism",
    "visual_aesthetic": ["cinematic", "ultra-detailed", "atmospheric", "mythic"]
  },
  "scene": {
    "location": "ancient jungle temple courtyard",
    "environment": "dense tropical rainforest",
    "time_of_day": "early dawn",
    "weather": "light mist after rainfall"
  },
  "composition": {
    "camera_angle": "low three-quarter angle",
    "framing": "wide shot",
    "focus": "central altar and lone explorer",
    "depth_layers": [
      "foreground wet roots and glowing fungi",
      "midground broken steps and altar",
      "background towering ruins and canopy"
    ]
  },
  "lighting": {
    "primary_light_source": "soft golden dawn light through the canopy",
    "secondary_light_sources": [
      "warm lantern glow",
      "subtle bioluminescent cyan glow"
    ],
    "mood": "mysterious, sacred, awe-inspiring"
  },
  "technical_preferences": {
    "aspect_ratio": "16:9",
    "lens": "24mm cinematic wide-angle",
    "realism": "high"
  },
  "negative_prompt": [
    "cartoon style",
    "low detail",
    "flat lighting",
    "text or watermark"
  ]
}
```

Final quality checklist

最终质量检查清单

Before you stop, make sure the image has the following:

a clear focal point
composition that matches the use case
lighting that supports the intended mood
believable materials and textures
no obvious artifacting or accidental clutter
text that is spelled correctly if text is present
aspect ratio that suits the output channel
prompt file saved for reproducibility if the task is important

结束前确认图像满足以下要求：

清晰的焦点
符合使用场景的构图
匹配预期氛围的光线
真实可信的材质和纹理
无明显伪影或意外的杂乱内容
如果有文字，拼写正确
适配输出渠道的宽高比
如果任务重要，保存提示词文件方便复现

Minimal agent playbook

极简Agent操作手册

When the user asks for an image:

infer the deliverable type and aspect ratio
choose a suitable image model
write the prompt in a file if the request is non-trivial
use structured prompting for complex scenes or edits
generate once
if the model produces multiple output images, review every single output before deciding which one(s) to use — do not just pick the first image
inspect what is wrong
revise only the necessary sections
prefer prompt improvement over random model hopping

IMPORTANT: state of the art image generation models are expensive. Check with the user before revising images.

The fastest path to better image output is usually clearer direction, cleaner hierarchy, and more deliberate visual language.

用户要求生成图像时：

推断交付物类型和宽高比
选择合适的图像模型
如果需求不简单，把提示词写在文件中
复杂场景或编辑使用结构化提示词
首次生成
如果模型生成了多张输出图像，决定使用哪张前逐一检查所有输出，不要直接选第一张
检查存在的问题
仅修改必要的部分
优先优化提示词，不要随意切换模型

重要提示：前沿的图像生成模型成本很高，修改图像前先和用户确认。

获得更优图像输出的最快路径通常是更清晰的方向、更干净的层级和更有目的性的视觉描述。