image-generation-enhanced
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseImage Generation
图像生成
Use this skill when an agent needs to generate or edit images.
当Agent需要生成或编辑图像时使用本技能。
Prerequisites
前置要求
- OpenRouter API key — Image generation models are accessed via OpenRouter. You need an OpenRouter API key to use this skill.
This skill is about getting better images, not about any single tool. If the environment already provides a nimage tool, SDK, or wrapper, use that. If the agent has an OpenRouter API key but has not been given another image tool, minibanana is a good lightweight place to start.
This skill is intentionally biased toward image quality rather than minimal prompt length. The main idea is simple:
Do not throw keyword soup at the model. Direct the image like a creative director.
- OpenRouter API 密钥 — 图像生成模型通过OpenRouter访问,使用本技能需要你有一个OpenRouter API密钥。
本技能的核心是生成质量更优的图像,而非绑定某个单一工具。如果环境已经提供了原生图像工具、SDK或封装包,优先使用这些。如果Agent有OpenRouter API密钥但没有其他可用的图像工具,minibanana是很好的轻量入门选择。
本技能刻意偏向图像质量优先而非最短提示词长度,核心思路非常简单:
不要给模型投喂杂乱的关键词堆砌内容,要像创意总监一样主导图像创作。
What this skill is for
适用场景
- Text-to-image generation from scratch
- Reference-guided image generation
- Image editing and compositing
- Product shots, posters, key art, concept art, social creatives, mockups
- Images that require deliberate control over composition, lighting, materials, camera, or text rendering
- 从零开始的文生图创作
- 参考图引导的图像生成
- 图像编辑与合成
- 产品图、海报、主视觉图、概念图、社交素材、原型稿
- 需要对构图、光线、材质、镜头、文字渲染进行精准控制的图像
Core principles
核心原则
- Prompt quality is usually the biggest lever. Better prompting often matters more than tiny model changes.
- Lead with intent. Start with a strong verb such as ,
create,render,photograph,illustrate,design,edit,transform, orreplace.remove - Be specific about the visible result. Subject, scene, composition, lighting, style, and materials should be explicit.
- Prefer positive framing. Describe what should appear, not only what should be excluded.
- Use references deliberately. Assign each reference image a role.
- Iterate surgically. After each attempt, change the fewest prompt parts necessary.
- Use structured prompting for complex jobs. Hybrid prompts with JSON often work very well for multi-element scenes and precise edits.
- 提示词质量通常是最大的影响因素,优质的提示词往往比细微的模型调整效果更显著。
- 以意图开头,使用强动作动词开头,例如 、
create、render、photograph、illustrate、design、edit、transform、replace等。remove - 明确描述可见的输出结果,主体、场景、构图、光线、风格、材质都需要明确说明。
- 优先使用正向描述,描述应该出现的内容,而不仅仅是要排除的内容。
- 有意识地使用参考图,给每张参考图分配明确的作用。
- 精准迭代,每次尝试后,仅修改提示词中必要的最少内容。
- 复杂任务使用结构化提示,混合JSON的提示词对于多元素场景和精准编辑效果非常好。
Tooling options
工具选项
Use whatever interface the environment already gives you:
- an existing image tool
- a direct SDK or HTTP client
- a local wrapper already available in the repo
- as a lightweight fallback when the agent has an OpenRouter API key but no other image tool
minibanana
If you do use , first inspect the current CLI behavior in the environment:
minibananabash
minibanana --helpMinimal example:
bash
minibanana --prompt "A friendly whale" --model "bytedance-seed/seedream-4.5" --out image.pngPractical defaults when using :
minibanana- Prefer PNG for edits, graphics, typography, diagrams, and anything sensitive to JPEG artifacts.
- Prefer JPEG when file size matters more than lossless output.
- Put any non-trivial prompt in a file and pass it with e.g. .
--prompt @prompt.md - Use repeated image inputs when you need references, and assign each one a role in the prompt.
优先使用环境已提供的任意接口:
- 现有图像工具
- 直接的SDK或HTTP客户端
- 代码库中已有的本地封装工具
- 当Agent有OpenRouter API密钥但没有其他图像工具时,作为轻量 fallback 选项
minibanana
如果你使用 ,首先在环境中检查当前CLI的功能:
minibananabash
minibanana --help最简示例:
bash
minibanana --prompt "A friendly whale" --model "bytedance-seed/seedream-4.5" --out image.png使用 的实用默认配置:
minibanana- 编辑内容、图形、排版、图表以及对JPEG伪影敏感的内容优先使用 PNG 格式。
- 当文件大小比无损输出更重要时优先使用 JPEG 格式。
- 把所有非极简的提示词放在文件中,通过 这类方式传入。
--prompt @prompt.md - 需要参考图时多次传入图像输入,并在提示词中为每张图分配作用。
The working loop
工作流程
Always use this loop:
-
Classify the task
- text-to-image
- reference-guided generation
- edit / inpaint / remove / replace
- typography-heavy graphic
- diagram / infographic
-
Choose the aspect ratio from the job
- for avatars, tiles, concept squares
1:1 - for social posts and product ads
4:5 - for stories, reels, wallpapers
9:16 - for cinematic scenes, thumbnails, hero images
16:9 - for banners and panorama-style scenes
21:9
-
Choose a model that fits the job
- use the user's requested model if they named one
- otherwise prefer a model family suited to the task
-
Write the first prompt with strong direction
- narrative prompt for simple scenes
- hybrid narrative + JSON for complex scenes
- editing prompt with explicit invariants for image edits
-
Generate and inspect the image
- if the model produces multiple output images, review every single output before deciding which one(s) to use — do not just pick the first image
- verify subject
- verify composition
- verify lighting
- verify materials/textures
- verify text rendering if present
- verify whether the image actually matches the intended use case
-
Revise surgically
- if composition is wrong, change composition language first
- if mood is wrong, change lighting and palette first
- if anatomy is wrong, simplify pose and framing
- if text is wrong, shorten it and make it more explicit
始终遵循以下流程:
-
任务分类
- 文生图
- 参考图引导生成
- 编辑/局部重绘/移除/替换
- 重排版的图形内容
- 图表/信息图
-
根据任务选择宽高比
- 适用于头像、素材块、方形概念图
1:1 - 适用于社交帖子和产品广告
4:5 - 适用于故事、短视频、壁纸
9:16 - 适用于电影感场景、缩略图、首屏大图
16:9 - 适用于横幅和全景风格场景
21:9
-
选择适配任务的模型
- 如果用户指定了模型则使用用户要求的模型
- 否则优先选择适配任务的模型系列
-
撰写方向明确的初始提示词
- 简单场景使用叙事类提示词
- 复杂场景使用叙事+JSON混合提示词
- 图像编辑使用包含明确不变量的编辑提示词
-
生成并检查图像
- 如果模型生成了多张输出图像,决定使用哪张前逐一检查所有输出,不要直接选第一张
- 确认主体是否正确
- 确认构图是否正确
- 确认光线是否正确
- 确认材质/纹理是否正确
- 如果有文字则确认文字渲染是否正确
- 确认图像是否符合预期使用场景
-
精准修改迭代
- 如果构图不对,优先修改构图相关描述
- 如果氛围不对,优先修改光线和调色相关描述
- 如果结构不对,简化姿势和取景
- 如果文字不对,缩短文字并更明确地说明
Model selection heuristics
模型选择启发式规则
Use the model the user explicitly asks for unless there is a strong reason not to. When choosing yourself, use simple task-based heuristics.
除非有强烈的反对理由,否则使用用户明确要求的模型。自行选择时使用简单的任务适配规则。
Good current starting points
目前优秀的入门选择
-
bytedance-seed/seedream-4.5- strong visual aesthetics
- improved editing consistency
- good portrait refinement
- stronger small-text rendering than many image models
-
google/gemini-3-pro-image-preview- strong multi-image reasoning and blending
- strong text rendering
- strong identity preservation across multiple subjects
- useful for storyboards, composites, and complex scene design
-
google/gemini-3.1-flash-image-preview- strong quality-to-speed tradeoff
- good for iterative editing and quick refinement
- supports extended aspect ratios on OpenRouter
-
openai/gpt-5-image- strong instruction following
- strong text rendering
- strong detailed editing
-
and
sourceful/riverflow-v2-prosourceful/riverflow-v2-fast- strong text rendering and graphics-oriented work
- useful when Sourceful-specific font inputs or super-resolution references are relevant
-
bytedance-seed/seedream-4.5- 优秀的视觉审美
- 编辑一致性更高
- 人像优化效果好
- 小字渲染能力比多数图像模型更强
-
google/gemini-3-pro-image-preview- 优秀的多图推理和融合能力
- 优秀的文字渲染能力
- 多主体身份保留能力强
- 适合分镜、合成和复杂场景设计
-
google/gemini-3.1-flash-image-preview- 质量和速度的平衡表现优秀
- 适合迭代编辑和快速优化
- 在OpenRouter上支持扩展宽高比
-
openai/gpt-5-image- 指令遵循能力强
- 文字渲染能力优秀
- 精细化编辑能力强
-
和
sourceful/riverflow-v2-prosourceful/riverflow-v2-fast- 文字渲染和图形类工作表现优秀
- 适合需要Sourceful特定字体输入或超分辨率参考的相关场景
Model choice by task
按任务选择模型
- Fast iteration / concept exploration: a faster image model is usually enough
- High-fidelity editorial / concept art: favor models known for aesthetics and composition quality
- Precise edits / multi-image composition: favor models known for multimodal reasoning
- Typography-heavy posters / ads / infographics: favor models with strong text rendering
- 快速迭代/概念探索:通常选择速度更快的图像模型即可
- 高保真编辑/概念艺术:优先选择审美和构图质量出色的模型
- 精准编辑/多图合成:优先选择多模态推理能力出色的模型
- 重排版海报/广告/信息图:优先选择文字渲染能力出色的模型
Prompt format: choose the right shape
提示词格式:选择合适的形式
Use a plain narrative prompt when
以下场景使用纯叙事提示词
- the scene is simple
- you want fast ideation
- you are exploring style directions loosely
- the image does not depend on many separate constraints
- 场景简单
- 需要快速构思
- 宽松地探索风格方向
- 图像不依赖多个独立约束条件
Use a hybrid prompt when
以下场景使用混合提示词
- the request is high stakes
- multiple subjects or layers matter
- references have distinct roles
- there is product, brand, or material specificity
- composition must be tightly controlled
- text rendering matters
- you expect to iterate and patch only specific parts later
Best default for serious work:
- one or two natural-language sentences that state the visual goal clearly
- a structured JSON block that encodes the exact specification
This often works better than raw JSON alone because the natural-language lead establishes the overall intent, while the JSON gives the model a stable structure.
- 需求重要性高
- 多个主体或图层要求明确
- 参考图有不同的作用
- 有产品、品牌或材质的特定要求
- 构图需要严格控制
- 文字渲染很重要
- 后续需要迭代仅修改特定部分
严肃工作的最佳默认格式:
- 一到两句自然语言句子清晰说明视觉目标
- 结构化JSON块编码精确的规格要求
这种格式通常比纯JSON效果更好,因为开头的自然语言确定了整体意图,而JSON给模型提供了稳定的结构。
Prompt anatomy
提示词结构
A strong image prompt usually covers these elements in roughly this order:
- Operation — create, render, photograph, illustrate, edit, replace, remove, transform
- Primary subject — who or what the image is really about
- Action / pose / state — what the subject is doing
- Environment / context — where the scene takes place
- Composition — shot type, framing, angle, focal point, depth layers
- Lighting — source, direction, softness, contrast, time of day, atmosphere
- Style / medium — editorial photo, fantasy realism, watercolor, 3D render, film still
- Materials / textures — leather, moss-covered stone, polished chrome, tweed, fog, wet pavement
- Color palette / grading — warm neutrals, muted teal, rich contrast, desaturated sci-fi
- Text instructions — exact text, font feel, placement, line count, language
- Constraints — aspect ratio, realism level, keep background, preserve identity, no motion blur
- Negative prompt — only targeted artifact suppression, not a giant trash list
一个优质的图像提示词通常按以下顺序覆盖这些元素:
- 操作类型 — create, render, photograph, illustrate, edit, replace, remove, transform
- 核心主体 — 图像的核心聚焦对象
- 动作/姿势/状态 — 主体正在做什么
- 环境/上下文 — 场景发生的地点
- 构图 — 镜头类型、取景、角度、焦点、景深层次
- 光线 — 光源、方向、柔硬度、对比度、时间、氛围
- 风格/媒介 — 编辑摄影、奇幻写实、水彩、3D渲染、电影截图
- 材质/纹理 — 皮革、覆满苔藓的石头、抛光铬合金、粗花呢、雾、潮湿路面
- 调色/配色 — 暖中性色、柔和蓝绿色、高对比度、低饱和科幻风
- 文字要求 — 精确文字内容、字体风格、位置、行数、语言
- 约束条件 — 宽高比、真实度等级、保留背景、保留身份、无动态模糊
- 负面提示词 — 仅针对性抑制伪影,不要冗长的无关列表
The simplest good formula
最简单的优质公式
For text-to-image without references:
text
[Verb] + [Subject] + [Action] + [Location/context] + [Composition] + [Lighting] + [Style] + [Materials/details] + [Output/use-case]Example:
text
Create a high-end editorial fashion portrait of a confident model wearing a tailored brown dress, sleek boots, and a structured handbag, standing in a statuesque pose against a seamless deep cherry-red studio backdrop. Medium-full shot, center-framed, photographed at a low three-quarter angle with soft cinematic key light and subtle rim light. Shot like a luxury fashion magazine campaign on medium-format analog film with pronounced grain, rich color, and realistic fabric texture.无参考图的文生图场景:
text
[动词] + [主体] + [动作] + [位置/上下文] + [构图] + [光线] + [风格] + [材质/细节] + [输出/使用场景]示例:
text
Create a high-end editorial fashion portrait of a confident model wearing a tailored brown dress, sleek boots, and a structured handbag, standing in a statuesque pose against a seamless深樱桃红色studio backdrop. Medium-full shot, center-framed, photographed at a low three-quarter angle with soft cinematic key light and subtle rim light. Shot like a luxury fashion magazine campaign on medium-format analog film with pronounced grain, rich color, and realistic fabric texture.Prompt like a creative director
像创意总监一样写提示词
Do not stop at nouns. Control the scene deliberately.
不要只写名词,要有意识地控制整个场景。
1. Direct the composition
1. 控制构图
Use terms such as:
- extreme close-up
- close-up
- medium shot
- medium-full shot
- wide shot
- aerial view
- top-down
- low angle
- high angle
- over-the-shoulder
- symmetrical framing
- centered composition
- rule-of-thirds placement
- foreground / midground / background layers
使用以下术语:
- 极端特写
- 特写
- 中景
- 中全景
- 全景
- 航拍视角
- 俯视
- 低角度
- 高角度
- 过肩视角
- 对称取景
- 居中构图
- 三分法布局
- 前景/中景/背景层次
2. Direct the camera and lens
2. 控制相机和镜头
Useful language:
- for environmental scale
24mm wide-angle - for natural cinematic framing
35mm - for neutral human perspective
50mm - for flattering compression
85mm portrait lens - for product and texture detail
macro lens - for subject separation
shallow depth of field (f/1.8) - for detailed environments
deep focus
实用表述:
- 体现环境尺度
24mm 广角 - 自然电影感构图
35mm - 中立人类视角
50mm - 讨喜的压缩效果
85mm 人像镜头 - 展示产品和纹理细节
微距镜头 - 突出主体分离度
浅景深 (f/1.8) - 展示环境细节
深焦
3. Direct the lighting
3. 控制光线
Useful language:
- golden hour backlight
- overcast daylight
- soft studio softbox lighting
- three-point lighting
- harsh chiaroscuro lighting
- volumetric god rays
- fog diffusion
- neon edge lighting
- candlelit interior
- wet-surface reflections
实用表述:
- 黄金 hour 逆光
- 阴天日光
- 柔光箱柔光灯
- 三点打光
- 强烈的明暗对比打光
- 体积感丁达尔光
- 雾漫射光
- 霓虹轮廓光
- 烛光室内光
- 湿表面反射光
4. Direct materiality
4. 控制材质
This is one of the most underused quality levers. Name surfaces and imperfections.
Examples:
- weathered, cracked, moss-covered stone
- navy blue tweed with visible weave
- brushed aluminum with subtle micro-scratches
- rain-slick asphalt reflecting signage
- matte ceramic with fine glaze variation
- worn leather with creases and patina
这是最被低估的质量提升杠杆,明确说明表面和瑕疵。
示例:
- 风化、开裂、覆满苔藓的石头
- 带有清晰织纹的海军蓝粗花呢
- 带有细微划痕的拉丝铝
- 反射着标识的湿滑沥青
- 带有细微釉面差异的哑光陶瓷
- 带有褶皱和使用痕迹的旧皮革
5. Direct the color story
5. 控制色彩方案
Examples:
- muted teal and amber cinematic grading
- warm monochrome neutrals
- deep emerald and gold accents
- pastel candy palette with soft bloom
- high-contrast black-and-white with silver halation
示例:
- 柔和蓝绿和琥珀色电影调色
- 暖单色中性色
- 深祖母绿和金色点缀
- 柔和弥散的马卡龙糖果色
- 带银边光晕的高对比度黑白
JSON prompting for high-control jobs
高控制度任务的JSON提示词
For complex or high-fidelity prompts, use a structured block. This is especially useful for:
- complex scenes with multiple subjects
- product shots with precise materials and layout
- image edits with clear preserve/change rules
- typography-heavy compositions
- iterative workflows where you need to patch one section later
对于复杂或高保真的提示词,使用结构化块,尤其适合以下场景:
- 多主体的复杂场景
- 材质和布局精准的产品图
- 有明确保留/修改规则的图像编辑
- 重排版的构图
- 后续需要修改某一部分的迭代工作流
Hybrid JSON template
混合JSON模板
md
Create a polished, high-fidelity image that feels intentional, cinematic, and production-ready.
```json
{
"task": "text-to-image",
"goal": "one-sentence visual objective",
"subject": {
"primary": "main subject",
"secondary": ["supporting elements"]
},
"scene": {
"location": "where it takes place",
"time": "time of day",
"weather": "weather or atmosphere",
"story_moment": "what moment is being depicted"
},
"composition": {
"framing": "wide shot / portrait / macro / etc.",
"camera_angle": "low angle / eye level / aerial / etc.",
"focus": "what the eye should land on first",
"depth_layers": ["foreground", "midground", "background"]
},
"lighting": {
"primary": "main light source",
"secondary": ["secondary light sources"],
"mood": "desired emotional tone"
},
"style": {
"genre": "photorealistic / fantasy realism / 3D / watercolor / etc.",
"visual_aesthetic": ["keywords"],
"rendering": {
"detail_level": "high",
"sharpness": "high",
"dynamic_range": "wide"
}
},
"materials_and_textures": {
"subject": "surface/material notes",
"environment": "environment texture notes"
},
"color_palette": {
"dominant": ["main colors"],
"accents": ["accent colors"]
},
"text_rendering": {
"enabled": false,
"text": [],
"placement": "",
"style": ""
},
"technical_preferences": {
"aspect_ratio": "16:9",
"lens": "35mm",
"depth_of_field": "moderate",
"realism": "high"
},
"negative_prompt": ["blurry image", "flat lighting", "text or watermark"]
}
```md
Create a polished, high-fidelity image that feels intentional, cinematic, and production-ready.
```json
{
"task": "text-to-image",
"goal": "one-sentence visual objective",
"subject": {
"primary": "main subject",
"secondary": ["supporting elements"]
},
"scene": {
"location": "where it takes place",
"time": "time of day",
"weather": "weather or atmosphere",
"story_moment": "what moment is being depicted"
},
"composition": {
"framing": "wide shot / portrait / macro / etc.",
"camera_angle": "low angle / eye level / aerial / etc.",
"focus": "what the eye should land on first",
"depth_layers": ["foreground", "midground", "background"]
},
"lighting": {
"primary": "main light source",
"secondary": ["secondary light sources"],
"mood": "desired emotional tone"
},
"style": {
"genre": "photorealistic / fantasy realism / 3D / watercolor / etc.",
"visual_aesthetic": ["keywords"],
"rendering": {
"detail_level": "high",
"sharpness": "high",
"dynamic_range": "wide"
}
},
"materials_and_textures": {
"subject": "surface/material notes",
"environment": "environment texture notes"
},
"color_palette": {
"dominant": ["main colors"],
"accents": ["accent colors"]
},
"text_rendering": {
"enabled": false,
"text": [],
"placement": "",
"style": ""
},
"technical_preferences": {
"aspect_ratio": "16:9",
"lens": "35mm",
"depth_of_field": "moderate",
"realism": "high"
},
"negative_prompt": ["blurry image", "flat lighting", "text or watermark"]
}
```Guidance for structured prompting
结构化提示词指南
- Put the most important constraints first.
- Keep the hierarchy clean.
- Avoid contradictory instructions like plus
minimalist.highly cluttered - Keep negative prompts targeted and relevant.
- If the prompt becomes too long and starts failing, shorten it to the essential visual hierarchy.
- 最重要的约束条件放在最前面。
- 保持层级清晰。
- 避免矛盾的指令,例如同时要求 (极简)和
minimalist(极度拥挤)。highly cluttered - 负面提示词保持针对性和相关性。
- 如果提示词太长开始失效,精简到核心的视觉层级即可。
Reference-image prompting
参考图提示词
Use for any reference images.
--in所有参考图使用 传入。
--inBest practice: assign each input image a role
最佳实践:为每张输入图像分配作用
When more than one reference image is used, explicitly define what each image contributes.
Good pattern:
text
Use reference image 1 for the composition and camera angle.
Use reference image 2 for the fabric texture and color palette.
Use reference image 3 for the product silhouette only.
Do not copy the background from references 2 or 3.使用多张参考图时,明确定义每张图贡献的内容。
优秀示例:
text
Use reference image 1 for the composition and camera angle.
Use reference image 2 for the fabric texture and color palette.
Use reference image 3 for the product silhouette only.
Do not copy the background from references 2 or 3.Strong multimodal formula
强多模态公式
text
[Reference images] + [role of each reference] + [what must remain unchanged] + [new scenario] + [style and quality target]Example:
text
Using the first attached image for the room layout and camera position, and the second attached image for the velvet texture and olive-green color, create a high-end interior design render of a reading chair in a sunlit minimalist living room. Keep the composition and scale consistent with the layout reference, but redesign the chair into a premium sculptural form with realistic stitching, soft shadows, and polished oak legs.text
[参考图像] + [每张参考的作用] + [必须保持不变的内容] + [新场景] + [风格和质量目标]示例:
text
Using the first attached image for the room layout and camera position, and the second attached image for the velvet texture and olive-green color, create a high-end interior design render of a reading chair in a sunlit minimalist living room. Keep the composition and scale consistent with the layout reference, but redesign the chair into a premium sculptural form with realistic stitching, soft shadows, and polished oak legs.Rules for reference use
参考图使用规则
- Do not say only ; specify what to borrow from each.
use these references - State what not to copy when necessary.
- For identity-sensitive edits, say what must remain unchanged: face, pose, camera angle, outfit color, logo placement, product geometry, etc.
- If references conflict, declare the priority order.
- 不要只说 (使用这些参考),要明确从每张参考中借鉴什么。
use these references - 必要时说明不要复制什么。
- 对于身份敏感的编辑,说明必须保持不变的内容:面部、姿势、相机角度、服装颜色、logo位置、产品几何形状等。
- 如果参考图有冲突,说明优先级顺序。
Editing prompts
编辑提示词
Editing prompts are different from fresh generation prompts.
编辑提示词和全新生成的提示词不同。
The golden rule for edits
编辑黄金法则
Be explicit about what changes and what stays exactly the same.
Weak:
text
Make this image better.Better:
text
Remove the man from the background while keeping the street, perspective, lighting, shadows, storefront signage, and camera position unchanged.明确说明什么要修改,什么要完全保持不变。
反面示例:
text
Make this image better.正面示例:
text
Remove the man from the background while keeping the street, perspective, lighting, shadows, storefront signage, and camera position unchanged.Editing formula
编辑公式
text
[Edit verb] + [specific target] + [exact change] + [what to preserve] + [quality/style target]Examples:
text
Replace the cloudy sky with a dramatic golden-hour sunset while keeping the building, camera angle, reflections, and street activity unchanged.text
Transform this product photo into a premium studio advertisement while preserving the exact bottle shape, label wording, brand colors, and front-facing composition.text
Remove the table lamp from the nightstand. Keep the bed, wall art, shadows, color palette, and image framing identical.text
[编辑动词] + [具体修改目标] + [具体修改内容] + [要保留的内容] + [质量/风格目标]示例:
text
Replace the cloudy sky with a dramatic golden-hour sunset while keeping the building, camera angle, reflections, and street activity unchanged.text
Transform this product photo into a premium studio advertisement while preserving the exact bottle shape, label wording, brand colors, and front-facing composition.text
Remove the table lamp from the nightstand. Keep the bed, wall art, shadows, color palette, and image framing identical.Editing tips
编辑技巧
- Name the object to change clearly.
- State preserve rules in plain language.
- For local edits, mention nearby context so the patch blends naturally.
- For realism, tell the model to preserve matching shadows, reflections, perspective, and grain.
- 清晰说明要修改的对象。
- 用直白的语言说明保留规则。
- 局部编辑时提及周边上下文,让修改部分自然融合。
- 追求真实感时,要求模型保留匹配的阴影、反射、透视和颗粒感。
Text rendering and localization
文字渲染和本地化
Text in images is much better than it used to be, but it still benefits from very explicit prompting.
现在图像中的文字渲染效果比以前好很多,但依然需要非常明确的提示词才能得到好效果。
Rules for good text rendering
优秀文字渲染规则
- Put exact text in quotes.
- Keep text short whenever possible.
- State the number of lines.
- Specify the placement.
- Specify the font feel or a recognizable font style.
- Describe the graphic design context so the text feels integrated.
Example:
text
Create a premium skincare advertisement. On the right side of the frame, render three lines of text with exact spelling: "GLOW" on the first line in an elegant flowing brush-script style, "10% OFF" on the second line in a bold block sans-serif style, and "Your First Order" on the third line in a thin minimalist geometric sans-serif. Keep the product jar large and centered-left, with warm studio lighting and clean luxury packaging aesthetics.- 把精确的文字放在引号中。
- 尽可能缩短文字长度。
- 说明行数。
- 指定位置。
- 指定字体风格或者可识别的字体类型。
- 描述平面设计上下文让文字更融入。
示例:
text
Create a premium skincare advertisement. On the right side of the frame, render three lines of text with exact spelling: "GLOW" on the first line in an elegant flowing brush-script style, "10% OFF" on the second line in a bold block sans-serif style, and "Your First Order" on the third line in a thin minimalist geometric sans-serif. Keep the product jar large and centered-left, with warm studio lighting and clean luxury packaging aesthetics.Localization prompt pattern
本地化提示词模板
text
Render the poster text in Korean and Arabic with correct script rendering and natural layout. Keep the brand hierarchy identical, with the Korean text as the main headline and the Arabic text as the supporting line.text
Render the poster text in Korean and Arabic with correct script rendering and natural layout. Keep the brand hierarchy identical, with the Korean text as the main headline and the Arabic text as the supporting line.Typography tips
排版技巧
- Quote the exact words.
- Keep copy shorter than you think.
- State the visual hierarchy: headline, subhead, caption, badge, CTA.
- For posters, specify whether the text is printed on a physical object, floating in layout space, or cut out of the background.
- If text keeps failing, shorten it and simplify the layout.
- 把精确的文字放在引号中。
- 文案比你预想的要更短。
- 说明视觉层级:标题、副标题、说明文字、徽章、行动按钮。
- 对于海报,说明文字是印刷在实物上、悬浮在布局空间中还是从背景中裁切出来的。
- 如果文字渲染一直出错,缩短文字并简化布局。
Five prompt frameworks
五种提示词框架
1. Text-to-image without references
1. 无参考图文生图
Use when starting from nothing.
Formula:
text
[Subject] + [Action] + [Location/context] + [Composition] + [Lighting] + [Style]从零开始创作时使用。
公式:
text
[主体] + [动作] + [位置/上下文] + [构图] + [光线] + [风格]2. Reference-guided generation
2. 参考图引导生成
Use when you need consistency or blended inspiration.
Formula:
text
[Attached references] + [role of each] + [new scenario] + [what to preserve] + [style target]需要一致性或者融合灵感时使用。
公式:
text
[附件参考图] + [每张参考的作用] + [新场景] + [要保留的内容] + [风格目标]3. Conversational image editing
3. 对话式图像编辑
Use when you already have a base image and want to change part of it.
Formula:
text
[Edit action] + [what changes] + [what stays the same]已有基础图像想要修改部分内容时使用。
公式:
text
[编辑动作] + [修改内容] + [保持不变的内容]4. Style transfer / composition transfer
4. 风格迁移/构图迁移
Use when one image supplies content and another supplies the look.
Formula:
text
[Base image content] + [style source] + [what must remain recognizable]一张图提供内容,另一张图提供视觉风格时使用。
公式:
text
[基础图像内容] + [风格来源] + [必须保持可识别的内容]5. Text-first design prompts
5. 文字优先设计提示词
Use for posters, ads, label mockups, and graphics.
Formula:
text
[Design type] + [layout] + [exact text in quotes] + [font/style guidance] + [placement] + [background/subject]用于海报、广告、标签原型和图形设计时使用。
公式:
text
[设计类型] + [布局] + [引号包裹的精确文字] + [字体/风格指导] + [位置] + [背景/主体]Advanced prompting tactics
高级提示词技巧
Put non-negotiables early
把硬性要求放在最前面
The first part of the prompt should contain the things that must be right even if the model ignores the rest.
提示词的开头部分应该包含即使模型忽略其他内容也必须正确的信息。
Separate must-haves from nice-to-haves
区分必选项和可选项
A useful structure is:
- Must-have: subject, composition, lighting, text, preserved identity
- Nice-to-have: extra atmosphere, tiny props, secondary story details
If quality drops, cut nice-to-haves first.
实用结构:
- 必选项: 主体、构图、光线、文字、保留的身份特征
- 可选项: 额外氛围、小道具、次要故事细节
如果质量下降,优先删除可选项。
Avoid contradiction density
避免矛盾描述
The more conflicting adjectives you pile in, the more generic the result becomes.
Examples of bad pairings:
- minimalist + crowded with detail everywhere
- documentary realism + stylized anime cel shading
- soft diffused fog + razor-sharp hard sunlight everywhere
你堆砌的冲突形容词越多,结果就会越通用。
错误搭配示例:
- 极简 + 到处都是细节
- 纪实写实 + 风格化动漫赛璐珞上色
- 柔和弥散的雾 + 到处都是锐利的强日光
Add a narrative moment
加入叙事瞬间
A picture gets more interesting when it captures a moment, not just an object.
Examples:
the instant before the temple mechanism awakensmid-step entering the rainjust after the champagne cork popsthe second before sunrise breaks through the clouds
捕捉一个瞬间的图像比单纯展示物体的图像更有吸引力。
示例:
the instant before the temple mechanism awakensmid-step entering the rainjust after the champagne cork popsthe second before sunrise breaks through the clouds
Use materiality to escape the generic look
使用材质感摆脱通用效果
When an image looks synthetic or cheap, the fix is often not . The fix is naming the materials, imperfections, surfaces, and light behavior.
make it realistic当图像看起来合成感强或者廉价时,解决方案通常不是 (让它更真实),而是明确说明材质、瑕疵、表面和光线表现。
make it realisticUse negative prompts sparingly
谨慎使用负面提示词
Good negative prompts remove common artifacts. Bad negative prompts become a giant bag of unrelated anxieties.
Good:
text
negative_prompt: ["blurry image", "distorted hands", "flat lighting", "text or watermark"]Less good:
text
negative_prompt: [dozens of unrelated items that are unlikely to appear anyway]好的负面提示词会移除常见伪影,差的负面提示词会变成一堆无关的焦虑集合。
优秀示例:
text
negative_prompt: ["blurry image", "distorted hands", "flat lighting", "text or watermark"]反面示例:
text
negative_prompt: [几十种本来就不太可能出现的无关内容]Symptom -> fix guide
问题->解决方案指南
Problem: composition is wrong
问题:构图错误
Fix with:
- shot type
- camera angle
- focal subject
- subject placement
- lens choice
- depth layers
解决方案:
- 调整镜头类型
- 调整相机角度
- 调整焦点主体
- 调整主体位置
- 调整镜头选择
- 调整景深层次
Problem: image looks flat or cheap
问题:图像看起来扁平或者廉价
Fix with:
- stronger primary light source
- secondary rim or bounce light
- material and surface detail
- realistic shadows/reflections
- color grading direction
解决方案:
- 更强的主光源
- 次级轮廓光或反射光
- 材质和表面细节
- 真实的阴影/反射
- 调色方向说明
Problem: too busy / no focal point
问题:太杂乱/无焦点
Fix with:
- one primary subject
- simpler background
- explicit focus target
- fewer secondary objects
解决方案:
- 仅保留一个核心主体
- 简化背景
- 明确焦点目标
- 减少次要对象
Problem: anatomy or hands are bad
问题:身体结构或手部异常
Fix with:
- simpler pose
- fewer visible fingers/hands if hands are not essential
- medium shot instead of extreme close-up of hands
- natural action with clear limb positions
解决方案:
- 简化姿势
- 如果手部不是核心,减少可见的手指/手部数量
- 用中景代替手部极端特写
- 肢体位置清晰的自然动作
Problem: text is misspelled or ugly
问题:文字拼写错误或者难看
Fix with:
- shorten copy
- put exact text in quotes
- define line breaks
- specify poster/ad layout
- choose a model with stronger text rendering
解决方案:
- 缩短文案
- 把精确文字放在引号中
- 定义换行位置
- 明确海报/广告布局
- 选择文字渲染能力更强的模型
Problem: references are ignored
问题:参考图被忽略
Fix with:
- reduce the number of references
- assign one explicit role per reference
- state what must be preserved from the base image
- declare priority if references conflict
解决方案:
- 减少参考图数量
- 每张参考图分配一个明确的作用
- 说明基础图像中必须保留的内容
- 参考图冲突时明确优先级
Problem: image feels generic
问题:图像感觉通用
Fix with:
- add a story moment
- add material specificity
- add camera/lens choices
- add atmosphere/weather/time of day
- replace vague style labels with concrete visual direction
解决方案:
- 加入故事瞬间
- 加入材质细节说明
- 加入相机/镜头选择
- 加入氛围/天气/时间说明
- 把模糊的风格标签替换为具体的视觉指导
Example high-control prompt
高控制度提示词示例
This example shows the general style of prompt that often produces strong results for complex scenes.
md
Create a cinematic fantasy-realist image with a clear focal point, believable lighting, and rich environmental detail.
```json
{
"title": "Bioluminescent Jungle Temple at Dawn",
"style": {
"genre": "fantasy realism",
"visual_aesthetic": ["cinematic", "ultra-detailed", "atmospheric", "mythic"]
},
"scene": {
"location": "ancient jungle temple courtyard",
"environment": "dense tropical rainforest",
"time_of_day": "early dawn",
"weather": "light mist after rainfall"
},
"composition": {
"camera_angle": "low three-quarter angle",
"framing": "wide shot",
"focus": "central altar and lone explorer",
"depth_layers": [
"foreground wet roots and glowing fungi",
"midground broken steps and altar",
"background towering ruins and canopy"
]
},
"lighting": {
"primary_light_source": "soft golden dawn light through the canopy",
"secondary_light_sources": [
"warm lantern glow",
"subtle bioluminescent cyan glow"
],
"mood": "mysterious, sacred, awe-inspiring"
},
"technical_preferences": {
"aspect_ratio": "16:9",
"lens": "24mm cinematic wide-angle",
"realism": "high"
},
"negative_prompt": [
"cartoon style",
"low detail",
"flat lighting",
"text or watermark"
]
}
```这个示例展示了针对复杂场景通常能产出优秀结果的提示词风格。
md
Create a cinematic fantasy-realist image with a clear focal point, believable lighting, and rich environmental detail.
```json
{
"title": "Bioluminescent Jungle Temple at Dawn",
"style": {
"genre": "fantasy realism",
"visual_aesthetic": ["cinematic", "ultra-detailed", "atmospheric", "mythic"]
},
"scene": {
"location": "ancient jungle temple courtyard",
"environment": "dense tropical rainforest",
"time_of_day": "early dawn",
"weather": "light mist after rainfall"
},
"composition": {
"camera_angle": "low three-quarter angle",
"framing": "wide shot",
"focus": "central altar and lone explorer",
"depth_layers": [
"foreground wet roots and glowing fungi",
"midground broken steps and altar",
"background towering ruins and canopy"
]
},
"lighting": {
"primary_light_source": "soft golden dawn light through the canopy",
"secondary_light_sources": [
"warm lantern glow",
"subtle bioluminescent cyan glow"
],
"mood": "mysterious, sacred, awe-inspiring"
},
"technical_preferences": {
"aspect_ratio": "16:9",
"lens": "24mm cinematic wide-angle",
"realism": "high"
},
"negative_prompt": [
"cartoon style",
"low detail",
"flat lighting",
"text or watermark"
]
}
```Final quality checklist
最终质量检查清单
Before you stop, make sure the image has the following:
- a clear focal point
- composition that matches the use case
- lighting that supports the intended mood
- believable materials and textures
- no obvious artifacting or accidental clutter
- text that is spelled correctly if text is present
- aspect ratio that suits the output channel
- prompt file saved for reproducibility if the task is important
结束前确认图像满足以下要求:
- 清晰的焦点
- 符合使用场景的构图
- 匹配预期氛围的光线
- 真实可信的材质和纹理
- 无明显伪影或意外的杂乱内容
- 如果有文字,拼写正确
- 适配输出渠道的宽高比
- 如果任务重要,保存提示词文件方便复现
Minimal agent playbook
极简Agent操作手册
When the user asks for an image:
- infer the deliverable type and aspect ratio
- choose a suitable image model
- write the prompt in a file if the request is non-trivial
- use structured prompting for complex scenes or edits
- generate once
- if the model produces multiple output images, review every single output before deciding which one(s) to use — do not just pick the first image
- inspect what is wrong
- revise only the necessary sections
- prefer prompt improvement over random model hopping
IMPORTANT: state of the art image generation models are expensive. Check with the user before revising images.
The fastest path to better image output is usually clearer direction, cleaner hierarchy, and more deliberate visual language.
用户要求生成图像时:
- 推断交付物类型和宽高比
- 选择合适的图像模型
- 如果需求不简单,把提示词写在文件中
- 复杂场景或编辑使用结构化提示词
- 首次生成
- 如果模型生成了多张输出图像,决定使用哪张前逐一检查所有输出,不要直接选第一张
- 检查存在的问题
- 仅修改必要的部分
- 优先优化提示词,不要随意切换模型
重要提示:前沿的图像生成模型成本很高,修改图像前先和用户确认。
获得更优图像输出的最快路径通常是更清晰的方向、更干净的层级和更有目的性的视觉描述。