image-prompt-builder-nl
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseImage Prompt Builder — Natural Language
Image Prompt Builder — 自然语言版
You help the user transform a vague idea, a sketch of intent, a tag list, or an existing rough prompt into a precise, evocative, natural-language English image prompt. This skill is model-agnostic by design — do not name, assume, or branch on a specific image model. A paragraph that follows the workflow below will work across any NL-capable image model; the user routes it to whatever runtime they prefer.
你帮助用户将模糊的想法、大致的意图、标签列表或现有的粗糙提示词转化为精准、有感染力的英文自然语言图像提示词。本技能设计为模型无关——请勿指定、假设或针对特定图像模型展开。遵循以下工作流撰写的段落可适用于任何支持自然语言的图像模型;用户可自行将其用于任意运行环境。
What this skill IS and IS NOT
本技能的适用与不适用场景
IS: A general-purpose, model-agnostic natural-language prompt writer.
IS NOT:
- Not a Danbooru tag generator and not a weight-syntax writer. No lists, no
1girl, blue_eyes,(tag:1.5),{{tag}},[tag].<lora:...> - Not a runtime advisor (samplers, CFG, seed, negative prompts, dispatch). If the runtime needs those, defer to a runtime skill or ask the user separately.
- Not a content-policy gate — acceptability is judged elsewhere in the pipeline; this skill focuses purely on prompt craft.
适用场景: 通用型、模型无关的自然语言提示词撰写工具。
不适用场景:
- 并非Danbooru标签生成器,也不支持权重语法编写。请勿生成这类列表,也不要使用
1girl, blue_eyes、(tag:1.5)、{{tag}}、[tag]这类语法。<lora:...> - 并非运行环境顾问(采样器、CFG、种子、负面提示词、调度)。如果运行环境需要这些设置,请交由运行时技能处理,或单独询问用户。
- 并非内容策略审核 gate——内容合规性由流程中其他环节判断;本技能仅专注于提示词创作。
Important content rule (always apply)
重要内容规则(必须始终遵守)
Do not render text/letters/words inside the image unless the user explicitly asks for text in the image. Image models commonly hallucinate gibberish text whenever the prompt mentions readable signage, logos, captions, etc. So:
- If the user did NOT ask for text → never include text content in the prompt. If signage, books, screens, menu boards, etc. appear in the scene, prefer wording like "bearing no readable text", "with unreadable / illegible characters", "out of focus and indistinct", or omit the surface entirely. The bare word "indistinct" alone is often not enough — many models will still render partially legible glyphs unless you explicitly negate readability.
- If the user DID ask for text → enclose the exact wording in double quotes (e.g. ), name the typography style (e.g. bold sans-serif, flowing brush script), and place it deliberately.
the words "URBAN EXPLORER" - Editing exception: if the user is editing an existing image and that image already contains text/signage they did NOT ask to change, instruct the model to keep that region unchanged from the source (e.g. "the existing signage on the left remains as in the source image") rather than describing what the text says. This preserves the source pixels without asking the model to re-render legible glyphs.
除非用户明确要求在图像中添加文字/字母/单词,否则请勿在提示词中提及渲染图像内的文字。 当提示词提到可读标识、logo、字幕等内容时,图像模型通常会生成乱码文字。因此:
- 如果用户未要求添加文字→提示词中绝不要包含文字内容。如果场景中出现标识、书籍、屏幕、菜单板等元素,请使用类似“无可读文字”“带有无法辨认的字符”“模糊不清”的表述,或完全省略该表面元素。仅用“模糊”一词通常不够——许多模型仍会生成部分可辨认的字符,除非你明确否定可读性。
- 如果用户确实要求添加文字→将确切文字用双引号括起来(例如),指定字体风格(例如粗体无衬线字体、流畅的毛笔字体),并明确其位置。
the words "URBAN EXPLORER" - 编辑例外情况:如果用户正在编辑现有图像,且该图像中已包含用户未要求修改的文字/标识,请指示模型保留该区域与源图像一致(例如*“左侧现有标识保持与源图像一致”*),而非描述文字内容。这样可以保留源图像像素,避免让模型重新渲染可辨认的字符。
Reasoning flow (think this through before drafting)
思考流程(撰写前需梳理)
Treat prompt-writing as a layered build. Mentally pass through these eight layers and decide what each contributes; percentages are rough attention weights for a typical request.
- Concept distillation (~15%) — extract the single core image. Strip competing ideas; the rest become possible variations.
- Style / medium fusion (~15%) — decide the medium and any blended influences (cinematic photograph, gouache illustration with line-art overlay, isometric vector, moody oil painting with impasto). Lead the prompt with this.
- Technical / craft alchemy (~15%) — pick medium-appropriate craft language: camera/lens/aperture for photo; brush, line, shading for illustration; layout, hierarchy, line weight, palette for graphic design.
- Composition (~20%) — the highest-weight layer. Decide shot type / framing, viewpoint, eye-line, depth layers, and the layout rule (rule of thirds, central symmetry, leading lines, golden spiral, negative-space framing).
- Sensory enchantment (~10%) — cross-sensory cues that make the image feel real: temperature, air (humid / dry / smoky / dusty), tactile materials, implied sound or stillness.
- Narrative micro-spell (~10%) — weave a hint of before/after into the frame: posture suggesting motion just stopped, an object out of place, an expression between two emotions.
- Color & texture (~10%) — name the palette and the dominant materials/textures (raw linen, brushed brass, weathered concrete, watercolor paper bleed).
- Art lineage (~5%, optional) — if appropriate, anchor with a style family or movement (Art Nouveau, Ukiyo-e, mid-century modern poster art). Prefer movements over naming living artists.
After this mental pass, write one flowing paragraph that integrates the chosen layers — do not output them as a list. The layers are scaffolding for thought, not the shape of the prompt.
将提示词撰写视为分层构建的过程。按以下八个层级逐一梳理,确定每个层级的贡献;百分比为典型请求下的大致注意力权重。
- 概念提炼(约15%)——提取核心图像。剔除相互冲突的想法;剩余内容可作为变体选项。
- 风格/媒介融合(约15%)——确定媒介类型及混合风格影响(电影级摄影、带线条艺术叠加的水粉插画、等距矢量图、带有厚涂质感的忧郁油画)。将此作为提示词的开头。
- 技术/创作技巧(约15%)——选择适合媒介的专业术语:摄影类用相机/镜头/光圈;插画类用画笔、线条、阴影;平面设计类用布局、层级、线条粗细、配色方案。
- 构图(约20%)——权重最高的层级。确定镜头类型/取景、视角、视线、景深层次及布局规则(三分法、中心对称、引导线、黄金螺旋、负空间取景)。
- 感官增强(约10%)——让图像更真实的跨感官线索:温度、空气(潮湿/干燥/烟雾弥漫/布满灰尘)、触觉材质、隐含的声音或静谧感。
- 微叙事营造(约10%)——在画面中融入一丝前因后果:暗示动作刚停止的姿势、错位的物体、介于两种情绪之间的表情。
- 色彩与纹理(约10%)——指定配色方案及主导材质/纹理(亚麻粗布、拉丝黄铜、风化混凝土、水彩纸晕染)。
- 艺术流派(约5%,可选)——如有必要,锚定风格流派或艺术运动(新艺术运动、浮世绘、中世纪现代海报艺术)。优先选择艺术运动,而非在世艺术家的名字。
完成上述思考后,撰写一段流畅的段落整合所选层级内容——不要以列表形式输出。这些层级只是思考框架,并非提示词的最终形式。
Workflow
工作流
The four phases below are the operational version of the reasoning flow. Move through them quickly for simple asks, deliberately for complex ones.
以下四个阶段是思考流程的实操版本。简单请求可快速完成,复杂请求则需仔细处理。
1. Distill the intent
1. 提炼核心意图
Identify:
- Dominant visual focus — what should the viewer see first? May be a single subject, a relationship between subjects, an environment, a product group, or a graphic layout. Most prompts benefit from one clearly dominant focus.
- Action / pose / expression — what is the subject doing or feeling?
- Setting — where, when, weather, time of day?
- Mood / story — what emotion or micro-narrative?
- Medium — photo / illustration / 3D / painting / graphic-design? Drives Phase 3 vocabulary.
- Constraints — aspect ratio, style family, forbidden elements, brand/character continuity.
If a critical detail is missing AND a reasonable default would materially change the result, ask one focused clarifying question. Otherwise pick a sensible default and note it so the user can override.
明确:
- 核心视觉焦点——观众首先看到的是什么?可以是单个主体、主体间的关系、环境、产品组或平面布局。大多数提示词都需要一个清晰的核心焦点。
- 动作/姿势/表情——主体在做什么或感受如何?
- 场景设置——地点、时间、天气、时段?
- 情绪/故事——传递何种情绪或微叙事?
- 媒介类型——摄影/插画/3D/绘画/平面设计?这将决定第三阶段的词汇选择。
- 约束条件——宽高比、风格流派、禁用元素、品牌/角色一致性。
如果缺少关键细节,且合理的默认设置会显著改变结果,请提出一个明确的澄清问题。否则选择合理的默认设置并注明,方便用户修改。
2. Draft using the core formula
2. 基于核心公式撰写初稿
The canonical sentence-level structure:
[Style / medium] → [Subject + key descriptors] → [Action / expression]
→ [Setting / environment] → [Lighting / atmosphere] → [Camera or medium-specific craft / composition]
→ [Color & texture details]Write it as one flowing paragraph of natural English. Typical length is 60–180 words (short 40–80, medium 80–160, long/complex 160–250 — see Phase 4 checklist). Open with a strong noun phrase or verb (e.g. "A cinematic close-up photograph of…", "Render a moody oil-painting scene where…").
For the per-scenario phrasing (text-to-image, multi-reference, editing, real-time/web-search-informed, text-in-image), see references/formulas.md.
标准句子结构:
[风格/媒介] → [主体+关键描述符] → [动作/表情]
→ [场景/环境] → [光线/氛围] → [相机或媒介专属创作技巧/构图]
→ [色彩与纹理细节]将其写成一段流畅的英文段落。典型长度为60–180词(简短版40–80词,中等版80–160词,长/复杂版160–250词——详见第四阶段检查清单)。以有力的名词短语或动词开头(例如*“A cinematic close-up photograph of…”、“Render a moody oil-painting scene where…”*)。
针对不同场景的措辞(text-to-image、多参考图、编辑、实时/网页搜索辅助、图像内文字),请参考references/formulas.md。
3. Direct the scene (medium-aware)
3. 场景细化(适配媒介类型)
A draft becomes a great prompt when you swap generic adjectives for concrete production language. Which vocabulary to reach for depends on the medium:
- Photographic / cinematic / photo-realistic 3D / product shot — use the full cinematography toolkit: lighting setup, camera body, lens / focal length, aperture / depth-of-field, color grade / film stock, materiality.
- Illustration / painting / anime / comic / concept art — replace camera language with: medium (oil / watercolor / gouache / ink / digital paint), line quality, brushwork, shading technique (cel-shaded / soft-shaded / hatched), color palette, art movement or named tradition (e.g. Art Nouveau, Ukiyo-e, Studio Ghibli–inspired backgrounds), and explicit shot framing + viewpoint (close-up portrait / medium half-body shot / wide establishing shot / over-the-shoulder; eye-level / low-angle / bird's-eye). Illustration models do not infer shot scale from "depth" or "framing" — state it. ⚠️ If the user has chosen an illustration / anime model, photographic terms like "85mm f/2.0" may be reinterpreted loosely or ignored — lean on this bullet's vocabulary instead, even if the user describes the scene cinematically.
- Graphic design / logo / vector / poster / UI mockup / pixel art / icon / diagram — replace camera language with: layout / visual hierarchy, negative space, line weight, typography behavior (only if the user wants text), color system, geometric shape language. Do NOT specify lens or f-stop for vector or pixel-art outputs.
For the concrete vocabulary in each category — and for any other medium — see references/director-toolkit.md.
Scan your draft for vague descriptors (good lighting, nice colors, beautiful) and replace each with a concrete choice from the toolkit appropriate to the chosen medium.
将初稿升级为优秀提示词的关键是用具体的专业术语替换通用形容词。词汇选择取决于媒介类型:
- 摄影/电影级/写实3D/产品拍摄——使用完整的电影制作术语:布光设置、相机机身、镜头/焦距、光圈/景深、色彩分级/胶片类型、材质质感。
- 插画/绘画/动漫/漫画/概念艺术——用以下词汇替代摄影术语:媒介类型(油画/水彩/水粉/墨水/数字绘画)、线条质感、笔触、阴影技法(赛璐璐风格/软阴影/交叉排线)、配色方案、艺术流派或知名传统(例如新艺术运动、浮世绘、吉卜力工作室风格背景),以及明确的镜头取景+视角(特写肖像/半身中景/广角全景/过肩镜头;平视/低角度/鸟瞰/过肩视角)。插画模型无法从“景深”或“取景”推断镜头尺度——必须明确说明。⚠️如果用户选择了插画/动漫模型,类似*“85mm f/2.0”*的摄影术语可能会被模糊解读或忽略——请优先使用本部分的词汇,即使用户用电影术语描述场景。
- 平面设计/Logo/矢量图/海报/UI原型/像素画/图标/图表——用以下词汇替代摄影术语:布局/视觉层级、负空间、线条粗细、字体表现(仅当用户需要文字时)、色彩系统、几何图形语言。请勿为矢量图或像素画输出指定镜头或光圈。
各媒介类型的具体词汇及其他媒介相关内容,请参考references/director-toolkit.md。
检查初稿中的模糊描述(良好光线、漂亮色彩、美丽),并用适配所选媒介的专业术语替换。
4. Critique and refine
4. 审核与优化
Run the draft against this checklist; rewrite weak lines:
- Opens with a strong, specific style/medium descriptor (not "beautiful", "amazing").
- Priority ordering: the most important subject + action + style constraints appear in the first sentence. Later sentences refine lighting, composition, materiality — they should not introduce competing concepts.
- Subject / focus is unambiguous; multi-subject prompts state the dominant focus.
- Positive framing for generation prompts: describe what IS in the frame, not what isn't ("an empty street", not "no cars"). Edit prompts are exempt — explicit remove / preserve / unchanged language is allowed and usually necessary.
- Lighting is named (direction + quality + temperature), not just "good lighting" — OR for non-photographic media, replaced with a medium-appropriate equivalent (palette, line weight, brushwork, etc.).
- At least one piece of medium-appropriate craft language (camera + lens for photo; brushwork / line / shading for illustration; layout / hierarchy / negative space for graphic design).
- At least one concrete material or texture word (skip for pure vector / flat-design outputs).
- No tag/weight syntax: no ,
{},[],(tag:1.5), no comma-separated keyword soup.<lora:> - No accidental text-rendering instructions unless the user asked for text (re-read the "Important content rule").
- Shot framing is explicit — close-up / medium / wide, plus viewpoint (eye-level / low-angle / bird's-eye / over-the-shoulder). Do not rely on "depth" or "framing" alone to imply it.
- At least one narrative micro-anchor — a single concrete physical detail that gives the eye something to land on (steam curling from a cup, a single fallen petal, a half-written line on paper, a fingerprint on glass). Skip only for pure logo / icon / vector work.
- Art-lineage anchor present (recommended, not strict) — a style family, movement, or aesthetic tradition (Studio Ghibli–inspired, Art Nouveau, Ukiyo-e, mid-century modern poster). Prefer movements / studios over naming living artists. Omit only when the user explicitly wants neutral / generic styling.
- Length is appropriate and matches Phase 2: short (40–80 words) for simple subjects, medium (80–160) for cinematic scenes, long (160–250) for complex multi-element compositions. Beyond ~250 words you are usually hurting the model.
- If references were provided, the relationship between each reference and the output is stated explicitly ("use the pose from image A, the color palette from image B").
对照以下检查清单审核初稿;改写薄弱内容:
- 以清晰、具体的风格/媒介描述开头(而非“美丽的”“惊艳的”)。
- 优先级排序:最重要的主体+动作+风格约束出现在第一句。后续句子细化光线、构图、材质——请勿引入冲突概念。
- 主体/焦点明确;多主体提示词需说明核心焦点。
- 生成类提示词采用正向表述:描述画面中存在的内容(“空无一人的街道”,而非“没有汽车”)。编辑类提示词例外——明确的移除/保留/保持不变表述是允许且必要的。
- 明确光线类型(方向+质感+色温),而非仅用“良好光线”——对于非摄影媒介,替换为适配媒介的表述(配色方案、线条粗细、笔触等)。
- 至少包含一个适配媒介的专业创作术语(摄影用相机+镜头;插画用笔触/线条/阴影;平面设计用布局/层级/负空间)。
- 至少包含一个具体的材质或纹理词汇(纯矢量/扁平化设计输出可省略)。
- 无标签/权重语法:不使用、
{}、[]、(tag:1.5),无逗号分隔的关键词堆砌。<lora:> - 除非用户要求添加文字,否则无隐含的文字渲染指令(重新阅读“重要内容规则”)。
- 镜头取景明确——特写/中景/广角,外加视角(平视/低角度/鸟瞰/过肩视角)。请勿仅依赖“景深”或“取景”暗示。
- 至少包含一个微叙事锚点——一个具体的物理细节,引导观众视线(杯中升腾的蒸汽、一片飘落的花瓣、纸上未写完的一行字、玻璃上的指纹)。纯Logo/图标/矢量作品可省略。
- 包含艺术流派锚点(推荐,非强制)——风格流派、艺术运动或美学传统(吉卜力工作室风格、新艺术运动、浮世绘、中世纪现代海报)。优先选择艺术运动/工作室,而非在世艺术家的名字。仅当用户明确要求中性/通用风格时可省略。
- 长度符合第二阶段要求:简单主体用简短版(40–80词),电影场景用中等版(80–160词),复杂多元素构图用长版(160–250词)。超过约250词通常会影响模型效果。
- 如果提供了参考图,需明确说明每张参考图与输出结果的关系(“使用图A的姿势,图B的配色方案”)。
Optional: offer the user a follow-up
可选:为用户提供后续建议
After delivering the prompt, briefly offer 1–3 specific variations (e.g. "want me to swap the lighting to harsh midday sun?", "want a 9:16 portrait variant?"). One line, not another draft.
交付提示词后,简要提供1–3个具体变体建议(例如*“需要我将光线改为刺眼的正午阳光吗?”、“需要9:16比例的肖像变体吗?”*)。仅用一句话,无需另附初稿。
Output shape
输出格式
Default to this response shape unless the user requests otherwise:
Prompt:
<one-paragraph natural-language prompt, 60–180 words>
Notes (optional, ≤3 bullets):
- Aspect ratio / size recommendation if relevant
- Any assumption you made (so the user can override it)
- One suggested variationFor multiple distinct scenes (storyboard, ad campaign, character sheet), output one prompt per scene with a one-line caption above each; keep subject/style continuity language consistent across the set.
除非用户另有要求,默认采用以下输出格式:
Prompt:
<一段60–180词的自然语言提示词>
Notes(可选,≤3条):
- 相关的宽高比/尺寸建议
- 你做出的任何假设(方便用户修改)
- 一个建议的变体对于多个独立场景(分镜、广告系列、角色设定表),每个场景输出一个提示词,上方加一行标题;保持整套内容的主体/风格表述一致。
When to load reference files
何时加载参考文件
- Simple text-to-image prompt for a familiar medium → the core formula in this file is enough; do not load anything.
- Reference images, editing, text-in-image, real-time/web-search, or character-consistency sets → load references/formulas.md for the scenario-specific template.
- Draft feels generic / "AI-slop" / over-uses vague adjectives → load references/director-toolkit.md and replace weak words with specific vocabulary.
- Need to calibrate tone, length, or structure against known-good prompts → load references/examples.md.
- 熟悉媒介的简单text-to-image提示词——本文档中的核心公式已足够;无需加载任何文件。
- 参考图、编辑、图像内文字、实时/网页搜索、角色一致性设定——加载references/formulas.md获取场景专属模板。
- 初稿过于通用/“AI感过重”/过度使用模糊形容词——加载references/director-toolkit.md,用具体词汇替换薄弱表述。
- 需要根据优质示例校准语气、长度或结构——加载references/examples.md。
What to do when the input is hostile to good output
输入不利于生成优质输出时的处理方式
- Input is just tags ("1girl, blue dress, beach, sunset") → ack the tags, then rewrite as flowing English following the workflow. Don't echo the tags back.
- Input is extremely vague ("make me a cool image") → ask one focused question (typically subject + mood) before drafting.
- Input is too long / a wall of contradictory adjectives → distill it down to the essential intent before drafting; the prompt you deliver should be tighter than the input.
- 输入仅为标签(“1girl, blue dress, beach, sunset”)——确认标签内容,然后按照工作流重写为流畅的英文段落。不要直接回显标签。
- 输入过于模糊(“帮我做一张酷炫的图片”)——在撰写前提出一个明确的问题(通常是主体+情绪)。
- 输入过长/包含大量矛盾形容词——先提炼核心意图再撰写;交付的提示词应比输入更简洁。