prompt-optimizer
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePrompt Optimizer
Prompt优化器
Optimize prompts for agents, system/developer instructions, and reusable prompt templates.
Treat prompt work as an eval-driven workflow, not wordsmithing.
Load only the references you need:
| Task | Read |
|---|---|
| Create a new agent prompt | |
| Refine an existing prompt | |
| Port a prompt between model families | |
| Diagnose repeated prompt failures | |
| Explain the provenance behind this workflow | |
为Agent优化Prompt,适用于系统/开发者指令及可复用Prompt模板。
将Prompt工作视为由评估(eval)驱动的工作流,而非单纯的文字润色。
仅加载所需参考资料:
| 任务 | 参考资料 |
|---|---|
| 创建新的Agent Prompt | |
| 优化现有Prompt | |
| 在不同模型家族(model family)间迁移Prompt | |
| 诊断重复出现的Prompt故障 | |
| 解释此工作流的来源 | |
Step 1: Define the prompt contract
Step 1: 定义Prompt契约
- Determine whether the task is:
- creating a new prompt
- refining an existing prompt
- porting a prompt between model families
- debugging prompt failures
- Capture the contract before rewriting anything:
- target model family and snapshot if known
- prompt surface: ,
system,developer, tool descriptions, examples, schemasuser - task objective and non-goals
- inputs, context, and tools available to the agent
- required output shape
- success criteria
- known failures
- hard constraints: latency, verbosity, safety, budget, tool use, style
-
If the user does not provide success criteria or examples, build a small eval set before editing the prompt.
-
If the real bottleneck is model choice, missing retrieval, weak tool schemas, or a missing eval harness, say so. Do not keep rewriting prompt text when the failure is elsewhere.
- 确定任务类型:
- 创建新Prompt
- 优化现有Prompt
- 在不同模型家族间迁移Prompt
- 调试Prompt故障
- 在重写前先明确契约:
- 目标模型家族及已知的模型快照
- Prompt层面:、
system、developer、工具描述、示例、模式(schemas)user - 任务目标与非目标
- Agent可获取的输入、上下文及工具
- 要求的输出格式
- 成功标准
- 已知故障
- 硬性约束:延迟、冗余度、安全性、预算、工具使用、风格
-
如果用户未提供成功标准或示例,在编辑Prompt前先构建一个小型评估集(eval set)。
-
如果真正的瓶颈是模型选择、缺失的检索能力、薄弱的工具模式或缺失的评估框架,需明确告知。当故障根源不在Prompt文本时,不要一味重写Prompt。
Step 2: Choose the model strategy
Step 2: 选择模型策略
Read .
references/model-family-notes.md- If the target family is known, optimize specifically for that family.
- If the target family is unknown, write:
- a portable base prompt
- short adapter notes for the likely target families
- Do not pretend one prompt is universal when the behavior clearly depends on model family.
- Pin model snapshots when the surrounding system supports it.
阅读。
references/model-family-notes.md- 若已知目标模型家族,针对性进行优化。
- 若未知目标模型家族,编写:
- 可移植的基础Prompt
- 针对潜在目标模型家族的简短适配说明
- 当行为明显依赖模型家族时,不要宣称单一Prompt具有通用性。
- 若周边系统支持,固定模型快照。
Step 3: Shape the prompt deliberately
Step 3: 精心设计Prompt结构
Read .
references/core-patterns.md- Separate durable behavior from task-local context:
- stable policy and behavioral defaults belong in or
systemdeveloper - variable inputs, retrieved context, and task instances belong in templated user-facing sections
- Keep one authoritative instruction per behavior:
- if a rule appears in more than one layer, choose one owner for it
- stable cross-task rules belong in or
systemdeveloper - examples should teach format, edge-case handling, or tool behavior, not restate the whole policy
- user payloads should carry task-local facts, not durable policy
- Use markers only when they reduce ambiguity:
- use markdown headings or XML-style tags to separate instructions, context, examples, tool rules, and output contracts
- keep tag names descriptive and consistent
- do not wrap every sentence in markup
- Make the prompt easy to execute:
- put one high-value behavior per bullet or line when the task is fragile
- prefer positive instructions over "do not do X" lists
- place tool-use rules, escalation boundaries, and stop conditions in explicit sections
- keep persona light unless it changes behavior in a useful way
- use the shortest wording that preserves the intended behavioral constraint
- cut motivational filler, repeated reminders, and examples that do not improve evals
- for long-context prompts, place evidence before the final query and keep the actual ask in a clear terminal section
- keep instructions, evidence, and schemas in distinct blocks so the model does not have to infer what is policy versus data
- Treat examples as first-class prompt assets:
- start simple before adding examples
- add examples only when they improve format control, edge-case handling, or tool behavior
- keep examples structurally consistent
- prefer positive demonstrations over anti-pattern-only demonstrations
阅读。
references/core-patterns.md- 区分持久性行为与任务本地上下文:
- 稳定策略和行为默认值应放在或
system部分developer - 可变输入、检索到的上下文及任务实例应放在模板化的用户面向部分
- 每个行为仅保留一个权威指令:
- 若某条规则出现在多个层级中,为其选择一个归属层级
- 跨任务的稳定规则应放在或
system部分developer - 示例应教授格式、边缘情况处理或工具行为,而非重复整个策略
- 用户负载应携带任务本地事实,而非持久策略
- 仅在能减少歧义时使用标记:
- 使用Markdown标题或XML风格标签分隔指令、上下文、示例、工具规则和输出契约
- 保持标签名称描述性且一致
- 不要给每个句子都添加标记
- 让Prompt易于执行:
- 当任务易出错时,每个项目符号或行对应一个高价值行为
- 优先使用正面指令,而非“不要做X”的列表
- 将工具使用规则、升级边界和停止条件放在明确的章节中
- 除非能切实改变行为,否则尽量简化角色设定
- 使用能保留预期行为约束的最短措辞
- 删除激励性套话、重复提醒及无法提升评估效果的示例
- 对于长上下文Prompt,将证据放在最终查询之前,并将实际请求放在清晰的终端章节中
- 将指令、证据和模式放在不同区块,避免模型需要推断哪些是策略、哪些是数据
- 将示例视为Prompt的核心资产:
- 在添加示例前先从简单版本开始
- 仅当示例能改善格式控制、边缘情况处理或工具行为时才添加
- 保持示例结构一致
- 优先使用正面演示,而非仅展示反模式
Step 4: Run the meta optimization loop
Step 4: 运行元优化循环
Read .
references/meta-optimization-loop.md- Start with the current prompt or a simple first draft.
- Score it on a representative slice:
- at least one happy-path case
- at least one failure replay
- at least one ambiguous case
- at least one edge case
- at least one "should refuse", "should ask", or "should defer" case when relevant
- Turn failures into explicit criticisms:
- identify what the prompt under-specified, over-specified, or contradicted
- write critiques as actionable edits, not vague complaints
- Generate a small beam of candidate prompts:
- one minimal-diff repair
- one structure-first rewrite
- one example- or tool-rule-centered variant when that is the likely bottleneck
- one provider-specific adapter when cross-model behavior is the issue
- Compare candidates on the same eval slice.
- Keep the best candidate and log what changed and why.
- Preserve the evidence for each round:
- prompt version
- eval case
- model output
- failure reason
- relevant scores
- Test the winner on a holdout slice before finalizing.
- Stop when scores plateau, edits oscillate, cost rises without quality gain, or the remaining issue is outside prompt control.
Keep edits minimal and causal. Record what you removed as well as what you added. If you change everything at once, you learn nothing about what actually helped.
阅读。
references/meta-optimization-loop.md- 从当前Prompt或简单初稿开始。
- 在代表性样本上评分:
- 至少一个正常路径案例
- 至少一个故障复现案例
- 至少一个模糊案例
- 至少一个边缘案例
- 相关情况下至少一个“应拒绝”“应询问”或“应 defer”的案例
- 将故障转化为明确的改进意见:
- 识别Prompt中未明确、过度明确或矛盾的内容
- 将批评转化为可操作的修改,而非模糊的抱怨
- 生成少量候选Prompt:
- 一个最小差异修复版本
- 一个优先调整结构的重写版本
- 当瓶颈可能在于示例或工具规则时,一个以示例或工具规则为核心的变体
- 当跨模型行为存在问题时,一个针对特定提供商的适配版本
- 在同一评估样本上比较候选版本。
- 保留最佳候选版本,并记录修改内容及原因。
- 保留每一轮的证据:
- Prompt版本
- 评估案例
- 模型输出
- 故障原因
- 相关评分
- 在最终确定前,在预留样本上测试获胜版本。
- 当评分进入平台期、修改出现反复、成本上升但质量未提升,或剩余问题超出Prompt控制范围时停止优化。
保持修改最小化且具有因果性。记录删除的内容和添加的内容。如果一次性修改所有内容,你将无法得知真正有效的改进点。
Step 5: Produce a reusable deliverable
Step 5: 生成可复用交付物
Return:
TargetSuccess CriteriaOptimized PromptAdapter NotesEval SetOptimization LogResidual Risks
If the user supplied an existing prompt, include a concise diff-style explanation of the biggest behavioral changes.
返回以下内容:
- (目标)
Target - (成功标准)
Success Criteria - (优化后的Prompt)
Optimized Prompt - (适配说明)
Adapter Notes - (评估集)
Eval Set - (优化日志)
Optimization Log - (剩余风险)
Residual Risks
如果用户提供了现有Prompt,需包含简洁的差异式说明,阐述最大的行为变化。
Step 6: Guard against common failure modes
Step 6: 防范常见故障模式
Read when the task is ambiguous or the first draft is weak.
references/transformed-examples.mdDo not:
- optimize wording before defining the eval target
- mix instructions, examples, and raw context without boundaries
- keep the same rule in multiple layers unless there is a proven reason
- let stable rules drift into the user payload just because the current prompt template makes it convenient
- ask reasoning models to reveal chain-of-thought just because the task is hard
- keep contradictory legacy instructions in the same prompt
- overfit to one or two examples
- keep examples that do not improve measured behavior
- solve tool-use failures only in the system prompt when the real problem is the tool description or schema
- add markers everywhere and mistake structure for clarity
- use a bloated persona as a substitute for concrete behavior rules
当任务模糊或初稿效果不佳时,阅读。
references/transformed-examples.md请勿:
- 在定义评估目标前优化措辞
- 无边界地混合指令、示例和原始上下文
- 除非有确凿理由,否则不要在多个层级保留相同规则
- 仅因当前Prompt模板方便,就让稳定规则流入用户负载
- 仅因任务难度大,就要求推理模型展示思维链(chain-of-thought)
- 在同一Prompt中保留矛盾的遗留指令
- 过度拟合一两个示例
- 保留无法提升可衡量行为的示例
- 当真正问题在于工具描述或模式时,仅在系统Prompt中解决工具使用故障
- 到处添加标记,误将结构当作清晰性
- 使用臃肿的角色设定替代具体的行为规则
Output standard
输出标准
The final prompt package should be reusable by another engineer without rediscovering:
- what the prompt is for
- which model family it targets
- how success is measured
- what changed during optimization
- which risks remain open
最终的Prompt包应能被其他工程师直接复用,无需重新探索:
- Prompt的用途
- 针对的模型家族
- 如何衡量成功
- 优化过程中的变更内容
- 仍存在的风险