prompt-optimizer

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Prompt Optimizer

Prompt优化器

Optimize prompts for agents, system/developer instructions, and reusable prompt templates. Treat prompt work as an eval-driven workflow, not wordsmithing.

Load only the references you need:

Task	Read
Create a new agent prompt	`references/core-patterns.md` , `references/model-family-notes.md` , `references/transformed-examples.md`
Refine an existing prompt	`references/meta-optimization-loop.md` , `references/core-patterns.md` , `references/model-family-notes.md` , `references/transformed-examples.md`
Port a prompt between model families	`references/model-family-notes.md` , `references/core-patterns.md`
Diagnose repeated prompt failures	`references/meta-optimization-loop.md` , `references/core-patterns.md`
Explain the provenance behind this workflow	`SOURCES.md`

为Agent优化Prompt，适用于系统/开发者指令及可复用Prompt模板。将Prompt工作视为由评估（eval）驱动的工作流，而非单纯的文字润色。

仅加载所需参考资料：

任务	参考资料
创建新的Agent Prompt	`references/core-patterns.md` , `references/model-family-notes.md` , `references/transformed-examples.md`
优化现有Prompt	`references/meta-optimization-loop.md` , `references/core-patterns.md` , `references/model-family-notes.md` , `references/transformed-examples.md`
在不同模型家族（model family）间迁移Prompt	`references/model-family-notes.md` , `references/core-patterns.md`
诊断重复出现的Prompt故障	`references/meta-optimization-loop.md` , `references/core-patterns.md`
解释此工作流的来源	`SOURCES.md`

Step 1: Define the prompt contract

Step 1: 定义Prompt契约

Determine whether the task is:

creating a new prompt
refining an existing prompt
porting a prompt between model families
debugging prompt failures

Capture the contract before rewriting anything:

target model family and snapshot if known
prompt surface:
```
system
```
,
```
developer
```
,
```
user
```
, tool descriptions, examples, schemas
task objective and non-goals
inputs, context, and tools available to the agent
required output shape
success criteria
known failures
hard constraints: latency, verbosity, safety, budget, tool use, style

If the user does not provide success criteria or examples, build a small eval set before editing the prompt.
If the real bottleneck is model choice, missing retrieval, weak tool schemas, or a missing eval harness, say so. Do not keep rewriting prompt text when the failure is elsewhere.

确定任务类型：

创建新Prompt
优化现有Prompt
在不同模型家族间迁移Prompt
调试Prompt故障

在重写前先明确契约：

目标模型家族及已知的模型快照
Prompt层面：
```
system
```
、
```
developer
```
、
```
user
```
、工具描述、示例、模式（schemas）
任务目标与非目标
Agent可获取的输入、上下文及工具
要求的输出格式
成功标准
已知故障
硬性约束：延迟、冗余度、安全性、预算、工具使用、风格

如果用户未提供成功标准或示例，在编辑Prompt前先构建一个小型评估集（eval set）。
如果真正的瓶颈是模型选择、缺失的检索能力、薄弱的工具模式或缺失的评估框架，需明确告知。当故障根源不在Prompt文本时，不要一味重写Prompt。

Step 2: Choose the model strategy

Step 2: 选择模型策略

Read

references/model-family-notes.md

If the target family is known, optimize specifically for that family.
If the target family is unknown, write:

a portable base prompt
short adapter notes for the likely target families

Do not pretend one prompt is universal when the behavior clearly depends on model family.
Pin model snapshots when the surrounding system supports it.

阅读

references/model-family-notes.md

。

若已知目标模型家族，针对性进行优化。
若未知目标模型家族，编写：

可移植的基础Prompt
针对潜在目标模型家族的简短适配说明

当行为明显依赖模型家族时，不要宣称单一Prompt具有通用性。
若周边系统支持，固定模型快照。

Step 3: Shape the prompt deliberately

Step 3: 精心设计Prompt结构

Read

references/core-patterns.md

Separate durable behavior from task-local context:

stable policy and behavioral defaults belong in
```
system
```
or
```
developer
```
variable inputs, retrieved context, and task instances belong in templated user-facing sections

Keep one authoritative instruction per behavior:

if a rule appears in more than one layer, choose one owner for it
stable cross-task rules belong in
```
system
```
or
```
developer
```
examples should teach format, edge-case handling, or tool behavior, not restate the whole policy
user payloads should carry task-local facts, not durable policy

Use markers only when they reduce ambiguity:

use markdown headings or XML-style tags to separate instructions, context, examples, tool rules, and output contracts
keep tag names descriptive and consistent
do not wrap every sentence in markup

Make the prompt easy to execute:

put one high-value behavior per bullet or line when the task is fragile
prefer positive instructions over "do not do X" lists
place tool-use rules, escalation boundaries, and stop conditions in explicit sections
keep persona light unless it changes behavior in a useful way
use the shortest wording that preserves the intended behavioral constraint
cut motivational filler, repeated reminders, and examples that do not improve evals
for long-context prompts, place evidence before the final query and keep the actual ask in a clear terminal section
keep instructions, evidence, and schemas in distinct blocks so the model does not have to infer what is policy versus data

Treat examples as first-class prompt assets:

start simple before adding examples
add examples only when they improve format control, edge-case handling, or tool behavior
keep examples structurally consistent
prefer positive demonstrations over anti-pattern-only demonstrations

阅读

references/core-patterns.md

。

区分持久性行为与任务本地上下文：

稳定策略和行为默认值应放在
```
system
```
或
```
developer
```
部分
可变输入、检索到的上下文及任务实例应放在模板化的用户面向部分

每个行为仅保留一个权威指令：

若某条规则出现在多个层级中，为其选择一个归属层级
跨任务的稳定规则应放在
```
system
```
或
```
developer
```
部分
示例应教授格式、边缘情况处理或工具行为，而非重复整个策略
用户负载应携带任务本地事实，而非持久策略

仅在能减少歧义时使用标记：

使用Markdown标题或XML风格标签分隔指令、上下文、示例、工具规则和输出契约
保持标签名称描述性且一致
不要给每个句子都添加标记

让Prompt易于执行：

当任务易出错时，每个项目符号或行对应一个高价值行为
优先使用正面指令，而非“不要做X”的列表
将工具使用规则、升级边界和停止条件放在明确的章节中
除非能切实改变行为，否则尽量简化角色设定
使用能保留预期行为约束的最短措辞
删除激励性套话、重复提醒及无法提升评估效果的示例
对于长上下文Prompt，将证据放在最终查询之前，并将实际请求放在清晰的终端章节中
将指令、证据和模式放在不同区块，避免模型需要推断哪些是策略、哪些是数据

将示例视为Prompt的核心资产：

在添加示例前先从简单版本开始
仅当示例能改善格式控制、边缘情况处理或工具行为时才添加
保持示例结构一致
优先使用正面演示，而非仅展示反模式

Step 4: Run the meta optimization loop

Step 4: 运行元优化循环

Read

references/meta-optimization-loop.md

Start with the current prompt or a simple first draft.
Score it on a representative slice:

at least one happy-path case
at least one failure replay
at least one ambiguous case
at least one edge case
at least one "should refuse", "should ask", or "should defer" case when relevant

Turn failures into explicit criticisms:

identify what the prompt under-specified, over-specified, or contradicted
write critiques as actionable edits, not vague complaints

Generate a small beam of candidate prompts:

one minimal-diff repair
one structure-first rewrite
one example- or tool-rule-centered variant when that is the likely bottleneck
one provider-specific adapter when cross-model behavior is the issue

Compare candidates on the same eval slice.
Keep the best candidate and log what changed and why.
Preserve the evidence for each round:

prompt version
eval case
model output
failure reason
relevant scores

Test the winner on a holdout slice before finalizing.
Stop when scores plateau, edits oscillate, cost rises without quality gain, or the remaining issue is outside prompt control.

Keep edits minimal and causal. Record what you removed as well as what you added. If you change everything at once, you learn nothing about what actually helped.

阅读

references/meta-optimization-loop.md

。

从当前Prompt或简单初稿开始。
在代表性样本上评分：

至少一个正常路径案例
至少一个故障复现案例
至少一个模糊案例
至少一个边缘案例
相关情况下至少一个“应拒绝”“应询问”或“应 defer”的案例

将故障转化为明确的改进意见：

识别Prompt中未明确、过度明确或矛盾的内容
将批评转化为可操作的修改，而非模糊的抱怨

生成少量候选Prompt：

一个最小差异修复版本
一个优先调整结构的重写版本
当瓶颈可能在于示例或工具规则时，一个以示例或工具规则为核心的变体
当跨模型行为存在问题时，一个针对特定提供商的适配版本

在同一评估样本上比较候选版本。
保留最佳候选版本，并记录修改内容及原因。
保留每一轮的证据：

Prompt版本
评估案例
模型输出
故障原因
相关评分

在最终确定前，在预留样本上测试获胜版本。
当评分进入平台期、修改出现反复、成本上升但质量未提升，或剩余问题超出Prompt控制范围时停止优化。

保持修改最小化且具有因果性。记录删除的内容和添加的内容。如果一次性修改所有内容，你将无法得知真正有效的改进点。

Step 5: Produce a reusable deliverable

Step 5: 生成可复用交付物

Return:

```
Target
```
```
Success Criteria
```
```
Optimized Prompt
```
```
Adapter Notes
```
```
Eval Set
```
```
Optimization Log
```
```
Residual Risks
```

If the user supplied an existing prompt, include a concise diff-style explanation of the biggest behavioral changes.

返回以下内容：

```
Target
```
（目标）
```
Success Criteria
```
（成功标准）
```
Optimized Prompt
```
（优化后的Prompt）
```
Adapter Notes
```
（适配说明）
```
Eval Set
```
（评估集）
```
Optimization Log
```
（优化日志）
```
Residual Risks
```
（剩余风险）

如果用户提供了现有Prompt，需包含简洁的差异式说明，阐述最大的行为变化。

Step 6: Guard against common failure modes

Step 6: 防范常见故障模式

Read

references/transformed-examples.md

when the task is ambiguous or the first draft is weak.

Do not:

optimize wording before defining the eval target
mix instructions, examples, and raw context without boundaries
keep the same rule in multiple layers unless there is a proven reason
let stable rules drift into the user payload just because the current prompt template makes it convenient
ask reasoning models to reveal chain-of-thought just because the task is hard
keep contradictory legacy instructions in the same prompt
overfit to one or two examples
keep examples that do not improve measured behavior
solve tool-use failures only in the system prompt when the real problem is the tool description or schema
add markers everywhere and mistake structure for clarity
use a bloated persona as a substitute for concrete behavior rules

当任务模糊或初稿效果不佳时，阅读

references/transformed-examples.md

。

请勿：

在定义评估目标前优化措辞
无边界地混合指令、示例和原始上下文
除非有确凿理由，否则不要在多个层级保留相同规则
仅因当前Prompt模板方便，就让稳定规则流入用户负载
仅因任务难度大，就要求推理模型展示思维链（chain-of-thought）
在同一Prompt中保留矛盾的遗留指令
过度拟合一两个示例
保留无法提升可衡量行为的示例
当真正问题在于工具描述或模式时，仅在系统Prompt中解决工具使用故障
到处添加标记，误将结构当作清晰性
使用臃肿的角色设定替代具体的行为规则

Output standard

输出标准

The final prompt package should be reusable by another engineer without rediscovering:

what the prompt is for
which model family it targets
how success is measured
what changed during optimization
which risks remain open

最终的Prompt包应能被其他工程师直接复用，无需重新探索：

Prompt的用途
针对的模型家族
如何衡量成功
优化过程中的变更内容
仍存在的风险