writing-system-prompts
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseWriting LLM System Prompts
编写LLM系统提示词
A well-built system prompt steers behavior across the whole conversation and — if structured for KV-cache reuse — costs a fraction of an uncached prefix on providers that offer prompt caching (Anthropic, OpenAI, Bedrock, Vertex, etc.).
一个构建良好的系统提示词可以引导整个对话的行为,并且如果按照KV-cache复用的结构设计,在支持提示词缓存的服务商(如Anthropic、OpenAI、Bedrock、Vertex等)处,其成本仅为未缓存前缀的一小部分。
Rules
规则
-
Put the role first, in one sentence. A single descriptive sentence in the system field (not the user turn) anchors tone and vocabulary, e.g.Plain prose at the top — do not wrap in markup.
You are a helpful coding assistant specializing in Python. -
Order static-first, dynamic-last. Cache hits require a byte-identical prefix. Sequence: role → tools/policies → long reference docs / few-shot examples → cache breakpoint → dynamic context → user turn. Why: one changed token before the breakpoint drops you from cached-read pricing (~10% of input) to full input cost.
-
Mark cache breakpoints explicitly when the provider supports it. Most caching is opt-in (e.g. Anthropic, Bedrock cachePoint). Place a breakpoint at the end of each stable block to reuse. Cache writes typically cost more than base input; reads a fraction of it. TTLs vary (commonly ~5 min, sometimes longer tiers). Only cache prefixes you'll reuse within the TTL — sparse traffic pays the write penalty repeatedly.
cache_controlpython# Anthropic example — adapt to your provider's syntax system=[ {"type": "text", "text": ROLE_AND_POLICIES}, {"type": "text", "text": LONG_REFERENCE_DOC, "cache_control": {"type": "ephemeral"}}, ] -
Never put churning content at the prefix. Current time, request ID, user ID, or conversation summary at the top defeats caching for the whole prompt. Push them into the user turn or a post-breakpoint segment.
-
Use structural delimiters for distinct sections. Wrap longform inputs and behavioral directives in clear delimiters — XML tags (,
<documents><document>…</document></documents>), markdown headers, or named JSON fields. Why: delimited sections survive long contexts better than walls of prose, and several model families (notably Claude) are explicitly trained on XML structure.<default_to_action>…</default_to_action> -
Place longform data above instructions in the user turn. For 20k+ token inputs, docs at the top, question at the bottom — measured up to 30% quality lift on multi-document tasks across major models.
-
Write literal, scoped instructions. Modern instruction-tuned models interpret prompts increasingly literally and won't silently generalize. If a rule applies broadly, say so:Avoid ALL-CAPS shouting (
Apply this formatting to every section, not just the first one.,CRITICAL) — it causes over-triggering on recent Claude models and adds little on others; use normal imperative voice.MUST -
Tune verbosity, tone, tool use, and subagent spawning only when the default is wrong. Examples:/
Provide concise, focused responses. Skip non-essential context./Use a warm, collaborative tone.Override defaults only after observing a problem — don't pre-empt.Spawn multiple subagents in the same turn when fanning out across items. Do not spawn a subagent for work you can complete in a single response.
-
角色前置,用一句话表述 在system字段(而非用户轮次)中用一句描述性语句锚定语气和词汇,例如。顶部使用普通文本——不要用标记语言包裹。
You are a helpful coding assistant specializing in Python. -
静态内容在前,动态内容在后 缓存命中需要字节完全一致的前缀。顺序为:角色 → 工具/策略 → 长参考文档/少样本示例 → 缓存断点 → 动态上下文 → 用户轮次。原因:断点前哪怕一个标记发生变化,都会让你从缓存读取定价(约为输入成本的10%)变为全额输入成本。
-
当服务商支持时,明确标记缓存断点 大多数缓存是可选启用的(例如Anthropic的、Bedrock的cachePoint)。在每个稳定块的末尾设置断点以实现复用。缓存写入的成本通常高于基础输入成本;而读取成本仅为其一小部分。TTL(生存时间)各不相同(通常约5分钟,有时有更长层级)。仅缓存你会在TTL内复用的前缀——低流量场景会反复支付写入成本的罚金。
cache_controlpython# Anthropic示例——适配你使用的服务商语法 system=[ {"type": "text", "text": ROLE_AND_POLICIES}, {"type": "text", "text": LONG_REFERENCE_DOC, "cache_control": {"type": "ephemeral"}}, ] -
切勿将频繁变化的内容放在前缀位置 当前时间、请求ID、用户ID或对话摘要放在顶部会导致整个提示词无法缓存。将它们移至用户轮次或断点后的段落中。
-
为不同部分使用结构化分隔符 用清晰的分隔符包裹长篇输入和行为指令——XML标签(、
<documents><document>…</document></documents>)、Markdown标题或命名JSON字段。原因:分隔后的部分在长上下文中比大段文本更易保留,并且多个模型系列(尤其是Claude)接受过XML结构的专门训练。<default_to_action>…</default_to_action> -
在用户轮次中,将长篇数据放在指令上方 对于20k+标记的输入,文档放在顶部,问题放在底部——在多文档任务中,主流模型的质量提升最高可达30%。
-
编写字面化、有范围的指令 现代指令微调模型对提示词的解读越来越字面化,不会默默进行泛化。如果规则适用范围较广,请明确说明:避免使用全大写的强调词(如
Apply this formatting to every section, not just the first one.、CRITICAL)——这会导致新版Claude模型过度触发,对其他模型也作用甚微;使用正常的祈使语气即可。MUST -
仅在默认表现不佳时调整冗长程度、语气、工具使用和子Agent生成 示例:/
Provide concise, focused responses. Skip non-essential context./Use a warm, collaborative tone.仅在观察到问题后再覆盖默认设置——不要预先设置。Spawn multiple subagents in the same turn when fanning out across items. Do not spawn a subagent for work you can complete in a single response.
Verification procedure
验证流程
After drafting or editing, check each:
- Role check — One-sentence role at the very top of the system field?
- Cache-order check — List segments top-to-bottom. Is every segment before the breakpoint stable across requests? Move per-request values below the breakpoint or into the user turn.
- Breakpoint check — Is the breakpoint on the last static block? Is the cached prefix above the provider's minimum (e.g. 1024 tokens for Anthropic Opus/Sonnet, 2048 for Haiku, 1024 for OpenAI)? Below the minimum, breakpoints are silently ignored.
- Structure check — Longform docs wrapped in delimiters? Behavioral blocks in named tags or sections?
- Literalism check — If an instruction says "format the output" but means "format every section," rewrite with explicit scope.
起草或编辑完成后,逐一检查以下项:
- 角色检查 —— system字段最顶部是否有一句式的角色描述?
- 缓存顺序检查 —— 从上到下列出各个段落。断点前的每个段落是否在所有请求中保持稳定?将每个请求独有的值移至断点下方或用户轮次中。
- 断点检查 —— 断点是否设置在最后一个静态块的末尾?缓存前缀是否超过服务商的最低要求(例如Anthropic Opus/Sonnet为1024标记,Haiku为2048标记,OpenAI为1024标记)?低于最低要求的话,断点会被静默忽略。
- 结构检查 —— 长篇文档是否用分隔符包裹?行为模块是否放在命名标签或段落中?
- 字面化检查 —— 如果指令说“格式化输出”但实际意思是“格式化每个部分”,请重写为明确的范围表述。
Common mistakes to watch for
需要注意的常见错误
- Timestamp at the top. in the role line invalidates the cache every request. Move to the user turn.
Current date: {{today}} - Caching a tiny prefix. A short system prompt below the provider's minimum — breakpoint silently ignored and you still pay the write surcharge. Either grow the cached block or drop the breakpoint.
- Conversation history before the breakpoint. History grows every request, so anything after it can never be cached. Put the breakpoint before the rolling history.
- Copy-pasted legacy shouting. Causes over-triggering on recent models. Rewrite as plain imperatives.
CRITICAL: YOU MUST… - Role in the user message. in
You are a…instead of the system field weakens steering and wastes a cacheable slot.messages[0]
- 顶部放置时间戳 角色行中的会导致每次请求都使缓存失效。移至用户轮次中。
Current date: {{today}} - 缓存极小的前缀 短于服务商最低要求的系统提示词——断点会被静默忽略,而你仍需支付写入附加费。要么扩展缓存块,要么移除断点。
- 断点前放置对话历史 历史记录会随每次请求增长,因此断点后的任何内容都无法被缓存。将断点放在滚动历史记录之前。
- 复制粘贴遗留的式强调 会导致新版模型过度触发。重写为普通祈使句。
CRITICAL: YOU MUST… - 角色放在用户消息中 在中写
messages[0]而非system字段会削弱引导效果,并浪费一个可缓存的位置。You are a…