prompt-writer

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Prompt Writer

提示词编写工具

Write and improve production-grade prompts for any LLM, grounded in official documentation, peer-reviewed research, and patterns from production systems like Claude Code, Cursor, Manus, and Devin.
为任意大语言模型(LLM)编写并优化生产级提示词,参考依据包括官方文档、同行评审研究,以及Claude Code、Cursor、Manus、Devin等生产系统的实践模式。

Step 0: Determine Mode

步骤0:确定工作模式

Before starting, identify the task:
User intentModeWhat to do
"Write a prompt for..."WriteFollow Steps 1-5 below
"Improve/fix/optimize this prompt"ImproveSee Improve Mode below, then
references/prompt-improvement.md
for full workflow
"Review this prompt"ReviewRun Step 4 (Review and Harden) on their prompt
"Write a tool description"ToolJump to the Tool Descriptions section in Step 3
Current as of March 2026. Model guidance reflects Claude 4.6, GPT-5.4, Gemini 3, Llama 4, DeepSeek R1/V3. Verify against official docs for newer releases.
开始前,先明确任务类型:
用户意图模式操作内容
「为...编写提示词」编写模式遵循下方步骤1-5
「优化/修复/改进这个提示词」优化模式先参考下方的优化模式概述,再查看
references/prompt-improvement.md
获取完整流程
「审核我的提示词」审核模式对用户提供的提示词执行步骤4(审核与强化)
「编写工具描述」工具模式直接跳转至步骤3中的工具描述部分
本文档截至2026年3月更新。模型适配指南基于Claude 4.6、GPT 5.4、Gemini 3、Llama 4、DeepSeek R1/V3编写。若使用更新版本模型,请以官方文档为准。

Improve Mode (Quick Summary)

优化模式(快速概述)

When the user has an existing prompt that isn't working:
  1. Diagnose — run through the 6-dimension defect checklist (specification, input, structure, context, performance, maintainability)
  2. Quick fixes — apply the principled instructions checklist (remove politeness, add delimiters, specify audience/format, reframe negatives)
  3. Meta-prompt — give an LLM the prompt + 3-5 failure examples and ask it to rewrite
  4. Rewrite — apply patterns: add specificity, add role+audience, add output schema, add contrastive examples
  5. Evaluate — compare before/after on test cases, check for regressions
For the full workflow with detailed patterns and examples, see
references/prompt-improvement.md
.
当用户提供的现有提示词效果不佳时:
  1. 诊断 — 从6个维度检查缺陷(规格、输入、结构、上下文、性能、可维护性)
  2. 快速修复 — 应用标准化优化清单(移除客套表述、添加分隔符、明确受众/格式、重构负面表述)
  3. 元提示词优化 — 将原提示词+3-5个失败案例输入LLM,要求其重写
  4. 重写优化 — 应用模式:增加细节、明确角色+受众、定义输出 schema、添加对比示例
  5. 评估 — 在测试用例上对比优化前后效果,检查是否出现性能退化
如需包含详细模式与示例的完整流程,请查看
references/prompt-improvement.md

Boundaries

适用边界

This skill covers writing and improving prompts. It does not cover:
  • Prompt injection red-teaming (use security-specific tools)
  • Full RAG pipeline architecture (prompt is one piece; retrieval design is separate)
  • Fine-tuning or training data curation
本工具专注于提示词的编写与优化,不涵盖以下内容:
  • 提示词注入红队测试(请使用安全专用工具)
  • 完整RAG pipeline架构设计(提示词仅为其中一部分,检索设计为独立环节)
  • 模型微调或训练数据整理

Step 1: Understand the Context

步骤1:理解上下文信息

Before writing, gather:
  • Target model — Which model family? (See Model Decision Matrix below)
  • Use case — System prompt, task prompt, agent prompt, tool description?
  • Tool access — What tools/functions will the agent have?
  • Audience — Who interacts with this? Technical users, consumers, internal?
  • Constraints — Length, tone, safety requirements, domain boundaries?
  • Failure modes — What should the agent refuse or avoid?
If the user hasn't specified these, ask before writing. A prompt without context is a prompt that will fail.
编写提示词前,先收集以下信息:
  • 目标模型 — 属于哪个模型家族?(参考下方的模型决策矩阵)
  • 使用场景 — 是系统提示词、任务提示词、Agent提示词还是工具描述?
  • 工具权限 — Agent将拥有哪些工具/函数调用权限?
  • 受众群体 — 谁会使用这个提示词对应的服务?技术用户、普通消费者还是内部人员?
  • 约束条件 — 长度限制、语气要求、安全规范、领域边界?
  • 失败场景 — Agent需要拒绝哪些请求或避免哪些行为?
如果用户未明确上述信息,请先询问再开始编写。缺乏上下文的提示词必然无法达到预期效果。

Step 2: Choose the Right Structure

步骤2:选择合适的结构

Universal structure adapted from Anthropic, OpenAI, and Google's official guides:
1. Role & Identity        — Who is this agent?
2. Core Instructions      — What are the behavioral rules?
3. Tool/Capability Guide  — How to use available tools
4. Reasoning Guidance     — When and how to think step-by-step
5. Output Format          — What shape should responses take?
6. Examples               — 2-5 demonstrations of desired behavior
7. Context/Reference      — Domain knowledge, loaded last
8. Final Reminders        — Closing behavioral anchors
Not every prompt needs all sections. A simple chatbot needs 1, 2, 5, 6. A complex agent needs all eight. A task prompt may need only 2, 5, 6.
以下通用结构改编自Anthropic、OpenAI和Google的官方指南:
1. Role & Identity        — Who is this agent?
2. Core Instructions      — What are the behavioral rules?
3. Tool/Capability Guide  — How to use available tools
4. Reasoning Guidance     — When and how to think step-by-step
5. Output Format          — What shape should responses take?
6. Examples               — 2-5 demonstrations of desired behavior
7. Context/Reference      — Domain knowledge, loaded last
8. Final Reminders        — Closing behavioral anchors
并非所有提示词都需要包含全部模块。简单聊天机器人只需1、2、5、6模块;复杂Agent需要全部8个模块;任务提示词可能仅需2、5、6模块。

Step 3: Write Each Section

步骤3:编写各模块内容

Role & Identity

角色与身份

One to three sentences. Anchors all subsequent behavior.
You are a senior backend engineer specializing in Python and Django.
You help developers debug production issues by analyzing logs,
tracing errors, and suggesting minimal fixes.
Audience framing amplifies this — stating who the output is for improves relevance by up to 100% in benchmarks.
用1-3句话明确,为后续所有行为提供定位锚点。
You are a senior backend engineer specializing in Python and Django.
You help developers debug production issues by analyzing logs,
tracing errors, and suggesting minimal fixes.
明确受众定位可进一步提升相关性——基准测试显示,明确输出受众可使效果提升高达100%。

Core Instructions

核心指令

Structure with markdown headers or XML tags. Key principles:
Explain the why, not just the what. Models generalize better from motivation than bare commands. Instead of "Never use ellipses", write "Avoid ellipses because the TTS engine cannot pronounce them."
Use positive framing. "Write in complete sentences" outperforms "Don't use fragments." Tell the agent what TO do. Research confirms positive instructions actively boost desired token probabilities, while negatives only weakly suppress.
Be specific enough to verify. "Use 2-space indentation" is testable. "Format code properly" is not.
Resolve contradictions. Models waste reasoning tokens reconciling conflicts. Add explicit priority: "If rule A and rule B conflict, rule A takes precedence."
Match language intensity to model — this is critical:
Model FamilyGuidance Style
Claude 4.xConversational. Explain reasoning. Avoid CAPS/MUST/NEVER — causes overtriggering. Use "prefer", "because".
GPT-5.xDirect but not aggressive. Naturally thorough — don't over-prompt for thoroughness.
GPT-4.1More literal than predecessors. A single clarifying sentence corrects behavior.
Gemini 3Short, direct specs only. Logic over persuasion. No flowery language.
DeepSeek R1NO system prompt. Put everything in user message. No few-shot. No "think step by step."
MistralHierarchical markdown sections. Explicit step-by-step helps.
CohereMinimal preamble. Pass docs via API, not prompt. RAG-first.
Qwen/QwQStandard. For QwQ: non-greedy sampling required (temp=0.6, top_p=0.95).
GrokVerbose, detailed system prompts work best. XML/MD both fine.
NovaCAPS DO/MUST/DO NOT encouraged (opposite of Claude!). Numbered steps.
Smaller/MiniMost literal. Critical rules FIRST. Numbered steps. Full execution order.
For full model-specific guidance, see
references/model-specific-tuning.md
.
使用Markdown标题或XML标签结构化呈现。关键原则:
解释原因,而非仅说明要求。模型从动机中学习的泛化能力优于生硬指令。例如,不要写「禁止使用省略号」,而应写「避免使用省略号,因为TTS引擎无法识别发音」。
使用正向表述。「使用完整句子」的效果优于「不要使用片段化表述」。告诉模型要做什么,而非不要做什么。研究证实,正向指令能有效提升目标token的概率,而负向指令仅能微弱抑制非目标token。
明确到可验证的程度。「使用2空格缩进」是可测试的要求,而「规范格式化代码」则不是。
解决规则冲突。模型会浪费推理token来调和冲突规则。需添加明确优先级:「若规则A与规则B冲突,以规则A为准」。
语言强度匹配目标模型——这一点至关重要
模型家族指导风格
Claude 4.x对话式风格,解释推理过程。避免使用大写字母/MUST/NEVER——会导致过度触发。使用「建议」「因为」等表述。
GPT 5.x直接但不生硬。模型本身已具备全面性——无需过度强调「详尽性」。
GPT 4.1比前代模型更字面化。一句澄清即可修正行为。
Gemini 3仅保留简短、直接的规格说明。逻辑优先,无需修饰性语言。
DeepSeek R1禁止使用系统提示词。所有内容放入用户消息中。不使用少样本示例,不使用「逐步思考」。
Mistral采用分层Markdown章节结构。明确的步骤引导有助于提升效果。
Cohere最小化前置说明。通过API传递文档,而非放入提示词。优先使用RAG。
Qwen/QwQ标准风格。对于QwQ:需使用非贪婪采样(temp=0.6,top_p=0.95)。
Grok冗长、详细的系统提示词效果最佳。XML或Markdown格式均可。
Nova鼓励使用大写字母的DO/MUST/DO NOT(与Claude完全相反!)。使用编号步骤。
小型/迷你模型最字面化。关键规则放在最前面。使用编号步骤,明确完整执行顺序。
如需完整的模型专属指导,请查看
references/model-specific-tuning.md

The Prompting Inversion (Critical Finding)

提示词反转效应(关键发现)

Techniques effective for mid-tier models can actively harm frontier models. As capability increases, simpler prompts work better. Constraints that prevent common-sense errors in weaker models induce hyper-literalism in stronger ones.
Rule: always calibrate prompt complexity to model capability. When in doubt, start simple.
对中端模型有效的技术可能严重损害前沿模型的性能。随着模型能力提升,更简洁的提示词效果更好。在弱模型中用于避免常识错误的约束,会导致强模型出现过度字面化的问题。
规则:始终根据模型能力调整提示词复杂度。如有疑问,从简洁版本开始。

Tool Descriptions

工具描述

Tool descriptions are prompts — 97.1% of tool descriptions in production have quality defects. Each tool needs a 3-4 sentence contract:
  1. What it does and what it returns
  2. When to use it (activation criteria)
  3. When NOT to use it
  4. Key parameter guidance and limitations
search_codebase: Searches the repository for code matching a query.
Returns matching lines with file paths and line numbers. Use when
you need to find function definitions, usage patterns, or specific
snippets. Do not use for reading entire files — use read_file instead.
Query supports regex. Results are limited to 50 matches.
The six essentials (from MCP tool description research, 2026):
  • Clear purpose with return data specified
  • Activation criteria (when to use)
  • Documented limitations and failure modes
  • Parameter intent, not just data types
  • 3-4+ sentences proportional to complexity
  • Examples showing both successful and failing cases
工具描述本质也是提示词——生产环境中97.1%的工具描述存在质量缺陷。每个工具需要3-4句话的契约式描述:
  1. 工具功能与返回结果
  2. 使用场景(触发条件)
  3. 禁用场景
  4. 关键参数指导与限制
search_codebase: Searches the repository for code matching a query.
Returns matching lines with file paths and line numbers. Use when
you need to find function definitions, usage patterns, or specific
snippets. Do not use for reading entire files — use read_file instead.
Query supports regex. Results are limited to 50 matches.
MCP工具描述研究(2026)总结的6个核心要素:
  • 明确功能与返回数据
  • 触发条件(何时使用)
  • 文档化的限制与失败场景
  • 参数意图(而非仅数据类型)
  • 3-4句话,长度与复杂度匹配
  • 包含成功与失败案例的示例

Reasoning Guidance

推理指导

Match reasoning instructions to the model:
  • Reasoning models (Claude extended thinking, GPT o-series, DeepSeek R1, Gemini with thinking, QwQ): Already reason internally. "Think step by step" provides marginal gains at 35-600% more latency. Use "verify your answer against [criteria]" instead.
  • Non-reasoning models (GPT-4.1, Claude Haiku, Mistral Small): "Think step by step" provides ~50% improvement on math/symbolic tasks. Worth the cost.
  • Simple tasks on any model: Skip reasoning entirely. CoT can hurt performance on straightforward tasks.
The Think Tool pattern (+54% on complex policy tasks): Give agents a no-op tool called
think
— a callable scratchpad between tool calls. Different from extended thinking. Include domain-specific examples showing HOW to use it.
Chain-of-Draft (cost-efficient alternative to CoT): When reasoning is needed but cost/latency matters, limit each reasoning step to ~5 words. Matches CoT accuracy at 7.6% of the tokens: "Think step by step, but keep only a minimum draft for each step, 5 words at most."
For agents, three reminders consistently improve performance (+20%):
  1. Persistence: "Continue working until the task is fully resolved."
  2. Tool use: "Use tools to gather information rather than guessing."
  3. Planning: "Plan your approach before taking action, and reflect on results after."
推理指令需匹配模型特性:
  • 推理型模型(Claude extended thinking、GPT o-series、DeepSeek R1、带thinking的Gemini、QwQ):已具备内部推理能力。「逐步思考」仅能带来边际收益,但会增加35-600%的延迟。建议使用「对照[标准]验证你的答案」替代。
  • 非推理型模型(GPT-4.1、Claude Haiku、Mistral Small):「逐步思考」可使数学/符号任务的性能提升约50%,值得付出延迟成本。
  • 任意模型的简单任务:完全跳过推理指导。思维链(CoT)会损害简单任务的性能。
思考工具模式(复杂策略任务性能提升54%):为Agent提供一个名为
think
的空操作工具——作为工具调用之间的可调用草稿本。与extended thinking不同,需包含领域特定示例说明如何使用该工具。
草稿链(Chain-of-Draft)(CoT的成本高效替代方案):当需要推理但关注成本/延迟时,将每个推理步骤限制在约5个词。在保持CoT准确率的同时,仅需7.6%的token消耗:「逐步思考,但每个步骤仅保留最少草稿内容,最多5个词。」
对于Agent,以下3个提醒可稳定提升性能20%:
  1. 持续性:「持续工作直至任务完全解决。」
  2. 工具使用:「使用工具收集信息,而非猜测。」
  3. 规划性:「采取行动前先规划方案,行动后反思结果。」

Output Format

输出格式

Be explicit. Without format guidance, models add prose, markdown fences, or unnecessary structure.
The CTCO pattern (from OpenAI's GPT-5.4 guide — most reliable anti-hallucination structure):
  1. Context — Who is the model? What is the background state?
  2. Task — The single, atomic action required
  3. Constraints — Negative constraints and scope limits
  4. Output — Exact format specification (JSON schema, sections, length)
Respond with a JSON object containing:
- "summary": one-sentence description
- "severity": "low" | "medium" | "high" | "critical"
- "fix": the recommended code change as a unified diff
明确输出格式要求。若无格式指导,模型会添加冗余的 prose、Markdown围栏或不必要的结构。
CTCO模式(来自OpenAI的GPT 5.4指南——最可靠的抗幻觉结构):
  1. Context(上下文)——模型的定位是什么?背景状态如何?
  2. Task(任务)——单一、原子化的行动要求
  3. Constraints(约束)——负向约束与范围限制
  4. Output(输出)——精确的格式规范(JSON schema、章节、长度)
Respond with a JSON object containing:
- "summary": one-sentence description
- "severity": "low" | "medium" | "high" | "critical"
- "fix": the recommended code change as a unified diff

Examples (Few-Shot)

示例(少样本)

Examples are the highest-impact technique — up to 90% accuracy improvement.
The 2-5 rule: Major gains after 2; diminishing returns after 5. More can paradoxically degrade performance (the "few-shot dilemma" — Gemma 7B dropped from 77.9% to 39.9%).
Quality over quantity: TF-IDF-based example selection outperforms random sampling by 1%+ while avoiding degradation. Choose examples relevant to the actual use case.
Contrastive examples dramatically improve reasoning: showing both a wrong approach (with why it's wrong) and the correct approach. GSM8K: 35.9% → 88.8% with GPT-4 using this pattern.
Order matters: Alternate positive and negative examples to avoid bias.
Format by model:
  • Claude:
    <examples><example><user>...</user><assistant>...</assistant></example></examples>
  • GPT/others: Markdown headers or numbered examples
  • DeepSeek R1: NO examples (degrades reasoning)
示例是效果提升最显著的技术——可使准确率提升高达90%。
2-5规则:2个示例后即可获得显著收益;超过5个后收益递减。过多示例反而会降低性能(「少样本困境」——Gemma 7B的准确率从77.9%降至39.9%)。
质量优先于数量:基于TF-IDF选择的示例比随机采样的示例性能提升1%以上,且避免性能退化。选择与实际使用场景相关的示例。
对比示例可大幅提升推理能力:同时展示错误做法(说明错误原因)与正确做法。在GSM8K数据集上,GPT-4使用该模式后准确率从35.9%提升至88.8%。
顺序影响:交替展示正向与负向示例,避免偏见。
按模型选择格式
  • Claude:使用
    <examples><example><user>...</user><assistant>...</assistant></example></examples>
    格式
  • GPT/其他模型:使用Markdown标题或编号示例
  • DeepSeek R1:禁止使用示例(会降低推理能力)

Context/Reference

上下文/参考资料

Place long-form reference material AFTER instructions and examples. Queries placed after long context improve quality by up to 30%.
Lost-in-the-middle: Models attend more to the beginning and end of prompts (architectural, not training artifact). Place highest-priority content at the start and end. Never bury critical instructions in the middle.
For Claude, use XML tags:
<context>
,
<documents>
,
<reference_material>
.
将长篇参考资料放在指令与示例之后。查询放在长上下文之后可使效果提升高达30%。
中间信息丢失问题:模型对提示词开头与结尾的内容关注度更高(这是架构特性,而非训练 artifact)。将优先级最高的内容放在开头与结尾。切勿将关键指令埋在中间。
对于Claude,使用XML标签:
<context>
<documents>
<reference_material>

Final Reminders

最终提醒

Close with 1-3 critical behavioral anchors. Models weight content at the beginning and end more heavily.
用1-3个关键行为锚点收尾。模型对开头与结尾内容的权重更高。

Step 4: Review and Harden

步骤4:审核与强化

Structural quality:
  • Sections clearly delineated (XML tags or markdown headers)
  • No contradictory instructions
  • Examples match the stated rules
  • Tool descriptions are complete contracts (what, when, when-not, limitations)
Language quality:
  • Positive framing (what to do, not what to avoid)
  • Motivation provided for non-obvious rules (the "why")
  • Language intensity matches target model
  • Specificity is testable ("2-space indent" not "format nicely")
Robustness:
  • Edge cases addressed or delegated ("if unclear, ask the user")
  • Failure modes handled ("if the tool errors, retry once then report")
  • No sensitive data in the prompt (credentials, API keys, PII)
  • Prompt injection considerations (trust boundaries, input validation)
Efficiency:
  • No redundant sections
  • Tool descriptions concise (3-4 sentences, not paragraphs)
  • Examples minimal but sufficient (2-5, not 10+)
  • Static content first, dynamic last (enables caching — up to 90% cost reduction)
See
references/security-patterns.md
for defense patterns.
结构质量检查
  • 模块划分清晰(使用XML标签或Markdown标题)
  • 无冲突指令
  • 示例与规则一致
  • 工具描述为完整契约(功能、使用场景、禁用场景、限制)
语言质量检查
  • 正向表述(说明要做什么,而非不要做什么)
  • 非明显规则需说明原因(「因为...」)
  • 语言强度匹配目标模型
  • 明确到可验证的程度(「2空格缩进」而非「规范格式化」)
鲁棒性检查
  • 已处理边缘情况或明确委托(「若有疑问,询问用户」)
  • 已处理失败场景(「若工具调用出错,重试1次后报告」)
  • 提示词中无敏感数据(凭证、API密钥、个人可识别信息)
  • 考虑提示词注入风险(信任边界、输入验证)
效率检查
  • 无冗余模块
  • 工具描述简洁(3-4句话,而非段落)
  • 示例精简但足够(2-5个,而非10+)
  • 静态内容在前,动态内容在后(支持缓存——可降低高达90%的成本)
如需防御模式,请查看
references/security-patterns.md

Step 5: Test and Iterate

步骤5:测试与迭代

  1. Try adversarial inputs — edge cases, ambiguous requests, rule-breaking attempts
  2. Check failure modes — graceful or catastrophic?
  3. Run 5x on same input — check consistency
  4. Iterate one variable at a time — changing model AND prompt simultaneously makes attribution impossible
For systematic prompt improvement, see
references/prompt-improvement.md
.

  1. 尝试对抗性输入——边缘情况、模糊请求、违反规则的尝试
  2. 检查失败场景——是优雅降级还是灾难性失败?
  3. 同一输入测试5次——检查结果一致性
  4. 每次仅迭代一个变量——同时更改模型与提示词会导致无法归因性能变化
如需系统化的提示词优化方法,请查看
references/prompt-improvement.md

Model Decision Matrix

模型决策矩阵

Quick reference for choosing prompting strategy by model:
ModelSystem PromptDelimiter StyleThinkingKey Differentiator
Claude 4.xSeparate roleXML tagsExtended thinkingSoft language, no CAPS
GPT-5.xSystem + instructionsMarkdowno-series reasoningCTCO pattern, reasoning_effort
GPT-4.1SystemMarkdownManual CoTUltra-literal, sandwich method
Gemini 3System instructionHierarchy/outlinethinking_levelTemp must stay 1.0, concise
Llama 4Special tokensSpecial tokensNoneExact template required
DeepSeek R1NO system promptPlain text in user msgAuto <think>Zero-shot only, no examples
DeepSeek V3YesStandardNoneUse for speed, R1 for depth
MistralPrepended to userMarkdown sectionsNonePrefix injection, JSON dual-instruct
CoherePreamble (optional)Documents APINoneRAG-first, auto-citations
Qwen/QwQapply_chat_templateChatMLenable_thinkingNon-greedy required for QwQ
Grok 3Separate (verbose)XML or Markdownreasoning_effortReal-time X data, cache-stable
NovaSeparateMarkdownNoneCAPS encouraged, temp=0 for tools
For detailed per-model guidance:
references/model-specific-tuning.md

按模型选择提示词策略的快速参考:
模型系统提示词分隔符风格推理设置核心差异
Claude 4.x独立角色设置XML标签Extended thinking温和语言,禁用大写强调
GPT 5.x系统提示+指令Markdowno-series推理CTCO模式,reasoning_effort参数
GPT 4.1系统提示Markdown手动CoT超字面化,三明治方法
Gemini 3系统指令层级/大纲thinking_level参数温度必须保持1.0,内容简洁
Llama 4特殊令牌特殊令牌需要精确模板
DeepSeek R1禁止使用系统提示词用户消息中使用纯文本自动<think>仅支持零样本,无示例
DeepSeek V3支持系统提示词标准风格追求速度选V3,追求深度选R1
Mistral前置到用户消息Markdown章节前缀注入,JSON双指令
Cohere可选前言文档APIRAG优先,自动引用
Qwen/QwQ使用apply_chat_templateChatMLenable_thinkingQwQ需使用非贪婪采样
Grok 3独立设置(支持冗长内容)XML或Markdownreasoning_effort参数实时X数据,缓存稳定
Nova独立设置Markdown鼓励使用大写强调,工具调用温度设为0
如需详细的单模型指导,请查看
references/model-specific-tuning.md

Quick Reference: Prompt Patterns

快速参考:提示词模式

Agent System Prompt Template

Agent系统提示词模板

markdown
undefined
markdown
undefined

Role

Role

[1-3 sentences: who, what domain, what tone]
[1-3 sentences: who, what domain, what tone]

Instructions

Instructions

[Behavioral rules, organized by category] [Each rule with motivation: "because..."]
[Behavioral rules, organized by category] [Each rule with motivation: "because..."]

Tools

Tools

[For each tool: what it does, when to use, when not to use, limitations]
[For each tool: what it does, when to use, when not to use, limitations]

Output Format

Output Format

[Explicit format specification]
[Explicit format specification]

Examples

Examples

[2-5 diverse, relevant demonstrations]
[2-5 diverse, relevant demonstrations]

Context

Context

[Reference material, loaded last]
[Reference material, loaded last]

Reminders

Reminders

[1-3 critical behavioral anchors]
undefined
[1-3 critical behavioral anchors]
undefined

Common Prompt Recipes

常见提示词模板

ScenarioKey SectionsNotes
Simple chatbotRole, Instructions, Format, ExamplesSkip tools and reasoning
Coding agentAll 8 sectionsAdd persistence + planning reminders, think tool
Data analystRole, Instructions, Tools, FormatEmphasize structured output
Content writerRole, Instructions, Examples, FormatHeavy on examples and tone
Customer supportRole, Instructions, Examples, ContextLoad FAQ as context
Multi-agent orchestratorRole, Instructions, Tools, ReasoningDefine delegation rules, effort budgets
Task prompt (one-shot)Instructions, Format, ExamplesNo role needed, be specific
Prompt improvementSee
references/prompt-improvement.md
Diagnose → fix → evaluate

场景核心模块注意事项
简单聊天机器人角色、指令、格式、示例跳过工具与推理模块
编码Agent全部8个模块添加持续性+规划提醒,使用思考工具
数据分析师角色、指令、工具、格式强调结构化输出
内容创作者角色、指令、示例、格式重点关注示例与语气
客户支持角色、指令、示例、上下文将FAQ作为上下文加载
多Agent编排器角色、指令、工具、推理定义委托规则、资源预算
一次性任务提示词指令、格式、示例无需角色模块,内容需具体
提示词优化查看
references/prompt-improvement.md
诊断→修复→评估

Multimodal Prompting

多模态提示词

For models that support vision (Claude, GPT-4o+, Gemini, Llama 4, Grok):
Image placement matters:
  • Gemini: image BEFORE text (official recommendation)
  • Claude/GPT: image position is flexible, but place relevant images near the text that references them
  • Multiple images: reference each by name or position ("In the first image...")
Key patterns:
  • Be explicit about what to do with the image: "transcribe", "describe", "extract table data", "compare these two screenshots"
  • For OCR/extraction: specify exact output format (JSON, table, list)
  • For analysis: provide evaluation criteria, not open-ended "what do you see?"
  • For diagrams/charts: ask for specific data points, not general description
Token costs vary significantly:
  • Gemini: 258 tokens per image tile (≤384px), video at 263 tokens/sec
  • Claude: ~1,600 tokens for a typical image
  • Budget accordingly — 10 images can consume 16K+ tokens

对于支持视觉的模型(Claude、GPT-4o+、Gemini、Llama 4、Grok):
图片位置影响
  • Gemini:图片放在文本之前(官方推荐)
  • Claude/GPT:图片位置灵活,但需将相关图片放在引用文本附近
  • 多张图片:通过名称或位置引用(「在第一张图片中...」)
关键模式
  • 明确说明对图片的操作:「转录」「描述」「提取表格数据」「对比这两张截图」
  • OCR/提取任务:指定精确输出格式(JSON、表格、列表)
  • 分析任务:提供评估标准,而非开放式的「你看到了什么?」
  • 图表/图形:要求提取具体数据点,而非泛泛描述
Token成本差异显著
  • Gemini:每个图片块(≤384px)消耗258 tokens,视频每秒消耗263 tokens
  • Claude:单张典型图片消耗约1600 tokens
  • 合理预算——10张图片可消耗16000+ tokens

Structured Output (JSON Mode)

结构化输出(JSON模式)

Each model handles structured output differently:
ModelHow to enableKey requirement
ClaudeRequest JSON in instructions + use XML output tagsSpecify exact schema in prompt
GPT
response_format: {type: "json_object"}
or Structured Outputs with JSON Schema
Must mention "JSON" in prompt
Gemini
responseMimeType: "application/json"
+
responseSchema
Define
required
arrays explicitly (avoids nulls)
DeepSeek
response_format: {type: "json_object"}
Include "json" in prompt + set adequate
max_tokens
Mistral
response_format: {type: "json_object"}
Must ALSO instruct JSON in prompt text (dual requirement)
LlamaInstruct in prompt; validate outputNo native JSON mode — always validate server-side
CohereInstruct in promptUse document grounding for factual JSON
QwenInstruct in promptUse
apply_chat_template
GrokPydantic/JSON Schema via Structured OutputsUse schema definitions
NovaInstruct in promptSet
max_tokens
high enough to avoid truncation
Universal best practice: Always provide the exact JSON schema with field types, not just "return JSON". Show one complete example of the expected output.

各模型处理结构化输出的方式不同:
模型启用方式核心要求
Claude指令中要求JSON输出+使用XML输出标签提示词中指定精确schema
GPT
response_format: {type: "json_object"}
或带JSON Schema的结构化输出
提示词中必须提及「JSON」
Gemini
responseMimeType: "application/json"
+
responseSchema
显式定义
required
数组(避免null值)
DeepSeek
response_format: {type: "json_object"}
提示词中包含「json」+设置足够的
max_tokens
Mistral
response_format: {type: "json_object"}
必须同时在提示词文本中要求JSON输出(双重要求)
Llama提示词中要求;需验证输出无原生JSON模式——必须在服务端验证输出
Cohere提示词中要求使用文档 grounding 生成事实性JSON
Qwen提示词中要求使用
apply_chat_template
Grok通过结构化输出功能传入Pydantic/JSON Schema使用schema定义
Nova提示词中要求设置足够的
max_tokens
避免截断
通用最佳实践:始终提供包含字段类型的精确JSON schema,而非仅要求「返回JSON」。同时展示一个完整的预期输出示例。

Before/After Examples

优化前后示例

Vague → Specific (any model)

模糊→具体(任意模型)

Before: "Summarize this article" After: "Summarize this article in 3 bullet points for a product manager. Each bullet: one sentence, starts with an action verb. Focus on decisions needed, not background."
优化前:「总结这篇文章」 优化后:「为产品经理总结这篇文章,输出3个要点。每个要点为1句话,以动作动词开头。重点关注需要做出的决策,而非背景信息。」

Generic → Model-Tuned (Claude 4.x)

通用→模型适配(Claude 4.x)

Before: "You MUST ALWAYS follow these CRITICAL rules: NEVER use markdown. ALWAYS respond in JSON." After: "Respond in JSON format because the output feeds directly into our parser. Avoid markdown formatting since the parser doesn't handle it. Here's the expected schema: {example}"
优化前:「你必须始终遵守这些关键规则:禁止使用Markdown。始终返回JSON。」 优化后:「请返回JSON格式输出,因为输出将直接传入我们的解析器。避免使用Markdown格式,因为解析器不支持。以下是预期的schema:{示例}」

Overloaded → Decomposed (any model)

过载→拆分(任意模型)

Before: "Read this codebase, find all security vulnerabilities, fix them, write tests, update the docs, and create a PR." After (prompt 1 of 3): "Scan this codebase for security vulnerabilities. For each finding, output: file path, line number, vulnerability type (OWASP category), severity (high/medium/low), and a one-line fix description. Output as JSON array."
优化前:「阅读这个代码库,找出所有安全漏洞,修复它们,编写测试,更新文档,并创建PR。」 优化后(第1/3个提示词):「扫描这个代码库以查找安全漏洞。对于每个发现,输出:文件路径、行号、漏洞类型(OWASP分类)、严重程度(高/中/低)、一行修复描述。以JSON数组格式输出。」

Missing Context → Complete (DeepSeek R1)

缺失上下文→完整(DeepSeek R1)

Before (system prompt): "You are a code reviewer." After (user message — no system prompt): "Task: Review this Python function for bugs and performance issues. Requirements: Focus on correctness first, then performance. Flag any edge cases that would cause exceptions. Output format: For each issue: {line, issue_type, severity, fix}"
优化前(系统提示词):「你是代码审核员。」 优化后(用户消息——无系统提示词): 「任务:审核这个Python函数的bug与性能问题。 要求:优先关注正确性,其次是性能。标记任何会导致异常的边缘情况。 输出格式:每个问题包含:{行号、问题类型、严重程度、修复方案}」

No Examples → With Contrastive Example (GPT-4.1)

无示例→带对比示例(GPT 4.1)

Before: "Classify customer feedback as positive, negative, or neutral." After: "Classify customer feedback as positive, negative, or neutral.
INCORRECT approach: 'The product works but the delivery was late' → positive (wrong: mixed sentiment defaults to negative when a complaint is present) CORRECT approach: 'The product works but the delivery was late' → negative (delivery complaint outweighs neutral product statement)
Classify: {input}"

优化前:「将客户反馈分类为正面、负面或中性。」 优化后:「将客户反馈分类为正面、负面或中性。
错误示例:'产品能用但发货延迟' → 正面(错误:混合情感中存在投诉时,默认归类为负面) 正确示例:'产品能用但发货延迟' → 负面(发货投诉的权重超过产品的中性评价)
请分类:{输入}」

Context Engineering (The Current Paradigm)

上下文工程(当前主流范式)

Prompt engineering is now context engineering — curating the minimal high-signal token set the model needs. Every instruction should earn its place in the context window.
Four strategies (Anthropic):
  1. Compaction — Summarize and reinitiate near context limits
  2. Structured note-taking — Agents write persistent notes outside the context window
  3. Sub-agent architectures — Specialized agents return 1,000-2,000 token summaries
  4. Hybrid retrieval — Combine upfront loading with just-in-time exploration
Context rot degrades quality in long conversations:
  • Poisoning: errors/hallucinations repeatedly referenced
  • Distraction: irrelevant content drowning signal
  • Confusion: superfluous information degrading quality
  • Clash: conflicting instructions from different sources
Prompt caching changes how prompts should be structured:
  • Static content first (role, instructions, tool definitions) — cacheable
  • Dynamic content last (user query, conversation) — not cached
  • Up to 90% cost reduction, 85% latency reduction
  • Break-even at 1.4+ cache reads per prefix

提示词工程现已演变为上下文工程——整理模型所需的最小高信号token集合。每个指令都应在上下文窗口中占据合理位置。
四种策略(来自Anthropic):
  1. 压缩 — 接近上下文限制时,总结并重新初始化对话
  2. 结构化笔记 — Agent在上下文窗口外编写持久化笔记
  3. 子Agent架构 — 专用Agent返回1000-2000 token的摘要
  4. 混合检索 — 结合预加载与即时探索
上下文退化会降低长对话的质量:
  • 污染:错误/幻觉被反复引用
  • 干扰:无关内容淹没有效信号
  • 混淆:冗余信息降低质量
  • 冲突:不同来源的指令相互矛盾
提示词缓存改变了提示词的结构设计方式:
  • 静态内容在前(角色、指令、工具定义)——可缓存
  • 动态内容在后(用户查询、对话历史)——不可缓存
  • 最多可降低90%的成本,减少85%的延迟
  • 当每个前缀的缓存读取次数≥1.4时,即可实现收支平衡

Anti-Patterns

反模式

Anti-PatternWhy It FailsFix
ALL-CAPS MUST/NEVERClaude 4.x overtriggers; GPT-5 wastes tokens (except Nova)Explain the why; use CAPS only for Nova
"Be thorough"Modern models are already thorough; causes overengineeringSpecify exact scope and boundaries
Contradictory rulesModels attempt both, degrading qualityAdd explicit priority ordering
20+ examplesBurns context; paradoxically degrades performanceUse 2-5 diverse, TF-IDF-selected examples
Vague instructionsNot testable, not actionableMake each rule verifiable
Negative-only framing"Don't X" is weaker than "Do Y"Reframe as positive instructions
Same prompt across modelsEach family responds differently (Inversion effect)Tune per model; start simple for frontier
No examples at allRemoves highest-impact techniqueAdd 2+ demonstrations (except DeepSeek R1)
Monolithic wall of textHard to parse, sections blurUse headers, XML tags, or delimiters
CoT on reasoning models35-600% more latency, marginal gainsUse self-verification instead
One-sentence tool descriptions97% of tool descriptions have defectsUse 3-4 sentence contracts with limitations
Critical info in the middleLost-in-the-middle: models attend to start/endPut critical content at beginning and end

反模式失败原因修复方案
全大写MUST/NEVERClaude 4.x会过度触发;GPT 5会浪费token(Nova除外)说明原因;仅在Nova模型中使用大写强调
「请务必详尽」现代模型本身已具备详尽性;会导致过度设计明确精确的范围与边界
冲突规则模型会尝试同时满足所有规则,降低质量添加明确的优先级排序
20+示例占用上下文;反而会降低性能使用2-5个多样化、基于TF-IDF选择的示例
模糊指令不可测试、不可执行使每个规则可验证
仅负向表述「不要做X」的效果弱于「要做Y」重构为正向指令
跨模型使用相同提示词每个模型家族的响应方式不同(反转效应)按模型调整提示词;前沿模型从简洁版本开始
无任何示例放弃了效果提升最显著的技术添加2+示例(DeepSeek R1除外)
大段无结构文本难以解析,模块边界模糊使用标题、XML标签或分隔符
推理型模型使用CoT延迟增加35-600%,仅获边际收益使用自验证替代
单句工具描述97%的工具描述存在缺陷使用3-4句话的契约式描述,包含限制
关键信息放在中间中间信息丢失:模型更关注开头与结尾将关键内容放在开头与结尾

Security Considerations

安全考量

For agents with tool access or external data:
  1. Separate trusted/untrusted content — XML tags or delimiters to mark system vs user input
  2. Least privilege — only grant tools the agent actually needs
  3. Input validation guidance — instruct the agent to validate before acting
  4. Spotlighting — mark untrusted data with delimiters/encoding (reduces attacks from >50% to <2%)
  5. Graceful refusal — define what the agent should decline and how
See
references/security-patterns.md
for the full defense hierarchy and architectural patterns.

对于拥有工具调用权限或访问外部数据的Agent:
  1. 区分可信/不可信内容 — 使用XML标签或分隔符标记系统输入与用户输入
  2. 最小权限原则 — 仅授予Agent实际需要的工具权限
  3. 输入验证指导 — 指导Agent在行动前验证输入
  4. 高亮标记 — 使用分隔符/编码标记不可信数据(可将攻击成功率从>50%降至<2%)
  5. 优雅拒绝 — 定义Agent应拒绝的请求类型与方式
如需完整的防御层级与架构模式,请查看
references/security-patterns.md

Additional Resources

附加资源

  • references/templates.md
    — Ready-to-use prompt templates: coding agent, chatbot, data extraction, content writer, orchestrator, per-model skeletons (Claude/GPT/Gemini/DeepSeek/Llama/Cohere/Nova), meta-prompts for rewriting/diagnosing/optimizing, tool description template
  • references/model-specific-tuning.md
    — Per-model guidance for all 10 families: language patterns, structural preferences, tool calling, anti-patterns, migration tips
  • references/prompt-improvement.md
    — Systematic workflow for improving existing prompts: 6-dimension defect taxonomy, diagnostic checklist, rewriting patterns, meta-prompting, evaluation
  • references/security-patterns.md
    — Defense hierarchy, spotlighting, OWASP Top 10 for LLMs, architectural patterns, trust boundaries
  • references/research-evidence.md
    — Key papers with measured improvements: technique rankings, prompting inversion, CoT findings, few-shot dilemma, context engineering, automated optimization
  • references/production-prompt-anatomy.md
    — Structural analysis of Claude Code, Cursor, Manus, Devin, v0: universal patterns, conditional assembly, three eras, caching architecture
  • references/templates.md
    — 即用型提示词模板:编码Agent、聊天机器人、数据提取、内容创作、编排器、各模型骨架(Claude/GPT/Gemini/DeepSeek/Llama/Cohere/Nova)、重写/诊断/优化用元提示词、工具描述模板
  • references/model-specific-tuning.md
    — 10大模型家族的单模型指导:语言模式、结构偏好、工具调用、反模式、迁移技巧
  • references/prompt-improvement.md
    — 系统化的提示词优化流程:6维度缺陷分类、诊断清单、重写模式、元提示词、评估方法
  • references/security-patterns.md
    — 防御层级、高亮标记、LLM的OWASP Top 10、架构模式、信任边界
  • references/research-evidence.md
    — 关键研究论文与实测效果:技术排名、提示词反转效应、CoT发现、少样本困境、上下文工程、自动化优化
  • references/production-prompt-anatomy.md
    — 生产级提示词结构分析:Claude Code、Cursor、Manus、Devin、v0的通用模式、条件组装、三个时代、缓存架构

Sources

参考来源

  • Anthropic: Prompt Engineering Docs, Context Engineering Blog, Claude 4 Best Practices, Think Tool
  • OpenAI: Platform Docs, GPT-4.1/5.x Prompting Guides, Harness Engineering
  • Google: Gemini API Prompting Strategies, Gemini 3 Developer Guide
  • Meta: Llama 4 Model Cards, Prompting Guide
  • DeepSeek: API Docs, Reasoning Model Guide
  • Mistral: Prompting Capabilities, Function Calling
  • Cohere: Crafting Effective Prompts, RAG Guide
  • Qwen: Official Docs, QwQ Guide
  • xAI: Grok Docs, Function Calling
  • Amazon: Nova Prompting Best Practices
  • Research: "Principled Instructions" (2024), "The Prompt Report" (2024), "Chain of Draft" (2025), "Prompting Inversion" (2025), "Few-Shot Dilemma" (2025), "CoT Faithfulness" (Anthropic 2025), "MCP Tool Description Smells" (2026), MASS (ICLR 2026)
  • Production: Claude Code, Cursor, Manus AI, Devin, v0, GitHub Copilot system prompt analysis
  • Anthropic:提示词工程文档、上下文工程博客、Claude 4最佳实践、思考工具
  • OpenAI:平台文档、GPT-4.1/5.x提示词指南、Harness工程
  • Google:Gemini API提示词策略、Gemini 3开发者指南
  • Meta:Llama 4模型卡片、提示词指南
  • DeepSeek:API文档、推理模型指南
  • Mistral:提示词能力、函数调用
  • Cohere:编写有效提示词、RAG指南
  • Qwen:官方文档、QwQ指南
  • xAI:Grok文档、函数调用
  • Amazon:Nova提示词最佳实践
  • 研究:《Principled Instructions》(2024)、《The Prompt Report》(2024)、《Chain of Draft》(2025)、《Prompting Inversion》(2025)、《Few-Shot Dilemma》(2025)、《CoT Faithfulness》(Anthropic 2025)、《MCP Tool Description Smells》(2026)、MASS(ICLR 2026)
  • 生产系统:Claude Code、Cursor、Manus AI、Devin、v0、GitHub Copilot系统提示词分析