prompt-writer

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Prompt Writer

提示词编写工具

Write and improve production-grade prompts for any LLM, grounded in official documentation, peer-reviewed research, and patterns from production systems like Claude Code, Cursor, Manus, and Devin.

为任意大语言模型（LLM）编写并优化生产级提示词，参考依据包括官方文档、同行评审研究，以及Claude Code、Cursor、Manus、Devin等生产系统的实践模式。

Step 0: Determine Mode

步骤0：确定工作模式

Before starting, identify the task:

User intent	Mode	What to do
"Write a prompt for..."	Write	Follow Steps 1-5 below
"Improve/fix/optimize this prompt"	Improve	See Improve Mode below, then `references/prompt-improvement.md` for full workflow
"Review this prompt"	Review	Run Step 4 (Review and Harden) on their prompt
"Write a tool description"	Tool	Jump to the Tool Descriptions section in Step 3

Current as of March 2026. Model guidance reflects Claude 4.6, GPT-5.4, Gemini 3, Llama 4, DeepSeek R1/V3. Verify against official docs for newer releases.

开始前，先明确任务类型：

用户意图	模式	操作内容
「为...编写提示词」	编写模式	遵循下方步骤1-5
「优化/修复/改进这个提示词」	优化模式	先参考下方的优化模式概述，再查看 `references/prompt-improvement.md` 获取完整流程
「审核我的提示词」	审核模式	对用户提供的提示词执行步骤4（审核与强化）
「编写工具描述」	工具模式	直接跳转至步骤3中的工具描述部分

本文档截至2026年3月更新。模型适配指南基于Claude 4.6、GPT 5.4、Gemini 3、Llama 4、DeepSeek R1/V3编写。若使用更新版本模型，请以官方文档为准。

Improve Mode (Quick Summary)

优化模式（快速概述）

When the user has an existing prompt that isn't working:

Diagnose — run through the 6-dimension defect checklist (specification, input, structure, context, performance, maintainability)
Quick fixes — apply the principled instructions checklist (remove politeness, add delimiters, specify audience/format, reframe negatives)
Meta-prompt — give an LLM the prompt + 3-5 failure examples and ask it to rewrite
Rewrite — apply patterns: add specificity, add role+audience, add output schema, add contrastive examples
Evaluate — compare before/after on test cases, check for regressions

For the full workflow with detailed patterns and examples, see

references/prompt-improvement.md

当用户提供的现有提示词效果不佳时：

诊断 — 从6个维度检查缺陷（规格、输入、结构、上下文、性能、可维护性）
快速修复 — 应用标准化优化清单（移除客套表述、添加分隔符、明确受众/格式、重构负面表述）
元提示词优化 — 将原提示词+3-5个失败案例输入LLM，要求其重写
重写优化 — 应用模式：增加细节、明确角色+受众、定义输出 schema、添加对比示例
评估 — 在测试用例上对比优化前后效果，检查是否出现性能退化

如需包含详细模式与示例的完整流程，请查看

references/prompt-improvement.md

。

Boundaries

适用边界

This skill covers writing and improving prompts. It does not cover:

Prompt injection red-teaming (use security-specific tools)
Full RAG pipeline architecture (prompt is one piece; retrieval design is separate)
Fine-tuning or training data curation

本工具专注于提示词的编写与优化，不涵盖以下内容：

提示词注入红队测试（请使用安全专用工具）
完整RAG pipeline架构设计（提示词仅为其中一部分，检索设计为独立环节）
模型微调或训练数据整理

Step 1: Understand the Context

步骤1：理解上下文信息

Before writing, gather:

Target model — Which model family? (See Model Decision Matrix below)
Use case — System prompt, task prompt, agent prompt, tool description?
Tool access — What tools/functions will the agent have?
Audience — Who interacts with this? Technical users, consumers, internal?
Constraints — Length, tone, safety requirements, domain boundaries?
Failure modes — What should the agent refuse or avoid?

If the user hasn't specified these, ask before writing. A prompt without context is a prompt that will fail.

编写提示词前，先收集以下信息：

目标模型 — 属于哪个模型家族？（参考下方的模型决策矩阵）
使用场景 — 是系统提示词、任务提示词、Agent提示词还是工具描述？
工具权限 — Agent将拥有哪些工具/函数调用权限？
受众群体 — 谁会使用这个提示词对应的服务？技术用户、普通消费者还是内部人员？
约束条件 — 长度限制、语气要求、安全规范、领域边界？
失败场景 — Agent需要拒绝哪些请求或避免哪些行为？

如果用户未明确上述信息，请先询问再开始编写。缺乏上下文的提示词必然无法达到预期效果。

Step 2: Choose the Right Structure

步骤2：选择合适的结构

Universal structure adapted from Anthropic, OpenAI, and Google's official guides:

1. Role & Identity        — Who is this agent?
2. Core Instructions      — What are the behavioral rules?
3. Tool/Capability Guide  — How to use available tools
4. Reasoning Guidance     — When and how to think step-by-step
5. Output Format          — What shape should responses take?
6. Examples               — 2-5 demonstrations of desired behavior
7. Context/Reference      — Domain knowledge, loaded last
8. Final Reminders        — Closing behavioral anchors

Not every prompt needs all sections. A simple chatbot needs 1, 2, 5, 6. A complex agent needs all eight. A task prompt may need only 2, 5, 6.

以下通用结构改编自Anthropic、OpenAI和Google的官方指南：

1. Role & Identity        — Who is this agent?
2. Core Instructions      — What are the behavioral rules?
3. Tool/Capability Guide  — How to use available tools
4. Reasoning Guidance     — When and how to think step-by-step
5. Output Format          — What shape should responses take?
6. Examples               — 2-5 demonstrations of desired behavior
7. Context/Reference      — Domain knowledge, loaded last
8. Final Reminders        — Closing behavioral anchors

并非所有提示词都需要包含全部模块。简单聊天机器人只需1、2、5、6模块；复杂Agent需要全部8个模块；任务提示词可能仅需2、5、6模块。

Step 3: Write Each Section

步骤3：编写各模块内容

Role & Identity

角色与身份

One to three sentences. Anchors all subsequent behavior.

You are a senior backend engineer specializing in Python and Django.
You help developers debug production issues by analyzing logs,
tracing errors, and suggesting minimal fixes.

Audience framing amplifies this — stating who the output is for improves relevance by up to 100% in benchmarks.

用1-3句话明确，为后续所有行为提供定位锚点。

You are a senior backend engineer specializing in Python and Django.
You help developers debug production issues by analyzing logs,
tracing errors, and suggesting minimal fixes.

明确受众定位可进一步提升相关性——基准测试显示，明确输出受众可使效果提升高达100%。

Core Instructions

核心指令

Structure with markdown headers or XML tags. Key principles:

Explain the why, not just the what. Models generalize better from motivation than bare commands. Instead of "Never use ellipses", write "Avoid ellipses because the TTS engine cannot pronounce them."

Use positive framing. "Write in complete sentences" outperforms "Don't use fragments." Tell the agent what TO do. Research confirms positive instructions actively boost desired token probabilities, while negatives only weakly suppress.

Be specific enough to verify. "Use 2-space indentation" is testable. "Format code properly" is not.

Resolve contradictions. Models waste reasoning tokens reconciling conflicts. Add explicit priority: "If rule A and rule B conflict, rule A takes precedence."

Match language intensity to model — this is critical:

Model Family	Guidance Style
Claude 4.x	Conversational. Explain reasoning. Avoid CAPS/MUST/NEVER — causes overtriggering. Use "prefer", "because".
GPT-5.x	Direct but not aggressive. Naturally thorough — don't over-prompt for thoroughness.
GPT-4.1	More literal than predecessors. A single clarifying sentence corrects behavior.
Gemini 3	Short, direct specs only. Logic over persuasion. No flowery language.
DeepSeek R1	NO system prompt. Put everything in user message. No few-shot. No "think step by step."
Mistral	Hierarchical markdown sections. Explicit step-by-step helps.
Cohere	Minimal preamble. Pass docs via API, not prompt. RAG-first.
Qwen/QwQ	Standard. For QwQ: non-greedy sampling required (temp=0.6, top_p=0.95).
Grok	Verbose, detailed system prompts work best. XML/MD both fine.
Nova	CAPS DO/MUST/DO NOT encouraged (opposite of Claude!). Numbered steps.
Smaller/Mini	Most literal. Critical rules FIRST. Numbered steps. Full execution order.

For full model-specific guidance, see

references/model-specific-tuning.md

使用Markdown标题或XML标签结构化呈现。关键原则：

解释原因，而非仅说明要求。模型从动机中学习的泛化能力优于生硬指令。例如，不要写「禁止使用省略号」，而应写「避免使用省略号，因为TTS引擎无法识别发音」。

使用正向表述。「使用完整句子」的效果优于「不要使用片段化表述」。告诉模型要做什么，而非不要做什么。研究证实，正向指令能有效提升目标token的概率，而负向指令仅能微弱抑制非目标token。

明确到可验证的程度。「使用2空格缩进」是可测试的要求，而「规范格式化代码」则不是。

解决规则冲突。模型会浪费推理token来调和冲突规则。需添加明确优先级：「若规则A与规则B冲突，以规则A为准」。

语言强度匹配目标模型——这一点至关重要：

模型家族	指导风格
Claude 4.x	对话式风格，解释推理过程。避免使用大写字母/MUST/NEVER——会导致过度触发。使用「建议」「因为」等表述。
GPT 5.x	直接但不生硬。模型本身已具备全面性——无需过度强调「详尽性」。
GPT 4.1	比前代模型更字面化。一句澄清即可修正行为。
Gemini 3	仅保留简短、直接的规格说明。逻辑优先，无需修饰性语言。
DeepSeek R1	禁止使用系统提示词。所有内容放入用户消息中。不使用少样本示例，不使用「逐步思考」。
Mistral	采用分层Markdown章节结构。明确的步骤引导有助于提升效果。
Cohere	最小化前置说明。通过API传递文档，而非放入提示词。优先使用RAG。
Qwen/QwQ	标准风格。对于QwQ：需使用非贪婪采样（temp=0.6，top_p=0.95）。
Grok	冗长、详细的系统提示词效果最佳。XML或Markdown格式均可。
Nova	鼓励使用大写字母的DO/MUST/DO NOT（与Claude完全相反！）。使用编号步骤。
小型/迷你模型	最字面化。关键规则放在最前面。使用编号步骤，明确完整执行顺序。

如需完整的模型专属指导，请查看

references/model-specific-tuning.md

。

The Prompting Inversion (Critical Finding)

提示词反转效应（关键发现）

Techniques effective for mid-tier models can actively harm frontier models. As capability increases, simpler prompts work better. Constraints that prevent common-sense errors in weaker models induce hyper-literalism in stronger ones.

Rule: always calibrate prompt complexity to model capability. When in doubt, start simple.

对中端模型有效的技术可能严重损害前沿模型的性能。随着模型能力提升，更简洁的提示词效果更好。在弱模型中用于避免常识错误的约束，会导致强模型出现过度字面化的问题。

规则：始终根据模型能力调整提示词复杂度。如有疑问，从简洁版本开始。

Tool Descriptions

工具描述

Tool descriptions are prompts — 97.1% of tool descriptions in production have quality defects. Each tool needs a 3-4 sentence contract:

What it does and what it returns
When to use it (activation criteria)
When NOT to use it
Key parameter guidance and limitations

search_codebase: Searches the repository for code matching a query.
Returns matching lines with file paths and line numbers. Use when
you need to find function definitions, usage patterns, or specific
snippets. Do not use for reading entire files — use read_file instead.
Query supports regex. Results are limited to 50 matches.

The six essentials (from MCP tool description research, 2026):

Clear purpose with return data specified
Activation criteria (when to use)
Documented limitations and failure modes
Parameter intent, not just data types
3-4+ sentences proportional to complexity
Examples showing both successful and failing cases

工具描述本质也是提示词——生产环境中97.1%的工具描述存在质量缺陷。每个工具需要3-4句话的契约式描述：

工具功能与返回结果
使用场景（触发条件）
禁用场景
关键参数指导与限制

search_codebase: Searches the repository for code matching a query.
Returns matching lines with file paths and line numbers. Use when
you need to find function definitions, usage patterns, or specific
snippets. Do not use for reading entire files — use read_file instead.
Query supports regex. Results are limited to 50 matches.

MCP工具描述研究（2026）总结的6个核心要素：

明确功能与返回数据
触发条件（何时使用）
文档化的限制与失败场景
参数意图（而非仅数据类型）
3-4句话，长度与复杂度匹配
包含成功与失败案例的示例

Reasoning Guidance

推理指导

Match reasoning instructions to the model:

Reasoning models (Claude extended thinking, GPT o-series, DeepSeek R1, Gemini with thinking, QwQ): Already reason internally. "Think step by step" provides marginal gains at 35-600% more latency. Use "verify your answer against [criteria]" instead.
Non-reasoning models (GPT-4.1, Claude Haiku, Mistral Small): "Think step by step" provides ~50% improvement on math/symbolic tasks. Worth the cost.
Simple tasks on any model: Skip reasoning entirely. CoT can hurt performance on straightforward tasks.

The Think Tool pattern (+54% on complex policy tasks): Give agents a no-op tool called

think

— a callable scratchpad between tool calls. Different from extended thinking. Include domain-specific examples showing HOW to use it.

Chain-of-Draft (cost-efficient alternative to CoT): When reasoning is needed but cost/latency matters, limit each reasoning step to ~5 words. Matches CoT accuracy at 7.6% of the tokens: "Think step by step, but keep only a minimum draft for each step, 5 words at most."

For agents, three reminders consistently improve performance (+20%):

Persistence: "Continue working until the task is fully resolved."
Tool use: "Use tools to gather information rather than guessing."
Planning: "Plan your approach before taking action, and reflect on results after."

推理指令需匹配模型特性：

推理型模型（Claude extended thinking、GPT o-series、DeepSeek R1、带thinking的Gemini、QwQ）：已具备内部推理能力。「逐步思考」仅能带来边际收益，但会增加35-600%的延迟。建议使用「对照[标准]验证你的答案」替代。
非推理型模型（GPT-4.1、Claude Haiku、Mistral Small）：「逐步思考」可使数学/符号任务的性能提升约50%，值得付出延迟成本。
任意模型的简单任务：完全跳过推理指导。思维链（CoT）会损害简单任务的性能。

思考工具模式（复杂策略任务性能提升54%）：为Agent提供一个名为

think

的空操作工具——作为工具调用之间的可调用草稿本。与extended thinking不同，需包含领域特定示例说明如何使用该工具。

草稿链（Chain-of-Draft）（CoT的成本高效替代方案）：当需要推理但关注成本/延迟时，将每个推理步骤限制在约5个词。在保持CoT准确率的同时，仅需7.6%的token消耗：「逐步思考，但每个步骤仅保留最少草稿内容，最多5个词。」

对于Agent，以下3个提醒可稳定提升性能20%：

持续性：「持续工作直至任务完全解决。」
工具使用：「使用工具收集信息，而非猜测。」
规划性：「采取行动前先规划方案，行动后反思结果。」

Output Format

输出格式

Be explicit. Without format guidance, models add prose, markdown fences, or unnecessary structure.

The CTCO pattern (from OpenAI's GPT-5.4 guide — most reliable anti-hallucination structure):

Context — Who is the model? What is the background state?
Task — The single, atomic action required
Constraints — Negative constraints and scope limits
Output — Exact format specification (JSON schema, sections, length)

Respond with a JSON object containing:
- "summary": one-sentence description
- "severity": "low" | "medium" | "high" | "critical"
- "fix": the recommended code change as a unified diff

明确输出格式要求。若无格式指导，模型会添加冗余的 prose、Markdown围栏或不必要的结构。

CTCO模式（来自OpenAI的GPT 5.4指南——最可靠的抗幻觉结构）：

Context（上下文）——模型的定位是什么？背景状态如何？
Task（任务）——单一、原子化的行动要求
Constraints（约束）——负向约束与范围限制
Output（输出）——精确的格式规范（JSON schema、章节、长度）

Respond with a JSON object containing:
- "summary": one-sentence description
- "severity": "low" | "medium" | "high" | "critical"
- "fix": the recommended code change as a unified diff

Examples (Few-Shot)

示例（少样本）

Examples are the highest-impact technique — up to 90% accuracy improvement.

The 2-5 rule: Major gains after 2; diminishing returns after 5. More can paradoxically degrade performance (the "few-shot dilemma" — Gemma 7B dropped from 77.9% to 39.9%).

Quality over quantity: TF-IDF-based example selection outperforms random sampling by 1%+ while avoiding degradation. Choose examples relevant to the actual use case.

Contrastive examples dramatically improve reasoning: showing both a wrong approach (with why it's wrong) and the correct approach. GSM8K: 35.9% → 88.8% with GPT-4 using this pattern.

Order matters: Alternate positive and negative examples to avoid bias.

Format by model:

Claude:

<examples><example><user>...</user><assistant>...</assistant></example></examples>

GPT/others: Markdown headers or numbered examples
DeepSeek R1: NO examples (degrades reasoning)

示例是效果提升最显著的技术——可使准确率提升高达90%。

2-5规则：2个示例后即可获得显著收益；超过5个后收益递减。过多示例反而会降低性能（「少样本困境」——Gemma 7B的准确率从77.9%降至39.9%）。

质量优先于数量：基于TF-IDF选择的示例比随机采样的示例性能提升1%以上，且避免性能退化。选择与实际使用场景相关的示例。

对比示例可大幅提升推理能力：同时展示错误做法（说明错误原因）与正确做法。在GSM8K数据集上，GPT-4使用该模式后准确率从35.9%提升至88.8%。

顺序影响：交替展示正向与负向示例，避免偏见。

按模型选择格式：

Claude：使用

<examples><example><user>...</user><assistant>...</assistant></example></examples>

格式

GPT/其他模型：使用Markdown标题或编号示例
DeepSeek R1：禁止使用示例（会降低推理能力）

Context/Reference

上下文/参考资料

Place long-form reference material AFTER instructions and examples. Queries placed after long context improve quality by up to 30%.

Lost-in-the-middle: Models attend more to the beginning and end of prompts (architectural, not training artifact). Place highest-priority content at the start and end. Never bury critical instructions in the middle.

For Claude, use XML tags:

<context>

<documents>

<reference_material>

将长篇参考资料放在指令与示例之后。查询放在长上下文之后可使效果提升高达30%。

中间信息丢失问题：模型对提示词开头与结尾的内容关注度更高（这是架构特性，而非训练 artifact）。将优先级最高的内容放在开头与结尾。切勿将关键指令埋在中间。

对于Claude，使用XML标签：

<context>

、

<documents>

、

<reference_material>

。

Final Reminders

最终提醒

Close with 1-3 critical behavioral anchors. Models weight content at the beginning and end more heavily.

用1-3个关键行为锚点收尾。模型对开头与结尾内容的权重更高。

Step 4: Review and Harden

步骤4：审核与强化

Structural quality:

Sections clearly delineated (XML tags or markdown headers)
No contradictory instructions
Examples match the stated rules
Tool descriptions are complete contracts (what, when, when-not, limitations)

Language quality:

Positive framing (what to do, not what to avoid)
Motivation provided for non-obvious rules (the "why")
Language intensity matches target model
Specificity is testable ("2-space indent" not "format nicely")

Robustness:

Edge cases addressed or delegated ("if unclear, ask the user")
Failure modes handled ("if the tool errors, retry once then report")
No sensitive data in the prompt (credentials, API keys, PII)
Prompt injection considerations (trust boundaries, input validation)

Efficiency:

No redundant sections
Tool descriptions concise (3-4 sentences, not paragraphs)
Examples minimal but sufficient (2-5, not 10+)
Static content first, dynamic last (enables caching — up to 90% cost reduction)

See

references/security-patterns.md

for defense patterns.

结构质量检查：

模块划分清晰（使用XML标签或Markdown标题）
无冲突指令
示例与规则一致
工具描述为完整契约（功能、使用场景、禁用场景、限制）

语言质量检查：

正向表述（说明要做什么，而非不要做什么）
非明显规则需说明原因（「因为...」）
语言强度匹配目标模型
明确到可验证的程度（「2空格缩进」而非「规范格式化」）

鲁棒性检查：

已处理边缘情况或明确委托（「若有疑问，询问用户」）
已处理失败场景（「若工具调用出错，重试1次后报告」）
提示词中无敏感数据（凭证、API密钥、个人可识别信息）
考虑提示词注入风险（信任边界、输入验证）

效率检查：

无冗余模块
工具描述简洁（3-4句话，而非段落）
示例精简但足够（2-5个，而非10+）
静态内容在前，动态内容在后（支持缓存——可降低高达90%的成本）

如需防御模式，请查看

references/security-patterns.md

。

Step 5: Test and Iterate

步骤5：测试与迭代

Try adversarial inputs — edge cases, ambiguous requests, rule-breaking attempts
Check failure modes — graceful or catastrophic?
Run 5x on same input — check consistency
Iterate one variable at a time — changing model AND prompt simultaneously makes attribution impossible

For systematic prompt improvement, see

references/prompt-improvement.md

尝试对抗性输入——边缘情况、模糊请求、违反规则的尝试
检查失败场景——是优雅降级还是灾难性失败？
同一输入测试5次——检查结果一致性
每次仅迭代一个变量——同时更改模型与提示词会导致无法归因性能变化

如需系统化的提示词优化方法，请查看

references/prompt-improvement.md

。

Model Decision Matrix

模型决策矩阵

Quick reference for choosing prompting strategy by model:

Model	System Prompt	Delimiter Style	Thinking	Key Differentiator
Claude 4.x	Separate role	XML tags	Extended thinking	Soft language, no CAPS
GPT-5.x	System + instructions	Markdown	o-series reasoning	CTCO pattern, reasoning_effort
GPT-4.1	System	Markdown	Manual CoT	Ultra-literal, sandwich method
Gemini 3	System instruction	Hierarchy/outline	thinking_level	Temp must stay 1.0, concise
Llama 4	Special tokens	Special tokens	None	Exact template required
DeepSeek R1	NO system prompt	Plain text in user msg	Auto <think>	Zero-shot only, no examples
DeepSeek V3	Yes	Standard	None	Use for speed, R1 for depth
Mistral	Prepended to user	Markdown sections	None	Prefix injection, JSON dual-instruct
Cohere	Preamble (optional)	Documents API	None	RAG-first, auto-citations
Qwen/QwQ	apply_chat_template	ChatML	enable_thinking	Non-greedy required for QwQ
Grok 3	Separate (verbose)	XML or Markdown	reasoning_effort	Real-time X data, cache-stable
Nova	Separate	Markdown	None	CAPS encouraged, temp=0 for tools

For detailed per-model guidance:

references/model-specific-tuning.md

按模型选择提示词策略的快速参考：

模型	系统提示词	分隔符风格	推理设置	核心差异
Claude 4.x	独立角色设置	XML标签	Extended thinking	温和语言，禁用大写强调
GPT 5.x	系统提示+指令	Markdown	o-series推理	CTCO模式，reasoning_effort参数
GPT 4.1	系统提示	Markdown	手动CoT	超字面化，三明治方法
Gemini 3	系统指令	层级/大纲	thinking_level参数	温度必须保持1.0，内容简洁
Llama 4	特殊令牌	特殊令牌	无	需要精确模板
DeepSeek R1	禁止使用系统提示词	用户消息中使用纯文本	自动<think>	仅支持零样本，无示例
DeepSeek V3	支持系统提示词	标准风格	无	追求速度选V3，追求深度选R1
Mistral	前置到用户消息	Markdown章节	无	前缀注入，JSON双指令
Cohere	可选前言	文档API	无	RAG优先，自动引用
Qwen/QwQ	使用apply_chat_template	ChatML	enable_thinking	QwQ需使用非贪婪采样
Grok 3	独立设置（支持冗长内容）	XML或Markdown	reasoning_effort参数	实时X数据，缓存稳定
Nova	独立设置	Markdown	无	鼓励使用大写强调，工具调用温度设为0

如需详细的单模型指导，请查看

references/model-specific-tuning.md

Quick Reference: Prompt Patterns

快速参考：提示词模式

Agent System Prompt Template

Agent系统提示词模板

markdown

undefined

markdown

undefined

Role

[1-3 sentences: who, what domain, what tone]

Instructions

[Behavioral rules, organized by category] [Each rule with motivation: "because..."]

Tools

[For each tool: what it does, when to use, when not to use, limitations]

Output Format

[Explicit format specification]

Examples

[2-5 diverse, relevant demonstrations]

Context

[Reference material, loaded last]

Reminders

[1-3 critical behavioral anchors]

undefined

[1-3 critical behavioral anchors]

undefined

Common Prompt Recipes

常见提示词模板

Scenario	Key Sections	Notes
Simple chatbot	Role, Instructions, Format, Examples	Skip tools and reasoning
Coding agent	All 8 sections	Add persistence + planning reminders, think tool
Data analyst	Role, Instructions, Tools, Format	Emphasize structured output
Content writer	Role, Instructions, Examples, Format	Heavy on examples and tone
Customer support	Role, Instructions, Examples, Context	Load FAQ as context
Multi-agent orchestrator	Role, Instructions, Tools, Reasoning	Define delegation rules, effort budgets
Task prompt (one-shot)	Instructions, Format, Examples	No role needed, be specific
Prompt improvement	See `references/prompt-improvement.md`	Diagnose → fix → evaluate

场景	核心模块	注意事项
简单聊天机器人	角色、指令、格式、示例	跳过工具与推理模块
编码Agent	全部8个模块	添加持续性+规划提醒，使用思考工具
数据分析师	角色、指令、工具、格式	强调结构化输出
内容创作者	角色、指令、示例、格式	重点关注示例与语气
客户支持	角色、指令、示例、上下文	将FAQ作为上下文加载
多Agent编排器	角色、指令、工具、推理	定义委托规则、资源预算
一次性任务提示词	指令、格式、示例	无需角色模块，内容需具体
提示词优化	查看 `references/prompt-improvement.md`	诊断→修复→评估

Multimodal Prompting

多模态提示词

For models that support vision (Claude, GPT-4o+, Gemini, Llama 4, Grok):

Image placement matters:

Gemini: image BEFORE text (official recommendation)
Claude/GPT: image position is flexible, but place relevant images near the text that references them
Multiple images: reference each by name or position ("In the first image...")

Key patterns:

Be explicit about what to do with the image: "transcribe", "describe", "extract table data", "compare these two screenshots"
For OCR/extraction: specify exact output format (JSON, table, list)
For analysis: provide evaluation criteria, not open-ended "what do you see?"
For diagrams/charts: ask for specific data points, not general description

Token costs vary significantly:

Gemini: 258 tokens per image tile (≤384px), video at 263 tokens/sec
Claude: ~1,600 tokens for a typical image
Budget accordingly — 10 images can consume 16K+ tokens

对于支持视觉的模型（Claude、GPT-4o+、Gemini、Llama 4、Grok）：

图片位置影响：

Gemini：图片放在文本之前（官方推荐）
Claude/GPT：图片位置灵活，但需将相关图片放在引用文本附近
多张图片：通过名称或位置引用（「在第一张图片中...」）

关键模式：

明确说明对图片的操作：「转录」「描述」「提取表格数据」「对比这两张截图」
OCR/提取任务：指定精确输出格式（JSON、表格、列表）
分析任务：提供评估标准，而非开放式的「你看到了什么？」
图表/图形：要求提取具体数据点，而非泛泛描述

Token成本差异显著：

Gemini：每个图片块（≤384px）消耗258 tokens，视频每秒消耗263 tokens
Claude：单张典型图片消耗约1600 tokens
合理预算——10张图片可消耗16000+ tokens

Structured Output (JSON Mode)

结构化输出（JSON模式）

Each model handles structured output differently:

Model	How to enable	Key requirement
Claude	Request JSON in instructions + use XML output tags	Specify exact schema in prompt
GPT	`response_format: {type: "json_object"}` or Structured Outputs with JSON Schema	Must mention "JSON" in prompt
Gemini	`responseMimeType: "application/json"` + `responseSchema`	Define `required` arrays explicitly (avoids nulls)
DeepSeek	`response_format: {type: "json_object"}`	Include "json" in prompt + set adequate `max_tokens`
Mistral	`response_format: {type: "json_object"}`	Must ALSO instruct JSON in prompt text (dual requirement)
Llama	Instruct in prompt; validate output	No native JSON mode — always validate server-side
Cohere	Instruct in prompt	Use document grounding for factual JSON
Qwen	Instruct in prompt	Use `apply_chat_template`
Grok	Pydantic/JSON Schema via Structured Outputs	Use schema definitions
Nova	Instruct in prompt	Set `max_tokens` high enough to avoid truncation

Universal best practice: Always provide the exact JSON schema with field types, not just "return JSON". Show one complete example of the expected output.

各模型处理结构化输出的方式不同：

模型	启用方式	核心要求
Claude	指令中要求JSON输出+使用XML输出标签	提示词中指定精确schema
GPT	`response_format: {type: "json_object"}` 或带JSON Schema的结构化输出	提示词中必须提及「JSON」
Gemini	`responseMimeType: "application/json"` + `responseSchema`	显式定义 `required` 数组（避免null值）
DeepSeek	`response_format: {type: "json_object"}`	提示词中包含「json」+设置足够的 `max_tokens`
Mistral	`response_format: {type: "json_object"}`	必须同时在提示词文本中要求JSON输出（双重要求）
Llama	提示词中要求；需验证输出	无原生JSON模式——必须在服务端验证输出
Cohere	提示词中要求	使用文档 grounding 生成事实性JSON
Qwen	提示词中要求	使用 `apply_chat_template`
Grok	通过结构化输出功能传入Pydantic/JSON Schema	使用schema定义
Nova	提示词中要求	设置足够的 `max_tokens` 避免截断

通用最佳实践：始终提供包含字段类型的精确JSON schema，而非仅要求「返回JSON」。同时展示一个完整的预期输出示例。

Before/After Examples

优化前后示例

Vague → Specific (any model)

模糊→具体（任意模型）

Before: "Summarize this article" After: "Summarize this article in 3 bullet points for a product manager. Each bullet: one sentence, starts with an action verb. Focus on decisions needed, not background."

优化前：「总结这篇文章」 优化后：「为产品经理总结这篇文章，输出3个要点。每个要点为1句话，以动作动词开头。重点关注需要做出的决策，而非背景信息。」

Generic → Model-Tuned (Claude 4.x)

通用→模型适配（Claude 4.x）

Before: "You MUST ALWAYS follow these CRITICAL rules: NEVER use markdown. ALWAYS respond in JSON." After: "Respond in JSON format because the output feeds directly into our parser. Avoid markdown formatting since the parser doesn't handle it. Here's the expected schema: {example}"

优化前：「你必须始终遵守这些关键规则：禁止使用Markdown。始终返回JSON。」 优化后：「请返回JSON格式输出，因为输出将直接传入我们的解析器。避免使用Markdown格式，因为解析器不支持。以下是预期的schema：{示例}」

Overloaded → Decomposed (any model)

过载→拆分（任意模型）

Before: "Read this codebase, find all security vulnerabilities, fix them, write tests, update the docs, and create a PR." After (prompt 1 of 3): "Scan this codebase for security vulnerabilities. For each finding, output: file path, line number, vulnerability type (OWASP category), severity (high/medium/low), and a one-line fix description. Output as JSON array."

优化前：「阅读这个代码库，找出所有安全漏洞，修复它们，编写测试，更新文档，并创建PR。」 优化后（第1/3个提示词）：「扫描这个代码库以查找安全漏洞。对于每个发现，输出：文件路径、行号、漏洞类型（OWASP分类）、严重程度（高/中/低）、一行修复描述。以JSON数组格式输出。」

Missing Context → Complete (DeepSeek R1)

缺失上下文→完整（DeepSeek R1）

Before (system prompt): "You are a code reviewer." After (user message — no system prompt): "Task: Review this Python function for bugs and performance issues. Requirements: Focus on correctness first, then performance. Flag any edge cases that would cause exceptions. Output format: For each issue: {line, issue_type, severity, fix}"

优化前（系统提示词）：「你是代码审核员。」 优化后（用户消息——无系统提示词）：「任务：审核这个Python函数的bug与性能问题。要求：优先关注正确性，其次是性能。标记任何会导致异常的边缘情况。输出格式：每个问题包含：{行号、问题类型、严重程度、修复方案}」

No Examples → With Contrastive Example (GPT-4.1)

无示例→带对比示例（GPT 4.1）

Before: "Classify customer feedback as positive, negative, or neutral." After: "Classify customer feedback as positive, negative, or neutral.

INCORRECT approach: 'The product works but the delivery was late' → positive (wrong: mixed sentiment defaults to negative when a complaint is present) CORRECT approach: 'The product works but the delivery was late' → negative (delivery complaint outweighs neutral product statement)

Classify: {input}"

优化前：「将客户反馈分类为正面、负面或中性。」 优化后：「将客户反馈分类为正面、负面或中性。

错误示例：'产品能用但发货延迟' → 正面（错误：混合情感中存在投诉时，默认归类为负面）正确示例：'产品能用但发货延迟' → 负面（发货投诉的权重超过产品的中性评价）

请分类：{输入}」

Context Engineering (The Current Paradigm)

上下文工程（当前主流范式）

Prompt engineering is now context engineering — curating the minimal high-signal token set the model needs. Every instruction should earn its place in the context window.

Four strategies (Anthropic):

Compaction — Summarize and reinitiate near context limits
Structured note-taking — Agents write persistent notes outside the context window
Sub-agent architectures — Specialized agents return 1,000-2,000 token summaries
Hybrid retrieval — Combine upfront loading with just-in-time exploration

Context rot degrades quality in long conversations:

Poisoning: errors/hallucinations repeatedly referenced
Distraction: irrelevant content drowning signal
Confusion: superfluous information degrading quality
Clash: conflicting instructions from different sources

Prompt caching changes how prompts should be structured:

Static content first (role, instructions, tool definitions) — cacheable
Dynamic content last (user query, conversation) — not cached
Up to 90% cost reduction, 85% latency reduction
Break-even at 1.4+ cache reads per prefix

提示词工程现已演变为上下文工程——整理模型所需的最小高信号token集合。每个指令都应在上下文窗口中占据合理位置。

四种策略（来自Anthropic）：

压缩 — 接近上下文限制时，总结并重新初始化对话
结构化笔记 — Agent在上下文窗口外编写持久化笔记
子Agent架构 — 专用Agent返回1000-2000 token的摘要
混合检索 — 结合预加载与即时探索

上下文退化会降低长对话的质量：

污染：错误/幻觉被反复引用
干扰：无关内容淹没有效信号
混淆：冗余信息降低质量
冲突：不同来源的指令相互矛盾

提示词缓存改变了提示词的结构设计方式：

静态内容在前（角色、指令、工具定义）——可缓存
动态内容在后（用户查询、对话历史）——不可缓存
最多可降低90%的成本，减少85%的延迟
当每个前缀的缓存读取次数≥1.4时，即可实现收支平衡

Anti-Patterns

反模式

Anti-Pattern	Why It Fails	Fix
ALL-CAPS MUST/NEVER	Claude 4.x overtriggers; GPT-5 wastes tokens (except Nova)	Explain the why; use CAPS only for Nova
"Be thorough"	Modern models are already thorough; causes overengineering	Specify exact scope and boundaries
Contradictory rules	Models attempt both, degrading quality	Add explicit priority ordering
20+ examples	Burns context; paradoxically degrades performance	Use 2-5 diverse, TF-IDF-selected examples
Vague instructions	Not testable, not actionable	Make each rule verifiable
Negative-only framing	"Don't X" is weaker than "Do Y"	Reframe as positive instructions
Same prompt across models	Each family responds differently (Inversion effect)	Tune per model; start simple for frontier
No examples at all	Removes highest-impact technique	Add 2+ demonstrations (except DeepSeek R1)
Monolithic wall of text	Hard to parse, sections blur	Use headers, XML tags, or delimiters
CoT on reasoning models	35-600% more latency, marginal gains	Use self-verification instead
One-sentence tool descriptions	97% of tool descriptions have defects	Use 3-4 sentence contracts with limitations
Critical info in the middle	Lost-in-the-middle: models attend to start/end	Put critical content at beginning and end

反模式	失败原因	修复方案
全大写MUST/NEVER	Claude 4.x会过度触发；GPT 5会浪费token（Nova除外）	说明原因；仅在Nova模型中使用大写强调
「请务必详尽」	现代模型本身已具备详尽性；会导致过度设计	明确精确的范围与边界
冲突规则	模型会尝试同时满足所有规则，降低质量	添加明确的优先级排序
20+示例	占用上下文；反而会降低性能	使用2-5个多样化、基于TF-IDF选择的示例
模糊指令	不可测试、不可执行	使每个规则可验证
仅负向表述	「不要做X」的效果弱于「要做Y」	重构为正向指令
跨模型使用相同提示词	每个模型家族的响应方式不同（反转效应）	按模型调整提示词；前沿模型从简洁版本开始
无任何示例	放弃了效果提升最显著的技术	添加2+示例（DeepSeek R1除外）
大段无结构文本	难以解析，模块边界模糊	使用标题、XML标签或分隔符
推理型模型使用CoT	延迟增加35-600%，仅获边际收益	使用自验证替代
单句工具描述	97%的工具描述存在缺陷	使用3-4句话的契约式描述，包含限制
关键信息放在中间	中间信息丢失：模型更关注开头与结尾	将关键内容放在开头与结尾

Security Considerations

安全考量

For agents with tool access or external data:

Separate trusted/untrusted content — XML tags or delimiters to mark system vs user input
Least privilege — only grant tools the agent actually needs
Input validation guidance — instruct the agent to validate before acting
Spotlighting — mark untrusted data with delimiters/encoding (reduces attacks from >50% to <2%)
Graceful refusal — define what the agent should decline and how

See

references/security-patterns.md

for the full defense hierarchy and architectural patterns.

对于拥有工具调用权限或访问外部数据的Agent：

区分可信/不可信内容 — 使用XML标签或分隔符标记系统输入与用户输入
最小权限原则 — 仅授予Agent实际需要的工具权限
输入验证指导 — 指导Agent在行动前验证输入
高亮标记 — 使用分隔符/编码标记不可信数据（可将攻击成功率从>50%降至<2%）
优雅拒绝 — 定义Agent应拒绝的请求类型与方式

如需完整的防御层级与架构模式，请查看

references/security-patterns.md

。

Additional Resources

附加资源

references/templates.md
— Ready-to-use prompt templates: coding agent, chatbot, data extraction, content writer, orchestrator, per-model skeletons (Claude/GPT/Gemini/DeepSeek/Llama/Cohere/Nova), meta-prompts for rewriting/diagnosing/optimizing, tool description template
references/model-specific-tuning.md
— Per-model guidance for all 10 families: language patterns, structural preferences, tool calling, anti-patterns, migration tips
references/prompt-improvement.md
— Systematic workflow for improving existing prompts: 6-dimension defect taxonomy, diagnostic checklist, rewriting patterns, meta-prompting, evaluation
references/security-patterns.md
— Defense hierarchy, spotlighting, OWASP Top 10 for LLMs, architectural patterns, trust boundaries
references/research-evidence.md
— Key papers with measured improvements: technique rankings, prompting inversion, CoT findings, few-shot dilemma, context engineering, automated optimization
references/production-prompt-anatomy.md
— Structural analysis of Claude Code, Cursor, Manus, Devin, v0: universal patterns, conditional assembly, three eras, caching architecture

references/templates.md
— 即用型提示词模板：编码Agent、聊天机器人、数据提取、内容创作、编排器、各模型骨架（Claude/GPT/Gemini/DeepSeek/Llama/Cohere/Nova）、重写/诊断/优化用元提示词、工具描述模板
references/model-specific-tuning.md
— 10大模型家族的单模型指导：语言模式、结构偏好、工具调用、反模式、迁移技巧
references/prompt-improvement.md
— 系统化的提示词优化流程：6维度缺陷分类、诊断清单、重写模式、元提示词、评估方法
references/security-patterns.md
— 防御层级、高亮标记、LLM的OWASP Top 10、架构模式、信任边界
references/research-evidence.md
— 关键研究论文与实测效果：技术排名、提示词反转效应、CoT发现、少样本困境、上下文工程、自动化优化
references/production-prompt-anatomy.md
— 生产级提示词结构分析：Claude Code、Cursor、Manus、Devin、v0的通用模式、条件组装、三个时代、缓存架构

Sources

参考来源

Anthropic: Prompt Engineering Docs, Context Engineering Blog, Claude 4 Best Practices, Think Tool
OpenAI: Platform Docs, GPT-4.1/5.x Prompting Guides, Harness Engineering
Google: Gemini API Prompting Strategies, Gemini 3 Developer Guide
Meta: Llama 4 Model Cards, Prompting Guide
DeepSeek: API Docs, Reasoning Model Guide
Mistral: Prompting Capabilities, Function Calling
Cohere: Crafting Effective Prompts, RAG Guide
Qwen: Official Docs, QwQ Guide
xAI: Grok Docs, Function Calling
Amazon: Nova Prompting Best Practices
Research: "Principled Instructions" (2024), "The Prompt Report" (2024), "Chain of Draft" (2025), "Prompting Inversion" (2025), "Few-Shot Dilemma" (2025), "CoT Faithfulness" (Anthropic 2025), "MCP Tool Description Smells" (2026), MASS (ICLR 2026)
Production: Claude Code, Cursor, Manus AI, Devin, v0, GitHub Copilot system prompt analysis

Anthropic：提示词工程文档、上下文工程博客、Claude 4最佳实践、思考工具
OpenAI：平台文档、GPT-4.1/5.x提示词指南、Harness工程
Google：Gemini API提示词策略、Gemini 3开发者指南
Meta：Llama 4模型卡片、提示词指南
DeepSeek：API文档、推理模型指南
Mistral：提示词能力、函数调用
Cohere：编写有效提示词、RAG指南
Qwen：官方文档、QwQ指南
xAI：Grok文档、函数调用
Amazon：Nova提示词最佳实践
研究：《Principled Instructions》（2024）、《The Prompt Report》（2024）、《Chain of Draft》（2025）、《Prompting Inversion》（2025）、《Few-Shot Dilemma》（2025）、《CoT Faithfulness》（Anthropic 2025）、《MCP Tool Description Smells》（2026）、MASS（ICLR 2026）
生产系统：Claude Code、Cursor、Manus AI、Devin、v0、GitHub Copilot系统提示词分析