schema-first-prompting

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Schema-First Prompting

Schema优先的提示词设计

Design patterns for LLM structured output: what belongs in the schema, what belongs in the prompt, and how to avoid duplication.

Most important rule: be brief, clean, elegant, and internally consistent.

LLM结构化输出的设计模式：哪些内容应该放在Schema中，哪些应该放在提示词中，以及如何避免重复。

最重要的原则：简洁、清晰、优雅、内部一致。

Core principles

核心原则

The schema is the spec. The Pydantic model is the structural contract. The prompt carries intent, tone, and constraints the schema cannot encode. Never duplicate between them.
One model per shape. Optional fields that only apply to some shapes are a smell — use separate models or a discriminated union.
Only model what the LLM must produce. If a value is fixed once you know the variant or slot, derive it in code. The LLM must not be asked to output something the pipeline already knows.
Separate models by layer. LLM extraction shape, public API request/response, and persistence/DB rows have different fields and invariants. One "god model" for all layers leaks fields across boundaries and breaks when any layer changes.
Reasoning comes first. Because autoregressive models generate tokens sequentially, a chain-of-thought field must be the first field in the model — placing it after the target data defeats the purpose entirely.
No contradictions. If the model says "exactly 6", the prompt must not say "4-6". Contradictions create hidden bugs, worse than either version being slightly imperfect on its own.
Start minimal. Begin with the simplest schema that could work. Add reasoning fields, submodels, and constraints only when the output proves they are needed. Complexity should be earned by failure, not anticipated by speculation.

Schema就是规范。 Pydantic模型是结构契约。提示词承载Schema无法编码的意图、语气和约束。绝对不要在两者之间重复定义内容。
一种结构对应一个模型。 仅适用于部分场景的可选字段是坏味道——请使用独立模型或可辨识联合。
仅为LLM必须输出的内容建模。 如果某个值在你确定变体或槽位后就固定了，请在代码中推导生成。绝对不要要求LLM输出流水线已经知晓的内容。
按层级拆分模型。 LLM提取结构、公开API请求/响应、持久化/数据库行有不同的字段和约束。为所有层级设计一个“全能模型”会导致字段跨边界泄露，且任意层级变更时都会出现故障。
推理优先。 由于自回归模型是按顺序生成Token的，思维链字段必须是模型中的第一个字段——将其放在目标数据之后会完全失去意义。
无矛盾。 如果模型要求“恰好6个”，提示词就不能说“4-6个”。矛盾会产生隐藏漏洞，比任一版本本身存在小缺陷的后果严重得多。
从最小化开始。 从可行的最简单Schema开始搭建。仅当输出证明确实需要时，再添加推理字段、子模型和约束。复杂度应该由失败验证而来，而非基于猜测提前预设。

Why Pydantic for LLM output

为什么选择Pydantic处理LLM输出

LLM completions are untrusted input. Same reasons Pydantic helps for HTTP or config: coercion to the right types, clear failures on bad data, and a JSON Schema that providers can attach to structured-output or tool/function parameters so the model is steered toward a shape you can parse.

Do not assume the model returns clean JSON strings. Raw text may include markdown fences or leading prose. Extract JSON (or use the provider's native structured output) before

model_validate

model_validate_json

. Sanitizers and validators are complementary: pre-parse cleanup vs. post-parse rules.

LLM补全结果是不可信输入。和Pydantic适用于HTTP或配置场景的原因一致：可以强制转换为正确的类型，遇到坏数据时会清晰报错，还能生成JSON Schema供服务商绑定到结构化输出或工具/函数参数，从而引导模型生成你可以解析的结构。

不要假设模型会返回干净的JSON字符串。原始文本可能包含Markdown围栏或前置说明文字。先提取JSON（或使用服务商原生的结构化输出能力），再执行

model_validate

model_validate_json

。清理器和验证器是互补的：分别负责解析前的清理和解析后的规则校验。

Design for how LLMs actually work

针对LLM的实际工作原理做设计

A schema is an interface to a language model. Design it around what models are good and bad at, not what looks clean in an ORM.

Schema是与大语言模型对接的接口。你需要围绕模型的优缺点来设计，而不是只追求ORM层面的外观整洁。

Ask for decisions, not estimates

请求决策而非估算

LLMs are poor at producing absolute numeric values — durations in milliseconds, pixel coordinates, precise word counts. They are much better at relative and categorical decisions: which word does an effect start on, which of three severity levels applies, which item comes first. When you need a numeric output, reframe it as a choice (bins, ranks, named anchors) wherever possible. If you must ask for a number, keep the range small and well-defined in the field description.

LLM不擅长生成绝对数值——比如毫秒级时长、像素坐标、精确的字数。它们更擅长相对决策和分类决策：效果从哪个词开始生效、适用三个 severity 等级中的哪一个、哪个条目排在前面。当你需要数值输出时，尽可能将其重构为选择形式（区间、排名、命名锚点）。如果必须请求数值，请保持范围较小，并在字段描述中明确定义。

Scope the context per step

按步骤限定上下文范围

Dumping an entire manuscript into one call and asking for a complex nested output is a recipe for degraded quality in the tail. Break large pipelines into focused steps, each with its own schema, where the input is scoped to what that step needs. Use prompt caching to avoid re-sending shared context (style guides, instructions) on every call, but restart the generation context for each step so the model's attention is fresh. This is not just a cost optimization — it directly improves output quality on later fields.

把整份文稿一次性传入单次调用并请求复杂的嵌套输出，很容易导致长尾场景下的质量下降。将大型流水线拆分为聚焦的步骤，每个步骤对应自己的Schema，输入仅限定为该步骤所需的内容。使用提示词缓存避免每次调用都重复发送共享上下文（风格指南、说明），但为每个步骤重新生成上下文，让模型的注意力保持新鲜。这不仅是成本优化——还能直接提升后续字段的输出质量。

Ordering fields for generation quality

为提升生成质量调整字段顺序

Autoregressive models commit to tokens left-to-right, top-to-bottom. This means field order in your schema is not cosmetic:

Reasoning / chain-of-thought fields first — before the target data they inform.
High-level decisions before details — a
```
tone
```
or
```
strategy
```
field before the
```
body_text
```
it should shape.
Independent fields before dependent ones — if field B's quality depends on field A, A must come first in the schema.

自回归模型按照从左到右、从上到下的顺序生成Token。这意味着Schema中的字段顺序不只是外观问题：

推理/思维链字段放最前面——排在它们支撑的目标数据之前。
高层决策放在细节之前——比如
```
tone
```
或
```
strategy
```
字段要排在它们影响的
```
body_text
```
之前。
独立字段放在依赖字段之前——如果字段B的质量依赖字段A，A必须在Schema中排在前面。

Match the model's grain

匹配模型的处理粒度

If a task is easy for the model, do not add reasoning fields, intermediate steps, or scaffolding "just in case." Extra fields cost tokens and can reduce quality by forcing the model to fill structure it does not need. Conversely, if a task is hard (multi-entity extraction, long-range consistency), invest in reasoning fields and break the work into steps. The right amount of structure depends on observed difficulty, not on how important the task feels.

如果某个任务对模型来说很简单，不要“以防万一”添加推理字段、中间步骤或支架结构。多余的字段会消耗Token，还可能因为强制模型填充不需要的结构而降低输出质量。反过来，如果任务难度很高（多实体提取、长范围一致性），则需要投入精力设计推理字段并将工作拆分为多个步骤。合适的结构数量取决于实际观察到的难度，而非任务的主观重要性。

Models

模型设计

The type is the data

类型即数据

If a value is fixed once you know the variant or slot, do not put it in the model. Derive it in code (mapping,

match

, small helper). The LLM must not be asked to output something the pipeline already knows.

如果某个值在确定变体或槽位后就固定了，不要把它放在模型里。在代码中推导生成（映射、

match

匹配、小型辅助函数）。不要要求LLM输出流水线已经知晓的内容。

One model per shape

一种结构对应一个模型

Each distinct output shape gets its own model. Optional fields that only apply to some shapes are a smell — use separate models or a discriminated union (see below).

每个不同的输出结构都要有自己的模型。仅适用于部分场景的可选字段是坏味道——请使用独立模型或可辨识联合（见下文）。

Avoid models that are only one string

避免仅包含单个字符串的模型

BaseModel

with a single

str

field adds nesting and schema noise without adding structure. Prefer a plain field on the parent with

Field(description=...)

python

undefined

仅包含单个

str

字段的

BaseModel

只会增加嵌套和Schema噪音，不会带来额外的结构价值。推荐在父模型中直接使用普通字段搭配

Field(description=...)

。

python

undefined

Bad — wrapper adds nothing

不好的写法 —— 包装层没有任何价值

class ClosingSummary(BaseModel): text: str = Field(description="2-3 sentences.")

class Report(BaseModel): closing: ClosingSummary

class ClosingSummary(BaseModel): text: str = Field(description="2-3 sentences.")

class Report(BaseModel): closing: ClosingSummary

Good

好的写法

class Report(BaseModel): closing: str = Field(description="Closing summary: 2-3 sentences.")


Keep a dedicated model when there are **at least two** meaningful fields, or when you are grouping a stable sub-object at a known serialization boundary that will genuinely grow. "Will grow" means there is a concrete next field on the roadmap — not a hypothetical one.

class Report(BaseModel): closing: str = Field(description="Closing summary: 2-3 sentences.")


只有当存在**至少两个**有意义的字段，或者你要在明确的序列化边界下分组稳定的子对象，且该子对象确实会扩展功能时，才保留独立模型。“会扩展”指的是路线图中已经有明确的下一个要添加的字段——而非假设性的需求。

How you discriminate (pick one pattern)

可辨识联合的实现方式（选择一种模式）

1. Fixed slots (no redundant
type
inside)

When the JSON has named slots, the key tells you the shape. Do not add

kind

action

inside each child if it only repeats what the key already says.

python

class OutlineSection(BaseModel):
    title: str = Field(description="Section heading.")
    bullets: list[str] = Field(default_factory=list, max_length=8)

class DocumentPlan(BaseModel):
    outline: OutlineSection
    summary: str = Field(description="Closing summary: 2-3 sentences.")

2. Tagged union (one field, many possible shapes)

When a single value must be one of several shapes (e.g. a list of heterogeneous steps), JSON has no slot name — you need a discriminator for deserialization, not for documentation fluff.

python

from typing import Annotated, Literal, Union

from pydantic import BaseModel, Field

class SearchStep(BaseModel):
    kind: Literal["search"] = "search"
    query: str

class AnswerStep(BaseModel):
    kind: Literal["answer"] = "answer"
    text: str

Step = Annotated[
    Union[SearchStep, AnswerStep],
    Field(discriminator="kind"),
]

class Plan(BaseModel):
    steps: list[Step]

Here

kind

is not redundant with the class name: it is the wire format tag the model must emit so the union parses. Do not also mirror it as a second field (

action

step_type

) saying the same thing.

3. Conditional schema (branch omits a subtree)

If a branch should not generate a subtree at all, use a different root model — not

risk_section: RiskAssessment | None

plus prompt prose saying "omit when…".

python

class RiskAssessment(BaseModel):
    summary: str
    severity: Literal["low", "medium", "high"]

class PlanWithRisks(BaseModel):
    outline: OutlineSection
    summary: str = Field(description="Closing summary.")
    risk_section: RiskAssessment

class PlanWithoutRisks(BaseModel):
    outline: OutlineSection
    summary: str = Field(description="Closing summary.")
    # risk_section does not exist on this model at all

Select

PlanWithRisks

PlanWithoutRisks

before the LLM call from application context.

1. 固定槽位（内部无需冗余
type
字段）

当JSON有命名槽位时，键名已经可以说明结构。如果子对象内部的

kind

action

字段只是重复键名已有的信息，就不要添加这类字段。

python

class OutlineSection(BaseModel):
    title: str = Field(description="Section heading.")
    bullets: list[str] = Field(default_factory=list, max_length=8)

class DocumentPlan(BaseModel):
    outline: OutlineSection
    summary: str = Field(description="Closing summary: 2-3 sentences.")

2. 标签联合（一个字段对应多种可能的结构）

当单个值必须是多种结构中的一种时（比如异构步骤的列表），JSON没有槽位名称——你需要一个判别器用于反序列化，而非作为冗余的说明内容。

python

from typing import Annotated, Literal, Union

from pydantic import BaseModel, Field

class SearchStep(BaseModel):
    kind: Literal["search"] = "search"
    query: str

class AnswerStep(BaseModel):
    kind: Literal["answer"] = "answer"
    text: str

Step = Annotated[
    Union[SearchStep, AnswerStep],
    Field(discriminator="kind"),
]

class Plan(BaseModel):
    steps: list[Step]

这里的

kind

字段和类名并不冗余：它是模型必须输出的传输格式标签，用于联合类型的解析。不要额外添加第二个字段（

action

、

step_type

）重复表达相同的含义。

3. 条件Schema（分支省略子树）

如果某个分支完全不需要生成子树，请使用不同的根模型——而不是定义

risk_section: RiskAssessment | None

再加上提示词说明“在XX情况下省略”。

python

class RiskAssessment(BaseModel):
    summary: str
    severity: Literal["low", "medium", "high"]

class PlanWithRisks(BaseModel):
    outline: OutlineSection
    summary: str = Field(description="Closing summary.")
    risk_section: RiskAssessment

class PlanWithoutRisks(BaseModel):
    outline: OutlineSection
    summary: str = Field(description="Closing summary.")
    # 这个模型完全不存在risk_section字段

在调用LLM之前，就根据应用上下文选择使用

PlanWithRisks

还是

PlanWithoutRisks

。

Only model what the LLM must produce

仅为LLM必须输出的内容建模

Include	Exclude
Text, labels, lists the model must author	Values derived from variant/slot
Structure the model must choose	Defaults your code will apply
Fields downstream truly consumes from validated output	"Helper" fields you merge in after validation

包含	排除
模型必须生成的文本、标签、列表	从变体/槽位推导而来的值
模型必须选择的结构	你的代码会应用的默认值
下游从验证后的输出中实际消费的字段	你在验证后合并的“辅助”字段

Known vs. unknown structure

已知结构与未知结构

Known: nested models,
```
max_length
```
/
```
max_items
```
, enums or
```
Literal
```
where the LLM must pick from a closed set. However, some providers' strict modes forbid validation keywords — see Provider-specific strict mode.
Unknown:
```
dict[str, Any]
```
, loose
```
list[dict[str, Any]]
```
— use only where the content is genuinely open-ended. Similarly,
```
additionalProperties
```
must be set to
```
false
```
or strictly typed in strict mode, as empty dictionary annotations will cause immediate failure.

A known concept should not be

dict[str, Any]

已知结构：嵌套模型、
```
max_length
```
/
```
max_items
```
、枚举或
```
Literal
```
类型，要求LLM从封闭集合中选择。但部分服务商的严格模式禁止使用验证关键字——请参考服务商专属严格模式。
未知结构：
```
dict[str, Any]
```
、宽松的
```
list[dict[str, Any]]
```
——仅在内容确实是开放场景时使用。同理，在严格模式下
```
additionalProperties
```
必须设置为
```
false
```
或严格类型，空字典注解会直接导致报错。

已知概念不应该用

dict[str, Any]

类型。

Field hygiene

字段规范

Mutable defaults:
```
default_factory=list
```
, never
```
[]
```
.
```
Field(description=...)
```
guides the model; avoid internal jargon.
Drop fields nothing produces; drop "legacy" aliases.
Do not create unnecessary submodels. Every nested model should earn its keep by adding real structure, clearer validation, or a stable reusable concept.

可变默认值：使用
```
default_factory=list
```
，绝对不要用
```
[]
```
。
```
Field(description=...)
```
可以引导模型输出；避免使用内部黑话。
删除不会被任何逻辑生成的字段；删除“遗留”别名。
不要创建不必要的子模型。每个嵌套模型都应该能产生实际价值：比如增加真实结构、更清晰的验证，或是稳定的可复用概念。

Names should be reasonable

命名要合理

Choose names that are short, specific, and easy to read in both code and JSON Schema.

Prefer names that describe the actual concept, not the implementation accident.
Avoid vague names like
```
data
```
,
```
info
```
,
```
payload
```
,
```
value
```
,
```
type2
```
, or
```
misc
```
.
Avoid ornamental naming: if
```
BannerCopy
```
says it, do not name a field
```
banner_copy_text_value
```
.
Keep sibling fields parallel: if one field is
```
quote_text
```
, the related field should probably be
```
quote_source
```
, not
```
quoteAttributionLine
```
.
Rename awkward names early. Small schema names spread everywhere: prompts, validators, logs, tests, and downstream code.

选择简短、明确、在代码和JSON Schema中都易读的名称。

优先选择描述实际概念的名称，而非实现相关的临时名称。
避免模糊的名称，比如
```
data
```
、
```
info
```
、
```
payload
```
、
```
value
```
、
```
type2
```
或
```
misc
```
。
避免冗余命名：如果
```
BannerCopy
```
已经能说明含义，就不要把字段命名为
```
banner_copy_text_value
```
。
保持同级字段命名平行：如果一个字段叫
```
quote_text
```
，相关的字段应该叫
```
quote_source
```
，而非
```
quoteAttributionLine
```
。
尽早修改不合适的名称。简短的Schema名称会扩散到所有地方：提示词、验证器、日志、测试和下游代码。

Base classes

基类设计

Extract a shared base only when several shared fields justify it. One duplicated field across two models is often clearer than a

_Base

with a single line.

仅当多个共享字段能证明基类的价值时，才提取共享基类。两个模型间仅有一个重复字段的话，通常比只有一行代码的

_Base

基类更清晰。

Closed sets and uncertainty

封闭集合与不确定性

For Enum or Literal fields, include an explicit escape hatch (

OTHER

UNKNOWN

, …) when the model must be able to say "none of the above" without lying. That is part of the schema contract, not a prompt hack.

对于Enum或Literal字段，当模型需要能够表达“以上都不符合”而不撒谎时，要包含明确的逃逸选项（

OTHER

、

UNKNOWN

等）。这是Schema契约的一部分，而非提示词的hack手段。

Nullable vs empty string

可空与空字符串

Use

field: str | None = None

(or

Optional[str] = Field(default=None)

) when missing is different from present but empty. If you use

""

for both, validation cannot tell them apart. Be aware that under strict constrained decoding, omitted keys are not allowed. Strict mode APIs require all fields to be marked as required, using nullable types for optional values. Ensure your generated JSON schema represents optional fields explicitly as a union of types (e.g.,

["string", "null"]

) while keeping the key itself in the required array.

当缺失和存在但为空的含义不同时，使用

field: str | None = None

（或

Optional[str] = Field(default=None)

）。如果你对两种情况都使用

""

，验证逻辑无法区分它们。请注意，在严格的约束解码模式下，不允许省略键。严格模式API要求所有字段都标记为必填，对可选值使用可空类型。确保你生成的JSON Schema将可选字段显式声明为类型联合（比如

["string", "null"]

），同时将键本身保留在必填数组中。

Bounded "extra" attributes

有边界的“额外”属性

Open-ended

dict[str, str]

invites huge blobs. Prefer a list of small objects (or

(index, key, value)

tuples) with max_items
in the model (stripped during strict-mode sanitization), plus a short description of the cap. Same idea for arbitrary key-value pairs: structure + limit, not an unbounded map.

开放的

dict[str, str]

会导致生成巨大的blob。推荐使用小对象列表（或

(index, key, value)

元组），并在模型中设置**

max_items

**（在严格模式清理阶段会被移除），同时添加简短的上限说明。对任意键值对的处理也是同理：结构+限制，而非无边界的映射。

Entity relationships

实体关系

If extraction output references other entities, model IDs and relationship fields explicitly (

parent_id

friend_ids: list[int]

), not only free-text names. Downstream code should not parse prose for links.

如果提取输出需要引用其他实体，请显式为ID和关系字段建模（

parent_id

、

friend_ids: list[int]

），不要仅使用自由文本名称。下游代码不应该从文本中解析链接关系。

Optional reasoning fields

可选推理字段

A dedicated reasoning / chain-of-thought field on a sub-object can improve quality for that step. It costs tokens; use sparingly, not on every leaf. Do not duplicate the same instruction in the prompt if the field description already states how to reason. Crucially, the reasoning field must be the very first field defined in the model — the model will have already committed to its answer before it begins reasoning if it comes after the target field.

在子对象上添加专门的推理/思维链字段可以提升对应步骤的输出质量。它会消耗Token，请谨慎使用，不要在每个叶子节点都添加。如果字段描述已经说明了推理方式，不要在提示词中重复相同的指令。最重要的是，推理字段必须是模型中定义的第一个字段——如果它排在目标字段之后，模型在开始推理前就已经确定了答案，完全失去了意义。

Multimodal spatial coordinates

多模态空间坐标

When extracting spatial data (like bounding boxes or coordinates) from images using vision models, enforce a normalized coordinate space. Your field description should explicitly mandate a consistent format like

[y_min, x_min, y_max, x_max]

and specify that coordinate values must be normalized (e.g., scaled from 0 to 1000 for every image) rather than relying on absolute pixel values, which models struggle to predict accurately.

当使用视觉模型从图像中提取空间数据（比如边界框或坐标）时，请强制使用归一化的坐标空间。你的字段描述应该明确要求一致的格式，比如

[y_min, x_min, y_max, x_max]

，并说明坐标值必须归一化（比如每张图片都缩放到0到1000的范围），而非依赖模型很难准确预测的绝对像素值。

Result-or-error in the schema

Schema中的结果或错误处理

When you want a structured failure (validation message, "could not extract") without throwing out of the LLM layer, a small wrapper model (

result: T | None

error: bool

message: str | None

) keeps control flow inside one response type. Use only where that pattern fits your API; it is not mandatory.

当你需要结构化的失败信息（验证消息、“无法提取”）而不希望抛出LLM层异常时，使用小型的包装模型（

result: T | None

、

error: bool

、

message: str | None

）可以将控制流保持在单个响应类型内。仅在符合你的API模式时使用，不是强制要求。

Long rules for one field

单个字段的长规则

If a single field has long or subtle extraction rules, put them in Field(description=...)
for that field so they travel with the schema. That is still one place (the field), not the global prompt repeating every key.

如果单个字段有较长或精细的提取规则，请把规则放在该字段的**

Field(description=...)

中，让规则和Schema绑定。这样规则仍然只存在一个**地方（字段本身），而不需要在全局提示词中重复每个键的规则。

Prompts

提示词设计

The schema is the spec

Schema就是规范

The JSON schema (from

model_json_schema()

or the provider's equivalent) is the structural contract. When the API supports tool/function parameters or structured output tied to that schema, put shape there and keep the user/system messages to task + context + behavior — not a prose duplicate of every field.

If native structured outputs or tools are unsupported and you must inject the schema into the prompt context, avoid injecting raw JSON Schema (

model_json_schema()

), as it is highly token-inefficient. Instead, use type-definitions (which look like TypeScript interfaces or Pydantic pseudo-code), as this lossless compression technique can reduce token usage by up to 60% while being clearer to the model's attention mechanism.

The prompt should not restate:

Field names, types, nesting, required vs optional.
Defaults already in the model.
Lists of keys the schema already enumerates.

Keep in the prompt:

Intent: tone, audience, what "good" looks like.
Constraints the schema cannot encode: "at most 50 words", "no proper nouns from the input", "do not repeat the previous section."
Inputs: pasted documents,
```
{variables}
```
.
Conditional blocks via template variables, not by asking the model to "ignore" a section.

从

model_json_schema()

或服务商等价接口生成的JSON Schema是结构契约。当API支持工具/函数参数或绑定到该Schema的结构化输出时，把结构定义放在这里，用户/系统消息仅保留任务+上下文+行为描述——不要用文字重复每个字段的定义。

如果不支持原生结构化输出或工具，你必须把Schema注入到提示词上下文中时，避免注入原始的JSON Schema（

model_json_schema()

），因为它的Token效率极低。推荐使用类型定义（看起来像TypeScript接口或Pydantic伪代码），这种无损压缩技术可以减少最多60%的Token使用量，同时对模型的注意力机制更友好。

提示词不应该重复说明：

字段名、类型、嵌套关系、必填/可选属性。
模型中已经定义的默认值。
Schema已经枚举的键列表。

提示词中保留：

意图：语气、受众、“好结果”的标准。
Schema无法编码的约束：“最多50字”、“输入中出现的专有名词不要出现在输出中”、“不要重复上一部分的内容”。
输入：粘贴的文档、
```
{variables}
```
变量。
通过模板变量实现的条件块，不要要求模型“忽略”某个部分。

Template variables for branches

分支的模板变量实现

python

USER_PROMPT = """You are extracting a plan.

{extra_instructions}

Source text:
{source}
"""

python

USER_PROMPT = """You are extracting a plan.

{extra_instructions}

Source text:
{source}
"""

Caller sets extra_instructions="" or extra_instructions=RISKS_BLOCK when using PlanWithRisks.

调用方在使用PlanWithRisks时设置extra_instructions=""或extra_instructions=RISKS_BLOCK。


The model never sees instructions for a branch that is not in the schema.


模型永远看不到不在Schema对应的分支的说明。

One source of truth

单一事实源

What	Where
Shape, types, limits	Pydantic model
Fixed values from variant/slot	Your code after validation
Rhetorical / quality rules	Prompt
"Which schema today?"	Caller (conditional model + prompt)

If something appears in both the schema description and the prompt, delete the duplicate.

When you review a prompt/model pair, check three things explicitly:

Names match: the prompt uses the same field and concept names as the schema.
Constraints match: counts, limits, optionality, and branch behavior are identical.
Responsibility matches: the prompt asks only for what the schema expects, and the schema models only what the LLM must produce.

内容	存放位置
结构、类型、限制	Pydantic模型
从变体/槽位生成的固定值	验证后你的业务代码
修辞/质量规则	提示词
“今天用哪个Schema？”	调用方（条件选择模型+提示词）

如果某条规则同时出现在Schema描述和提示词中，请删除重复的部分。

当你审核提示词/模型对时，请明确检查三件事：

名称匹配：提示词使用和Schema相同的字段和概念名称。
约束匹配：数量、限制、可选性、分支行为完全一致。
职责匹配：提示词仅请求Schema期望的内容，Schema仅为LLM必须输出的内容建模。

Provider-specific strict mode

服务商专属严格模式

Not all providers handle JSON Schema validation keywords the same way. Know what your target supports before relying on field-level constraints.

OpenAI (
strict=True
): The underlying parser explicitly forbids

maxLength

maxItems

minimum

maximum

, and similar validation keywords. Sending a Pydantic model with

Field(ge=0, le=150)

results in an immediate 400 error. Implement a schema sanitizer that strips these constraints from the JSON Schema before sending it to the API, while keeping the unmodified Pydantic model for post-generation Python-side validation.

Anthropic (tool use): Tool input schemas accept standard JSON Schema validation keywords including

maxLength

minLength

pattern

minimum

maximum

minItems

, and

maxItems

. No sanitization is needed for these constraints — Pydantic models with

Field(ge=0, le=150)

Field(max_length=500)

work as-is when passed as tool parameter schemas. However, the model may still occasionally violate soft constraints, so keep Python-side validation as a safety net.

General strategy: Write your Pydantic models with full validation (

max_length

ge

le

max_items

, etc.). Then apply a provider-specific sanitizer only where required. This gives you one authoritative model with the tightest constraints, and a thin adapter layer per provider.

不是所有服务商对JSON Schema验证关键字的处理方式都相同。在依赖字段级约束之前，先了解你的目标服务商支持的能力。

OpenAI (
strict=True
)：底层解析器明确禁止

maxLength

、

maxItems

、

minimum

、

maximum

和类似的验证关键字。发送带有

Field(ge=0, le=150)

的Pydantic模型会直接返回400错误。你需要实现一个Schema清理器，在发送到API之前从JSON Schema中移除这些约束，同时保留未修改的Pydantic模型用于生成后Python端的验证。

Anthropic（工具调用）： 工具输入Schema支持标准JSON Schema验证关键字，包括

maxLength

、

minLength

、

pattern

、

minimum

、

maximum

、

minItems

和

maxItems

。这些约束不需要清理——带有

Field(ge=0, le=150)

或

Field(max_length=500)

的Pydantic模型作为工具参数Schema传入时可以直接正常工作。但模型仍可能偶尔违反软约束，所以请保留Python端的验证作为安全网。

通用策略： 编写带有完整验证规则（

max_length

、

ge

、

le

、

max_items

等）的Pydantic模型。然后仅在需要时应用服务商专属的清理逻辑。这样你就有一个带有最严格约束的权威模型，以及每个服务商的薄薄适配层。

Sanitizers (pre-validation)

清理器（验证前）

Run on parsed JSON (dict) from the model before

model_validate()

, once fences/prose are stripped if you are not using native structured output.

Do: coerce

None

→

""

, list → joined string where needed, strip overlong strings,

pop()

keys that are not on the model (LLM hallucinated extras).

Don't: re-implement defaults or

Literal

enforcement the validator already applies; keep dead branches for old shapes.

如果你没有使用原生结构化输出，在移除围栏/说明文字后，在

model_validate()

之前对模型返回的已解析JSON（字典）运行清理逻辑。

推荐操作： 根据需要将

None

转为

""

、列表转为拼接字符串、裁剪过长的字符串、

pop()

模型中不存在的键（LLM幻觉生成的额外字段）。

不推荐操作： 重新实现验证器已经处理的默认值或

Literal

强制逻辑；保留旧结构的死分支。

Validation feedback loop (re-asking)

验证反馈循环（重新请求）

When

model_validate()

fails due to hallucinations or missed constraints, do not simply drop the data. Catch the Pydantic

ValidationError

and feed the exact error message (e.g.,

"Value error, Name must contain a space"

) back to the LLM in a new user prompt, commanding it to self-correct its previous output. Libraries like Instructor automate this retry loop by catching validation errors and returning them directly to the model alongside the original completion payload.

当

model_validate()

因为幻觉或违反约束失败时，不要直接丢弃数据。捕获Pydantic的

ValidationError

，将精确的错误消息（比如

"Value error, Name must contain a space"

）作为新的用户提示词返回给LLM，要求它自行修正之前的输出。像Instructor这样的库可以自动实现这个重试循环：捕获验证错误，直接将错误和原始补全 payload 一起返回给模型。

Prompts in production

生产环境的提示词管理

Prompts are artifacts, not immortal strings in code.

Templates: Separate fixed wording from runtime data (variables, branches). Data and structure are not one concatenated blob.
Versioning: Track changes in source control. When behavior shifts, you need a diff and a rollback story.
Evaluation: Keep a small golden set of inputs with expected or acceptable outputs; rerun when the model or prompt changes. Subjective tasks still need criteria (length, must-include fields, forbidden patterns).
Observation: Log latency, token use, and validation failures per prompt/version so regressions surface before users report them.

提示词是** artifacts **，不是代码中一成不变的字符串。

模板化：将固定表述和运行时数据（变量、分支）分离。数据和结构不应该是拼接在一起的单个blob。
版本控制：在版本控制系统中跟踪变更。当行为发生变化时，你需要有diff和回滚方案。
评估：维护一个小型黄金测试集，包含输入和预期/可接受的输出；在模型或提示词变更时重新运行评估。主观任务也需要有评估标准（长度、必填字段、禁止模式）。
可观测：记录每个提示词/版本的延迟、Token使用量和验证失败率，这样可以在用户反馈之前发现回归问题。

General hygiene

通用规范

Prefer modern builtins:
```
dict[str, Any]
```
,
```
list[str]
```
.
Skip
```
__all__
```
unless the package is a published API.
No unused imports, dead constants, or commented-out blocks.
Elegant code is usually the simplest code that keeps the contract precise. Prefer clarity over cleverness, and compactness over repetition.

优先使用现代内置类型：
```
dict[str, Any]
```
、
```
list[str]
```
。
除非包是公开的API，否则省略
```
__all__
```
。
不要保留未使用的导入、无效常量或注释掉的代码块。

优雅的代码通常是能保持契约精确性的最简单代码。优先选择清晰而非巧妙，优先选择简洁而非重复。