model-first-reasoning

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Model-First Reasoning (MFR)

Model-First Reasoning (MFR)

A rigorous methodology that REQUIRES constructing an explicit problem MODEL before any reasoning or implementation. The model becomes a frozen contract that governs all downstream work.
Based on Kumar & Rana (2025), "Model-First Reasoning LLM Agents: Reducing Hallucinations through Explicit Problem Modeling" (arXiv:2512.14474)
这是一种严谨的方法论,要求在进行任何推理或实现之前,先构建一个明确的问题模型。该模型将成为一份固定契约,指导所有后续工作。
基于Kumar & Rana(2025)的论文《Model-First Reasoning LLM Agents: Reducing Hallucinations through Explicit Problem Modeling》(arXiv:2512.14474)

Why MFR Works

为什么MFR有效

Hallucination is not merely the generation of false statements—it is a symptom of reasoning performed without a clearly defined model of the problem space.
Reasoning does not create structure; it operates on structure. When that structure is implicit or unstable, reasoning becomes unreliable. MFR provides "soft symbolic grounding"—enough structure to stabilize reasoning without imposing rigid formalism.
幻觉不仅仅是生成错误陈述——它是在未明确定义问题空间模型的情况下进行推理所导致的症状。
推理不会创造结构,而是在结构之上运作。当结构是隐含的或不稳定的时,推理就会变得不可靠。MFR提供了“软符号接地”——足够的结构来稳定推理,同时不会强加僵化的形式主义。

Core Principle

核心原则

Phase 1 produces the MODEL. Phase 2 reasons/implements ONLY within the model.
This prevents the common failure mode where reasoning introduces ad-hoc decisions, missing constraints, or invented behavior not grounded in the problem definition.
第一阶段生成MODEL(模型)。第二阶段仅在模型范围内进行推理/实现。
这避免了常见的失败模式:推理过程中引入临时决策、遗漏约束,或产生未基于问题定义的虚构行为。

Non-Negotiable Rules

不可协商的规则

  1. Phase 1 (Model) produces NO code, no solution steps—only the formal model
  2. Phase 2 (Implement) may NOT introduce new entities, state, actions, or constraints
  3. If you need something not in the model: output exactly
    MODEL INCOMPLETE
    + what to add, then STOP
  4. No invented APIs or dependencies. If not provided, either ask (unknowns) or create a stub clearly marked
    STUB
  1. **第一阶段(建模)**不生成任何代码,也不提供解决方案步骤——仅产出形式化模型
  2. **第二阶段(实现)**不得引入新的实体、状态、操作或约束
  3. 如果需要模型中没有的内容:精确输出
    MODEL INCOMPLETE
    加上需要添加的内容,然后停止
  4. 不得虚构API或依赖项。如果未提供相关信息,要么询问(未知项),要么创建明确标记为
    STUB
    的存根

The Model as Contract

作为契约的模型

After creating the model, run a MODEL AUDIT before coding:
创建模型后,在编码前运行MODEL AUDIT(模型审计)

Audit Checks

审计检查项

CheckDescription
CoverageEvery user requirement is represented in exactly one of: a constraint, the goal/acceptance criteria, or an action precondition/effect
OperabilityEvery operation your plan would require is present as an action
ConsistencyConstraints don't contradict each other; action effects don't violate invariants
TestabilityEvery constraint has ≥1 test oracle
If any audit check fails, revise the model (still Phase 1) until it passes.
检查项描述
覆盖度每个用户需求都必须在以下任一内容中得到体现:约束条件、目标/验收标准,或操作前置条件/效果
可操作性计划中需要执行的每个操作都必须作为一个已定义的操作存在
一致性约束条件之间不存在矛盾;操作效果不得违反不变量
可测试性每个约束条件都有至少1个测试预言机
如果任何审计检查项未通过,修改模型(仍处于第一阶段)直至通过。

Freeze Rule

冻结规则

Once the audit passes, treat the model as read-only source of truth.
If later you discover missing info during implementation:
  1. Emit a
    MODEL PATCH
    (minimal change)
  2. Restart Phase 2 from scratch using the updated model
审计通过后,将模型视为只读的事实来源
如果在实现过程中后来发现缺失的信息:
  1. 输出
    MODEL PATCH
    (最小化修改)
  2. 使用更新后的模型,从头重新开始第二阶段

Validation

验证

After creating the model, write it to
model.json
and run the validator:
bash
python scripts/validate-model.py model.json
Exit codes:
  • 0
    = Valid, ready for Phase 2
  • 1
    = Invalid structure (fix and retry)
  • 2
    = Valid but has unknowns (STOP after Phase 1)
创建模型后,将其写入
model.json
并运行验证器:
bash
python scripts/validate-model.py model.json
退出码:
  • 0
    = 验证通过,可进入第二阶段
  • 1
    = 结构无效(修复后重试)
  • 2
    = 结构有效但存在未知项(第一阶段后停止)

Output Format

输出格式

Phase 1: MODEL

第一阶段:MODEL(模型)

The model may be expressed in natural language, semi-structured text, or JSON. Flexibility improves compliance—what matters is that the representation is explicit, inspectable, and stable.
For code generation tasks, the structured format below is recommended. Use MODEL_TEMPLATE.json as a reference:
json
{
  "deliverable": {
    "description": "What we're building",
    "files_expected": ["path/to/file.ts", ...]
  },
  "entities": [
    {"name": "EntityName", "description": "...", "properties": [...]}
  ],
  "state_variables": [
    {"name": "varName", "type": "...", "initial": "...", "description": "..."}
  ],
  "actions": [
    {
      "name": "actionName",
      "description": "...",
      "preconditions": ["..."],
      "effects": ["..."],
      "parameters": [...]
    }
  ],
  "constraints": [
    {"id": "C1", "statement": "...", "type": "invariant|precondition|postcondition"}
  ],
  "initial_state": ["description of starting conditions"],
  "goal": ["acceptance criteria"],
  "assumptions": ["things we assume to be true"],
  "unknowns": ["questions that must be answered before proceeding"],
  "requirement_trace": [
    {
      "requirement": "<verbatim from user>",
      "represented_as": "goal|constraint|action",
      "ref": "C1|action_name|goal_item"
    }
  ],
  "test_oracles": [
    {"id": "T1", "maps_to": ["C1"], "description": "how to verify constraint"}
  ]
}
Critical: If
unknowns
is non-empty, STOP after Phase 1. Do not implement until unknowns are resolved.
模型可以用自然语言、半结构化文本或JSON表示。灵活性有助于提升合规性——关键在于表示方式要明确、可检查且稳定
对于代码生成任务,推荐使用以下结构化格式。以MODEL_TEMPLATE.json作为参考:
json
{
  "deliverable": {
    "description": "What we're building",
    "files_expected": ["path/to/file.ts", ...]
  },
  "entities": [
    {"name": "EntityName", "description": "...", "properties": [...]}
  ],
  "state_variables": [
    {"name": "varName", "type": "...", "initial": "...", "description": "..."}
  ],
  "actions": [
    {
      "name": "actionName",
      "description": "...",
      "preconditions": ["..."],
      "effects": ["..."],
      "parameters": [...]
    }
  ],
  "constraints": [
    {"id": "C1", "statement": "...", "type": "invariant|precondition|postcondition"}
  ],
  "initial_state": ["description of starting conditions"],
  "goal": ["acceptance criteria"],
  "assumptions": ["things we assume to be true"],
  "unknowns": ["questions that must be answered before proceeding"],
  "requirement_trace": [
    {
      "requirement": "<verbatim from user>",
      "represented_as": "goal|constraint|action",
      "ref": "C1|action_name|goal_item"
    }
  ],
  "test_oracles": [
    {"id": "T1", "maps_to": ["C1"], "description": "how to verify constraint"}
  ]
}
关键提示:如果
unknowns
不为空,在第一阶段后停止。解决所有未知项后再进行实现。

Phase 1.5: MODEL AUDIT

第一阶段.5:MODEL AUDIT(模型审计)

Return:
json
{
  "audit_pass": true|false,
  "issues": [
    {"type": "coverage|operability|consistency|testability", "detail": "..."}
  ]
}
If
audit_pass
is false, STOP and return to Phase 1 to revise the model.
返回:
json
{
  "audit_pass": true|false,
  "issues": [
    {"type": "coverage|operability|consistency|testability", "detail": "..."}
  ]
}
如果
audit_pass
为false,停止并返回第一阶段修改模型。

Phase 2: IMPLEMENTATION

第二阶段:IMPLEMENTATION(实现)

Using ONLY the frozen model:
仅使用已冻结的模型:

A) PLAN

A) 计划

Numbered steps where each step must be an instance of a defined action:
Step 1: [action_name]
  - Preconditions check: [list which preconditions are satisfied]
  - Effects applied: [what state changes]
  - Constraints check: [C1, C2, ...]
编号步骤,每个步骤必须是已定义操作的实例:
Step 1: [action_name]
  - Preconditions check: [list which preconditions are satisfied]
  - Effects applied: [what state changes]
  - Constraints check: [C1, C2, ...]

B) CODE

B) 代码

Create all files in
deliverable.files_expected
:
Model ElementCode Translation
entities / state_variablesTypes, interfaces, data models
actionsFunctions/modules with validation + explicit failure modes
constraintsRuntime checks, defensive parsing, invariants
创建
deliverable.files_expected
中列出的所有文件:
模型元素代码映射
entities / state_variables类型、接口、数据模型
actions包含验证逻辑+明确失败模式的函数/模块
constraints运行时检查、防御性解析、不变量

C) TESTS

C) 测试

Implement all
test_oracles
. Every constraint must be covered by ≥1 test.
实现所有
test_oracles
。每个约束条件必须至少被1个测试覆盖。

D) VERIFICATION MAP

D) 验证映射

For each constraint, document:
  • Where it is enforced in code (file:line)
  • Which tests cover it
针对每个约束条件,记录:
  • 代码中强制执行该约束的位置(文件:行号)
  • 哪些测试覆盖了该约束

When to Use MFR

MFR的适用场景

MFR is most valuable for:
  • Complex state machines — where transitions must be valid
  • Business logic with invariants — rules that must never be violated
  • Data transformations — where input/output contracts matter
  • Multi-step workflows — with dependencies between steps
  • Safety-critical features — where bugs have high cost
  • Collaborative specifications — where the model serves as documentation
When NOT to use: Simple, single-step tasks where the overhead of explicit modeling exceeds its benefit.
MFR在以下场景中价值最高:
  • 复杂状态机——其中状态转换必须合法
  • 带有不变量的业务逻辑——永远不能违反的规则
  • 数据转换——其中输入/输出契约至关重要
  • 多步骤工作流——步骤之间存在依赖关系
  • 安全关键功能—— bug会带来高成本的场景
  • 协作规范——模型可作为文档使用的场景
不适用场景:简单的单步任务,此时显式建模的开销超过其收益。

Relationship to Other Reasoning Strategies

与其他推理策略的关系

MFR is complementary, not competing:
  • With Chain-of-Thought: Use CoT within Phase 2 for step-by-step reasoning over the model
  • With ReAct: Treat the model as persistent state that actions operate on
  • With planning agents: The model provides the domain specification that planners reason over
MFR provides a foundational layer that improves robustness of any reasoning strategy in constraint-heavy domains.
MFR是互补的,而非竞争关系:
  • 与思维链(Chain-of-Thought)结合:在第二阶段使用CoT对模型进行逐步推理
  • 与ReAct结合:将模型视为操作所基于的持久化状态
  • 与规划Agent结合:模型为规划器提供推理所需的领域规范
MFR提供了一个基础层,可提升任何推理策略在约束密集型领域的鲁棒性。

Example Workflow

示例工作流

User: Build a shopping cart that enforces: max 10 items, no duplicate SKUs,
      total can't exceed $1000

You: [PHASE 1: MODEL]
{
  "entities": [
    {"name": "CartItem", "properties": ["sku", "price", "quantity"]}
  ],
  "constraints": [
    {"id": "C1", "statement": "cart.items.length <= 10", "type": "invariant"},
    {"id": "C2", "statement": "all SKUs in cart are unique", "type": "invariant"},
    {"id": "C3", "statement": "sum(items.price * items.quantity) <= 1000", "type": "invariant"}
  ],
  "actions": [
    {
      "name": "addItem",
      "preconditions": ["cart.items.length < 10", "sku not in cart", "new total <= 1000"],
      "effects": ["item added to cart"]
    }
  ],
  ...
}

[PHASE 1.5: MODEL AUDIT]
{
  "audit_pass": true,
  "issues": []
}

[PHASE 2: IMPLEMENTATION]
// Now implementing strictly from the model...
用户:构建一个购物车,需满足以下约束:最多10件商品,SKU不能重复,总价不能超过1000美元

你:[第一阶段:MODEL]
{
  "entities": [
    {"name": "CartItem", "properties": ["sku", "price", "quantity"]}
  ],
  "constraints": [
    {"id": "C1", "statement": "cart.items.length <= 10", "type": "invariant"},
    {"id": "C2", "statement": "all SKUs in cart are unique", "type": "invariant"},
    {"id": "C3", "statement": "sum(items.price * items.quantity) <= 1000", "type": "invariant"}
  ],
  "actions": [
    {
      "name": "addItem",
      "preconditions": ["cart.items.length < 10", "sku not in cart", "new total <= 1000"],
      "effects": ["item added to cart"]
    }
  ],
  ...
}

[第一阶段.5:MODEL AUDIT]
{
  "audit_pass": true,
  "issues": []
}

[第二阶段:实现]
// 现在严格基于模型进行实现...

Remember

请记住

The model is not overhead—it IS the specification. Most failures in complex reasoning are representational, not inferential: the reasoning was fine, but it operated on an incomplete or unstable understanding of the problem.
By externalizing the model, we make assumptions inspectable, constraints enforceable, and errors diagnosable. The model becomes the contract between intent and implementation.
Model first. Then reason. Never invert this.
模型不是额外开销——它本身就是规范。复杂推理中的大多数失败是表示层面的问题,而非推理层面:推理过程本身没问题,但它基于对问题的不完整或不稳定的理解。
通过将模型外部化,我们使假设变得可检查,约束变得可执行,错误变得可诊断。模型成为了意图与实现之间的契约。
先建模,再推理。永远不要颠倒这个顺序。