proof-checker

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Proof Checker: Rigorous Mathematical Verification & Fixing

证明检查器：严谨的数学验证与修正

Systematically verify a mathematical proof via cross-model adversarial review, fix identified gaps, re-review until convergence, and generate a detailed audit report with proof-obligation accounting.

通过跨模型对抗性审核系统地验证数学证明，修复识别出的漏洞，重复审核直至收敛，并生成包含证明义务核算的详细审核报告。

Context: $ARGUMENTS

上下文：$ARGUMENTS

Constants

常量

MAX_REVIEW_ROUNDS = 3
REVIEWER_MODEL =
```
gpt-5.4
```
via Codex MCP, reasoning effort always
```
xhigh
```
REVIEWER_BACKEND =
codex
— Default: Codex MCP (xhigh). Override with
```
— reviewer: oracle-pro
```
for GPT-5.4 Pro via Oracle MCP. See
```
shared-references/reviewer-routing.md
```
.
AUDIT_DOC:
```
PROOF_AUDIT.md
```
at the paper directory root, alongside
```
main.tex
```
(cumulative log; when invoked via
```
/paper-writing
```
, this is
```
paper/PROOF_AUDIT.md
```
)
REPORT_TEX:
```
proof_audit_report.tex
```
(formal before/after PDF)
STATE_FILE:
```
PROOF_CHECK_STATE.json
```
(for recovery)
SKELETON_DOC:
```
PROOF_SKELETON.md
```
(micro-claim inventory)

MAX_REVIEW_ROUNDS = 3
REVIEWER_MODEL =
```
gpt-5.4
```
via Codex MCP，推理强度始终设为
```
xhigh
```
REVIEWER_BACKEND =
codex
— 默认：Codex MCP（xhigh）。可通过
```
— reviewer: oracle-pro
```
覆盖为Oracle MCP提供的GPT-5.4 Pro。详见
```
shared-references/reviewer-routing.md
```
。
AUDIT_DOC: 论文目录根目录下的
```
PROOF_AUDIT.md
```
，与
```
main.tex
```
同级（累积日志；通过
```
/paper-writing
```
调用时，路径为
```
paper/PROOF_AUDIT.md
```
）
REPORT_TEX:
```
proof_audit_report.tex
```
（正式的修正前后PDF文件源）
STATE_FILE:
```
PROOF_CHECK_STATE.json
```
（用于恢复检查状态）
SKELETON_DOC:
```
PROOF_SKELETON.md
```
（微论断清单）

Acceptance Gate (objective, replaces subjective scoring)

验收标准（客观标准，替代主观评分）

The proof passes when ALL of the following hold:

Zero open FATAL or CRITICAL issues
Every theorem/lemma has: (i) explicit hypotheses, (ii) proof with all interchanges justified, (iii) every application discharges hypotheses in the ledger
All big-O/Θ/o statements have declared parameter dependence and uniformity scope
Counterexample pass executed on all key lemmas (log candidates even if none found)

当且仅当满足以下所有条件时，证明通过审核：

无未解决的FATAL或CRITICAL级问题
每个定理/引理均满足：(i) 明确的假设条件，(ii) 所有推导步骤均有依据的证明过程，(iii) 每一次定理应用都在台账中完成假设条件的核验
所有大O/Θ/o符号声明均明确参数依赖关系和一致性范围
所有关键引理均完成反例测试（即使未找到反例，也需记录候选测试案例）

Issue Taxonomy (20 categories, 4 groups)

问题分类体系（20个类别，4个组别）

Group A: Logic & Proof Structure

组别A：逻辑与证明结构

Category	Description	Example
UNJUSTIFIED_ASSERTION	Claim stated without proof or reference	"The Hessian splits into Gram blocks"
UNPROVEN_SUBCLAIM	"Clearly" / "it follows" hides a nontrivial lemma	"By symmetry, the cross-terms vanish" without checking
QUANTIFIER_ERROR	Wrong order ∀/∃, missing "for sufficiently small κ"	"For all π, there exists ε" vs "there exists ε for all π"
IMPLICATION_REVERSAL	Uses (A⇒B) as (B⇒A), or claims equivalence with only one direction
CASE_INCOMPLETE	Misses boundary/degenerate cases	Singular covariance, zero weight, non-unique argmin
CIRCULAR_DEPENDENCY	Lemma uses theorem that depends on it
LOGICAL_GAP	A step is not justified by what precedes it	B=Θ(1) → β_K=0 without analyzing W

类别	描述	示例
UNJUSTIFIED_ASSERTION	提出无证明或无引用支撑的论断	"The Hessian splits into Gram blocks"
UNPROVEN_SUBCLAIM	使用「显然」「由此可得」等表述隐藏非平凡引理	"By symmetry, the cross-terms vanish" 未做验证
QUANTIFIER_ERROR	量词∀/∃顺序错误，遗漏「当κ足够小时」等限定	"For all π, there exists ε" 与 "there exists ε for all π" 混淆
IMPLICATION_REVERSAL	将(A⇒B)当作(B⇒A)使用，或仅证明单方向就宣称等价关系
CASE_INCOMPLETE	遗漏边界/退化情况	奇异协方差、零权重、非唯一 argmin
CIRCULAR_DEPENDENCY	引理使用了依赖于该引理的定理
LOGICAL_GAP	某一步骤无法由前置内容推导得出	B=Θ(1) → β_K=0 未分析W的影响

Group B: Analysis & Measure Theory

组别B：分析学与测度论

Category	Description	Example
ILLEGAL_INTERCHANGE	Swaps limit/expectation/derivative/integral without DCT/MCT/Fubini	Differentiating under E without domination
NONUNIFORM_CONVERGENCE	Pointwise convergence used as uniform	sup and limit swapped
MISSING_DOMINATION	DCT cited but no dominating function given
INTEGRABILITY_GAP	Uses E	X
REGULARITY_GAP	Differentiability/Lipschitz/convexity used but not established
STOCHASTIC_MODE_CONFUSION	Mixes a.s./in prob./in L²/in expectation

类别	描述	示例
ILLEGAL_INTERCHANGE	未依据DCT/MCT/Fubini定理就交换极限/期望/导数/积分顺序	未满足控制条件就对E求导
NONUNIFORM_CONVERGENCE	将逐点收敛当作一致收敛使用	交换上确界与极限顺序
MISSING_DOMINATION	引用DCT定理但未给出控制函数
INTEGRABILITY_GAP	使用E	X
REGULARITY_GAP	使用可微性/Lipschitz性/凸性但未证明其成立
STOCHASTIC_MODE_CONFUSION	混淆几乎必然/依概率/依L²/依期望等收敛模式

Group C: Model & Parameter Tracking

组别C：模型与参数追踪

Category	Description	Example
MISSING_DERIVATION	A quantity is used but never derived from the model	Risk functional with undefined B, W
HIDDEN_ASSUMPTION	Proof silently uses a condition not in the theorem	Gaussianity assumed but not stated
INSUFFICIENT_ASSUMPTION	Hypotheses too weak for proof (counterexample exists)	Moment conditions admitting 2-point distributions
DIMENSION_TRACKING	Parameter dependence (d, n, K, ...) not explicit	d enters only through κ
NORMALIZATION_MISMATCH	Coordinate/scaling conventions inconsistent	Rescaled vs raw coordinates
CONSTANT_DEPENDENCE_HIDDEN	"C" depends on d,n,K but treated as universal

类别	描述	示例
MISSING_DERIVATION	使用某一量但从未基于模型推导得出	使用未定义B、W的风险函数
HIDDEN_ASSUMPTION	证明中默认使用定理声明未提及的条件	假设高斯性但未在定理中声明
INSUFFICIENT_ASSUMPTION	假设条件过于薄弱，无法支撑证明（存在反例）	矩条件允许两点分布
DIMENSION_TRACKING	参数依赖关系（d, n, K等）未明确	d仅通过κ引入但未说明
NORMALIZATION_MISMATCH	坐标/缩放约定不一致	重缩放坐标与原始坐标混用
CONSTANT_DEPENDENCE_HIDDEN	"C"依赖于d,n,K但被当作通用常数处理

Group D: Scope & Claims

组别D：范围与论断

Category	Description	Example
SCOPE_OVERCLAIM	Conclusion stated more broadly than proof supports	"β_K=0" with only generic overlap
REFERENCE_MISMATCH	Cited theorem's hypotheses not verified at point of use

类别	描述	示例
SCOPE_OVERCLAIM	结论表述范围超出证明可支撑的范围	"β_K=0" 仅在通用重叠情况下成立却未限定
REFERENCE_MISMATCH	引用的定理在应用时未验证其假设条件

Two-Axis Severity System

双轴严重程度体系

Axis A — Proof Status (what is wrong)

轴A — 证明状态（问题类型）

Status	Meaning
INVALID	Statement false as written (counterexample exists or contradiction)
UNJUSTIFIED	Could be true, but current proof does not establish it
UNDERSTATED	True only after strengthening assumptions
OVERSTATED	True only after weakening conclusion / adding qualifiers
UNCLEAR	Ambiguous notation / definition drift (not wrong per se)

状态	含义
INVALID	所述内容本身错误（存在反例或矛盾）
UNJUSTIFIED	内容可能为真，但当前证明无法支撑
UNDERSTATED	仅在强化假设条件后才成立
OVERSTATED	仅在弱化结论/添加限定后才成立
UNCLEAR	符号模糊/定义漂移（本身并非错误）

Axis B — Impact (how much breaks)

轴B — 影响程度（破坏范围）

Impact	Meaning
GLOBAL	Breaks main theorem or core dependency chain
LOCAL	Affects a side result but not the main theorem
COSMETIC	Exposition only

影响	含义
GLOBAL	破坏主定理或核心依赖链
LOCAL	影响次要结果但不影响主定理
COSMETIC	仅影响表述方式

Severity Labels (derived)

严重程度标签（推导得出）

Label	Definition
FATAL	INVALID + GLOBAL
CRITICAL	(INVALID + LOCAL) or (UNJUSTIFIED + GLOBAL)
MAJOR	(UNJUSTIFIED + LOCAL) or (UNDERSTATED/OVERSTATED + GLOBAL)
MINOR	Clarity / notation / dimension bookkeeping that doesn't change claims

标签	定义
FATAL	INVALID + GLOBAL
CRITICAL	(INVALID + LOCAL) 或 (UNJUSTIFIED + GLOBAL)
MAJOR	(UNJUSTIFIED + LOCAL) 或 (UNDERSTATED/OVERSTATED + GLOBAL)
MINOR	不改变论断的表述清晰度/符号/维度记录问题

Side-Condition Checklists for Common Theorems

常见定理的附带条件检查清单

When the proof invokes any of the following, require explicit verification of ALL listed conditions:

Theorem	Required Conditions
DCT (Dominated Convergence)	Pointwise a.e. convergence + integrable dominating function
MCT (Monotone Convergence)	Monotone increasing + non-negative
Fubini/Tonelli	Product measurability + integrability (Fubini) or non-negative (Tonelli)
Leibniz integral rule	Continuity of integrand + dominating function for derivative
Implicit Function Theorem	Continuous differentiability + non-singular Jacobian
Taylor with remainder	Sufficient differentiability + remainder form (Lagrange/integral)
Jensen's inequality	Convexity of function + integrability
Cauchy-Schwarz	Correct inner product space + integrability of both factors
Weyl/Davis-Kahan	Symmetry/Hermiticity + perturbation bound conditions
Analytic continuation	Domain connectivity + identity theorem conditions
WLOG reduction	Invariance under claimed symmetry + reduction is reversible

当证明中引用以下任一定理时，需明确验证所有列出的条件：

定理	所需条件
DCT（控制收敛定理）	几乎处处逐点收敛 + 可积控制函数
MCT（单调收敛定理）	单调递增 + 非负
Fubini/Tonelli	乘积可测性 + 可积性（Fubini）或非负（Tonelli）
莱布尼茨积分法则	被积函数连续 + 导数存在控制函数
隐函数定理	连续可微 + 雅可比矩阵非奇异
带余项的泰勒定理	足够阶数的可微性 + 余项形式（拉格朗日/积分型）
Jensen不等式	函数凸性 + 可积性
柯西-施瓦茨不等式	正确的内积空间 + 两个因子均可积
Weyl/Davis-Kahan	对称性/埃尔米特性 + 扰动界条件
解析延拓	定义域连通性 + 恒等定理条件
WLOG简化	满足宣称的对称性不变性 + 简化可逆

Workflow

工作流

Phase 0: Preparation

阶段0：准备工作

Locate the proof: Find the main
```
.tex
```
file(s).
Read the entire proof: Extract list of all theorems/lemmas/propositions/corollaries/definitions/assumptions.
Read reference materials: Reference papers, prior results.
Build a section map: Structured list with line numbers and key claims.
Identify the main theorem: Central result, assumptions, claims.

定位证明: 找到主
```
.tex
```
文件。
通读证明: 提取所有定理/引理/命题/推论/定义/假设的列表。
阅读参考资料: 参考论文、已有成果。
构建章节映射: 包含行号和关键论断的结构化列表。
识别主定理: 核心结论、假设条件、论断内容。

Phase 0.5: Proof-Obligation Ledger

阶段0.5：证明义务台账

Build formal accounting artifacts. Save to

PROOF_SKELETON.md

构建正式的核算工件，保存至

PROOF_SKELETON.md

：

1. Dependency DAG

1. 依赖关系DAG

Nodes = Definitions / Assumptions / Lemmas / Theorems. Edges = "uses". Detect cycles (including semantic circularity where Lemma A uses a corollary that quietly depends on A).

节点 = 定义 / 假设 / 引理 / 定理。边 = "使用"。检测循环（包括引理A使用了间接依赖于A的推论这类语义循环）。

2. Assumption Ledger

2. 假设台账

For each theorem/lemma, list every hypothesis with WHERE each is verified (or mark "UNVERIFIED"). Track usage-minimal assumption sets — which assumptions were actually used vs merely stated.

针对每个定理/引理，列出所有假设条件及每个条件的核验位置（或标记为「UNVERIFIED」）。追踪最小必要假设集——实际使用的假设与仅声明的假设的差异。

3. Typed Symbol Table

3. 类型化符号表

Each symbol must have a type signature:

κ : scalar ∈ (0,1), depends on (d, α_t, Σ, μ)
u* : vector ∈ ℝ^d, u* = C^{-1}m
B^even : matrix ∈ ℝ^{(L+1)×(L+1)}, symmetric PSD
Ψ_v : function ℝ → ℝ, analytic in (ζ,κ), parity determined by v

Flag any symbol whose meaning changes or whose type is inconsistent across uses.

每个符号必须有类型签名：

κ : scalar ∈ (0,1), depends on (d, α_t, Σ, μ)
u* : vector ∈ ℝ^d, u* = C^{-1}m
B^even : matrix ∈ ℝ^{(L+1)×(L+1)}, symmetric PSD
Ψ_v : function ℝ → ℝ, analytic in (ζ,κ), parity determined by v

标记任何含义变化或类型使用不一致的符号。

4. Canonical Quantified Statements

4. 规范化量化表述

For each theorem/lemma, rewrite the statement with explicit quantifiers, domains, and limit order:

∀K ≥ 3, ∀π ∈ Π_K^{ms,∘} \ E_K, ∃κ_0 > 0 such that ∀κ ∈ (0, κ_0):
  h_act^{(K,π)} = Θ(κ^{α_K^act})  [uniform in π on compact subsets]

If you cannot restate a theorem this precisely, mark it UNCLEAR — needs disambiguation.

针对每个定理/引理，用明确的量词、定义域和极限顺序重写表述：

∀K ≥ 3, ∀π ∈ Π_K^{ms,∘} \ E_K, ∃κ_0 > 0 such that ∀κ ∈ (0, κ_0):
  h_act^{(K,π)} = Θ(κ^{α_K^act})  [uniform in π on compact subsets]

若无法精确重述定理，标记为UNCLEAR — 需要歧义消除。

5. Micro-Claim Inventory

5. 微论断清单

Every nontrivial step becomes a numbered micro-claim in sequent form:

MC-17: Context: [Lemma 3.1, κ < κ_0, Z_κ has bounded moments up to order 2m+2]
       ⊢ Goal: P̂_0 is positive definite
       Rule: monomials linearly independent on support of continuous distribution
       Side-conditions: positive density near origin ✓ (by GMM weak convergence)

Each micro-claim has: justification rule name + required conditions + where conditions are proven.

每个非平凡步骤均转换为编号的相继式形式微论断：

MC-17: Context: [Lemma 3.1, κ < κ_0, Z_κ has bounded moments up to order 2m+2]
       ⊢ Goal: P̂_0 is positive definite
       Rule: monomials linearly independent on support of continuous distribution
       Side-conditions: positive density near origin ✓ (by GMM weak convergence)

每个微论断包含：依据的规则名称 + 所需条件 + 条件的证明位置。

6. Limit-Order Map

6. 极限顺序映射

Track every asymptotic statement's limit order and uniformity scope:

h_act = Θ(κ^α)  [as κ→0, uniform in π on compact subsets of Π_K, for fixed K]
τ_act ~ (b/a)n   [as n→∞, for fixed κ,K,π with x_K ≪ 1]

Flag any statement where limit order is ambiguous or uniformity is unclear.

追踪每个渐近表述的极限顺序和一致性范围：

h_act = Θ(κ^α)  [as κ→0, uniform in π on compact subsets of Π_K, for fixed K]
τ_act ~ (b/a)n   [as n→∞, for fixed κ,K,π with x_K ≪ 1]

标记任何极限顺序模糊或一致性范围不明确的表述。

Phase 1: First Review (Codex GPT-5.4 xhigh)

阶段1：首次审核（Codex GPT-5.4 xhigh）

Submit the complete proof content with the following mandatory reviewer checklist in the prompt:

mcp__codex__codex:
  config: {"model_reasoning_effort": "xhigh"}
  prompt: |
    You are performing a rigorous mathematical proof review. For EVERY theorem,
    lemma, and proposition, check ALL of the following:

    ## MANDATORY CHECKS

    A. DEFINITIONS: List any symbol whose meaning is ambiguous or changes.
    B. HYPOTHESIS DISCHARGE: For each lemma/theorem APPLICATION (not statement),
       list each hypothesis and whether it was verified, with location.
    C. INEQUALITY AUDIT: For each inequality chain, verify direction, missing
       absolute values, missing conditions (convexity, PSD, integrability).
    D. INTERCHANGE AUDIT: Flag every limit/derivative/expectation/integral
       interchange. State which theorem justifies it (DCT/MCT/Fubini/Leibniz)
       and which conditions are verified/missing.
    E. PROBABILITY MODE: Track whether claims are a.s./in prob./in expectation/
       w.h.p. Ensure transitions are justified.
    F. UNIFORMITY & CONSTANTS: For every O(·), o(·), Θ(·), ≲, state whether
       it is uniform over all parameters. List hidden parameter dependence.
    G. EDGE/DEGENERATE CASES: Attempt to break each key lemma with a 1D,
       low-rank, or extreme-parameter construction.
    H. DEPENDENCY CONSISTENCY: Detect cycles or forward references to unproven
       results.

    ## OUTPUT FORMAT (per issue)
    For each issue found, provide:
    - id: sequential number
    - status: INVALID / UNJUSTIFIED / UNDERSTATED / OVERSTATED / UNCLEAR
    - impact: GLOBAL / LOCAL / COSMETIC
    - category: [from taxonomy]
    - location: section/equation/line
    - statement: what the proof claims
    - why_invalid: why this is wrong or unjustified
    - counterexample: YES (describe) / NO / CANDIDATE (describe attempt)
    - affects: which downstream results break if this is wrong
    - minimal_fix: how to fix it

    [FULL PROOF CONTENT HERE]

Save the threadId. Parse into structured issue list. Write to

PROOF_AUDIT.md

提交完整的证明内容，并在提示中包含以下强制审核清单：

mcp__codex__codex:
  config: {"model_reasoning_effort": "xhigh"}
  prompt: |
    You are performing a rigorous mathematical proof review. For EVERY theorem,
    lemma, and proposition, check ALL of the following:

    ## MANDATORY CHECKS

    A. DEFINITIONS: List any symbol whose meaning is ambiguous or changes.
    B. HYPOTHESIS DISCHARGE: For each lemma/theorem APPLICATION (not statement),
       list each hypothesis and whether it was verified, with location.
    C. INEQUALITY AUDIT: For each inequality chain, verify direction, missing
       absolute values, missing conditions (convexity, PSD, integrability).
    D. INTERCHANGE AUDIT: Flag every limit/derivative/expectation/integral
       interchange. State which theorem justifies it (DCT/MCT/Fubini/Leibniz)
       and which conditions are verified/missing.
    E. PROBABILITY MODE: Track whether claims are a.s./in prob./in expectation/
       w.h.p. Ensure transitions are justified.
    F. UNIFORMITY & CONSTANTS: For every O(·), o(·), Θ(·), ≲, state whether
       it is uniform over all parameters. List hidden parameter dependence.
    G. EDGE/DEGENERATE CASES: Attempt to break each key lemma with a 1D,
       low-rank, or extreme-parameter construction.
    H. DEPENDENCY CONSISTENCY: Detect cycles or forward references to unproven
       results.

    ## OUTPUT FORMAT (per issue)
    For each issue found, provide:
    - id: sequential number
    - status: INVALID / UNJUSTIFIED / UNDERSTATED / OVERSTATED / UNCLEAR
    - impact: GLOBAL / LOCAL / COSMETIC
    - category: [from taxonomy]
    - location: section/equation/line
    - statement: what the proof claims
    - why_invalid: why this is wrong or unjustified
    - counterexample: YES (describe) / NO / CANDIDATE (describe attempt)
    - affects: which downstream results break if this is wrong
    - minimal_fix: how to fix it

    [FULL PROOF CONTENT HERE]

保存threadId。将结果解析为结构化问题列表，写入

PROOF_AUDIT.md

。

Phase 1.5: Counterexample Red Team

阶段1.5：反例攻坚

For each CRITICAL or MAJOR issue, and for every key lemma that introduces:

a new inequality bound
an identifiability/uniqueness claim
a curvature/PSD/strong convexity assertion
a uniform-in-parameter claim
a convergence mode upgrade (pointwise → uniform, in prob → w.h.p.)

Systematically attempt to construct counterexamples using:

Strategy	Description
Dimensional collapse	Set d=1 or 2, K=2, n small
Degeneracy	Singular covariance, tiny weight, overlapping means, identical components
Extremal distributions	Two-point ±a, bounded non-subGaussian, heavy tails
Adversarial parameter scaling	Pick parameters making neglected terms dominate
Numeric falsification	Translate lemma to a function, brute-force optimize over small domain

Rule: Label "counterexample found" ONLY if algebraically verified. Otherwise log as "candidate counterexample — needs verification."

Record all attempts (successful or not) in

PROOF_AUDIT.md

针对每个CRITICAL或MAJOR级问题，以及每个引入以下内容的关键引理：

新的不等式界
可识别性/唯一性论断
曲率/PSD/强凸性断言
参数一致型论断
收敛模式升级（逐点→一致，依概率→大概率）

系统性尝试构造反例，策略如下：

策略	描述
维度坍缩	设置d=1或2，K=2，n取小值
退化构造	奇异协方差、极小权重、重叠均值、相同分量
极端分布	两点分布±a、有界非次高斯分布、重尾分布
对抗性参数缩放	选择参数使被忽略项占主导
数值证伪	将引理转换为函数，在小范围内暴力优化测试

规则：仅当代数验证通过时，标记为「counterexample found」。否则记录为「candidate counterexample — needs verification」。

所有尝试（成功或失败）均记录在

PROOF_AUDIT.md

中。

Phase 2: Fix Implementation

阶段2：漏洞修复

For each issue, ordered by severity (FATAL → CRITICAL → MAJOR → MINOR):

按严重程度排序（FATAL → CRITICAL → MAJOR → MINOR）处理每个问题：

Step 2a: Choose fix strategy

步骤2a：选择修复策略

For each issue, explicitly choose one of:

ADD_DERIVATION: Write missing proof steps
STRENGTHEN_ASSUMPTION: Add conditions to theorem statement
WEAKEN_CLAIM: Reduce conclusion scope
ADD_REFERENCE: Cite known result + verify its conditions apply

Log this choice — it is a scope-changing decision when it alters theorem statements.

针对每个问题，明确选择以下策略之一：

ADD_DERIVATION: 补充缺失的证明步骤
STRENGTHEN_ASSUMPTION: 为定理添加条件
WEAKEN_CLAIM: 缩小结论范围
ADD_REFERENCE: 引用已有成果并验证其条件适用

记录该选择——当修改定理表述时，这属于范围变更决策。

Step 2b: Derive the fix mathematically

步骤2b：数学推导修复

Complete mathematical derivation, not just a claim
If new proposition/lemma needed, write in full theorem-proof style

完成完整的数学推导，而非仅给出论断
若需新增命题/引理，以完整的定理-证明格式撰写

Step 2c: Implement in LaTeX

步骤2c：在LaTeX中实现修复

Edit the
```
.tex
```
file
Preserve existing
```
\label
```
references where possible

编辑
```
.tex
```
文件
尽可能保留现有
```
\label
```
引用

Step 2d: Record the fix

步骤2d：记录修复内容

markdown

undefined

markdown

undefined

Fix N: [SHORT TITLE]

修复N: [简短标题]

Issue: [id] [CATEGORY] — [description] Severity: FATAL / CRITICAL / MAJOR / MINOR Status: INVALID / UNJUSTIFIED / UNDERSTATED / OVERSTATED Impact: GLOBAL / LOCAL / COSMETIC Fix strategy: ADD_DERIVATION / STRENGTHEN_ASSUMPTION / WEAKEN_CLAIM / ADD_REFERENCE Location: Section X, Lines Y-Z

BEFORE: [what the proof originally did] WHY WRONG: [mathematical problem, with counterexample if applicable] AFTER: [what the fix does] KEY EQUATION: [central new equation] PROOF OBLIGATIONS ADDED: [new conditions/lemmas introduced] DOWNSTREAM EFFECTS: [which results now need re-checking]

undefined

问题: [id] [类别] — [描述] 严重程度: FATAL / CRITICAL / MAJOR / MINOR 状态: INVALID / UNJUSTIFIED / UNDERSTATED / OVERSTATED 影响: GLOBAL / LOCAL / COSMETIC 修复策略: ADD_DERIVATION / STRENGTHEN_ASSUMPTION / WEAKEN_CLAIM / ADD_REFERENCE 位置: 第X节，第Y-Z行

修复前: [原证明内容] 问题原因: [数学层面的问题，如有反例需说明] 修复后: [修复后的内容] 核心方程: [新增的核心方程] 新增证明义务: [引入的新条件/引理] 下游影响: [哪些结果需要重新检查]

undefined

Step 2e: Compile check

步骤2e：编译检查

bash

pdflatex -interaction=nonstopmode <file>.tex 2>&1 | grep -E "Error|Warning|undefined"

bash

pdflatex -interaction=nonstopmode <file>.tex 2>&1 | grep -E "Error|Warning|undefined"

Phase 3: Re-Review (Codex GPT-5.4 xhigh)

阶段3：重新审核（Codex GPT-5.4 xhigh）

Use

codex-reply

with saved threadId. Include fix summaries. Request the same mandatory checklist.

Check acceptance gate. If not met, repeat Phases 2-3 (up to MAX_REVIEW_ROUNDS).

使用保存的threadId调用

codex-reply

，包含修复摘要，要求执行相同的强制审核清单。

检查验收标准。若未达标，重复阶段2-3（最多MAX_REVIEW_ROUNDS轮）。

Phase 3.5: Global Closure & Independent Verification

阶段3.5：全局收尾与独立验证

Global closure checks

全局收尾检查

After all fixes, verify the proof as a whole:

Statement–conclusion match: Does the proof end with EXACTLY what the theorem claims (quantifiers, constants, uniformity)?
All obligations discharged: Every node in the obligation DAG is proven or explicitly assumed (and the theorem statement includes it).
Case analysis coverage: Cases partition the domain AND include boundary/degenerate cases.
Induction correctness (if applicable): Base case, inductive step, correct use of IH, induction measure strictly decreases.
WLOG reductions: Each "without loss of generality" spawns a micro-claim proving the reduction is lossless.
No silent assumption strengthening: Any fix that strengthened assumptions has propagated to the main theorem statement.

完成所有修复后，整体验证证明：

表述-结论匹配: 证明结尾是否与定理宣称的内容完全一致（量词、常数、一致性）？
所有义务已核验: 义务DAG中的每个节点均已证明或明确假设（且定理表述包含该假设）。
案例分析覆盖: 案例划分了定义域且包含边界/退化情况。
归纳正确性（若适用）: 基础案例、归纳步骤、归纳假设的正确使用、归纳测度严格递减。
WLOG简化: 每个「不失一般性」均对应一个微论断，证明简化是无损的。
无隐性假设强化: 任何强化假设的修复均已同步更新至主定理表述。

Independent second review for FATAL/CRITICAL fixes

FATAL/CRITICAL级修复的独立二次审核

For any fix that resolved a FATAL or CRITICAL issue, submit the fixed section alone (without showing the previous critique) to a fresh Codex thread:

mcp__codex__codex:
  config: {"model_reasoning_effort": "xhigh"}
  prompt: |
    Blind review of the following proof section. You have NOT seen any prior
    review or discussion. Check every step for correctness, hidden assumptions,
    illegal interchanges, and counterexamples.
    [FIXED SECTION ONLY]

If the blind reviewer finds new issues, re-enter Phase 2.

针对任何解决了FATAL或CRITICAL级问题的修复，将修复后的单独章节（不展示之前的评审内容）提交至全新的Codex线程：

mcp__codex__codex:
  config: {"model_reasoning_effort": "xhigh"}
  prompt: |
    Blind review of the following proof section. You have NOT seen any prior
    review or discussion. Check every step for correctness, hidden assumptions,
    illegal interchanges, and counterexamples.
    [FIXED SECTION ONLY]

若盲审发现新问题，重新进入阶段2。

Regression proof-audit

回归审核

After fixes, re-run:

DAG acyclicity check (no new cycles introduced)
Counterexample suite on all DOWNSTREAM lemmas of modified results
Assumption-delta report: what became stronger/weaker due to fixes?

修复完成后，重新执行：

DAG无环检查（未引入新循环）
对修改结果的所有下游引理执行反例测试套件
假设变更报告：修复导致哪些假设变强/变弱？

Phase 3.9: Unrecoverable Proof Protocol

阶段3.9：不可修复证明处理流程

If acceptance gate is not met after MAX_REVIEW_ROUNDS, output a Proof Unrecoverable Report:

Minimal set of blocking FATAL/CRITICAL issues that could not be resolved
Salvage options ranked: (a) weaken claim, (b) strengthen assumptions, (c) add missing lemmas, (d) restructure argument
Which parts of the proof are likely still reusable
Recommended next steps for the author

Do NOT silently declare success. The report must be honest.

若经过MAX_REVIEW_ROUNDS轮后仍未达到验收标准，输出证明不可修复报告：

无法解决的阻塞性FATAL/CRITICAL级问题的最小集合
按优先级排序的挽救方案：(a) 弱化结论，(b) 强化假设，(c) 添加缺失引理，(d) 重构论证
证明中仍可复用的部分
给作者的下一步建议

不得默认宣称成功。报告必须如实反映情况。

Phase 4: Audit Report Generation

阶段4：生成审核报告

Generate

proof_audit_report.tex

with:

Overview table: All issues with two-axis severity, category, fix strategy, status
Before/After logic chain: Red (BEFORE) → Green (AFTER) comparison
For each fix: original proof → why wrong → counterexample (if any) → complete derivation → remaining subtleties
Proof-obligation diff: What was unverified before, what is verified now
Summary: Now proven / still assumed / open problems
Colored boxes: BEFORE (red), AFTER (green), WHY WRONG (orange), KEY INSIGHT (blue), WARNING (yellow)

Compile:

pdflatex proof_audit_report.tex && pdflatex proof_audit_report.tex

生成

proof_audit_report.tex

，包含：

概览表格: 所有问题的双轴严重程度、类别、修复策略、状态
逻辑链对比: 红色（修复前）→ 绿色（修复后）的对比
每个修复详情: 原证明 → 问题原因 → 反例（如有）→ 完整推导 → 剩余细节
证明义务差异: 修复前未核验的内容与当前已核验的内容
总结: 已证明内容 / 仍假设内容 / 未解决问题
彩色标注框: 修复前（红）、修复后（绿）、问题原因（橙）、核心洞察（蓝）、警告（黄）

编译：

pdflatex proof_audit_report.tex && pdflatex proof_audit_report.tex

Phase 5: State Persistence

阶段5：状态持久化

Write

PROOF_CHECK_STATE.json

json

{
  "status": "completed",
  "rounds": 2,
  "threadId": "...",
  "fatal_fixed": 0,
  "critical_fixed": 3,
  "major_fixed": 2,
  "minor_fixed": 1,
  "counterexamples_found": 1,
  "counterexample_candidates": 2,
  "acceptance_gate": "PASS",
  "timestamp": "..."
}

写入

PROOF_CHECK_STATE.json

：

json

{
  "status": "completed",
  "rounds": 2,
  "threadId": "...",
  "fatal_fixed": 0,
  "critical_fixed": 3,
  "major_fixed": 2,
  "minor_fixed": 1,
  "counterexamples_found": 1,
  "counterexample_candidates": 2,
  "acceptance_gate": "PASS",
  "timestamp": "..."
}

Key Rules

核心规则

Mathematical rigor

数学严谨性

Never accept a proof step on faith. "Clearly" / "it follows" / "by standard arguments" are red flags — each must spawn a micro-claim.
Hypothesis discharge: Every time a lemma is APPLIED, verify EACH of its hypotheses at that point. Use the side-condition checklists above.
Interchange discipline: Every swap of limit/expectation/derivative/integral must cite a theorem (DCT/MCT/Fubini/Leibniz) and verify its conditions with explicit dominating function or integrability proof.
Uniformity discipline: Every O(·)/Θ(·) must declare what parameters it is uniform over. "O(1)" that secretly depends on d,n,K is a CONSTANT_DEPENDENCE_HIDDEN issue.
Quantifier discipline: Check ∀/∃ order. "For sufficiently small κ" must specify: does κ₀ depend on K? On π? On d?
Counterexample-first: Before trying to fix a gap, first try to break it.
WLOG prohibition: Every "without loss of generality" must have an explicit micro-claim proving the reduction. No free WLOGs.
No silent assumption strengthening: Any fix that adds conditions must propagate to the theorem statement.

绝不轻信证明步骤。「显然」「由此可得」「标准论证」均为危险信号——每个此类表述都需生成一个微论断。
假设核验: 每次应用引理时，需在应用点验证其每个假设条件。使用上述附带条件检查清单。
推导顺序规范: 每次交换极限/期望/导数/积分顺序时，必须引用定理（DCT/MCT/Fubini/莱布尼茨）并明确验证其条件，如提供控制函数或可积性证明。
一致性规范: 每个O(·)/Θ(·)必须声明其对哪些参数一致。暗中依赖d,n,K的「O(1)」属于CONSTANT_DEPENDENCE_HIDDEN问题。
量词规范: 检查∀/∃顺序。「当κ足够小时」必须明确：κ₀是否依赖于K？是否依赖于π？是否依赖于d？
反例优先: 在尝试修复漏洞前，先尝试构造反例。
WLOG限制: 每个「不失一般性」必须有明确的微论断证明简化的无损性。不得随意使用WLOG。
无隐性假设强化: 任何添加条件的修复必须同步更新至定理表述。

Cross-model protocol

跨模型协议

Claude analyzes, Codex reviews: Claude reads proof, formulates questions, implements fixes. Codex provides adversarial review.
Codex reasoning always xhigh: Never downgrade.
Send full content: Don't summarize — send actual math for line-by-line checking.
Preserve threadId: Use
```
codex-reply
```
for follow-up rounds.

Claude分析，Codex审核: Claude读取证明、提出问题、实现修复。Codex提供对抗性审核。
Codex推理强度始终为xhigh: 不得降低。
发送完整内容: 不得总结——发送实际数学内容进行逐行检查。
保留threadId: 后续轮次使用
```
codex-reply
```
。

Fix quality

修复质量

Minimal fixes: Fix exactly what's broken, nothing more.
Full derivation: Every fix includes complete mathematical argument.
Explicit scope decisions: Each fix is tagged ADD_DERIVATION / STRENGTHEN_ASSUMPTION / WEAKEN_CLAIM / ADD_REFERENCE.
Compile after each fix: LaTeX must compile cleanly.

最小修复: 仅修复问题本身，不做额外修改。
完整推导: 每个修复包含完整的数学论证。
明确范围决策: 每个修复标记为ADD_DERIVATION / STRENGTHEN_ASSUMPTION / WEAKEN_CLAIM / ADD_REFERENCE。
每次修复后编译: LaTeX必须能正常编译。

Scope honesty

范围诚实性

Don't overclaim: If a fix makes a result conditional, say so.
Separate "proven" from "assumed": The audit report has an explicit section for this.
Log open problems: Issues requiring future work are listed, not hidden.

不夸大: 若修复使结果变为条件性成立，需明确说明。
区分「已证明」与「假设」: 审核报告需包含明确的对应章节。
记录未解决问题: 需列出需后续处理的问题，不得隐藏。

Output Files

输出文件

File	Content	When
`PROOF_SKELETON.md`	Dependency DAG + assumption ledger + micro-claims	Phase 0.5
`PROOF_AUDIT.md`	Cumulative round-by-round audit log	Updated each round
`PROOF_AUDIT.json`	Machine-readable submission verdict (see below)	Always emitted
`proof_audit_report.tex/.pdf`	Formal before/after report	Phase 4
`PROOF_CHECK_STATE.json`	State for recovery	Phase 5

文件	内容	生成阶段
`PROOF_SKELETON.md`	依赖关系DAG + 假设台账 + 微论断清单	阶段0.5
`PROOF_AUDIT.md`	逐轮累积审核日志	每轮更新
`PROOF_AUDIT.json`	机器可读的提交 verdict（见下文）	始终生成
`proof_audit_report.tex/.pdf`	正式的修复前后报告	阶段4
`PROOF_CHECK_STATE.json`	用于恢复的状态文件	阶段5

Submission Artifact Emission

提交工件生成

This skill always writes

PROOF_AUDIT.json

at the paper directory root (i.e.

paper/PROOF_AUDIT.json

when invoked from

/paper-writing

with paper-dir

paper/

;

<your-paper-dir>/PROOF_AUDIT.json

when invoked standalone), regardless of caller or whether the paper contains theorems. A paper with no

\begin{theorem}

\begin{lemma}

\begin{proof}

emits verdict

NOT_APPLICABLE

; silent skip is forbidden.

paper-writing

Phase 6 and

tools/verify_paper_audits.sh

both rely on this artifact existing at

<paper-dir>/PROOF_AUDIT.json

The artifact conforms to the schema in

shared-references/assurance-contract.md

json

{
  "audit_skill":      "proof-checker",
  "verdict":          "PASS | WARN | FAIL | NOT_APPLICABLE | BLOCKED | ERROR",
  "reason_code":      "all_proofs_complete | minor_gaps | critical_gap | no_theorems | ...",
  "summary":          "One-line human-readable verdict summary.",
  "audited_input_hashes": {
    "main.tex":                 "sha256:...",
    "sections/4.theory.tex":    "sha256:..."
  },
  "trace_path":       ".aris/traces/proof-checker/<date>_run<NN>/",
  "thread_id":        "<codex mcp thread id>",
  "reviewer_model":   "gpt-5.4",
  "reviewer_reasoning": "xhigh",
  "generated_at":     "<UTC ISO-8601>",
  "details": {
    "theorems_audited": <int>,
    "issues": [ { "id": "T1-H3", "severity": "FATAL|CRITICAL|MAJOR|MINOR",
                  "category": "quantifier|domination|...",
                  "location": "sections/4.theory.tex:L182",
                  "note": "..." }, ... ]
  }
}

PROOF_AUDIT.json

（即通过

/paper-writing

调用且论文目录为

paper/

时，路径为

paper/PROOF_AUDIT.json

；独立调用时，路径为

<your-paper-dir>/PROOF_AUDIT.json

），无论调用者是谁或论文是否包含定理。若论文中无

\begin{theorem}

\begin{lemma}

\begin{proof}

，则输出verdict为

NOT_APPLICABLE

；禁止静默跳过。

paper-writing

阶段6和

tools/verify_paper_audits.sh

均依赖于该工件存在于

<paper-dir>/PROOF_AUDIT.json

。

工件符合

shared-references/assurance-contract.md

中的 schema：

json

{
  "audit_skill":      "proof-checker",
  "verdict":          "PASS | WARN | FAIL | NOT_APPLICABLE | BLOCKED | ERROR",
  "reason_code":      "all_proofs_complete | minor_gaps | critical_gap | no_theorems | ...",
  "summary":          "单行人类可读的verdict摘要。",
  "audited_input_hashes": {
    "main.tex":                 "sha256:...",
    "sections/4.theory.tex":    "sha256:..."
  },
  "trace_path":       ".aris/traces/proof-checker/<date>_run<NN>/",
  "thread_id":        "<codex mcp thread id>",
  "reviewer_model":   "gpt-5.4",
  "reviewer_reasoning": "xhigh",
  "generated_at":     "<UTC ISO-8601>",
  "details": {
    "theorems_audited": <int>,
    "issues": [ { "id": "T1-H3", "severity": "FATAL|CRITICAL|MAJOR|MINOR",
                  "category": "quantifier|domination|...",
                  "location": "sections/4.theory.tex:L182",
                  "note": "..." }, ... ]
  }
}

audited_input_hashes

scope

audited_input_hashes

范围

Hash the declared input set actually reviewed — the theorem-bearing

.tex

files passed into this invocation — not a repo-wide union and not the reviewer's self-reported opened subset. The external verifier rehashes these entries; any mismatch flags

STALE

Path convention (must match

tools/verify_paper_audits.sh

): keys are paths relative to the paper directory (no

paper/

prefix — the verifier resolves relative to the paper dir; prefixing produces

paper/paper/...

and false-fails as STALE). Use absolute paths for files outside the paper dir.

对实际审核的声明输入集进行哈希——即本次调用传入的包含定理的

.tex

文件——而非整个仓库的文件集合或审核者自行打开的子集。外部验证者会重新哈希这些条目；任何不匹配都会标记为

STALE

。

路径约定（必须与

tools/verify_paper_audits.sh

一致）：键为相对于论文目录的路径（无

paper/

前缀——验证者会相对于论文目录解析；添加前缀会产生

paper/paper/...

路径并导致误判为STALE）。论文目录外的文件使用绝对路径。

Verdict decision table

Verdict决策表

Input state	Verdict	`reason_code` example
No theorems / lemmas / proofs in paper	`NOT_APPLICABLE`	`no_theorems`
Theorems present but referenced files unreadable	`BLOCKED`	`source_unreadable`
All proof obligations discharged, no gaps	`PASS`	`all_proofs_complete`
Only MINOR issues (notation / exposition)	`WARN`	`minor_gaps`
Any FATAL or CRITICAL issue (logic gap, wrong claim)	`FAIL`	`critical_gap`
Reviewer invocation failed (network / malformed)	`ERROR`	`reviewer_error`

MAJOR issues alone map to

WARN

FAIL

at the reviewer's discretion and must carry an explicit justification in

summary

details.issues

输入状态	Verdict	`reason_code` 示例
论文中无定理/引理/证明	`NOT_APPLICABLE`	`no_theorems`
存在定理但引用文件无法读取	`BLOCKED`	`source_unreadable`
所有证明义务已核验，无漏洞	`PASS`	`all_proofs_complete`
仅存在MINOR级问题（符号/表述）	`WARN`	`minor_gaps`
存在任何FATAL或CRITICAL级问题（逻辑漏洞、错误论断）	`FAIL`	`critical_gap`
审核者调用失败（网络/格式错误）	`ERROR`	`reviewer_error`

仅存在MAJOR级问题时，由审核者决定映射为

WARN

或

FAIL

，且必须在

summary

details.issues

中提供明确理由。

Thread independence

线程独立性

Every invocation uses a fresh

mcp__codex__codex

thread. Never

codex-reply

across proof-checker runs. Do not accept prior audit outputs (PAPER_CLAIM_AUDIT, CITATION_AUDIT, EXPERIMENT_LOG) as input — the fresh thread preserves reviewer independence per

shared-references/reviewer-independence.md

This skill never blocks by itself;

paper-writing

Phase 6 plus the verifier decide whether the verdict blocks finalization based on the

assurance

level.

每次调用使用全新的

mcp__codex__codex

线程。不得跨proof-checker运行调用

codex-reply

。不得接受之前的审核输出（PAPER_CLAIM_AUDIT、CITATION_AUDIT、EXPERIMENT_LOG）作为输入——全新线程确保审核者独立性，符合

shared-references/reviewer-independence.md

。

本工具自身不会阻塞流程；

paper-writing

阶段6和验证者会根据

assurance

级别决定verdict是否阻塞最终定稿。

Example Invocations

调用示例

/proof-checker "neurips_2025.tex"
/proof-checker "check the GMM generalization proof, focus on dimension dependence"
/proof-checker "verify proof in paper.tex — difficulty: nightmare"

/proof-checker "neurips_2025.tex"
/proof-checker "check the GMM generalization proof, focus on dimension dependence"
/proof-checker "verify proof in paper.tex — difficulty: nightmare"