proof-checker

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Proof Checker: Rigorous Mathematical Verification & Fixing

证明检查器:严谨的数学验证与修正

Systematically verify a mathematical proof via cross-model adversarial review, fix identified gaps, re-review until convergence, and generate a detailed audit report with proof-obligation accounting.
通过跨模型对抗性审核系统地验证数学证明,修复识别出的漏洞,重复审核直至收敛,并生成包含证明义务核算的详细审核报告。

Context: $ARGUMENTS

上下文:$ARGUMENTS

Constants

常量

  • MAX_REVIEW_ROUNDS = 3
  • REVIEWER_MODEL =
    gpt-5.4
    via Codex MCP, reasoning effort always
    xhigh
  • REVIEWER_BACKEND =
    codex
    — Default: Codex MCP (xhigh). Override with
    — reviewer: oracle-pro
    for GPT-5.4 Pro via Oracle MCP. See
    shared-references/reviewer-routing.md
    .
  • AUDIT_DOC:
    PROOF_AUDIT.md
    at the paper directory root, alongside
    main.tex
    (cumulative log; when invoked via
    /paper-writing
    , this is
    paper/PROOF_AUDIT.md
    )
  • REPORT_TEX:
    proof_audit_report.tex
    (formal before/after PDF)
  • STATE_FILE:
    PROOF_CHECK_STATE.json
    (for recovery)
  • SKELETON_DOC:
    PROOF_SKELETON.md
    (micro-claim inventory)
  • MAX_REVIEW_ROUNDS = 3
  • REVIEWER_MODEL =
    gpt-5.4
    via Codex MCP,推理强度始终设为
    xhigh
  • REVIEWER_BACKEND =
    codex
    — 默认:Codex MCP(xhigh)。可通过
    — reviewer: oracle-pro
    覆盖为Oracle MCP提供的GPT-5.4 Pro。详见
    shared-references/reviewer-routing.md
  • AUDIT_DOC: 论文目录根目录下的
    PROOF_AUDIT.md
    ,与
    main.tex
    同级(累积日志;通过
    /paper-writing
    调用时,路径为
    paper/PROOF_AUDIT.md
  • REPORT_TEX:
    proof_audit_report.tex
    (正式的修正前后PDF文件源)
  • STATE_FILE:
    PROOF_CHECK_STATE.json
    (用于恢复检查状态)
  • SKELETON_DOC:
    PROOF_SKELETON.md
    (微论断清单)

Acceptance Gate (objective, replaces subjective scoring)

验收标准(客观标准,替代主观评分)

The proof passes when ALL of the following hold:
  1. Zero open FATAL or CRITICAL issues
  2. Every theorem/lemma has: (i) explicit hypotheses, (ii) proof with all interchanges justified, (iii) every application discharges hypotheses in the ledger
  3. All big-O/Θ/o statements have declared parameter dependence and uniformity scope
  4. Counterexample pass executed on all key lemmas (log candidates even if none found)
当且仅当满足以下所有条件时,证明通过审核:
  1. 无未解决的FATAL或CRITICAL级问题
  2. 每个定理/引理均满足:(i) 明确的假设条件,(ii) 所有推导步骤均有依据的证明过程,(iii) 每一次定理应用都在台账中完成假设条件的核验
  3. 所有大O/Θ/o符号声明均明确参数依赖关系和一致性范围
  4. 所有关键引理均完成反例测试(即使未找到反例,也需记录候选测试案例)

Issue Taxonomy (20 categories, 4 groups)

问题分类体系(20个类别,4个组别)

Group A: Logic & Proof Structure

组别A:逻辑与证明结构

CategoryDescriptionExample
UNJUSTIFIED_ASSERTIONClaim stated without proof or reference"The Hessian splits into Gram blocks"
UNPROVEN_SUBCLAIM"Clearly" / "it follows" hides a nontrivial lemma"By symmetry, the cross-terms vanish" without checking
QUANTIFIER_ERRORWrong order ∀/∃, missing "for sufficiently small κ""For all π, there exists ε" vs "there exists ε for all π"
IMPLICATION_REVERSALUses (A⇒B) as (B⇒A), or claims equivalence with only one direction
CASE_INCOMPLETEMisses boundary/degenerate casesSingular covariance, zero weight, non-unique argmin
CIRCULAR_DEPENDENCYLemma uses theorem that depends on it
LOGICAL_GAPA step is not justified by what precedes itB=Θ(1) → β_K=0 without analyzing W
类别描述示例
UNJUSTIFIED_ASSERTION提出无证明或无引用支撑的论断"The Hessian splits into Gram blocks"
UNPROVEN_SUBCLAIM使用「显然」「由此可得」等表述隐藏非平凡引理"By symmetry, the cross-terms vanish" 未做验证
QUANTIFIER_ERROR量词∀/∃顺序错误,遗漏「当κ足够小时」等限定"For all π, there exists ε" 与 "there exists ε for all π" 混淆
IMPLICATION_REVERSAL将(A⇒B)当作(B⇒A)使用,或仅证明单方向就宣称等价关系
CASE_INCOMPLETE遗漏边界/退化情况奇异协方差、零权重、非唯一 argmin
CIRCULAR_DEPENDENCY引理使用了依赖于该引理的定理
LOGICAL_GAP某一步骤无法由前置内容推导得出B=Θ(1) → β_K=0 未分析W的影响

Group B: Analysis & Measure Theory

组别B:分析学与测度论

CategoryDescriptionExample
ILLEGAL_INTERCHANGESwaps limit/expectation/derivative/integral without DCT/MCT/FubiniDifferentiating under E without domination
NONUNIFORM_CONVERGENCEPointwise convergence used as uniformsup and limit swapped
MISSING_DOMINATIONDCT cited but no dominating function given
INTEGRABILITY_GAPUses EX
REGULARITY_GAPDifferentiability/Lipschitz/convexity used but not established
STOCHASTIC_MODE_CONFUSIONMixes a.s./in prob./in L²/in expectation
类别描述示例
ILLEGAL_INTERCHANGE未依据DCT/MCT/Fubini定理就交换极限/期望/导数/积分顺序未满足控制条件就对E求导
NONUNIFORM_CONVERGENCE将逐点收敛当作一致收敛使用交换上确界与极限顺序
MISSING_DOMINATION引用DCT定理但未给出控制函数
INTEGRABILITY_GAP使用EX
REGULARITY_GAP使用可微性/Lipschitz性/凸性但未证明其成立
STOCHASTIC_MODE_CONFUSION混淆几乎必然/依概率/依L²/依期望等收敛模式

Group C: Model & Parameter Tracking

组别C:模型与参数追踪

CategoryDescriptionExample
MISSING_DERIVATIONA quantity is used but never derived from the modelRisk functional with undefined B, W
HIDDEN_ASSUMPTIONProof silently uses a condition not in the theoremGaussianity assumed but not stated
INSUFFICIENT_ASSUMPTIONHypotheses too weak for proof (counterexample exists)Moment conditions admitting 2-point distributions
DIMENSION_TRACKINGParameter dependence (d, n, K, ...) not explicitd enters only through κ
NORMALIZATION_MISMATCHCoordinate/scaling conventions inconsistentRescaled vs raw coordinates
CONSTANT_DEPENDENCE_HIDDEN"C" depends on d,n,K but treated as universal
类别描述示例
MISSING_DERIVATION使用某一量但从未基于模型推导得出使用未定义B、W的风险函数
HIDDEN_ASSUMPTION证明中默认使用定理声明未提及的条件假设高斯性但未在定理中声明
INSUFFICIENT_ASSUMPTION假设条件过于薄弱,无法支撑证明(存在反例)矩条件允许两点分布
DIMENSION_TRACKING参数依赖关系(d, n, K等)未明确d仅通过κ引入但未说明
NORMALIZATION_MISMATCH坐标/缩放约定不一致重缩放坐标与原始坐标混用
CONSTANT_DEPENDENCE_HIDDEN"C"依赖于d,n,K但被当作通用常数处理

Group D: Scope & Claims

组别D:范围与论断

CategoryDescriptionExample
SCOPE_OVERCLAIMConclusion stated more broadly than proof supports"β_K=0" with only generic overlap
REFERENCE_MISMATCHCited theorem's hypotheses not verified at point of use
类别描述示例
SCOPE_OVERCLAIM结论表述范围超出证明可支撑的范围"β_K=0" 仅在通用重叠情况下成立却未限定
REFERENCE_MISMATCH引用的定理在应用时未验证其假设条件

Two-Axis Severity System

双轴严重程度体系

Axis A — Proof Status (what is wrong)

轴A — 证明状态(问题类型)

StatusMeaning
INVALIDStatement false as written (counterexample exists or contradiction)
UNJUSTIFIEDCould be true, but current proof does not establish it
UNDERSTATEDTrue only after strengthening assumptions
OVERSTATEDTrue only after weakening conclusion / adding qualifiers
UNCLEARAmbiguous notation / definition drift (not wrong per se)
状态含义
INVALID所述内容本身错误(存在反例或矛盾)
UNJUSTIFIED内容可能为真,但当前证明无法支撑
UNDERSTATED仅在强化假设条件后才成立
OVERSTATED仅在弱化结论/添加限定后才成立
UNCLEAR符号模糊/定义漂移(本身并非错误)

Axis B — Impact (how much breaks)

轴B — 影响程度(破坏范围)

ImpactMeaning
GLOBALBreaks main theorem or core dependency chain
LOCALAffects a side result but not the main theorem
COSMETICExposition only
影响含义
GLOBAL破坏主定理或核心依赖链
LOCAL影响次要结果但不影响主定理
COSMETIC仅影响表述方式

Severity Labels (derived)

严重程度标签(推导得出)

LabelDefinition
FATALINVALID + GLOBAL
CRITICAL(INVALID + LOCAL) or (UNJUSTIFIED + GLOBAL)
MAJOR(UNJUSTIFIED + LOCAL) or (UNDERSTATED/OVERSTATED + GLOBAL)
MINORClarity / notation / dimension bookkeeping that doesn't change claims
标签定义
FATALINVALID + GLOBAL
CRITICAL(INVALID + LOCAL) 或 (UNJUSTIFIED + GLOBAL)
MAJOR(UNJUSTIFIED + LOCAL) 或 (UNDERSTATED/OVERSTATED + GLOBAL)
MINOR不改变论断的表述清晰度/符号/维度记录问题

Side-Condition Checklists for Common Theorems

常见定理的附带条件检查清单

When the proof invokes any of the following, require explicit verification of ALL listed conditions:
TheoremRequired Conditions
DCT (Dominated Convergence)Pointwise a.e. convergence + integrable dominating function
MCT (Monotone Convergence)Monotone increasing + non-negative
Fubini/TonelliProduct measurability + integrability (Fubini) or non-negative (Tonelli)
Leibniz integral ruleContinuity of integrand + dominating function for derivative
Implicit Function TheoremContinuous differentiability + non-singular Jacobian
Taylor with remainderSufficient differentiability + remainder form (Lagrange/integral)
Jensen's inequalityConvexity of function + integrability
Cauchy-SchwarzCorrect inner product space + integrability of both factors
Weyl/Davis-KahanSymmetry/Hermiticity + perturbation bound conditions
Analytic continuationDomain connectivity + identity theorem conditions
WLOG reductionInvariance under claimed symmetry + reduction is reversible
当证明中引用以下任一定理时,需明确验证所有列出的条件:
定理所需条件
DCT(控制收敛定理)几乎处处逐点收敛 + 可积控制函数
MCT(单调收敛定理)单调递增 + 非负
Fubini/Tonelli乘积可测性 + 可积性(Fubini)或非负(Tonelli)
莱布尼茨积分法则被积函数连续 + 导数存在控制函数
隐函数定理连续可微 + 雅可比矩阵非奇异
带余项的泰勒定理足够阶数的可微性 + 余项形式(拉格朗日/积分型)
Jensen不等式函数凸性 + 可积性
柯西-施瓦茨不等式正确的内积空间 + 两个因子均可积
Weyl/Davis-Kahan对称性/埃尔米特性 + 扰动界条件
解析延拓定义域连通性 + 恒等定理条件
WLOG简化满足宣称的对称性不变性 + 简化可逆

Workflow

工作流

Phase 0: Preparation

阶段0:准备工作

  1. Locate the proof: Find the main
    .tex
    file(s).
  2. Read the entire proof: Extract list of all theorems/lemmas/propositions/corollaries/definitions/assumptions.
  3. Read reference materials: Reference papers, prior results.
  4. Build a section map: Structured list with line numbers and key claims.
  5. Identify the main theorem: Central result, assumptions, claims.
  1. 定位证明: 找到主
    .tex
    文件。
  2. 通读证明: 提取所有定理/引理/命题/推论/定义/假设的列表。
  3. 阅读参考资料: 参考论文、已有成果。
  4. 构建章节映射: 包含行号和关键论断的结构化列表。
  5. 识别主定理: 核心结论、假设条件、论断内容。

Phase 0.5: Proof-Obligation Ledger

阶段0.5:证明义务台账

Build formal accounting artifacts. Save to
PROOF_SKELETON.md
:
构建正式的核算工件,保存至
PROOF_SKELETON.md

1. Dependency DAG

1. 依赖关系DAG

Nodes = Definitions / Assumptions / Lemmas / Theorems. Edges = "uses". Detect cycles (including semantic circularity where Lemma A uses a corollary that quietly depends on A).
节点 = 定义 / 假设 / 引理 / 定理。边 = "使用"。检测循环(包括引理A使用了间接依赖于A的推论这类语义循环)。

2. Assumption Ledger

2. 假设台账

For each theorem/lemma, list every hypothesis with WHERE each is verified (or mark "UNVERIFIED"). Track usage-minimal assumption sets — which assumptions were actually used vs merely stated.
针对每个定理/引理,列出所有假设条件及每个条件的核验位置(或标记为「UNVERIFIED」)。追踪最小必要假设集——实际使用的假设与仅声明的假设的差异。

3. Typed Symbol Table

3. 类型化符号表

Each symbol must have a type signature:
κ : scalar ∈ (0,1), depends on (d, α_t, Σ, μ)
u* : vector ∈ ℝ^d, u* = C^{-1}m
B^even : matrix ∈ ℝ^{(L+1)×(L+1)}, symmetric PSD
Ψ_v : function ℝ → ℝ, analytic in (ζ,κ), parity determined by v
Flag any symbol whose meaning changes or whose type is inconsistent across uses.
每个符号必须有类型签名
κ : scalar ∈ (0,1), depends on (d, α_t, Σ, μ)
u* : vector ∈ ℝ^d, u* = C^{-1}m
B^even : matrix ∈ ℝ^{(L+1)×(L+1)}, symmetric PSD
Ψ_v : function ℝ → ℝ, analytic in (ζ,κ), parity determined by v
标记任何含义变化或类型使用不一致的符号。

4. Canonical Quantified Statements

4. 规范化量化表述

For each theorem/lemma, rewrite the statement with explicit quantifiers, domains, and limit order:
∀K ≥ 3, ∀π ∈ Π_K^{ms,∘} \ E_K, ∃κ_0 > 0 such that ∀κ ∈ (0, κ_0):
  h_act^{(K,π)} = Θ(κ^{α_K^act})  [uniform in π on compact subsets]
If you cannot restate a theorem this precisely, mark it UNCLEAR — needs disambiguation.
针对每个定理/引理,用明确的量词、定义域和极限顺序重写表述:
∀K ≥ 3, ∀π ∈ Π_K^{ms,∘} \ E_K, ∃κ_0 > 0 such that ∀κ ∈ (0, κ_0):
  h_act^{(K,π)} = Θ(κ^{α_K^act})  [uniform in π on compact subsets]
若无法精确重述定理,标记为UNCLEAR — 需要歧义消除

5. Micro-Claim Inventory

5. 微论断清单

Every nontrivial step becomes a numbered micro-claim in sequent form:
MC-17: Context: [Lemma 3.1, κ < κ_0, Z_κ has bounded moments up to order 2m+2]
       ⊢ Goal: P̂_0 is positive definite
       Rule: monomials linearly independent on support of continuous distribution
       Side-conditions: positive density near origin ✓ (by GMM weak convergence)
Each micro-claim has: justification rule name + required conditions + where conditions are proven.
每个非平凡步骤均转换为编号的相继式形式微论断:
MC-17: Context: [Lemma 3.1, κ < κ_0, Z_κ has bounded moments up to order 2m+2]
       ⊢ Goal: P̂_0 is positive definite
       Rule: monomials linearly independent on support of continuous distribution
       Side-conditions: positive density near origin ✓ (by GMM weak convergence)
每个微论断包含:依据的规则名称 + 所需条件 + 条件的证明位置。

6. Limit-Order Map

6. 极限顺序映射

Track every asymptotic statement's limit order and uniformity scope:
h_act = Θ(κ^α)  [as κ→0, uniform in π on compact subsets of Π_K, for fixed K]
τ_act ~ (b/a)n   [as n→∞, for fixed κ,K,π with x_K ≪ 1]
Flag any statement where limit order is ambiguous or uniformity is unclear.
追踪每个渐近表述的极限顺序和一致性范围
h_act = Θ(κ^α)  [as κ→0, uniform in π on compact subsets of Π_K, for fixed K]
τ_act ~ (b/a)n   [as n→∞, for fixed κ,K,π with x_K ≪ 1]
标记任何极限顺序模糊或一致性范围不明确的表述。

Phase 1: First Review (Codex GPT-5.4 xhigh)

阶段1:首次审核(Codex GPT-5.4 xhigh)

Submit the complete proof content with the following mandatory reviewer checklist in the prompt:
mcp__codex__codex:
  config: {"model_reasoning_effort": "xhigh"}
  prompt: |
    You are performing a rigorous mathematical proof review. For EVERY theorem,
    lemma, and proposition, check ALL of the following:

    ## MANDATORY CHECKS

    A. DEFINITIONS: List any symbol whose meaning is ambiguous or changes.
    B. HYPOTHESIS DISCHARGE: For each lemma/theorem APPLICATION (not statement),
       list each hypothesis and whether it was verified, with location.
    C. INEQUALITY AUDIT: For each inequality chain, verify direction, missing
       absolute values, missing conditions (convexity, PSD, integrability).
    D. INTERCHANGE AUDIT: Flag every limit/derivative/expectation/integral
       interchange. State which theorem justifies it (DCT/MCT/Fubini/Leibniz)
       and which conditions are verified/missing.
    E. PROBABILITY MODE: Track whether claims are a.s./in prob./in expectation/
       w.h.p. Ensure transitions are justified.
    F. UNIFORMITY & CONSTANTS: For every O(·), o(·), Θ(·), ≲, state whether
       it is uniform over all parameters. List hidden parameter dependence.
    G. EDGE/DEGENERATE CASES: Attempt to break each key lemma with a 1D,
       low-rank, or extreme-parameter construction.
    H. DEPENDENCY CONSISTENCY: Detect cycles or forward references to unproven
       results.

    ## OUTPUT FORMAT (per issue)
    For each issue found, provide:
    - id: sequential number
    - status: INVALID / UNJUSTIFIED / UNDERSTATED / OVERSTATED / UNCLEAR
    - impact: GLOBAL / LOCAL / COSMETIC
    - category: [from taxonomy]
    - location: section/equation/line
    - statement: what the proof claims
    - why_invalid: why this is wrong or unjustified
    - counterexample: YES (describe) / NO / CANDIDATE (describe attempt)
    - affects: which downstream results break if this is wrong
    - minimal_fix: how to fix it

    [FULL PROOF CONTENT HERE]
Save the threadId. Parse into structured issue list. Write to
PROOF_AUDIT.md
.
提交完整的证明内容,并在提示中包含以下强制审核清单
mcp__codex__codex:
  config: {"model_reasoning_effort": "xhigh"}
  prompt: |
    You are performing a rigorous mathematical proof review. For EVERY theorem,
    lemma, and proposition, check ALL of the following:

    ## MANDATORY CHECKS

    A. DEFINITIONS: List any symbol whose meaning is ambiguous or changes.
    B. HYPOTHESIS DISCHARGE: For each lemma/theorem APPLICATION (not statement),
       list each hypothesis and whether it was verified, with location.
    C. INEQUALITY AUDIT: For each inequality chain, verify direction, missing
       absolute values, missing conditions (convexity, PSD, integrability).
    D. INTERCHANGE AUDIT: Flag every limit/derivative/expectation/integral
       interchange. State which theorem justifies it (DCT/MCT/Fubini/Leibniz)
       and which conditions are verified/missing.
    E. PROBABILITY MODE: Track whether claims are a.s./in prob./in expectation/
       w.h.p. Ensure transitions are justified.
    F. UNIFORMITY & CONSTANTS: For every O(·), o(·), Θ(·), ≲, state whether
       it is uniform over all parameters. List hidden parameter dependence.
    G. EDGE/DEGENERATE CASES: Attempt to break each key lemma with a 1D,
       low-rank, or extreme-parameter construction.
    H. DEPENDENCY CONSISTENCY: Detect cycles or forward references to unproven
       results.

    ## OUTPUT FORMAT (per issue)
    For each issue found, provide:
    - id: sequential number
    - status: INVALID / UNJUSTIFIED / UNDERSTATED / OVERSTATED / UNCLEAR
    - impact: GLOBAL / LOCAL / COSMETIC
    - category: [from taxonomy]
    - location: section/equation/line
    - statement: what the proof claims
    - why_invalid: why this is wrong or unjustified
    - counterexample: YES (describe) / NO / CANDIDATE (describe attempt)
    - affects: which downstream results break if this is wrong
    - minimal_fix: how to fix it

    [FULL PROOF CONTENT HERE]
保存threadId。将结果解析为结构化问题列表,写入
PROOF_AUDIT.md

Phase 1.5: Counterexample Red Team

阶段1.5:反例攻坚

For each CRITICAL or MAJOR issue, and for every key lemma that introduces:
  • a new inequality bound
  • an identifiability/uniqueness claim
  • a curvature/PSD/strong convexity assertion
  • a uniform-in-parameter claim
  • a convergence mode upgrade (pointwise → uniform, in prob → w.h.p.)
Systematically attempt to construct counterexamples using:
StrategyDescription
Dimensional collapseSet d=1 or 2, K=2, n small
DegeneracySingular covariance, tiny weight, overlapping means, identical components
Extremal distributionsTwo-point ±a, bounded non-subGaussian, heavy tails
Adversarial parameter scalingPick parameters making neglected terms dominate
Numeric falsificationTranslate lemma to a function, brute-force optimize over small domain
Rule: Label "counterexample found" ONLY if algebraically verified. Otherwise log as "candidate counterexample — needs verification."
Record all attempts (successful or not) in
PROOF_AUDIT.md
.
针对每个CRITICAL或MAJOR级问题,以及每个引入以下内容的关键引理:
  • 新的不等式界
  • 可识别性/唯一性论断
  • 曲率/PSD/强凸性断言
  • 参数一致型论断
  • 收敛模式升级(逐点→一致,依概率→大概率)
系统性尝试构造反例,策略如下:
策略描述
维度坍缩设置d=1或2,K=2,n取小值
退化构造奇异协方差、极小权重、重叠均值、相同分量
极端分布两点分布±a、有界非次高斯分布、重尾分布
对抗性参数缩放选择参数使被忽略项占主导
数值证伪将引理转换为函数,在小范围内暴力优化测试
规则:仅当代数验证通过时,标记为「counterexample found」。否则记录为「candidate counterexample — needs verification」。
所有尝试(成功或失败)均记录在
PROOF_AUDIT.md
中。

Phase 2: Fix Implementation

阶段2:漏洞修复

For each issue, ordered by severity (FATAL → CRITICAL → MAJOR → MINOR):
按严重程度排序(FATAL → CRITICAL → MAJOR → MINOR)处理每个问题:

Step 2a: Choose fix strategy

步骤2a:选择修复策略

For each issue, explicitly choose one of:
  • ADD_DERIVATION: Write missing proof steps
  • STRENGTHEN_ASSUMPTION: Add conditions to theorem statement
  • WEAKEN_CLAIM: Reduce conclusion scope
  • ADD_REFERENCE: Cite known result + verify its conditions apply
Log this choice — it is a scope-changing decision when it alters theorem statements.
针对每个问题,明确选择以下策略之一:
  • ADD_DERIVATION: 补充缺失的证明步骤
  • STRENGTHEN_ASSUMPTION: 为定理添加条件
  • WEAKEN_CLAIM: 缩小结论范围
  • ADD_REFERENCE: 引用已有成果并验证其条件适用
记录该选择——当修改定理表述时,这属于范围变更决策。

Step 2b: Derive the fix mathematically

步骤2b:数学推导修复

  • Complete mathematical derivation, not just a claim
  • If new proposition/lemma needed, write in full theorem-proof style
  • 完成完整的数学推导,而非仅给出论断
  • 若需新增命题/引理,以完整的定理-证明格式撰写

Step 2c: Implement in LaTeX

步骤2c:在LaTeX中实现修复

  • Edit the
    .tex
    file
  • Preserve existing
    \label
    references where possible
  • 编辑
    .tex
    文件
  • 尽可能保留现有
    \label
    引用

Step 2d: Record the fix

步骤2d:记录修复内容

markdown
undefined
markdown
undefined

Fix N: [SHORT TITLE]

修复N: [简短标题]

Issue: [id] [CATEGORY] — [description] Severity: FATAL / CRITICAL / MAJOR / MINOR Status: INVALID / UNJUSTIFIED / UNDERSTATED / OVERSTATED Impact: GLOBAL / LOCAL / COSMETIC Fix strategy: ADD_DERIVATION / STRENGTHEN_ASSUMPTION / WEAKEN_CLAIM / ADD_REFERENCE Location: Section X, Lines Y-Z
BEFORE: [what the proof originally did] WHY WRONG: [mathematical problem, with counterexample if applicable] AFTER: [what the fix does] KEY EQUATION: [central new equation] PROOF OBLIGATIONS ADDED: [new conditions/lemmas introduced] DOWNSTREAM EFFECTS: [which results now need re-checking]
undefined
问题: [id] [类别] — [描述] 严重程度: FATAL / CRITICAL / MAJOR / MINOR 状态: INVALID / UNJUSTIFIED / UNDERSTATED / OVERSTATED 影响: GLOBAL / LOCAL / COSMETIC 修复策略: ADD_DERIVATION / STRENGTHEN_ASSUMPTION / WEAKEN_CLAIM / ADD_REFERENCE 位置: 第X节,第Y-Z行
修复前: [原证明内容] 问题原因: [数学层面的问题,如有反例需说明] 修复后: [修复后的内容] 核心方程: [新增的核心方程] 新增证明义务: [引入的新条件/引理] 下游影响: [哪些结果需要重新检查]
undefined

Step 2e: Compile check

步骤2e:编译检查

bash
pdflatex -interaction=nonstopmode <file>.tex 2>&1 | grep -E "Error|Warning|undefined"
bash
pdflatex -interaction=nonstopmode <file>.tex 2>&1 | grep -E "Error|Warning|undefined"

Phase 3: Re-Review (Codex GPT-5.4 xhigh)

阶段3:重新审核(Codex GPT-5.4 xhigh)

Use
codex-reply
with saved threadId. Include fix summaries. Request the same mandatory checklist.
Check acceptance gate. If not met, repeat Phases 2-3 (up to MAX_REVIEW_ROUNDS).
使用保存的threadId调用
codex-reply
,包含修复摘要,要求执行相同的强制审核清单。
检查验收标准。若未达标,重复阶段2-3(最多MAX_REVIEW_ROUNDS轮)。

Phase 3.5: Global Closure & Independent Verification

阶段3.5:全局收尾与独立验证

Global closure checks

全局收尾检查

After all fixes, verify the proof as a whole:
  • Statement–conclusion match: Does the proof end with EXACTLY what the theorem claims (quantifiers, constants, uniformity)?
  • All obligations discharged: Every node in the obligation DAG is proven or explicitly assumed (and the theorem statement includes it).
  • Case analysis coverage: Cases partition the domain AND include boundary/degenerate cases.
  • Induction correctness (if applicable): Base case, inductive step, correct use of IH, induction measure strictly decreases.
  • WLOG reductions: Each "without loss of generality" spawns a micro-claim proving the reduction is lossless.
  • No silent assumption strengthening: Any fix that strengthened assumptions has propagated to the main theorem statement.
完成所有修复后,整体验证证明:
  • 表述-结论匹配: 证明结尾是否与定理宣称的内容完全一致(量词、常数、一致性)?
  • 所有义务已核验: 义务DAG中的每个节点均已证明或明确假设(且定理表述包含该假设)。
  • 案例分析覆盖: 案例划分了定义域且包含边界/退化情况。
  • 归纳正确性(若适用): 基础案例、归纳步骤、归纳假设的正确使用、归纳测度严格递减。
  • WLOG简化: 每个「不失一般性」均对应一个微论断,证明简化是无损的。
  • 无隐性假设强化: 任何强化假设的修复均已同步更新至主定理表述。

Independent second review for FATAL/CRITICAL fixes

FATAL/CRITICAL级修复的独立二次审核

For any fix that resolved a FATAL or CRITICAL issue, submit the fixed section alone (without showing the previous critique) to a fresh Codex thread:
mcp__codex__codex:
  config: {"model_reasoning_effort": "xhigh"}
  prompt: |
    Blind review of the following proof section. You have NOT seen any prior
    review or discussion. Check every step for correctness, hidden assumptions,
    illegal interchanges, and counterexamples.
    [FIXED SECTION ONLY]
If the blind reviewer finds new issues, re-enter Phase 2.
针对任何解决了FATAL或CRITICAL级问题的修复,将修复后的单独章节(不展示之前的评审内容)提交至全新的Codex线程
mcp__codex__codex:
  config: {"model_reasoning_effort": "xhigh"}
  prompt: |
    Blind review of the following proof section. You have NOT seen any prior
    review or discussion. Check every step for correctness, hidden assumptions,
    illegal interchanges, and counterexamples.
    [FIXED SECTION ONLY]
若盲审发现新问题,重新进入阶段2。

Regression proof-audit

回归审核

After fixes, re-run:
  • DAG acyclicity check (no new cycles introduced)
  • Counterexample suite on all DOWNSTREAM lemmas of modified results
  • Assumption-delta report: what became stronger/weaker due to fixes?
修复完成后,重新执行:
  • DAG无环检查(未引入新循环)
  • 对修改结果的所有下游引理执行反例测试套件
  • 假设变更报告:修复导致哪些假设变强/变弱?

Phase 3.9: Unrecoverable Proof Protocol

阶段3.9:不可修复证明处理流程

If acceptance gate is not met after MAX_REVIEW_ROUNDS, output a Proof Unrecoverable Report:
  1. Minimal set of blocking FATAL/CRITICAL issues that could not be resolved
  2. Salvage options ranked: (a) weaken claim, (b) strengthen assumptions, (c) add missing lemmas, (d) restructure argument
  3. Which parts of the proof are likely still reusable
  4. Recommended next steps for the author
Do NOT silently declare success. The report must be honest.
若经过MAX_REVIEW_ROUNDS轮后仍未达到验收标准,输出证明不可修复报告
  1. 无法解决的阻塞性FATAL/CRITICAL级问题的最小集合
  2. 按优先级排序的挽救方案:(a) 弱化结论,(b) 强化假设,(c) 添加缺失引理,(d) 重构论证
  3. 证明中仍可复用的部分
  4. 给作者的下一步建议
不得默认宣称成功。报告必须如实反映情况。

Phase 4: Audit Report Generation

阶段4:生成审核报告

Generate
proof_audit_report.tex
with:
  1. Overview table: All issues with two-axis severity, category, fix strategy, status
  2. Before/After logic chain: Red (BEFORE) → Green (AFTER) comparison
  3. For each fix: original proof → why wrong → counterexample (if any) → complete derivation → remaining subtleties
  4. Proof-obligation diff: What was unverified before, what is verified now
  5. Summary: Now proven / still assumed / open problems
  6. Colored boxes: BEFORE (red), AFTER (green), WHY WRONG (orange), KEY INSIGHT (blue), WARNING (yellow)
Compile:
pdflatex proof_audit_report.tex && pdflatex proof_audit_report.tex
生成
proof_audit_report.tex
,包含:
  1. 概览表格: 所有问题的双轴严重程度、类别、修复策略、状态
  2. 逻辑链对比: 红色(修复前)→ 绿色(修复后)的对比
  3. 每个修复详情: 原证明 → 问题原因 → 反例(如有)→ 完整推导 → 剩余细节
  4. 证明义务差异: 修复前未核验的内容与当前已核验的内容
  5. 总结: 已证明内容 / 仍假设内容 / 未解决问题
  6. 彩色标注框: 修复前(红)、修复后(绿)、问题原因(橙)、核心洞察(蓝)、警告(黄)
编译:
pdflatex proof_audit_report.tex && pdflatex proof_audit_report.tex

Phase 5: State Persistence

阶段5:状态持久化

Write
PROOF_CHECK_STATE.json
:
json
{
  "status": "completed",
  "rounds": 2,
  "threadId": "...",
  "fatal_fixed": 0,
  "critical_fixed": 3,
  "major_fixed": 2,
  "minor_fixed": 1,
  "counterexamples_found": 1,
  "counterexample_candidates": 2,
  "acceptance_gate": "PASS",
  "timestamp": "..."
}
写入
PROOF_CHECK_STATE.json
json
{
  "status": "completed",
  "rounds": 2,
  "threadId": "...",
  "fatal_fixed": 0,
  "critical_fixed": 3,
  "major_fixed": 2,
  "minor_fixed": 1,
  "counterexamples_found": 1,
  "counterexample_candidates": 2,
  "acceptance_gate": "PASS",
  "timestamp": "..."
}

Key Rules

核心规则

Mathematical rigor

数学严谨性

  • Never accept a proof step on faith. "Clearly" / "it follows" / "by standard arguments" are red flags — each must spawn a micro-claim.
  • Hypothesis discharge: Every time a lemma is APPLIED, verify EACH of its hypotheses at that point. Use the side-condition checklists above.
  • Interchange discipline: Every swap of limit/expectation/derivative/integral must cite a theorem (DCT/MCT/Fubini/Leibniz) and verify its conditions with explicit dominating function or integrability proof.
  • Uniformity discipline: Every O(·)/Θ(·) must declare what parameters it is uniform over. "O(1)" that secretly depends on d,n,K is a CONSTANT_DEPENDENCE_HIDDEN issue.
  • Quantifier discipline: Check ∀/∃ order. "For sufficiently small κ" must specify: does κ₀ depend on K? On π? On d?
  • Counterexample-first: Before trying to fix a gap, first try to break it.
  • WLOG prohibition: Every "without loss of generality" must have an explicit micro-claim proving the reduction. No free WLOGs.
  • No silent assumption strengthening: Any fix that adds conditions must propagate to the theorem statement.
  • 绝不轻信证明步骤。「显然」「由此可得」「标准论证」均为危险信号——每个此类表述都需生成一个微论断。
  • 假设核验: 每次应用引理时,需在应用点验证其每个假设条件。使用上述附带条件检查清单。
  • 推导顺序规范: 每次交换极限/期望/导数/积分顺序时,必须引用定理(DCT/MCT/Fubini/莱布尼茨)并明确验证其条件,如提供控制函数或可积性证明。
  • 一致性规范: 每个O(·)/Θ(·)必须声明其对哪些参数一致。暗中依赖d,n,K的「O(1)」属于CONSTANT_DEPENDENCE_HIDDEN问题。
  • 量词规范: 检查∀/∃顺序。「当κ足够小时」必须明确:κ₀是否依赖于K?是否依赖于π?是否依赖于d?
  • 反例优先: 在尝试修复漏洞前,先尝试构造反例。
  • WLOG限制: 每个「不失一般性」必须有明确的微论断证明简化的无损性。不得随意使用WLOG。
  • 无隐性假设强化: 任何添加条件的修复必须同步更新至定理表述。

Cross-model protocol

跨模型协议

  • Claude analyzes, Codex reviews: Claude reads proof, formulates questions, implements fixes. Codex provides adversarial review.
  • Codex reasoning always xhigh: Never downgrade.
  • Send full content: Don't summarize — send actual math for line-by-line checking.
  • Preserve threadId: Use
    codex-reply
    for follow-up rounds.
  • Claude分析,Codex审核: Claude读取证明、提出问题、实现修复。Codex提供对抗性审核。
  • Codex推理强度始终为xhigh: 不得降低。
  • 发送完整内容: 不得总结——发送实际数学内容进行逐行检查。
  • 保留threadId: 后续轮次使用
    codex-reply

Fix quality

修复质量

  • Minimal fixes: Fix exactly what's broken, nothing more.
  • Full derivation: Every fix includes complete mathematical argument.
  • Explicit scope decisions: Each fix is tagged ADD_DERIVATION / STRENGTHEN_ASSUMPTION / WEAKEN_CLAIM / ADD_REFERENCE.
  • Compile after each fix: LaTeX must compile cleanly.
  • 最小修复: 仅修复问题本身,不做额外修改。
  • 完整推导: 每个修复包含完整的数学论证。
  • 明确范围决策: 每个修复标记为ADD_DERIVATION / STRENGTHEN_ASSUMPTION / WEAKEN_CLAIM / ADD_REFERENCE。
  • 每次修复后编译: LaTeX必须能正常编译。

Scope honesty

范围诚实性

  • Don't overclaim: If a fix makes a result conditional, say so.
  • Separate "proven" from "assumed": The audit report has an explicit section for this.
  • Log open problems: Issues requiring future work are listed, not hidden.
  • 不夸大: 若修复使结果变为条件性成立,需明确说明。
  • 区分「已证明」与「假设」: 审核报告需包含明确的对应章节。
  • 记录未解决问题: 需列出需后续处理的问题,不得隐藏。

Output Files

输出文件

FileContentWhen
PROOF_SKELETON.md
Dependency DAG + assumption ledger + micro-claimsPhase 0.5
PROOF_AUDIT.md
Cumulative round-by-round audit logUpdated each round
PROOF_AUDIT.json
Machine-readable submission verdict (see below)Always emitted
proof_audit_report.tex/.pdf
Formal before/after reportPhase 4
PROOF_CHECK_STATE.json
State for recoveryPhase 5
文件内容生成阶段
PROOF_SKELETON.md
依赖关系DAG + 假设台账 + 微论断清单阶段0.5
PROOF_AUDIT.md
逐轮累积审核日志每轮更新
PROOF_AUDIT.json
机器可读的提交 verdict(见下文)始终生成
proof_audit_report.tex/.pdf
正式的修复前后报告阶段4
PROOF_CHECK_STATE.json
用于恢复的状态文件阶段5

Submission Artifact Emission

提交工件生成

This skill always writes
PROOF_AUDIT.json
at the paper directory root (i.e.
paper/PROOF_AUDIT.json
when invoked from
/paper-writing
with paper-dir
paper/
;
<your-paper-dir>/PROOF_AUDIT.json
when invoked standalone), regardless of caller or whether the paper contains theorems. A paper with no
\begin{theorem}
/
\begin{lemma}
/
\begin{proof}
emits verdict
NOT_APPLICABLE
; silent skip is forbidden.
paper-writing
Phase 6 and
tools/verify_paper_audits.sh
both rely on this artifact existing at
<paper-dir>/PROOF_AUDIT.json
.
The artifact conforms to the schema in
shared-references/assurance-contract.md
:
json
{
  "audit_skill":      "proof-checker",
  "verdict":          "PASS | WARN | FAIL | NOT_APPLICABLE | BLOCKED | ERROR",
  "reason_code":      "all_proofs_complete | minor_gaps | critical_gap | no_theorems | ...",
  "summary":          "One-line human-readable verdict summary.",
  "audited_input_hashes": {
    "main.tex":                 "sha256:...",
    "sections/4.theory.tex":    "sha256:..."
  },
  "trace_path":       ".aris/traces/proof-checker/<date>_run<NN>/",
  "thread_id":        "<codex mcp thread id>",
  "reviewer_model":   "gpt-5.4",
  "reviewer_reasoning": "xhigh",
  "generated_at":     "<UTC ISO-8601>",
  "details": {
    "theorems_audited": <int>,
    "issues": [ { "id": "T1-H3", "severity": "FATAL|CRITICAL|MAJOR|MINOR",
                  "category": "quantifier|domination|...",
                  "location": "sections/4.theory.tex:L182",
                  "note": "..." }, ... ]
  }
}
本工具始终在论文目录根目录写入
PROOF_AUDIT.json
(即通过
/paper-writing
调用且论文目录为
paper/
时,路径为
paper/PROOF_AUDIT.json
;独立调用时,路径为
<your-paper-dir>/PROOF_AUDIT.json
),无论调用者是谁或论文是否包含定理。若论文中无
\begin{theorem}
/
\begin{lemma}
/
\begin{proof}
,则输出verdict为
NOT_APPLICABLE
;禁止静默跳过。
paper-writing
阶段6和
tools/verify_paper_audits.sh
均依赖于该工件存在于
<paper-dir>/PROOF_AUDIT.json
工件符合
shared-references/assurance-contract.md
中的 schema:
json
{
  "audit_skill":      "proof-checker",
  "verdict":          "PASS | WARN | FAIL | NOT_APPLICABLE | BLOCKED | ERROR",
  "reason_code":      "all_proofs_complete | minor_gaps | critical_gap | no_theorems | ...",
  "summary":          "单行人类可读的verdict摘要。",
  "audited_input_hashes": {
    "main.tex":                 "sha256:...",
    "sections/4.theory.tex":    "sha256:..."
  },
  "trace_path":       ".aris/traces/proof-checker/<date>_run<NN>/",
  "thread_id":        "<codex mcp thread id>",
  "reviewer_model":   "gpt-5.4",
  "reviewer_reasoning": "xhigh",
  "generated_at":     "<UTC ISO-8601>",
  "details": {
    "theorems_audited": <int>,
    "issues": [ { "id": "T1-H3", "severity": "FATAL|CRITICAL|MAJOR|MINOR",
                  "category": "quantifier|domination|...",
                  "location": "sections/4.theory.tex:L182",
                  "note": "..." }, ... ]
  }
}

audited_input_hashes
scope

audited_input_hashes
范围

Hash the declared input set actually reviewed — the theorem-bearing
.tex
files passed into this invocation — not a repo-wide union and not the reviewer's self-reported opened subset. The external verifier rehashes these entries; any mismatch flags
STALE
.
Path convention (must match
tools/verify_paper_audits.sh
): keys are paths relative to the paper directory (no
paper/
prefix — the verifier resolves relative to the paper dir; prefixing produces
paper/paper/...
and false-fails as STALE). Use absolute paths for files outside the paper dir.
实际审核的声明输入集进行哈希——即本次调用传入的包含定理的
.tex
文件——而非整个仓库的文件集合或审核者自行打开的子集。外部验证者会重新哈希这些条目;任何不匹配都会标记为
STALE
路径约定(必须与
tools/verify_paper_audits.sh
一致):键为相对于论文目录的路径(无
paper/
前缀——验证者会相对于论文目录解析;添加前缀会产生
paper/paper/...
路径并导致误判为STALE)。论文目录外的文件使用绝对路径

Verdict decision table

Verdict决策表

Input stateVerdict
reason_code
example
No theorems / lemmas / proofs in paper
NOT_APPLICABLE
no_theorems
Theorems present but referenced files unreadable
BLOCKED
source_unreadable
All proof obligations discharged, no gaps
PASS
all_proofs_complete
Only MINOR issues (notation / exposition)
WARN
minor_gaps
Any FATAL or CRITICAL issue (logic gap, wrong claim)
FAIL
critical_gap
Reviewer invocation failed (network / malformed)
ERROR
reviewer_error
MAJOR issues alone map to
WARN
or
FAIL
at the reviewer's discretion and must carry an explicit justification in
summary
+
details.issues
.
输入状态Verdict
reason_code
示例
论文中无定理/引理/证明
NOT_APPLICABLE
no_theorems
存在定理但引用文件无法读取
BLOCKED
source_unreadable
所有证明义务已核验,无漏洞
PASS
all_proofs_complete
仅存在MINOR级问题(符号/表述)
WARN
minor_gaps
存在任何FATAL或CRITICAL级问题(逻辑漏洞、错误论断)
FAIL
critical_gap
审核者调用失败(网络/格式错误)
ERROR
reviewer_error
仅存在MAJOR级问题时,由审核者决定映射为
WARN
FAIL
,且必须在
summary
+
details.issues
中提供明确理由。

Thread independence

线程独立性

Every invocation uses a fresh
mcp__codex__codex
thread. Never
codex-reply
across proof-checker runs. Do not accept prior audit outputs (PAPER_CLAIM_AUDIT, CITATION_AUDIT, EXPERIMENT_LOG) as input — the fresh thread preserves reviewer independence per
shared-references/reviewer-independence.md
.
This skill never blocks by itself;
paper-writing
Phase 6 plus the verifier decide whether the verdict blocks finalization based on the
assurance
level.
每次调用使用全新的
mcp__codex__codex
线程。不得跨proof-checker运行调用
codex-reply
。不得接受之前的审核输出(PAPER_CLAIM_AUDIT、CITATION_AUDIT、EXPERIMENT_LOG)作为输入——全新线程确保审核者独立性,符合
shared-references/reviewer-independence.md
本工具自身不会阻塞流程;
paper-writing
阶段6和验证者会根据
assurance
级别决定verdict是否阻塞最终定稿。

Example Invocations

调用示例

/proof-checker "neurips_2025.tex"
/proof-checker "check the GMM generalization proof, focus on dimension dependence"
/proof-checker "verify proof in paper.tex — difficulty: nightmare"
/proof-checker "neurips_2025.tex"
/proof-checker "check the GMM generalization proof, focus on dimension dependence"
/proof-checker "verify proof in paper.tex — difficulty: nightmare"