proof-checker
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseProof Checker: Rigorous Mathematical Verification & Fixing
证明检查器:严谨的数学验证与修正
Systematically verify a mathematical proof via cross-model adversarial review, fix identified gaps, re-review until convergence, and generate a detailed audit report with proof-obligation accounting.
通过跨模型对抗性审核系统地验证数学证明,修复识别出的漏洞,重复审核直至收敛,并生成包含证明义务核算的详细审核报告。
Context: $ARGUMENTS
上下文:$ARGUMENTS
Constants
常量
- MAX_REVIEW_ROUNDS = 3
- REVIEWER_MODEL = via Codex MCP, reasoning effort always
gpt-5.4xhigh - REVIEWER_BACKEND = — Default: Codex MCP (xhigh). Override with
codexfor GPT-5.4 Pro via Oracle MCP. See— reviewer: oracle-pro.shared-references/reviewer-routing.md - AUDIT_DOC: at the paper directory root, alongside
PROOF_AUDIT.md(cumulative log; when invoked viamain.tex, this is/paper-writing)paper/PROOF_AUDIT.md - REPORT_TEX: (formal before/after PDF)
proof_audit_report.tex - STATE_FILE: (for recovery)
PROOF_CHECK_STATE.json - SKELETON_DOC: (micro-claim inventory)
PROOF_SKELETON.md
- MAX_REVIEW_ROUNDS = 3
- REVIEWER_MODEL = via Codex MCP,推理强度始终设为
gpt-5.4xhigh - REVIEWER_BACKEND = — 默认:Codex MCP(xhigh)。可通过
codex覆盖为Oracle MCP提供的GPT-5.4 Pro。详见— reviewer: oracle-pro。shared-references/reviewer-routing.md - AUDIT_DOC: 论文目录根目录下的,与
PROOF_AUDIT.md同级(累积日志;通过main.tex调用时,路径为/paper-writing)paper/PROOF_AUDIT.md - REPORT_TEX: (正式的修正前后PDF文件源)
proof_audit_report.tex - STATE_FILE: (用于恢复检查状态)
PROOF_CHECK_STATE.json - SKELETON_DOC: (微论断清单)
PROOF_SKELETON.md
Acceptance Gate (objective, replaces subjective scoring)
验收标准(客观标准,替代主观评分)
The proof passes when ALL of the following hold:
- Zero open FATAL or CRITICAL issues
- Every theorem/lemma has: (i) explicit hypotheses, (ii) proof with all interchanges justified, (iii) every application discharges hypotheses in the ledger
- All big-O/Θ/o statements have declared parameter dependence and uniformity scope
- Counterexample pass executed on all key lemmas (log candidates even if none found)
当且仅当满足以下所有条件时,证明通过审核:
- 无未解决的FATAL或CRITICAL级问题
- 每个定理/引理均满足:(i) 明确的假设条件,(ii) 所有推导步骤均有依据的证明过程,(iii) 每一次定理应用都在台账中完成假设条件的核验
- 所有大O/Θ/o符号声明均明确参数依赖关系和一致性范围
- 所有关键引理均完成反例测试(即使未找到反例,也需记录候选测试案例)
Issue Taxonomy (20 categories, 4 groups)
问题分类体系(20个类别,4个组别)
Group A: Logic & Proof Structure
组别A:逻辑与证明结构
| Category | Description | Example |
|---|---|---|
| UNJUSTIFIED_ASSERTION | Claim stated without proof or reference | "The Hessian splits into Gram blocks" |
| UNPROVEN_SUBCLAIM | "Clearly" / "it follows" hides a nontrivial lemma | "By symmetry, the cross-terms vanish" without checking |
| QUANTIFIER_ERROR | Wrong order ∀/∃, missing "for sufficiently small κ" | "For all π, there exists ε" vs "there exists ε for all π" |
| IMPLICATION_REVERSAL | Uses (A⇒B) as (B⇒A), or claims equivalence with only one direction | |
| CASE_INCOMPLETE | Misses boundary/degenerate cases | Singular covariance, zero weight, non-unique argmin |
| CIRCULAR_DEPENDENCY | Lemma uses theorem that depends on it | |
| LOGICAL_GAP | A step is not justified by what precedes it | B=Θ(1) → β_K=0 without analyzing W |
| 类别 | 描述 | 示例 |
|---|---|---|
| UNJUSTIFIED_ASSERTION | 提出无证明或无引用支撑的论断 | "The Hessian splits into Gram blocks" |
| UNPROVEN_SUBCLAIM | 使用「显然」「由此可得」等表述隐藏非平凡引理 | "By symmetry, the cross-terms vanish" 未做验证 |
| QUANTIFIER_ERROR | 量词∀/∃顺序错误,遗漏「当κ足够小时」等限定 | "For all π, there exists ε" 与 "there exists ε for all π" 混淆 |
| IMPLICATION_REVERSAL | 将(A⇒B)当作(B⇒A)使用,或仅证明单方向就宣称等价关系 | |
| CASE_INCOMPLETE | 遗漏边界/退化情况 | 奇异协方差、零权重、非唯一 argmin |
| CIRCULAR_DEPENDENCY | 引理使用了依赖于该引理的定理 | |
| LOGICAL_GAP | 某一步骤无法由前置内容推导得出 | B=Θ(1) → β_K=0 未分析W的影响 |
Group B: Analysis & Measure Theory
组别B:分析学与测度论
| Category | Description | Example |
|---|---|---|
| ILLEGAL_INTERCHANGE | Swaps limit/expectation/derivative/integral without DCT/MCT/Fubini | Differentiating under E without domination |
| NONUNIFORM_CONVERGENCE | Pointwise convergence used as uniform | sup and limit swapped |
| MISSING_DOMINATION | DCT cited but no dominating function given | |
| INTEGRABILITY_GAP | Uses E | X |
| REGULARITY_GAP | Differentiability/Lipschitz/convexity used but not established | |
| STOCHASTIC_MODE_CONFUSION | Mixes a.s./in prob./in L²/in expectation |
| 类别 | 描述 | 示例 |
|---|---|---|
| ILLEGAL_INTERCHANGE | 未依据DCT/MCT/Fubini定理就交换极限/期望/导数/积分顺序 | 未满足控制条件就对E求导 |
| NONUNIFORM_CONVERGENCE | 将逐点收敛当作一致收敛使用 | 交换上确界与极限顺序 |
| MISSING_DOMINATION | 引用DCT定理但未给出控制函数 | |
| INTEGRABILITY_GAP | 使用E | X |
| REGULARITY_GAP | 使用可微性/Lipschitz性/凸性但未证明其成立 | |
| STOCHASTIC_MODE_CONFUSION | 混淆几乎必然/依概率/依L²/依期望等收敛模式 |
Group C: Model & Parameter Tracking
组别C:模型与参数追踪
| Category | Description | Example |
|---|---|---|
| MISSING_DERIVATION | A quantity is used but never derived from the model | Risk functional with undefined B, W |
| HIDDEN_ASSUMPTION | Proof silently uses a condition not in the theorem | Gaussianity assumed but not stated |
| INSUFFICIENT_ASSUMPTION | Hypotheses too weak for proof (counterexample exists) | Moment conditions admitting 2-point distributions |
| DIMENSION_TRACKING | Parameter dependence (d, n, K, ...) not explicit | d enters only through κ |
| NORMALIZATION_MISMATCH | Coordinate/scaling conventions inconsistent | Rescaled vs raw coordinates |
| CONSTANT_DEPENDENCE_HIDDEN | "C" depends on d,n,K but treated as universal |
| 类别 | 描述 | 示例 |
|---|---|---|
| MISSING_DERIVATION | 使用某一量但从未基于模型推导得出 | 使用未定义B、W的风险函数 |
| HIDDEN_ASSUMPTION | 证明中默认使用定理声明未提及的条件 | 假设高斯性但未在定理中声明 |
| INSUFFICIENT_ASSUMPTION | 假设条件过于薄弱,无法支撑证明(存在反例) | 矩条件允许两点分布 |
| DIMENSION_TRACKING | 参数依赖关系(d, n, K等)未明确 | d仅通过κ引入但未说明 |
| NORMALIZATION_MISMATCH | 坐标/缩放约定不一致 | 重缩放坐标与原始坐标混用 |
| CONSTANT_DEPENDENCE_HIDDEN | "C"依赖于d,n,K但被当作通用常数处理 |
Group D: Scope & Claims
组别D:范围与论断
| Category | Description | Example |
|---|---|---|
| SCOPE_OVERCLAIM | Conclusion stated more broadly than proof supports | "β_K=0" with only generic overlap |
| REFERENCE_MISMATCH | Cited theorem's hypotheses not verified at point of use |
| 类别 | 描述 | 示例 |
|---|---|---|
| SCOPE_OVERCLAIM | 结论表述范围超出证明可支撑的范围 | "β_K=0" 仅在通用重叠情况下成立却未限定 |
| REFERENCE_MISMATCH | 引用的定理在应用时未验证其假设条件 |
Two-Axis Severity System
双轴严重程度体系
Axis A — Proof Status (what is wrong)
轴A — 证明状态(问题类型)
| Status | Meaning |
|---|---|
| INVALID | Statement false as written (counterexample exists or contradiction) |
| UNJUSTIFIED | Could be true, but current proof does not establish it |
| UNDERSTATED | True only after strengthening assumptions |
| OVERSTATED | True only after weakening conclusion / adding qualifiers |
| UNCLEAR | Ambiguous notation / definition drift (not wrong per se) |
| 状态 | 含义 |
|---|---|
| INVALID | 所述内容本身错误(存在反例或矛盾) |
| UNJUSTIFIED | 内容可能为真,但当前证明无法支撑 |
| UNDERSTATED | 仅在强化假设条件后才成立 |
| OVERSTATED | 仅在弱化结论/添加限定后才成立 |
| UNCLEAR | 符号模糊/定义漂移(本身并非错误) |
Axis B — Impact (how much breaks)
轴B — 影响程度(破坏范围)
| Impact | Meaning |
|---|---|
| GLOBAL | Breaks main theorem or core dependency chain |
| LOCAL | Affects a side result but not the main theorem |
| COSMETIC | Exposition only |
| 影响 | 含义 |
|---|---|
| GLOBAL | 破坏主定理或核心依赖链 |
| LOCAL | 影响次要结果但不影响主定理 |
| COSMETIC | 仅影响表述方式 |
Severity Labels (derived)
严重程度标签(推导得出)
| Label | Definition |
|---|---|
| FATAL | INVALID + GLOBAL |
| CRITICAL | (INVALID + LOCAL) or (UNJUSTIFIED + GLOBAL) |
| MAJOR | (UNJUSTIFIED + LOCAL) or (UNDERSTATED/OVERSTATED + GLOBAL) |
| MINOR | Clarity / notation / dimension bookkeeping that doesn't change claims |
| 标签 | 定义 |
|---|---|
| FATAL | INVALID + GLOBAL |
| CRITICAL | (INVALID + LOCAL) 或 (UNJUSTIFIED + GLOBAL) |
| MAJOR | (UNJUSTIFIED + LOCAL) 或 (UNDERSTATED/OVERSTATED + GLOBAL) |
| MINOR | 不改变论断的表述清晰度/符号/维度记录问题 |
Side-Condition Checklists for Common Theorems
常见定理的附带条件检查清单
When the proof invokes any of the following, require explicit verification of ALL listed conditions:
| Theorem | Required Conditions |
|---|---|
| DCT (Dominated Convergence) | Pointwise a.e. convergence + integrable dominating function |
| MCT (Monotone Convergence) | Monotone increasing + non-negative |
| Fubini/Tonelli | Product measurability + integrability (Fubini) or non-negative (Tonelli) |
| Leibniz integral rule | Continuity of integrand + dominating function for derivative |
| Implicit Function Theorem | Continuous differentiability + non-singular Jacobian |
| Taylor with remainder | Sufficient differentiability + remainder form (Lagrange/integral) |
| Jensen's inequality | Convexity of function + integrability |
| Cauchy-Schwarz | Correct inner product space + integrability of both factors |
| Weyl/Davis-Kahan | Symmetry/Hermiticity + perturbation bound conditions |
| Analytic continuation | Domain connectivity + identity theorem conditions |
| WLOG reduction | Invariance under claimed symmetry + reduction is reversible |
当证明中引用以下任一定理时,需明确验证所有列出的条件:
| 定理 | 所需条件 |
|---|---|
| DCT(控制收敛定理) | 几乎处处逐点收敛 + 可积控制函数 |
| MCT(单调收敛定理) | 单调递增 + 非负 |
| Fubini/Tonelli | 乘积可测性 + 可积性(Fubini)或非负(Tonelli) |
| 莱布尼茨积分法则 | 被积函数连续 + 导数存在控制函数 |
| 隐函数定理 | 连续可微 + 雅可比矩阵非奇异 |
| 带余项的泰勒定理 | 足够阶数的可微性 + 余项形式(拉格朗日/积分型) |
| Jensen不等式 | 函数凸性 + 可积性 |
| 柯西-施瓦茨不等式 | 正确的内积空间 + 两个因子均可积 |
| Weyl/Davis-Kahan | 对称性/埃尔米特性 + 扰动界条件 |
| 解析延拓 | 定义域连通性 + 恒等定理条件 |
| WLOG简化 | 满足宣称的对称性不变性 + 简化可逆 |
Workflow
工作流
Phase 0: Preparation
阶段0:准备工作
- Locate the proof: Find the main file(s).
.tex - Read the entire proof: Extract list of all theorems/lemmas/propositions/corollaries/definitions/assumptions.
- Read reference materials: Reference papers, prior results.
- Build a section map: Structured list with line numbers and key claims.
- Identify the main theorem: Central result, assumptions, claims.
- 定位证明: 找到主文件。
.tex - 通读证明: 提取所有定理/引理/命题/推论/定义/假设的列表。
- 阅读参考资料: 参考论文、已有成果。
- 构建章节映射: 包含行号和关键论断的结构化列表。
- 识别主定理: 核心结论、假设条件、论断内容。
Phase 0.5: Proof-Obligation Ledger
阶段0.5:证明义务台账
Build formal accounting artifacts. Save to :
PROOF_SKELETON.md构建正式的核算工件,保存至:
PROOF_SKELETON.md1. Dependency DAG
1. 依赖关系DAG
Nodes = Definitions / Assumptions / Lemmas / Theorems. Edges = "uses". Detect cycles (including semantic circularity where Lemma A uses a corollary that quietly depends on A).
节点 = 定义 / 假设 / 引理 / 定理。边 = "使用"。检测循环(包括引理A使用了间接依赖于A的推论这类语义循环)。
2. Assumption Ledger
2. 假设台账
For each theorem/lemma, list every hypothesis with WHERE each is verified (or mark "UNVERIFIED"). Track usage-minimal assumption sets — which assumptions were actually used vs merely stated.
针对每个定理/引理,列出所有假设条件及每个条件的核验位置(或标记为「UNVERIFIED」)。追踪最小必要假设集——实际使用的假设与仅声明的假设的差异。
3. Typed Symbol Table
3. 类型化符号表
Each symbol must have a type signature:
κ : scalar ∈ (0,1), depends on (d, α_t, Σ, μ)
u* : vector ∈ ℝ^d, u* = C^{-1}m
B^even : matrix ∈ ℝ^{(L+1)×(L+1)}, symmetric PSD
Ψ_v : function ℝ → ℝ, analytic in (ζ,κ), parity determined by vFlag any symbol whose meaning changes or whose type is inconsistent across uses.
每个符号必须有类型签名:
κ : scalar ∈ (0,1), depends on (d, α_t, Σ, μ)
u* : vector ∈ ℝ^d, u* = C^{-1}m
B^even : matrix ∈ ℝ^{(L+1)×(L+1)}, symmetric PSD
Ψ_v : function ℝ → ℝ, analytic in (ζ,κ), parity determined by v标记任何含义变化或类型使用不一致的符号。
4. Canonical Quantified Statements
4. 规范化量化表述
For each theorem/lemma, rewrite the statement with explicit quantifiers, domains, and limit order:
∀K ≥ 3, ∀π ∈ Π_K^{ms,∘} \ E_K, ∃κ_0 > 0 such that ∀κ ∈ (0, κ_0):
h_act^{(K,π)} = Θ(κ^{α_K^act}) [uniform in π on compact subsets]If you cannot restate a theorem this precisely, mark it UNCLEAR — needs disambiguation.
针对每个定理/引理,用明确的量词、定义域和极限顺序重写表述:
∀K ≥ 3, ∀π ∈ Π_K^{ms,∘} \ E_K, ∃κ_0 > 0 such that ∀κ ∈ (0, κ_0):
h_act^{(K,π)} = Θ(κ^{α_K^act}) [uniform in π on compact subsets]若无法精确重述定理,标记为UNCLEAR — 需要歧义消除。
5. Micro-Claim Inventory
5. 微论断清单
Every nontrivial step becomes a numbered micro-claim in sequent form:
MC-17: Context: [Lemma 3.1, κ < κ_0, Z_κ has bounded moments up to order 2m+2]
⊢ Goal: P̂_0 is positive definite
Rule: monomials linearly independent on support of continuous distribution
Side-conditions: positive density near origin ✓ (by GMM weak convergence)Each micro-claim has: justification rule name + required conditions + where conditions are proven.
每个非平凡步骤均转换为编号的相继式形式微论断:
MC-17: Context: [Lemma 3.1, κ < κ_0, Z_κ has bounded moments up to order 2m+2]
⊢ Goal: P̂_0 is positive definite
Rule: monomials linearly independent on support of continuous distribution
Side-conditions: positive density near origin ✓ (by GMM weak convergence)每个微论断包含:依据的规则名称 + 所需条件 + 条件的证明位置。
6. Limit-Order Map
6. 极限顺序映射
Track every asymptotic statement's limit order and uniformity scope:
h_act = Θ(κ^α) [as κ→0, uniform in π on compact subsets of Π_K, for fixed K]
τ_act ~ (b/a)n [as n→∞, for fixed κ,K,π with x_K ≪ 1]Flag any statement where limit order is ambiguous or uniformity is unclear.
追踪每个渐近表述的极限顺序和一致性范围:
h_act = Θ(κ^α) [as κ→0, uniform in π on compact subsets of Π_K, for fixed K]
τ_act ~ (b/a)n [as n→∞, for fixed κ,K,π with x_K ≪ 1]标记任何极限顺序模糊或一致性范围不明确的表述。
Phase 1: First Review (Codex GPT-5.4 xhigh)
阶段1:首次审核(Codex GPT-5.4 xhigh)
Submit the complete proof content with the following mandatory reviewer checklist in the prompt:
mcp__codex__codex:
config: {"model_reasoning_effort": "xhigh"}
prompt: |
You are performing a rigorous mathematical proof review. For EVERY theorem,
lemma, and proposition, check ALL of the following:
## MANDATORY CHECKS
A. DEFINITIONS: List any symbol whose meaning is ambiguous or changes.
B. HYPOTHESIS DISCHARGE: For each lemma/theorem APPLICATION (not statement),
list each hypothesis and whether it was verified, with location.
C. INEQUALITY AUDIT: For each inequality chain, verify direction, missing
absolute values, missing conditions (convexity, PSD, integrability).
D. INTERCHANGE AUDIT: Flag every limit/derivative/expectation/integral
interchange. State which theorem justifies it (DCT/MCT/Fubini/Leibniz)
and which conditions are verified/missing.
E. PROBABILITY MODE: Track whether claims are a.s./in prob./in expectation/
w.h.p. Ensure transitions are justified.
F. UNIFORMITY & CONSTANTS: For every O(·), o(·), Θ(·), ≲, state whether
it is uniform over all parameters. List hidden parameter dependence.
G. EDGE/DEGENERATE CASES: Attempt to break each key lemma with a 1D,
low-rank, or extreme-parameter construction.
H. DEPENDENCY CONSISTENCY: Detect cycles or forward references to unproven
results.
## OUTPUT FORMAT (per issue)
For each issue found, provide:
- id: sequential number
- status: INVALID / UNJUSTIFIED / UNDERSTATED / OVERSTATED / UNCLEAR
- impact: GLOBAL / LOCAL / COSMETIC
- category: [from taxonomy]
- location: section/equation/line
- statement: what the proof claims
- why_invalid: why this is wrong or unjustified
- counterexample: YES (describe) / NO / CANDIDATE (describe attempt)
- affects: which downstream results break if this is wrong
- minimal_fix: how to fix it
[FULL PROOF CONTENT HERE]Save the threadId. Parse into structured issue list. Write to .
PROOF_AUDIT.md提交完整的证明内容,并在提示中包含以下强制审核清单:
mcp__codex__codex:
config: {"model_reasoning_effort": "xhigh"}
prompt: |
You are performing a rigorous mathematical proof review. For EVERY theorem,
lemma, and proposition, check ALL of the following:
## MANDATORY CHECKS
A. DEFINITIONS: List any symbol whose meaning is ambiguous or changes.
B. HYPOTHESIS DISCHARGE: For each lemma/theorem APPLICATION (not statement),
list each hypothesis and whether it was verified, with location.
C. INEQUALITY AUDIT: For each inequality chain, verify direction, missing
absolute values, missing conditions (convexity, PSD, integrability).
D. INTERCHANGE AUDIT: Flag every limit/derivative/expectation/integral
interchange. State which theorem justifies it (DCT/MCT/Fubini/Leibniz)
and which conditions are verified/missing.
E. PROBABILITY MODE: Track whether claims are a.s./in prob./in expectation/
w.h.p. Ensure transitions are justified.
F. UNIFORMITY & CONSTANTS: For every O(·), o(·), Θ(·), ≲, state whether
it is uniform over all parameters. List hidden parameter dependence.
G. EDGE/DEGENERATE CASES: Attempt to break each key lemma with a 1D,
low-rank, or extreme-parameter construction.
H. DEPENDENCY CONSISTENCY: Detect cycles or forward references to unproven
results.
## OUTPUT FORMAT (per issue)
For each issue found, provide:
- id: sequential number
- status: INVALID / UNJUSTIFIED / UNDERSTATED / OVERSTATED / UNCLEAR
- impact: GLOBAL / LOCAL / COSMETIC
- category: [from taxonomy]
- location: section/equation/line
- statement: what the proof claims
- why_invalid: why this is wrong or unjustified
- counterexample: YES (describe) / NO / CANDIDATE (describe attempt)
- affects: which downstream results break if this is wrong
- minimal_fix: how to fix it
[FULL PROOF CONTENT HERE]保存threadId。将结果解析为结构化问题列表,写入。
PROOF_AUDIT.mdPhase 1.5: Counterexample Red Team
阶段1.5:反例攻坚
For each CRITICAL or MAJOR issue, and for every key lemma that introduces:
- a new inequality bound
- an identifiability/uniqueness claim
- a curvature/PSD/strong convexity assertion
- a uniform-in-parameter claim
- a convergence mode upgrade (pointwise → uniform, in prob → w.h.p.)
Systematically attempt to construct counterexamples using:
| Strategy | Description |
|---|---|
| Dimensional collapse | Set d=1 or 2, K=2, n small |
| Degeneracy | Singular covariance, tiny weight, overlapping means, identical components |
| Extremal distributions | Two-point ±a, bounded non-subGaussian, heavy tails |
| Adversarial parameter scaling | Pick parameters making neglected terms dominate |
| Numeric falsification | Translate lemma to a function, brute-force optimize over small domain |
Rule: Label "counterexample found" ONLY if algebraically verified. Otherwise log as "candidate counterexample — needs verification."
Record all attempts (successful or not) in .
PROOF_AUDIT.md针对每个CRITICAL或MAJOR级问题,以及每个引入以下内容的关键引理:
- 新的不等式界
- 可识别性/唯一性论断
- 曲率/PSD/强凸性断言
- 参数一致型论断
- 收敛模式升级(逐点→一致,依概率→大概率)
系统性尝试构造反例,策略如下:
| 策略 | 描述 |
|---|---|
| 维度坍缩 | 设置d=1或2,K=2,n取小值 |
| 退化构造 | 奇异协方差、极小权重、重叠均值、相同分量 |
| 极端分布 | 两点分布±a、有界非次高斯分布、重尾分布 |
| 对抗性参数缩放 | 选择参数使被忽略项占主导 |
| 数值证伪 | 将引理转换为函数,在小范围内暴力优化测试 |
规则:仅当代数验证通过时,标记为「counterexample found」。否则记录为「candidate counterexample — needs verification」。
所有尝试(成功或失败)均记录在中。
PROOF_AUDIT.mdPhase 2: Fix Implementation
阶段2:漏洞修复
For each issue, ordered by severity (FATAL → CRITICAL → MAJOR → MINOR):
按严重程度排序(FATAL → CRITICAL → MAJOR → MINOR)处理每个问题:
Step 2a: Choose fix strategy
步骤2a:选择修复策略
For each issue, explicitly choose one of:
- ADD_DERIVATION: Write missing proof steps
- STRENGTHEN_ASSUMPTION: Add conditions to theorem statement
- WEAKEN_CLAIM: Reduce conclusion scope
- ADD_REFERENCE: Cite known result + verify its conditions apply
Log this choice — it is a scope-changing decision when it alters theorem statements.
针对每个问题,明确选择以下策略之一:
- ADD_DERIVATION: 补充缺失的证明步骤
- STRENGTHEN_ASSUMPTION: 为定理添加条件
- WEAKEN_CLAIM: 缩小结论范围
- ADD_REFERENCE: 引用已有成果并验证其条件适用
记录该选择——当修改定理表述时,这属于范围变更决策。
Step 2b: Derive the fix mathematically
步骤2b:数学推导修复
- Complete mathematical derivation, not just a claim
- If new proposition/lemma needed, write in full theorem-proof style
- 完成完整的数学推导,而非仅给出论断
- 若需新增命题/引理,以完整的定理-证明格式撰写
Step 2c: Implement in LaTeX
步骤2c:在LaTeX中实现修复
- Edit the file
.tex - Preserve existing references where possible
\label
- 编辑文件
.tex - 尽可能保留现有引用
\label
Step 2d: Record the fix
步骤2d:记录修复内容
markdown
undefinedmarkdown
undefinedFix N: [SHORT TITLE]
修复N: [简短标题]
Issue: [id] [CATEGORY] — [description]
Severity: FATAL / CRITICAL / MAJOR / MINOR
Status: INVALID / UNJUSTIFIED / UNDERSTATED / OVERSTATED
Impact: GLOBAL / LOCAL / COSMETIC
Fix strategy: ADD_DERIVATION / STRENGTHEN_ASSUMPTION / WEAKEN_CLAIM / ADD_REFERENCE
Location: Section X, Lines Y-Z
BEFORE: [what the proof originally did]
WHY WRONG: [mathematical problem, with counterexample if applicable]
AFTER: [what the fix does]
KEY EQUATION: [central new equation]
PROOF OBLIGATIONS ADDED: [new conditions/lemmas introduced]
DOWNSTREAM EFFECTS: [which results now need re-checking]
undefined问题: [id] [类别] — [描述]
严重程度: FATAL / CRITICAL / MAJOR / MINOR
状态: INVALID / UNJUSTIFIED / UNDERSTATED / OVERSTATED
影响: GLOBAL / LOCAL / COSMETIC
修复策略: ADD_DERIVATION / STRENGTHEN_ASSUMPTION / WEAKEN_CLAIM / ADD_REFERENCE
位置: 第X节,第Y-Z行
修复前: [原证明内容]
问题原因: [数学层面的问题,如有反例需说明]
修复后: [修复后的内容]
核心方程: [新增的核心方程]
新增证明义务: [引入的新条件/引理]
下游影响: [哪些结果需要重新检查]
undefinedStep 2e: Compile check
步骤2e:编译检查
bash
pdflatex -interaction=nonstopmode <file>.tex 2>&1 | grep -E "Error|Warning|undefined"bash
pdflatex -interaction=nonstopmode <file>.tex 2>&1 | grep -E "Error|Warning|undefined"Phase 3: Re-Review (Codex GPT-5.4 xhigh)
阶段3:重新审核(Codex GPT-5.4 xhigh)
Use with saved threadId. Include fix summaries. Request the same mandatory checklist.
codex-replyCheck acceptance gate. If not met, repeat Phases 2-3 (up to MAX_REVIEW_ROUNDS).
使用保存的threadId调用,包含修复摘要,要求执行相同的强制审核清单。
codex-reply检查验收标准。若未达标,重复阶段2-3(最多MAX_REVIEW_ROUNDS轮)。
Phase 3.5: Global Closure & Independent Verification
阶段3.5:全局收尾与独立验证
Global closure checks
全局收尾检查
After all fixes, verify the proof as a whole:
- Statement–conclusion match: Does the proof end with EXACTLY what the theorem claims (quantifiers, constants, uniformity)?
- All obligations discharged: Every node in the obligation DAG is proven or explicitly assumed (and the theorem statement includes it).
- Case analysis coverage: Cases partition the domain AND include boundary/degenerate cases.
- Induction correctness (if applicable): Base case, inductive step, correct use of IH, induction measure strictly decreases.
- WLOG reductions: Each "without loss of generality" spawns a micro-claim proving the reduction is lossless.
- No silent assumption strengthening: Any fix that strengthened assumptions has propagated to the main theorem statement.
完成所有修复后,整体验证证明:
- 表述-结论匹配: 证明结尾是否与定理宣称的内容完全一致(量词、常数、一致性)?
- 所有义务已核验: 义务DAG中的每个节点均已证明或明确假设(且定理表述包含该假设)。
- 案例分析覆盖: 案例划分了定义域且包含边界/退化情况。
- 归纳正确性(若适用): 基础案例、归纳步骤、归纳假设的正确使用、归纳测度严格递减。
- WLOG简化: 每个「不失一般性」均对应一个微论断,证明简化是无损的。
- 无隐性假设强化: 任何强化假设的修复均已同步更新至主定理表述。
Independent second review for FATAL/CRITICAL fixes
FATAL/CRITICAL级修复的独立二次审核
For any fix that resolved a FATAL or CRITICAL issue, submit the fixed section alone (without showing the previous critique) to a fresh Codex thread:
mcp__codex__codex:
config: {"model_reasoning_effort": "xhigh"}
prompt: |
Blind review of the following proof section. You have NOT seen any prior
review or discussion. Check every step for correctness, hidden assumptions,
illegal interchanges, and counterexamples.
[FIXED SECTION ONLY]If the blind reviewer finds new issues, re-enter Phase 2.
针对任何解决了FATAL或CRITICAL级问题的修复,将修复后的单独章节(不展示之前的评审内容)提交至全新的Codex线程:
mcp__codex__codex:
config: {"model_reasoning_effort": "xhigh"}
prompt: |
Blind review of the following proof section. You have NOT seen any prior
review or discussion. Check every step for correctness, hidden assumptions,
illegal interchanges, and counterexamples.
[FIXED SECTION ONLY]若盲审发现新问题,重新进入阶段2。
Regression proof-audit
回归审核
After fixes, re-run:
- DAG acyclicity check (no new cycles introduced)
- Counterexample suite on all DOWNSTREAM lemmas of modified results
- Assumption-delta report: what became stronger/weaker due to fixes?
修复完成后,重新执行:
- DAG无环检查(未引入新循环)
- 对修改结果的所有下游引理执行反例测试套件
- 假设变更报告:修复导致哪些假设变强/变弱?
Phase 3.9: Unrecoverable Proof Protocol
阶段3.9:不可修复证明处理流程
If acceptance gate is not met after MAX_REVIEW_ROUNDS, output a Proof Unrecoverable Report:
- Minimal set of blocking FATAL/CRITICAL issues that could not be resolved
- Salvage options ranked: (a) weaken claim, (b) strengthen assumptions, (c) add missing lemmas, (d) restructure argument
- Which parts of the proof are likely still reusable
- Recommended next steps for the author
Do NOT silently declare success. The report must be honest.
若经过MAX_REVIEW_ROUNDS轮后仍未达到验收标准,输出证明不可修复报告:
- 无法解决的阻塞性FATAL/CRITICAL级问题的最小集合
- 按优先级排序的挽救方案:(a) 弱化结论,(b) 强化假设,(c) 添加缺失引理,(d) 重构论证
- 证明中仍可复用的部分
- 给作者的下一步建议
不得默认宣称成功。报告必须如实反映情况。
Phase 4: Audit Report Generation
阶段4:生成审核报告
Generate with:
proof_audit_report.tex- Overview table: All issues with two-axis severity, category, fix strategy, status
- Before/After logic chain: Red (BEFORE) → Green (AFTER) comparison
- For each fix: original proof → why wrong → counterexample (if any) → complete derivation → remaining subtleties
- Proof-obligation diff: What was unverified before, what is verified now
- Summary: Now proven / still assumed / open problems
- Colored boxes: BEFORE (red), AFTER (green), WHY WRONG (orange), KEY INSIGHT (blue), WARNING (yellow)
Compile:
pdflatex proof_audit_report.tex && pdflatex proof_audit_report.tex生成,包含:
proof_audit_report.tex- 概览表格: 所有问题的双轴严重程度、类别、修复策略、状态
- 逻辑链对比: 红色(修复前)→ 绿色(修复后)的对比
- 每个修复详情: 原证明 → 问题原因 → 反例(如有)→ 完整推导 → 剩余细节
- 证明义务差异: 修复前未核验的内容与当前已核验的内容
- 总结: 已证明内容 / 仍假设内容 / 未解决问题
- 彩色标注框: 修复前(红)、修复后(绿)、问题原因(橙)、核心洞察(蓝)、警告(黄)
编译:
pdflatex proof_audit_report.tex && pdflatex proof_audit_report.texPhase 5: State Persistence
阶段5:状态持久化
Write :
PROOF_CHECK_STATE.jsonjson
{
"status": "completed",
"rounds": 2,
"threadId": "...",
"fatal_fixed": 0,
"critical_fixed": 3,
"major_fixed": 2,
"minor_fixed": 1,
"counterexamples_found": 1,
"counterexample_candidates": 2,
"acceptance_gate": "PASS",
"timestamp": "..."
}写入:
PROOF_CHECK_STATE.jsonjson
{
"status": "completed",
"rounds": 2,
"threadId": "...",
"fatal_fixed": 0,
"critical_fixed": 3,
"major_fixed": 2,
"minor_fixed": 1,
"counterexamples_found": 1,
"counterexample_candidates": 2,
"acceptance_gate": "PASS",
"timestamp": "..."
}Key Rules
核心规则
Mathematical rigor
数学严谨性
- Never accept a proof step on faith. "Clearly" / "it follows" / "by standard arguments" are red flags — each must spawn a micro-claim.
- Hypothesis discharge: Every time a lemma is APPLIED, verify EACH of its hypotheses at that point. Use the side-condition checklists above.
- Interchange discipline: Every swap of limit/expectation/derivative/integral must cite a theorem (DCT/MCT/Fubini/Leibniz) and verify its conditions with explicit dominating function or integrability proof.
- Uniformity discipline: Every O(·)/Θ(·) must declare what parameters it is uniform over. "O(1)" that secretly depends on d,n,K is a CONSTANT_DEPENDENCE_HIDDEN issue.
- Quantifier discipline: Check ∀/∃ order. "For sufficiently small κ" must specify: does κ₀ depend on K? On π? On d?
- Counterexample-first: Before trying to fix a gap, first try to break it.
- WLOG prohibition: Every "without loss of generality" must have an explicit micro-claim proving the reduction. No free WLOGs.
- No silent assumption strengthening: Any fix that adds conditions must propagate to the theorem statement.
- 绝不轻信证明步骤。「显然」「由此可得」「标准论证」均为危险信号——每个此类表述都需生成一个微论断。
- 假设核验: 每次应用引理时,需在应用点验证其每个假设条件。使用上述附带条件检查清单。
- 推导顺序规范: 每次交换极限/期望/导数/积分顺序时,必须引用定理(DCT/MCT/Fubini/莱布尼茨)并明确验证其条件,如提供控制函数或可积性证明。
- 一致性规范: 每个O(·)/Θ(·)必须声明其对哪些参数一致。暗中依赖d,n,K的「O(1)」属于CONSTANT_DEPENDENCE_HIDDEN问题。
- 量词规范: 检查∀/∃顺序。「当κ足够小时」必须明确:κ₀是否依赖于K?是否依赖于π?是否依赖于d?
- 反例优先: 在尝试修复漏洞前,先尝试构造反例。
- WLOG限制: 每个「不失一般性」必须有明确的微论断证明简化的无损性。不得随意使用WLOG。
- 无隐性假设强化: 任何添加条件的修复必须同步更新至定理表述。
Cross-model protocol
跨模型协议
- Claude analyzes, Codex reviews: Claude reads proof, formulates questions, implements fixes. Codex provides adversarial review.
- Codex reasoning always xhigh: Never downgrade.
- Send full content: Don't summarize — send actual math for line-by-line checking.
- Preserve threadId: Use for follow-up rounds.
codex-reply
- Claude分析,Codex审核: Claude读取证明、提出问题、实现修复。Codex提供对抗性审核。
- Codex推理强度始终为xhigh: 不得降低。
- 发送完整内容: 不得总结——发送实际数学内容进行逐行检查。
- 保留threadId: 后续轮次使用。
codex-reply
Fix quality
修复质量
- Minimal fixes: Fix exactly what's broken, nothing more.
- Full derivation: Every fix includes complete mathematical argument.
- Explicit scope decisions: Each fix is tagged ADD_DERIVATION / STRENGTHEN_ASSUMPTION / WEAKEN_CLAIM / ADD_REFERENCE.
- Compile after each fix: LaTeX must compile cleanly.
- 最小修复: 仅修复问题本身,不做额外修改。
- 完整推导: 每个修复包含完整的数学论证。
- 明确范围决策: 每个修复标记为ADD_DERIVATION / STRENGTHEN_ASSUMPTION / WEAKEN_CLAIM / ADD_REFERENCE。
- 每次修复后编译: LaTeX必须能正常编译。
Scope honesty
范围诚实性
- Don't overclaim: If a fix makes a result conditional, say so.
- Separate "proven" from "assumed": The audit report has an explicit section for this.
- Log open problems: Issues requiring future work are listed, not hidden.
- 不夸大: 若修复使结果变为条件性成立,需明确说明。
- 区分「已证明」与「假设」: 审核报告需包含明确的对应章节。
- 记录未解决问题: 需列出需后续处理的问题,不得隐藏。
Output Files
输出文件
| File | Content | When |
|---|---|---|
| Dependency DAG + assumption ledger + micro-claims | Phase 0.5 |
| Cumulative round-by-round audit log | Updated each round |
| Machine-readable submission verdict (see below) | Always emitted |
| Formal before/after report | Phase 4 |
| State for recovery | Phase 5 |
| 文件 | 内容 | 生成阶段 |
|---|---|---|
| 依赖关系DAG + 假设台账 + 微论断清单 | 阶段0.5 |
| 逐轮累积审核日志 | 每轮更新 |
| 机器可读的提交 verdict(见下文) | 始终生成 |
| 正式的修复前后报告 | 阶段4 |
| 用于恢复的状态文件 | 阶段5 |
Submission Artifact Emission
提交工件生成
This skill always writes at the paper directory
root (i.e. when invoked from
with paper-dir ; when invoked
standalone), regardless of caller or whether the paper contains theorems.
A paper with no / / emits
verdict ; silent skip is forbidden.
Phase 6 and both rely on this artifact
existing at .
PROOF_AUDIT.jsonpaper/PROOF_AUDIT.json/paper-writingpaper/<your-paper-dir>/PROOF_AUDIT.json\begin{theorem}\begin{lemma}\begin{proof}NOT_APPLICABLEpaper-writingtools/verify_paper_audits.sh<paper-dir>/PROOF_AUDIT.jsonThe artifact conforms to the schema in :
shared-references/assurance-contract.mdjson
{
"audit_skill": "proof-checker",
"verdict": "PASS | WARN | FAIL | NOT_APPLICABLE | BLOCKED | ERROR",
"reason_code": "all_proofs_complete | minor_gaps | critical_gap | no_theorems | ...",
"summary": "One-line human-readable verdict summary.",
"audited_input_hashes": {
"main.tex": "sha256:...",
"sections/4.theory.tex": "sha256:..."
},
"trace_path": ".aris/traces/proof-checker/<date>_run<NN>/",
"thread_id": "<codex mcp thread id>",
"reviewer_model": "gpt-5.4",
"reviewer_reasoning": "xhigh",
"generated_at": "<UTC ISO-8601>",
"details": {
"theorems_audited": <int>,
"issues": [ { "id": "T1-H3", "severity": "FATAL|CRITICAL|MAJOR|MINOR",
"category": "quantifier|domination|...",
"location": "sections/4.theory.tex:L182",
"note": "..." }, ... ]
}
}本工具始终在论文目录根目录写入(即通过调用且论文目录为时,路径为;独立调用时,路径为),无论调用者是谁或论文是否包含定理。若论文中无//,则输出verdict为;禁止静默跳过。阶段6和均依赖于该工件存在于。
PROOF_AUDIT.json/paper-writingpaper/paper/PROOF_AUDIT.json<your-paper-dir>/PROOF_AUDIT.json\begin{theorem}\begin{lemma}\begin{proof}NOT_APPLICABLEpaper-writingtools/verify_paper_audits.sh<paper-dir>/PROOF_AUDIT.json工件符合中的 schema:
shared-references/assurance-contract.mdjson
{
"audit_skill": "proof-checker",
"verdict": "PASS | WARN | FAIL | NOT_APPLICABLE | BLOCKED | ERROR",
"reason_code": "all_proofs_complete | minor_gaps | critical_gap | no_theorems | ...",
"summary": "单行人类可读的verdict摘要。",
"audited_input_hashes": {
"main.tex": "sha256:...",
"sections/4.theory.tex": "sha256:..."
},
"trace_path": ".aris/traces/proof-checker/<date>_run<NN>/",
"thread_id": "<codex mcp thread id>",
"reviewer_model": "gpt-5.4",
"reviewer_reasoning": "xhigh",
"generated_at": "<UTC ISO-8601>",
"details": {
"theorems_audited": <int>,
"issues": [ { "id": "T1-H3", "severity": "FATAL|CRITICAL|MAJOR|MINOR",
"category": "quantifier|domination|...",
"location": "sections/4.theory.tex:L182",
"note": "..." }, ... ]
}
}audited_input_hashes
scope
audited_input_hashesaudited_input_hashes
范围
audited_input_hashesHash the declared input set actually reviewed — the theorem-bearing
files passed into this invocation — not a repo-wide union and not
the reviewer's self-reported opened subset. The external verifier rehashes
these entries; any mismatch flags .
.texSTALEPath convention (must match ): keys are
paths relative to the paper directory (no prefix — the
verifier resolves relative to the paper dir; prefixing produces
and false-fails as STALE). Use absolute paths for
files outside the paper dir.
tools/verify_paper_audits.shpaper/paper/paper/...对实际审核的声明输入集进行哈希——即本次调用传入的包含定理的文件——而非整个仓库的文件集合或审核者自行打开的子集。外部验证者会重新哈希这些条目;任何不匹配都会标记为。
.texSTALE路径约定(必须与一致):键为相对于论文目录的路径(无前缀——验证者会相对于论文目录解析;添加前缀会产生路径并导致误判为STALE)。论文目录外的文件使用绝对路径。
tools/verify_paper_audits.shpaper/paper/paper/...Verdict decision table
Verdict决策表
| Input state | Verdict | |
|---|---|---|
| No theorems / lemmas / proofs in paper | | |
| Theorems present but referenced files unreadable | | |
| All proof obligations discharged, no gaps | | |
| Only MINOR issues (notation / exposition) | | |
| Any FATAL or CRITICAL issue (logic gap, wrong claim) | | |
| Reviewer invocation failed (network / malformed) | | |
MAJOR issues alone map to or at the reviewer's discretion and
must carry an explicit justification in + .
WARNFAILsummarydetails.issues| 输入状态 | Verdict | |
|---|---|---|
| 论文中无定理/引理/证明 | | |
| 存在定理但引用文件无法读取 | | |
| 所有证明义务已核验,无漏洞 | | |
| 仅存在MINOR级问题(符号/表述) | | |
| 存在任何FATAL或CRITICAL级问题(逻辑漏洞、错误论断) | | |
| 审核者调用失败(网络/格式错误) | | |
仅存在MAJOR级问题时,由审核者决定映射为或,且必须在 + 中提供明确理由。
WARNFAILsummarydetails.issuesThread independence
线程独立性
Every invocation uses a fresh thread. Never
across proof-checker runs. Do not accept prior audit outputs
(PAPER_CLAIM_AUDIT, CITATION_AUDIT, EXPERIMENT_LOG) as input — the fresh
thread preserves reviewer independence per
.
mcp__codex__codexcodex-replyshared-references/reviewer-independence.mdThis skill never blocks by itself; Phase 6 plus the
verifier decide whether the verdict blocks finalization based on the
level.
paper-writingassurance每次调用使用全新的线程。不得跨proof-checker运行调用。不得接受之前的审核输出(PAPER_CLAIM_AUDIT、CITATION_AUDIT、EXPERIMENT_LOG)作为输入——全新线程确保审核者独立性,符合。
mcp__codex__codexcodex-replyshared-references/reviewer-independence.md本工具自身不会阻塞流程;阶段6和验证者会根据级别决定verdict是否阻塞最终定稿。
paper-writingassuranceExample Invocations
调用示例
/proof-checker "neurips_2025.tex"
/proof-checker "check the GMM generalization proof, focus on dimension dependence"
/proof-checker "verify proof in paper.tex — difficulty: nightmare"/proof-checker "neurips_2025.tex"
/proof-checker "check the GMM generalization proof, focus on dimension dependence"
/proof-checker "verify proof in paper.tex — difficulty: nightmare"