santa-method

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Santa Method

Multi-agent adversarial verification framework. Make a list, check it twice. If it's naughty, fix it until it's nice.

The core insight: a single agent reviewing its own output shares the same biases, knowledge gaps, and systematic errors that produced the output. Two independent reviewers with no shared context break this failure mode.

多智能体对抗式验证框架。列清单、双重审核。若存在问题，持续修正直至符合要求。

核心思路：单个Agent审核自身输出时，会带有与生成该输出时相同的偏见、知识缺口和系统性错误。而两个无共享上下文的独立审核者则能打破这种失效模式。

When to Activate

激活场景

Invoke this skill when:

Output will be published, deployed, or consumed by end users
Compliance, regulatory, or brand constraints must be enforced
Code ships to production without human review
Content accuracy matters (technical docs, educational material, customer-facing copy)
Batch generation at scale where spot-checking misses systemic patterns
Hallucination risk is elevated (claims, statistics, API references, legal language)

Do NOT use for internal drafts, exploratory research, or tasks with deterministic verification (use build/test/lint pipelines for those).

在以下场景中调用该机制：

输出内容将被发布、部署或供终端用户使用
必须遵守合规、监管或品牌约束
代码将在无人工审核的情况下部署到生产环境
内容准确性至关重要（技术文档、教育材料、客户面向文案）
大规模批量生成场景中，抽样检查无法发现系统性问题
幻觉风险较高的场景（声明、统计数据、API参考、法律语言）

请勿将其用于内部草稿、探索性研究或可通过确定性验证完成的任务（此类任务请使用构建/测试/代码检查流水线）。

Architecture

架构

┌─────────────┐
│  GENERATOR   │  Phase 1: Make a List
│  (Agent A)   │  Produce the deliverable
└──────┬───────┘
       │ output
       ▼
┌──────────────────────────────┐
│     DUAL INDEPENDENT REVIEW   │  Phase 2: Check It Twice
│                                │
│  ┌───────────┐ ┌───────────┐  │  Two agents, same rubric,
│  │ Reviewer B │ │ Reviewer C │  │  no shared context
│  └─────┬─────┘ └─────┬─────┘  │
│        │              │        │
└────────┼──────────────┼────────┘
         │              │
         ▼              ▼
┌──────────────────────────────┐
│        VERDICT GATE           │  Phase 3: Naughty or Nice
│                                │
│  B passes AND C passes → NICE  │  Both must pass.
│  Otherwise → NAUGHTY           │  No exceptions.
└──────┬──────────────┬─────────┘
       │              │
    NICE           NAUGHTY
       │              │
       ▼              ▼
   [ SHIP ]    ┌─────────────┐
               │  FIX CYCLE   │  Phase 4: Fix Until Nice
               │              │
               │ iteration++  │  Collect all flags.
               │ if i > MAX:  │  Fix all issues.
               │   escalate   │  Re-run both reviewers.
               │ else:        │  Loop until convergence.
               │   goto Ph.2  │
               └──────────────┘

┌─────────────┐
│  GENERATOR   │  阶段1：生成内容（列清单）
│  (Agent A)   │  产出交付物
└──────┬───────┘
       │ 输出内容
       ▼
┌──────────────────────────────┐
│     双重独立审核               │  阶段2：双重审核
│                                │
│  ┌───────────┐ ┌───────────┐  │  两个Agent，同一审核标准，
│  │ 审核者B     │ │ 审核者C     │  │  无共享上下文
│  └─────┬─────┘ └─────┬─────┘  │
│        │              │        │
└────────┼──────────────┼────────┘
         │              │
         ▼              ▼
┌──────────────────────────────┐
│         verdict 关卡           │  阶段3：是否符合要求
│                                │
│  B通过 且 C通过 → 符合要求       │  必须两者都通过，无例外。
│  否则 → 存在问题               │
└──────┬──────────────┬─────────┘
       │              │
    符合要求           存在问题
       │              │
       ▼              ▼
   [ 发布 ]    ┌─────────────┐
               │  修正循环     │  阶段4：持续修正直至符合要求
               │              │
               │  iteration++  │  收集所有标记问题
               │  if i > MAX:  │  修复所有问题
               │   升级至人工审核   │  重新运行两个审核者
               │ else:        │  循环直至收敛
               │   回到阶段2    │
               └──────────────┘

Phase Details

阶段细节

Phase 1: Make a List (Generate)

阶段1：生成内容（列清单）

Execute the primary task. No changes to your normal generation workflow. Santa Method is a post-generation verification layer, not a generation strategy.

python

undefined

执行核心任务，无需改变常规生成流程。Santa Method是生成后的验证层，而非生成策略。

python

undefined

The generator runs as normal

生成Agent正常运行

output = generate(task_spec)

undefined

output = generate(task_spec)

undefined

Phase 2: Check It Twice (Independent Dual Review)

阶段2：双重审核（独立双重审查）

Spawn two review agents in parallel. Critical invariants:

Context isolation — neither reviewer sees the other's assessment
Identical rubric — both receive the same evaluation criteria
Same inputs — both receive the original spec AND the generated output
Structured output — each returns a typed verdict, not prose

python

REVIEWER_PROMPT = """
You are an independent quality reviewer. You have NOT seen any other review of this output.

并行启动两个审核Agent，需严格遵循以下不变原则：

上下文隔离 — 两个审核者均无法看到对方的评估结果
统一审核标准 — 两者使用相同的评估准则
相同输入 — 两者均接收原始任务说明和生成的输出内容
结构化输出 — 每个审核者返回标准化 verdict，而非散文式描述

python

REVIEWER_PROMPT = """
你是一名独立质量审核者，未查看过该输出内容的任何其他审核结果。

Task Specification

任务说明

{task_spec}

Output Under Review

待审核输出

{output}

Evaluation Rubric

评估准则

{rubric}

Instructions

指令

Evaluate the output against EACH rubric criterion. For each:

PASS: criterion fully met, no issues
FAIL: specific issue found (cite the exact problem)

Return your assessment as structured JSON: { "verdict": "PASS" | "FAIL", "checks": [ {"criterion": "...", "result": "PASS|FAIL", "detail": "..."} ], "critical_issues": ["..."], // blockers that must be fixed "suggestions": ["..."] // non-blocking improvements }

Be rigorous. Your job is to find problems, not to approve. """


```python

对照每一条评估准则对输出内容进行评估：

PASS：完全符合准则，无问题
FAIL：发现具体问题（需明确引用问题内容）

以结构化JSON格式返回评估结果： { "verdict": "PASS" | "FAIL", "checks": [ {"criterion": "...", "result": "PASS|FAIL", "detail": "..."} ], "critical_issues": ["..."], // 必须修复的阻塞性问题 "suggestions": ["..."] // 非阻塞性优化建议 }

请严格审核，你的职责是发现问题，而非通过审核。 """


```python

Spawn reviewers in parallel (Claude Code subagents)

并行启动审核者（推荐使用Claude Code子Agent）

review_b = Agent(prompt=REVIEWER_PROMPT.format(...), description="Santa Reviewer B") review_c = Agent(prompt=REVIEWER_PROMPT.format(...), description="Santa Reviewer C")

Both run concurrently — neither sees the other

两者并行运行，无共享上下文

undefined

undefined

Rubric Design

审核标准设计

The rubric is the most important input. Vague rubrics produce vague reviews. Every criterion must have an objective pass/fail condition.

Criterion	Pass Condition	Failure Signal
Factual accuracy	All claims verifiable against source material or common knowledge	Invented statistics, wrong version numbers, nonexistent APIs
Hallucination-free	No fabricated entities, quotes, URLs, or references	Links to pages that don't exist, attributed quotes with no source
Completeness	Every requirement in the spec is addressed	Missing sections, skipped edge cases, incomplete coverage
Compliance	Passes all project-specific constraints	Banned terms used, tone violations, regulatory non-compliance
Internal consistency	No contradictions within the output	Section A says X, section B says not-X
Technical correctness	Code compiles/runs, algorithms are sound	Syntax errors, logic bugs, wrong complexity claims

审核标准是最重要的输入，模糊的标准会导致模糊的审核结果。每条准则必须具备客观的通过/不通过条件。

准则	通过条件	失败信号
事实准确性	所有声明均可通过源材料或常识验证	虚构统计数据、错误版本号、不存在的API
无幻觉内容	无虚构实体、引用、URL或参考文献	指向不存在页面的链接、无来源的引用内容
完整性	任务说明中的所有需求均已覆盖	缺失章节、遗漏边缘情况、覆盖不完整
合规性	符合所有项目特定约束	使用禁用术语、违反语气要求、不符合监管规定
内部一致性	输出内容无自相矛盾	A章节说明X，B章节说明非X
技术正确性	代码可编译/运行、算法合理	语法错误、逻辑漏洞、错误的复杂度声明

Domain-Specific Rubric Extensions

特定领域审核标准扩展

Content/Marketing:

Brand voice adherence
SEO requirements met (keyword density, meta tags, structure)
No competitor trademark misuse
CTA present and correctly linked

Code:

Type safety (no
```
any
```
leaks, proper null handling)
Error handling coverage
Security (no secrets in code, input validation, injection prevention)
Test coverage for new paths

Compliance-Sensitive (regulated, legal, financial):

No outcome guarantees or unsubstantiated claims
Required disclaimers present
Approved terminology only
Jurisdiction-appropriate language

内容/营销领域：

符合品牌语调
满足SEO要求（关键词密度、元标签、结构）
无竞品商标滥用
包含正确链接的行动号召（CTA）

代码领域：

类型安全（无
```
any
```
类型泄漏、正确的空值处理）
错误处理覆盖
安全性（代码中无敏感信息、输入验证、注入防护）
新增代码路径的测试覆盖

合规敏感领域（受监管、法律、金融）：

无结果保证或无依据声明
包含必要的免责声明
仅使用批准术语
符合辖区特定语言要求

Phase 3: Naughty or Nice (Verdict Gate)

阶段3：是否符合要求（Verdict关卡）

python

def santa_verdict(review_b, review_c):
    """Both reviewers must pass. No partial credit."""
    if review_b.verdict == "PASS" and review_c.verdict == "PASS":
        return "NICE"  # Ship it

    # Merge flags from both reviewers, deduplicate
    all_issues = dedupe(review_b.critical_issues + review_c.critical_issues)
    all_suggestions = dedupe(review_b.suggestions + review_c.suggestions)

    return "NAUGHTY", all_issues, all_suggestions

Why both must pass: if only one reviewer catches an issue, that issue is real. The other reviewer's blind spot is exactly the failure mode Santa Method exists to eliminate.

python

def santa_verdict(review_b, review_c):
    """两个审核者必须全部通过，无部分通过情况。"""
    if review_b.verdict == "PASS" and review_c.verdict == "PASS":
        return "NICE"  // 发布内容

    // 合并两个审核者标记的问题并去重
    all_issues = dedupe(review_b.critical_issues + review_c.critical_issues)
    all_suggestions = dedupe(review_b.suggestions + review_c.suggestions)

    return "NAUGHTY", all_issues, all_suggestions

为何必须两者都通过：若只有一个审核者发现问题，该问题真实存在。另一个审核者的盲点正是Santa Method旨在解决的失效模式。

Phase 4: Fix Until Nice (Convergence Loop)

阶段4：持续修正直至符合要求（收敛循环）

python

MAX_ITERATIONS = 3

for iteration in range(MAX_ITERATIONS):
    verdict, issues, suggestions = santa_verdict(review_b, review_c)

    if verdict == "NICE":
        log_santa_result(output, iteration, "passed")
        return ship(output)

    # Fix all critical issues (suggestions are optional)
    output = fix_agent.execute(
        output=output,
        issues=issues,
        instruction="Fix ONLY the flagged issues. Do not refactor or add unrequested changes."
    )

    # Re-run BOTH reviewers on fixed output (fresh agents, no memory of previous round)
    review_b = Agent(prompt=REVIEWER_PROMPT.format(output=output, ...))
    review_c = Agent(prompt=REVIEWER_PROMPT.format(output=output, ...))

python

MAX_ITERATIONS = 3

for iteration in range(MAX_ITERATIONS):
    verdict, issues, suggestions = santa_verdict(review_b, review_c)

    if verdict == "NICE":
        log_santa_result(output, iteration, "passed")
        return ship(output)

    // 修复所有阻塞性问题（优化建议为可选）
    output = fix_agent.execute(
        output=output,
        issues=issues,
        instruction="仅修复标记的问题。请勿重构或添加未要求的修改。"
    )

    // 在修正后的输出上重新运行两个审核者（使用全新Agent，无之前回合的记忆）
    review_b = Agent(prompt=REVIEWER_PROMPT.format(output=output, ...))
    review_c = Agent(prompt=REVIEWER_PROMPT.format(output=output, ...))

// 迭代次数用尽 → 升级至人工审核
log_santa_result(output, MAX_ITERATIONS, "escalated")
escalate_to_human(output, issues)

关键要点：每一轮审核都使用全新Agent。审核者不得携带前一轮的记忆，因为先前的上下文会导致锚定偏见。

Exhausted iterations — escalate

实现模式

—

模式A：Claude Code子Agent（推荐）

log_santa_result(output, MAX_ITERATIONS, "escalated") escalate_to_human(output, issues)


Critical: each review round uses **fresh agents**. Reviewers must not carry memory from previous rounds, as prior context creates anchoring bias.

子Agent可实现真正的上下文隔离，每个审核者都是独立进程，无共享状态。

bash

undefined

Implementation Patterns

在Claude Code会话中，使用Agent工具启动审核者

Pattern A: Claude Code Subagents (Recommended)

两个Agent并行运行以提升速度

Subagents provide true context isolation. Each reviewer is a separate process with no shared state.

bash

undefined


```python

In a Claude Code session, use the Agent tool to spawn reviewers

Agent工具调用伪代码

Both agents run in parallel for speed

—


```python

reviewer_b = Agent( description="Santa Review B", prompt=f"审核该输出内容的质量...\n\n审核标准:\n{rubric}\n\n输出内容:\n{output}" ) reviewer_c = Agent( description="Santa Review C", prompt=f"审核该输出内容的质量...\n\n审核标准:\n{rubric}\n\n输出内容:\n{output}" )

undefined

Pseudocode for Agent tool invocation

模式B：顺序内联（备选方案）

reviewer_b = Agent( description="Santa Review B", prompt=f"Review this output for quality...\n\nRUBRIC:\n{rubric}\n\nOUTPUT:\n{output}" ) reviewer_c = Agent( description="Santa Review C", prompt=f"Review this output for quality...\n\nRUBRIC:\n{rubric}\n\nOUTPUT:\n{output}" )

undefined

当无法使用子Agent时，可通过显式上下文重置模拟隔离：

生成输出内容
新上下文："你是审核者1，仅对照该审核标准进行评估，找出问题。"
逐字记录发现的问题
完全清除上下文
新上下文："你是审核者2，仅对照该审核标准进行评估，找出问题。"
对比两个审核结果，修正问题后重复流程

子Agent模式更优——内联模拟存在审核者之间上下文泄漏的风险。

Pattern B: Sequential Inline (Fallback)

模式C：批量抽样

When subagents aren't available, simulate isolation with explicit context resets:

Generate output
New context: "You are Reviewer 1. Evaluate ONLY against this rubric. Find problems."
Record findings verbatim
Clear context completely
New context: "You are Reviewer 2. Evaluate ONLY against this rubric. Find problems."
Compare both reviews, fix, repeat

The subagent pattern is strictly superior — inline simulation risks context bleed between reviewers.

对于大规模批量内容（100+项），对每个内容项完整执行Santa Method成本过高。可使用分层抽样：

对随机抽样内容执行Santa Method（批量的10-15%，最少5项）
按问题类型分类失败案例（幻觉、合规性、完整性等）
若发现系统性问题模式，对整个批量内容应用针对性修复
重新抽样并验证修复后的批量内容
持续直至抽样内容全部通过

python

import random

def santa_batch(items, rubric, sample_rate=0.15):
    sample = random.sample(items, max(5, int(len(items) * sample_rate)))

    for item in sample:
        result = santa_full(item, rubric)
        if result.verdict == "NAUGHTY":
            pattern = classify_failure(result.issues)
            items = batch_fix(items, pattern)  // 修复所有符合该问题模式的内容
            return santa_batch(items, rubric)   // 重新抽样

    return items  // 抽样内容全部通过 → 发布批量内容

Pattern C: Batch Sampling

失效模式与缓解措施

For large batches (100+ items), full Santa on every item is cost-prohibitive. Use stratified sampling:

Run Santa on a random sample (10-15% of batch, minimum 5 items)
Categorize failures by type (hallucination, compliance, completeness, etc.)
If systematic patterns emerge, apply targeted fixes to the entire batch
Re-sample and re-verify the fixed batch
Continue until a clean sample passes

python

import random

def santa_batch(items, rubric, sample_rate=0.15):
    sample = random.sample(items, max(5, int(len(items) * sample_rate)))

    for item in sample:
        result = santa_full(item, rubric)
        if result.verdict == "NAUGHTY":
            pattern = classify_failure(result.issues)
            items = batch_fix(items, pattern)  # Fix all items matching pattern
            return santa_batch(items, rubric)   # Re-sample

    return items  # Clean sample → ship batch

失效模式	症状	缓解措施
无限循环	修复后审核者持续发现新问题	设置最大迭代次数（3次），升级至人工审核
橡皮图章式审核	两个审核者通过所有内容	使用对抗式提示："你的职责是发现问题，而非通过审核。"
主观偏差	审核者标记风格偏好而非错误	仅使用带有客观通过/不通过条件的严格审核标准
修复回归	修复问题A时引入问题B	每轮审核使用全新Agent以发现回归问题
审核者共识偏差	两个审核者均遗漏同一问题	通过独立性缓解，但无法完全消除。对于关键内容，添加第三个审核者或人工抽样检查。
成本激增	大型输出内容的迭代次数过多	使用批量抽样模式。为每个验证周期设置预算上限。

Failure Modes and Mitigations

与其他机制的集成

Failure Mode	Symptom	Mitigation
Infinite loop	Reviewers keep finding new issues after fixes	Max iteration cap (3). Escalate.
Rubber stamping	Both reviewers pass everything	Adversarial prompt: "Your job is to find problems, not approve."
Subjective drift	Reviewers flag style preferences, not errors	Tight rubric with objective pass/fail criteria only
Fix regression	Fixing issue A introduces issue B	Fresh reviewers each round catch regressions
Reviewer agreement bias	Both reviewers miss the same thing	Mitigated by independence, not eliminated. For critical output, add a third reviewer or human spot-check.
Cost explosion	Too many iterations on large outputs	Batch sampling pattern. Budget caps per verification cycle.

机制	关系
验证循环	用于确定性检查（构建、代码检查、测试）。Santa Method用于语义检查（准确性、幻觉）。先运行验证循环，再运行Santa Method。
评估框架	Santa Method结果可作为评估指标。跟踪Santa运行中的pass@k指标以衡量生成Agent的长期质量。
持续学习v2	Santa发现的问题将成为生成Agent的经验。同一准则下的重复失败将转化为避免该模式的习得行为。
策略压缩	在压缩前运行Santa Method。验证过程中请勿丢失审核上下文。

Integration with Other Skills

指标

Skill	Relationship
Verification Loop	Use for deterministic checks (build, lint, test). Santa for semantic checks (accuracy, hallucinations). Run verification-loop first, Santa second.
Eval Harness	Santa Method results feed eval metrics. Track pass@k across Santa runs to measure generator quality over time.
Continuous Learning v2	Santa findings become instincts. Repeated failures on the same criterion → learned behavior to avoid the pattern.
Strategic Compact	Run Santa BEFORE compacting. Don't lose review context mid-verification.

跟踪以下指标以衡量Santa Method的有效性：

首次通过率：首轮通过Santa审核的输出占比（目标：>70%）
平均收敛迭代次数：达到符合要求状态的平均轮次（目标：<1.5）
问题分类：失败类型分布（幻觉、完整性、合规性等）
审核者共识率：两个审核者均标记的问题占比 vs 仅单个审核者标记的问题占比（共识率低说明审核标准需要收紧）
逃逸率：发布后发现的、Santa本应发现的问题占比（目标：0）

Metrics

成本分析

Track these to measure Santa Method effectiveness:

First-pass rate: % of outputs that pass Santa on round 1 (target: >70%)
Mean iterations to convergence: average rounds to NICE (target: <1.5)
Issue taxonomy: distribution of failure types (hallucination vs. completeness vs. compliance)
Reviewer agreement: % of issues flagged by both reviewers vs. only one (low agreement = rubric needs tightening)
Escape rate: issues found post-ship that Santa should have caught (target: 0)

每个验证周期中，Santa Method的成本约为单独生成内容的2-3倍。对于大多数高风险输出内容，这一成本极具价值：

Santa成本 = （生成内容token成本） + 2×（每轮审核token成本） ×（平均轮次）
不使用Santa的成本 = （声誉损失） + （修正工作量） + （信任流失）

对于批量操作，抽样模式可将成本降至完整验证的15-20%，同时能发现>90%的系统性问题。

Cost Analysis

—

Santa Method costs approximately 2-3x the token cost of generation alone per verification cycle. For most high-stakes output, this is a bargain:

Cost of Santa = (generation tokens) + 2×(review tokens per round) × (avg rounds)
Cost of NOT Santa = (reputation damage) + (correction effort) + (trust erosion)

For batch operations, the sampling pattern reduces cost to ~15-20% of full verification while catching >90% of systematic issues.

—