santa-method

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Santa Method

Multi-agent adversarial verification framework. Make a list, check it twice. If it's naughty, fix it until it's nice.

The core insight: a single agent reviewing its own output shares the same biases, knowledge gaps, and systematic errors that produced the output. Two independent reviewers with no shared context break this failure mode.

多智能体对抗验证框架。列清单，查两遍。如果存在问题，就修改直到符合要求。

核心思路：单个智能体审核自身输出时，会带有与生成该输出时相同的偏见、知识盲区和系统性错误。两个无共享上下文的独立审核智能体可以打破这种失效模式。

When to Activate

激活场景

Invoke this skill when:

Output will be published, deployed, or consumed by end users
Compliance, regulatory, or brand constraints must be enforced
Code ships to production without human review
Content accuracy matters (technical docs, educational material, customer-facing copy)
Batch generation at scale where spot-checking misses systemic patterns
Hallucination risk is elevated (claims, statistics, API references, legal language)

Do NOT use for internal drafts, exploratory research, or tasks with deterministic verification (use build/test/lint pipelines for those).

在以下场景中调用该机制：

输出内容将发布、部署或供终端用户使用
必须遵守合规、监管或品牌约束
代码无需人工审核即可上线生产环境
内容准确性至关重要（技术文档、教育材料、客户面向文案）
大规模批量生成场景中，抽查无法发现系统性问题
幻觉风险较高（声明、统计数据、API参考、法律语言）

请勿在内部草稿、探索性研究或可确定性验证的任务中使用此类场景请使用构建/测试/代码检查流水线。

Architecture

架构

┌─────────────┐
│  GENERATOR   │  Phase 1: Make a List
│  (Agent A)   │  Produce the deliverable
└──────┬───────┘
       │ output
       ▼
┌──────────────────────────────┐
│     DUAL INDEPENDENT REVIEW   │  Phase 2: Check It Twice
│                                │
│  ┌───────────┐ ┌───────────┐  │  Two agents, same rubric,
│  │ Reviewer B │ │ Reviewer C │  │  no shared context
│  └─────┬─────┘ └─────┬─────┘  │
│        │              │        │
└────────┼──────────────┼────────┘
         │              │
         ▼              ▼
┌──────────────────────────────┐
│        VERDICT GATE           │  Phase 3: Naughty or Nice
│                                │
│  B passes AND C passes → NICE  │  Both must pass.
│  Otherwise → NAUGHTY           │  No exceptions.
└──────┬──────────────┬─────────┘
       │              │
    NICE           NAUGHTY
       │              │
       ▼              ▼
   [ SHIP ]    ┌─────────────┐
               │  FIX CYCLE   │  Phase 4: Fix Until Nice
               │              │
               │ iteration++  │  Collect all flags.
               │ if i > MAX:  │  Fix all issues.
               │   escalate   │  Re-run both reviewers.
               │ else:        │  Loop until convergence.
               │   goto Ph.2  │
               └──────────────┘

┌─────────────┐
│  GENERATOR   │  Phase 1: Make a List
│  (Agent A)   │  Produce the deliverable
└──────┬───────┘
       │ output
       ▼
┌──────────────────────────────┐
│     DUAL INDEPENDENT REVIEW   │  Phase 2: Check It Twice
│                                │
│  ┌───────────┐ ┌───────────┐  │  Two agents, same rubric,
│  │ Reviewer B │ │ Reviewer C │  │  no shared context
│  └─────┬─────┘ └─────┬─────┘  │
│        │              │        │
└────────┼──────────────┼────────┘
         │              │
         ▼              ▼
┌──────────────────────────────┐
│        VERDICT GATE           │  Phase 3: Naughty or Nice
│                                │
│  B passes AND C passes → NICE  │  Both must pass.
│  Otherwise → NAUGHTY           │  No exceptions.
└──────┬──────────────┬─────────┘
       │              │
    NICE           NAUGHTY
       │              │
       ▼              ▼
   [ SHIP ]    ┌─────────────┐
               │  FIX CYCLE   │  Phase 4: Fix Until Nice
               │              │
               │ iteration++  │  Collect all flags.
               │ if i > MAX:  │  Fix all issues.
               │   escalate   │  Re-run both reviewers.
               │ else:        │  Loop until convergence.
               │   goto Ph.2  │
               └──────────────┘

Phase Details

阶段详情

Phase 1: Make a List (Generate)

阶段1：列清单（生成）

Execute the primary task. No changes to your normal generation workflow. Santa Method is a post-generation verification layer, not a generation strategy.

python

undefined

执行核心任务。无需更改常规生成工作流。Santa Method是生成后的验证层，而非生成策略。

python

undefined

The generator runs as normal

output = generate(task_spec)

undefined

output = generate(task_spec)

undefined

Phase 2: Check It Twice (Independent Dual Review)

阶段2：查两遍（独立双重审核）

Spawn two review agents in parallel. Critical invariants:

Context isolation — neither reviewer sees the other's assessment
Identical rubric — both receive the same evaluation criteria
Same inputs — both receive the original spec AND the generated output
Structured output — each returns a typed verdict, not prose

python

REVIEWER_PROMPT = """
You are an independent quality reviewer. You have NOT seen any other review of this output.

并行生成两个审核智能体。关键约束：

上下文隔离 — 两个审核智能体均无法查看对方的评估结果
相同评估标准 — 两者使用完全一致的评审规则
相同输入 — 两者均接收原始任务规范和生成的输出内容
结构化输出 — 每个审核智能体返回标准化判定结果，而非散文式描述

python

REVIEWER_PROMPT = """
You are an independent quality reviewer. You have NOT seen any other review of this output.

Task Specification

{task_spec}

Output Under Review

{output}

Evaluation Rubric

{rubric}

Instructions

Evaluate the output against EACH rubric criterion. For each:

PASS: criterion fully met, no issues
FAIL: specific issue found (cite the exact problem)

Return your assessment as structured JSON: { "verdict": "PASS" | "FAIL", "checks": [ {"criterion": "...", "result": "PASS|FAIL", "detail": "..."} ], "critical_issues": ["..."], // blockers that must be fixed "suggestions": ["..."] // non-blocking improvements }

Be rigorous. Your job is to find problems, not to approve. """


```python

Evaluate the output against EACH rubric criterion. For each:

PASS: criterion fully met, no issues
FAIL: specific issue found (cite the exact problem)

Be rigorous. Your job is to find problems, not to approve. """


```python

Spawn reviewers in parallel (Claude Code subagents)

review_b = Agent(prompt=REVIEWER_PROMPT.format(...), description="Santa Reviewer B") review_c = Agent(prompt=REVIEWER_PROMPT.format(...), description="Santa Reviewer C")

Both run concurrently — neither sees the other

undefined

undefined

Rubric Design

评估标准设计

The rubric is the most important input. Vague rubrics produce vague reviews. Every criterion must have an objective pass/fail condition.

Criterion	Pass Condition	Failure Signal
Factual accuracy	All claims verifiable against source material or common knowledge	Invented statistics, wrong version numbers, nonexistent APIs
Hallucination-free	No fabricated entities, quotes, URLs, or references	Links to pages that don't exist, attributed quotes with no source
Completeness	Every requirement in the spec is addressed	Missing sections, skipped edge cases, incomplete coverage
Compliance	Passes all project-specific constraints	Banned terms used, tone violations, regulatory non-compliance
Internal consistency	No contradictions within the output	Section A says X, section B says not-X
Technical correctness	Code compiles/runs, algorithms are sound	Syntax errors, logic bugs, wrong complexity claims

评估标准是最重要的输入。模糊的标准会导致模糊的审核结果。每个评审项必须具备客观的通过/不通过条件。

评审项	通过条件	失效信号
事实准确性	所有声明均可通过源材料或常识验证	虚构统计数据、错误版本号、不存在的API
无幻觉内容	无虚构实体、引用、URL或参考资料	指向不存在页面的链接、无来源的引用内容
内容完整性	任务规范中的所有需求均已覆盖	缺失章节、跳过边缘场景、覆盖不完整
合规性	符合所有项目特定约束	使用禁用术语、违反语气要求、不符合监管规定
内部一致性	输出内容无自相矛盾	A章节表述X，B章节表述非X
技术正确性	代码可编译/运行、算法合理	语法错误、逻辑漏洞、复杂度表述错误

Domain-Specific Rubric Extensions

特定领域评估标准扩展

Content/Marketing:

Brand voice adherence
SEO requirements met (keyword density, meta tags, structure)
No competitor trademark misuse
CTA present and correctly linked

Code:

Type safety (no
```
any
```
leaks, proper null handling)
Error handling coverage
Security (no secrets in code, input validation, injection prevention)
Test coverage for new paths

Compliance-Sensitive (regulated, legal, financial):

No outcome guarantees or unsubstantiated claims
Required disclaimers present
Approved terminology only
Jurisdiction-appropriate language

内容/营销领域：

符合品牌语调
满足SEO要求（关键词密度、元标签、结构）
无竞争对手商标滥用
包含正确链接的行动号召（CTA）

代码领域：

类型安全（无
```
any
```
类型泄漏、正确的空值处理）
错误处理覆盖
安全性（代码中无密钥、输入验证、注入防护）
新路径的测试覆盖

合规敏感领域（受监管、法律、金融）：

无结果保证或无依据声明
包含必要的免责声明
仅使用批准术语
符合管辖区域的语言要求

Phase 3: Naughty or Nice (Verdict Gate)

阶段3：合格或不合格（判定闸门）

python

def santa_verdict(review_b, review_c):
    """Both reviewers must pass. No partial credit."""
    if review_b.verdict == "PASS" and review_c.verdict == "PASS":
        return "NICE"  # Ship it

    # Merge flags from both reviewers, deduplicate
    all_issues = dedupe(review_b.critical_issues + review_c.critical_issues)
    all_suggestions = dedupe(review_b.suggestions + review_c.suggestions)

    return "NAUGHTY", all_issues, all_suggestions

Why both must pass: if only one reviewer catches an issue, that issue is real. The other reviewer's blind spot is exactly the failure mode Santa Method exists to eliminate.

python

def santa_verdict(review_b, review_c):
    """Both reviewers must pass. No partial credit."""
    if review_b.verdict == "PASS" and review_c.verdict == "PASS":
        return "NICE"  # Ship it

    # Merge flags from both reviewers, deduplicate
    all_issues = dedupe(review_b.critical_issues + review_c.critical_issues)
    all_suggestions = dedupe(review_b.suggestions + review_c.suggestions)

    return "NAUGHTY", all_issues, all_suggestions

为什么必须两者都通过：如果只有一个审核智能体发现问题，那这个问题确实存在。另一个审核智能体的盲区正是Santa Method要解决的失效模式。

Phase 4: Fix Until Nice (Convergence Loop)

阶段4：修改至合格（收敛循环）

python

MAX_ITERATIONS = 3

for iteration in range(MAX_ITERATIONS):
    verdict, issues, suggestions = santa_verdict(review_b, review_c)

    if verdict == "NICE":
        log_santa_result(output, iteration, "passed")
        return ship(output)

    # Fix all critical issues (suggestions are optional)
    output = fix_agent.execute(
        output=output,
        issues=issues,
        instruction="Fix ONLY the flagged issues. Do not refactor or add unrequested changes."
    )

    # Re-run BOTH reviewers on fixed output (fresh agents, no memory of previous round)
    review_b = Agent(prompt=REVIEWER_PROMPT.format(output=output, ...))
    review_c = Agent(prompt=REVIEWER_PROMPT.format(output=output, ...))

python

MAX_ITERATIONS = 3

for iteration in range(MAX_ITERATIONS):
    verdict, issues, suggestions = santa_verdict(review_b, review_c)

    if verdict == "NICE":
        log_santa_result(output, iteration, "passed")
        return ship(output)

    # Fix all critical issues (suggestions are optional)
    output = fix_agent.execute(
        output=output,
        issues=issues,
        instruction="Fix ONLY the flagged issues. Do not refactor or add unrequested changes."
    )

    # Re-run BOTH reviewers on fixed output (fresh agents, no memory of previous round)
    review_b = Agent(prompt=REVIEWER_PROMPT.format(output=output, ...))
    review_c = Agent(prompt=REVIEWER_PROMPT.format(output=output, ...))

Exhausted iterations — escalate

log_santa_result(output, MAX_ITERATIONS, "escalated") escalate_to_human(output, issues)


Critical: each review round uses **fresh agents**. Reviewers must not carry memory from previous rounds, as prior context creates anchoring bias.

log_santa_result(output, MAX_ITERATIONS, "escalated") escalate_to_human(output, issues)


关键注意点：每一轮审核都使用**全新的智能体**。审核智能体不能携带上一轮的记忆，因为之前的上下文会产生锚定偏见。

Implementation Patterns

实现模式

Pattern A: Claude Code Subagents (Recommended)

模式A：Claude Code子智能体（推荐）

Subagents provide true context isolation. Each reviewer is a separate process with no shared state.

bash

undefined

子智能体可提供真正的上下文隔离。每个审核智能体都是独立进程，无共享状态。

bash

undefined

In a Claude Code session, use the Agent tool to spawn reviewers

Both agents run in parallel for speed


```python


```python

Pseudocode for Agent tool invocation

reviewer_b = Agent( description="Santa Review B", prompt=f"Review this output for quality...\n\nRUBRIC:\n{rubric}\n\nOUTPUT:\n{output}" ) reviewer_c = Agent( description="Santa Review C", prompt=f"Review this output for quality...\n\nRUBRIC:\n{rubric}\n\nOUTPUT:\n{output}" )

undefined

undefined

Pattern B: Sequential Inline (Fallback)

模式B：顺序内联（备选方案）

When subagents aren't available, simulate isolation with explicit context resets:

Generate output
New context: "You are Reviewer 1. Evaluate ONLY against this rubric. Find problems."
Record findings verbatim
Clear context completely
New context: "You are Reviewer 2. Evaluate ONLY against this rubric. Find problems."
Compare both reviews, fix, repeat

The subagent pattern is strictly superior — inline simulation risks context bleed between reviewers.

当无法使用子智能体时，通过显式重置上下文模拟隔离：

生成输出内容
新建上下文："你是审核员1。仅根据此评估标准进行评估。找出问题。"
逐字记录发现的问题
完全清除上下文
新建上下文："你是审核员2。仅根据此评估标准进行评估。找出问题。"
对比两次审核结果，修改内容，重复流程

子智能体模式严格更优——内联模拟存在审核智能体之间上下文泄漏的风险。

Pattern C: Batch Sampling

模式C：批量抽样

For large batches (100+ items), full Santa on every item is cost-prohibitive. Use stratified sampling:

Run Santa on a random sample (10-15% of batch, minimum 5 items)
Categorize failures by type (hallucination, compliance, completeness, etc.)
If systematic patterns emerge, apply targeted fixes to the entire batch
Re-sample and re-verify the fixed batch
Continue until a clean sample passes

python

import random

def santa_batch(items, rubric, sample_rate=0.15):
    sample = random.sample(items, max(5, int(len(items) * sample_rate)))

    for item in sample:
        result = santa_full(item, rubric)
        if result.verdict == "NAUGHTY":
            pattern = classify_failure(result.issues)
            items = batch_fix(items, pattern)  # Fix all items matching pattern
            return santa_batch(items, rubric)   # Re-sample

    return items  # Clean sample → ship batch

对于大型批量任务（100+项），对每个项目执行完整的Santa验证成本过高。可使用分层抽样：

对随机样本执行Santa验证（占批量的10-15%，最少5项）
按类型分类失效问题（幻觉、合规性、完整性等）
如果出现系统性问题模式，对整个批量应用针对性修复
重新抽样并验证修复后的批量
持续直到样本全部通过

python

import random

def santa_batch(items, rubric, sample_rate=0.15):
    sample = random.sample(items, max(5, int(len(items) * sample_rate)))

    for item in sample:
        result = santa_full(item, rubric)
        if result.verdict == "NAUGHTY":
            pattern = classify_failure(result.issues)
            items = batch_fix(items, pattern)  # Fix all items matching pattern
            return santa_batch(items, rubric)   # Re-sample

    return items  # Clean sample → ship batch

Failure Modes and Mitigations

失效模式与缓解措施

Failure Mode	Symptom	Mitigation
Infinite loop	Reviewers keep finding new issues after fixes	Max iteration cap (3). Escalate.
Rubber stamping	Both reviewers pass everything	Adversarial prompt: "Your job is to find problems, not approve."
Subjective drift	Reviewers flag style preferences, not errors	Tight rubric with objective pass/fail criteria only
Fix regression	Fixing issue A introduces issue B	Fresh reviewers each round catch regressions
Reviewer agreement bias	Both reviewers miss the same thing	Mitigated by independence, not eliminated. For critical output, add a third reviewer or human spot-check.
Cost explosion	Too many iterations on large outputs	Batch sampling pattern. Budget caps per verification cycle.

失效模式	症状	缓解措施
无限循环	修改后审核智能体持续发现新问题	设置最大迭代次数（3次）。升级至人工处理。
橡皮图章式审核	两个审核智能体通过所有内容	使用对抗式提示词："你的工作是找出问题，而非通过审核。"
主观偏差	审核智能体标记风格偏好而非错误	仅使用带有客观通过/不通过条件的严格评估标准
修复回归	修复问题A时引入问题B	每轮使用全新审核智能体，可发现回归问题
审核员共识偏差	两个审核智能体均遗漏同一问题	通过独立性缓解，但无法完全消除。对于关键输出，增加第三个审核智能体或人工抽查。
成本激增	大型输出的迭代次数过多	使用批量抽样模式。为每个验证周期设置预算上限。

Integration with Other Skills

与其他技能的集成

Skill	Relationship
Verification Loop	Use for deterministic checks (build, lint, test). Santa for semantic checks (accuracy, hallucinations). Run verification-loop first, Santa second.
Eval Harness	Santa Method results feed eval metrics. Track pass@k across Santa runs to measure generator quality over time.
Continuous Learning v2	Santa findings become instincts. Repeated failures on the same criterion → learned behavior to avoid the pattern.
Strategic Compact	Run Santa BEFORE compacting. Don't lose review context mid-verification.

技能	关系
Verification Loop	用于确定性检查（构建、代码检查、测试）。Santa Method用于语义检查（准确性、幻觉）。先运行Verification Loop，再运行Santa Method。
Eval Harness	Santa Method的结果可作为评估指标。跟踪Santa运行的pass@k指标，随时间衡量生成器质量。
Continuous Learning v2	Santa的发现可转化为直觉。同一评审项反复失效→学习避免该模式的行为。
Strategic Compact	在压缩前运行Santa Method。不要在验证过程中丢失审核上下文。

Metrics

指标

Track these to measure Santa Method effectiveness:

First-pass rate: % of outputs that pass Santa on round 1 (target: >70%)
Mean iterations to convergence: average rounds to NICE (target: <1.5)
Issue taxonomy: distribution of failure types (hallucination vs. completeness vs. compliance)
Reviewer agreement: % of issues flagged by both reviewers vs. only one (low agreement = rubric needs tightening)
Escape rate: issues found post-ship that Santa should have caught (target: 0)

跟踪以下指标以衡量Santa Method的有效性：

首次通过率：第一轮就通过Santa验证的输出占比（目标：>70%）
平均收敛迭代次数：达到合格状态的平均轮数（目标：<1.5）
问题分类：失效类型分布（幻觉 vs 完整性 vs 合规性）
审核员共识率：两个审核智能体均标记的问题占比 vs 仅单个审核员标记的问题占比（共识率低=评估标准需要收紧）
逃逸率：发布后发现的、本应被Santa Method发现的问题占比（目标：0）

Cost Analysis

成本分析

Santa Method costs approximately 2-3x the token cost of generation alone per verification cycle. For most high-stakes output, this is a bargain:

Cost of Santa = (generation tokens) + 2×(review tokens per round) × (avg rounds)
Cost of NOT Santa = (reputation damage) + (correction effort) + (trust erosion)

For batch operations, the sampling pattern reduces cost to ~15-20% of full verification while catching >90% of systematic issues.

每个验证周期，Santa Method的成本约为单独生成内容的2-3倍。对于大多数高风险输出而言，这是划算的：

Cost of Santa = (generation tokens) + 2×(review tokens per round) × (avg rounds)
Cost of NOT Santa = (reputation damage) + (correction effort) + (trust erosion)

对于批量操作，抽样模式可将成本降低至完整验证的15-20%，同时可发现>90%的系统性问题。