santa-method

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Santa Method

Santa Method

Multi-agent adversarial verification framework. Make a list, check it twice. If it's naughty, fix it until it's nice.
The core insight: a single agent reviewing its own output shares the same biases, knowledge gaps, and systematic errors that produced the output. Two independent reviewers with no shared context break this failure mode.
多智能体对抗验证框架。列清单,查两遍。如果存在问题,就修改直到符合要求。
核心思路:单个智能体审核自身输出时,会带有与生成该输出时相同的偏见、知识盲区和系统性错误。两个无共享上下文的独立审核智能体可以打破这种失效模式。

When to Activate

激活场景

Invoke this skill when:
  • Output will be published, deployed, or consumed by end users
  • Compliance, regulatory, or brand constraints must be enforced
  • Code ships to production without human review
  • Content accuracy matters (technical docs, educational material, customer-facing copy)
  • Batch generation at scale where spot-checking misses systemic patterns
  • Hallucination risk is elevated (claims, statistics, API references, legal language)
Do NOT use for internal drafts, exploratory research, or tasks with deterministic verification (use build/test/lint pipelines for those).
在以下场景中调用该机制:
  • 输出内容将发布、部署或供终端用户使用
  • 必须遵守合规、监管或品牌约束
  • 代码无需人工审核即可上线生产环境
  • 内容准确性至关重要(技术文档、教育材料、客户面向文案)
  • 大规模批量生成场景中,抽查无法发现系统性问题
  • 幻觉风险较高(声明、统计数据、API参考、法律语言)
请勿在内部草稿、探索性研究或可确定性验证的任务中使用此类场景请使用构建/测试/代码检查流水线。

Architecture

架构

┌─────────────┐
│  GENERATOR   │  Phase 1: Make a List
│  (Agent A)   │  Produce the deliverable
└──────┬───────┘
       │ output
┌──────────────────────────────┐
│     DUAL INDEPENDENT REVIEW   │  Phase 2: Check It Twice
│                                │
│  ┌───────────┐ ┌───────────┐  │  Two agents, same rubric,
│  │ Reviewer B │ │ Reviewer C │  │  no shared context
│  └─────┬─────┘ └─────┬─────┘  │
│        │              │        │
└────────┼──────────────┼────────┘
         │              │
         ▼              ▼
┌──────────────────────────────┐
│        VERDICT GATE           │  Phase 3: Naughty or Nice
│                                │
│  B passes AND C passes → NICE  │  Both must pass.
│  Otherwise → NAUGHTY           │  No exceptions.
└──────┬──────────────┬─────────┘
       │              │
    NICE           NAUGHTY
       │              │
       ▼              ▼
   [ SHIP ]    ┌─────────────┐
               │  FIX CYCLE   │  Phase 4: Fix Until Nice
               │              │
               │ iteration++  │  Collect all flags.
               │ if i > MAX:  │  Fix all issues.
               │   escalate   │  Re-run both reviewers.
               │ else:        │  Loop until convergence.
               │   goto Ph.2  │
               └──────────────┘
┌─────────────┐
│  GENERATOR   │  Phase 1: Make a List
│  (Agent A)   │  Produce the deliverable
└──────┬───────┘
       │ output
┌──────────────────────────────┐
│     DUAL INDEPENDENT REVIEW   │  Phase 2: Check It Twice
│                                │
│  ┌───────────┐ ┌───────────┐  │  Two agents, same rubric,
│  │ Reviewer B │ │ Reviewer C │  │  no shared context
│  └─────┬─────┘ └─────┬─────┘  │
│        │              │        │
└────────┼──────────────┼────────┘
         │              │
         ▼              ▼
┌──────────────────────────────┐
│        VERDICT GATE           │  Phase 3: Naughty or Nice
│                                │
│  B passes AND C passes → NICE  │  Both must pass.
│  Otherwise → NAUGHTY           │  No exceptions.
└──────┬──────────────┬─────────┘
       │              │
    NICE           NAUGHTY
       │              │
       ▼              ▼
   [ SHIP ]    ┌─────────────┐
               │  FIX CYCLE   │  Phase 4: Fix Until Nice
               │              │
               │ iteration++  │  Collect all flags.
               │ if i > MAX:  │  Fix all issues.
               │   escalate   │  Re-run both reviewers.
               │ else:        │  Loop until convergence.
               │   goto Ph.2  │
               └──────────────┘

Phase Details

阶段详情

Phase 1: Make a List (Generate)

阶段1:列清单(生成)

Execute the primary task. No changes to your normal generation workflow. Santa Method is a post-generation verification layer, not a generation strategy.
python
undefined
执行核心任务。无需更改常规生成工作流。Santa Method是生成后的验证层,而非生成策略。
python
undefined

The generator runs as normal

The generator runs as normal

output = generate(task_spec)
undefined
output = generate(task_spec)
undefined

Phase 2: Check It Twice (Independent Dual Review)

阶段2:查两遍(独立双重审核)

Spawn two review agents in parallel. Critical invariants:
  1. Context isolation — neither reviewer sees the other's assessment
  2. Identical rubric — both receive the same evaluation criteria
  3. Same inputs — both receive the original spec AND the generated output
  4. Structured output — each returns a typed verdict, not prose
python
REVIEWER_PROMPT = """
You are an independent quality reviewer. You have NOT seen any other review of this output.
并行生成两个审核智能体。关键约束:
  1. 上下文隔离 — 两个审核智能体均无法查看对方的评估结果
  2. 相同评估标准 — 两者使用完全一致的评审规则
  3. 相同输入 — 两者均接收原始任务规范和生成的输出内容
  4. 结构化输出 — 每个审核智能体返回标准化判定结果,而非散文式描述
python
REVIEWER_PROMPT = """
You are an independent quality reviewer. You have NOT seen any other review of this output.

Task Specification

Task Specification

{task_spec}
{task_spec}

Output Under Review

Output Under Review

{output}
{output}

Evaluation Rubric

Evaluation Rubric

{rubric}
{rubric}

Instructions

Instructions

Evaluate the output against EACH rubric criterion. For each:
  • PASS: criterion fully met, no issues
  • FAIL: specific issue found (cite the exact problem)
Return your assessment as structured JSON: { "verdict": "PASS" | "FAIL", "checks": [ {"criterion": "...", "result": "PASS|FAIL", "detail": "..."} ], "critical_issues": ["..."], // blockers that must be fixed "suggestions": ["..."] // non-blocking improvements }
Be rigorous. Your job is to find problems, not to approve. """

```python
Evaluate the output against EACH rubric criterion. For each:
  • PASS: criterion fully met, no issues
  • FAIL: specific issue found (cite the exact problem)
Return your assessment as structured JSON: { "verdict": "PASS" | "FAIL", "checks": [ {"criterion": "...", "result": "PASS|FAIL", "detail": "..."} ], "critical_issues": ["..."], // blockers that must be fixed "suggestions": ["..."] // non-blocking improvements }
Be rigorous. Your job is to find problems, not to approve. """

```python

Spawn reviewers in parallel (Claude Code subagents)

Spawn reviewers in parallel (Claude Code subagents)

review_b = Agent(prompt=REVIEWER_PROMPT.format(...), description="Santa Reviewer B") review_c = Agent(prompt=REVIEWER_PROMPT.format(...), description="Santa Reviewer C")
review_b = Agent(prompt=REVIEWER_PROMPT.format(...), description="Santa Reviewer B") review_c = Agent(prompt=REVIEWER_PROMPT.format(...), description="Santa Reviewer C")

Both run concurrently — neither sees the other

Both run concurrently — neither sees the other

undefined
undefined

Rubric Design

评估标准设计

The rubric is the most important input. Vague rubrics produce vague reviews. Every criterion must have an objective pass/fail condition.
CriterionPass ConditionFailure Signal
Factual accuracyAll claims verifiable against source material or common knowledgeInvented statistics, wrong version numbers, nonexistent APIs
Hallucination-freeNo fabricated entities, quotes, URLs, or referencesLinks to pages that don't exist, attributed quotes with no source
CompletenessEvery requirement in the spec is addressedMissing sections, skipped edge cases, incomplete coverage
CompliancePasses all project-specific constraintsBanned terms used, tone violations, regulatory non-compliance
Internal consistencyNo contradictions within the outputSection A says X, section B says not-X
Technical correctnessCode compiles/runs, algorithms are soundSyntax errors, logic bugs, wrong complexity claims
评估标准是最重要的输入。模糊的标准会导致模糊的审核结果。每个评审项必须具备客观的通过/不通过条件。
评审项通过条件失效信号
事实准确性所有声明均可通过源材料或常识验证虚构统计数据、错误版本号、不存在的API
无幻觉内容无虚构实体、引用、URL或参考资料指向不存在页面的链接、无来源的引用内容
内容完整性任务规范中的所有需求均已覆盖缺失章节、跳过边缘场景、覆盖不完整
合规性符合所有项目特定约束使用禁用术语、违反语气要求、不符合监管规定
内部一致性输出内容无自相矛盾A章节表述X,B章节表述非X
技术正确性代码可编译/运行、算法合理语法错误、逻辑漏洞、复杂度表述错误

Domain-Specific Rubric Extensions

特定领域评估标准扩展

Content/Marketing:
  • Brand voice adherence
  • SEO requirements met (keyword density, meta tags, structure)
  • No competitor trademark misuse
  • CTA present and correctly linked
Code:
  • Type safety (no
    any
    leaks, proper null handling)
  • Error handling coverage
  • Security (no secrets in code, input validation, injection prevention)
  • Test coverage for new paths
Compliance-Sensitive (regulated, legal, financial):
  • No outcome guarantees or unsubstantiated claims
  • Required disclaimers present
  • Approved terminology only
  • Jurisdiction-appropriate language
内容/营销领域:
  • 符合品牌语调
  • 满足SEO要求(关键词密度、元标签、结构)
  • 无竞争对手商标滥用
  • 包含正确链接的行动号召(CTA)
代码领域:
  • 类型安全(无
    any
    类型泄漏、正确的空值处理)
  • 错误处理覆盖
  • 安全性(代码中无密钥、输入验证、注入防护)
  • 新路径的测试覆盖
合规敏感领域(受监管、法律、金融):
  • 无结果保证或无依据声明
  • 包含必要的免责声明
  • 仅使用批准术语
  • 符合管辖区域的语言要求

Phase 3: Naughty or Nice (Verdict Gate)

阶段3:合格或不合格(判定闸门)

python
def santa_verdict(review_b, review_c):
    """Both reviewers must pass. No partial credit."""
    if review_b.verdict == "PASS" and review_c.verdict == "PASS":
        return "NICE"  # Ship it

    # Merge flags from both reviewers, deduplicate
    all_issues = dedupe(review_b.critical_issues + review_c.critical_issues)
    all_suggestions = dedupe(review_b.suggestions + review_c.suggestions)

    return "NAUGHTY", all_issues, all_suggestions
Why both must pass: if only one reviewer catches an issue, that issue is real. The other reviewer's blind spot is exactly the failure mode Santa Method exists to eliminate.
python
def santa_verdict(review_b, review_c):
    """Both reviewers must pass. No partial credit."""
    if review_b.verdict == "PASS" and review_c.verdict == "PASS":
        return "NICE"  # Ship it

    # Merge flags from both reviewers, deduplicate
    all_issues = dedupe(review_b.critical_issues + review_c.critical_issues)
    all_suggestions = dedupe(review_b.suggestions + review_c.suggestions)

    return "NAUGHTY", all_issues, all_suggestions
为什么必须两者都通过:如果只有一个审核智能体发现问题,那这个问题确实存在。另一个审核智能体的盲区正是Santa Method要解决的失效模式。

Phase 4: Fix Until Nice (Convergence Loop)

阶段4:修改至合格(收敛循环)

python
MAX_ITERATIONS = 3

for iteration in range(MAX_ITERATIONS):
    verdict, issues, suggestions = santa_verdict(review_b, review_c)

    if verdict == "NICE":
        log_santa_result(output, iteration, "passed")
        return ship(output)

    # Fix all critical issues (suggestions are optional)
    output = fix_agent.execute(
        output=output,
        issues=issues,
        instruction="Fix ONLY the flagged issues. Do not refactor or add unrequested changes."
    )

    # Re-run BOTH reviewers on fixed output (fresh agents, no memory of previous round)
    review_b = Agent(prompt=REVIEWER_PROMPT.format(output=output, ...))
    review_c = Agent(prompt=REVIEWER_PROMPT.format(output=output, ...))
python
MAX_ITERATIONS = 3

for iteration in range(MAX_ITERATIONS):
    verdict, issues, suggestions = santa_verdict(review_b, review_c)

    if verdict == "NICE":
        log_santa_result(output, iteration, "passed")
        return ship(output)

    # Fix all critical issues (suggestions are optional)
    output = fix_agent.execute(
        output=output,
        issues=issues,
        instruction="Fix ONLY the flagged issues. Do not refactor or add unrequested changes."
    )

    # Re-run BOTH reviewers on fixed output (fresh agents, no memory of previous round)
    review_b = Agent(prompt=REVIEWER_PROMPT.format(output=output, ...))
    review_c = Agent(prompt=REVIEWER_PROMPT.format(output=output, ...))

Exhausted iterations — escalate

Exhausted iterations — escalate

log_santa_result(output, MAX_ITERATIONS, "escalated") escalate_to_human(output, issues)

Critical: each review round uses **fresh agents**. Reviewers must not carry memory from previous rounds, as prior context creates anchoring bias.
log_santa_result(output, MAX_ITERATIONS, "escalated") escalate_to_human(output, issues)

关键注意点:每一轮审核都使用**全新的智能体**。审核智能体不能携带上一轮的记忆,因为之前的上下文会产生锚定偏见。

Implementation Patterns

实现模式

Pattern A: Claude Code Subagents (Recommended)

模式A:Claude Code子智能体(推荐)

Subagents provide true context isolation. Each reviewer is a separate process with no shared state.
bash
undefined
子智能体可提供真正的上下文隔离。每个审核智能体都是独立进程,无共享状态。
bash
undefined

In a Claude Code session, use the Agent tool to spawn reviewers

In a Claude Code session, use the Agent tool to spawn reviewers

Both agents run in parallel for speed

Both agents run in parallel for speed


```python

```python

Pseudocode for Agent tool invocation

Pseudocode for Agent tool invocation

reviewer_b = Agent( description="Santa Review B", prompt=f"Review this output for quality...\n\nRUBRIC:\n{rubric}\n\nOUTPUT:\n{output}" ) reviewer_c = Agent( description="Santa Review C", prompt=f"Review this output for quality...\n\nRUBRIC:\n{rubric}\n\nOUTPUT:\n{output}" )
undefined
reviewer_b = Agent( description="Santa Review B", prompt=f"Review this output for quality...\n\nRUBRIC:\n{rubric}\n\nOUTPUT:\n{output}" ) reviewer_c = Agent( description="Santa Review C", prompt=f"Review this output for quality...\n\nRUBRIC:\n{rubric}\n\nOUTPUT:\n{output}" )
undefined

Pattern B: Sequential Inline (Fallback)

模式B:顺序内联(备选方案)

When subagents aren't available, simulate isolation with explicit context resets:
  1. Generate output
  2. New context: "You are Reviewer 1. Evaluate ONLY against this rubric. Find problems."
  3. Record findings verbatim
  4. Clear context completely
  5. New context: "You are Reviewer 2. Evaluate ONLY against this rubric. Find problems."
  6. Compare both reviews, fix, repeat
The subagent pattern is strictly superior — inline simulation risks context bleed between reviewers.
当无法使用子智能体时,通过显式重置上下文模拟隔离:
  1. 生成输出内容
  2. 新建上下文:"你是审核员1。仅根据此评估标准进行评估。找出问题。"
  3. 逐字记录发现的问题
  4. 完全清除上下文
  5. 新建上下文:"你是审核员2。仅根据此评估标准进行评估。找出问题。"
  6. 对比两次审核结果,修改内容,重复流程
子智能体模式严格更优——内联模拟存在审核智能体之间上下文泄漏的风险。

Pattern C: Batch Sampling

模式C:批量抽样

For large batches (100+ items), full Santa on every item is cost-prohibitive. Use stratified sampling:
  1. Run Santa on a random sample (10-15% of batch, minimum 5 items)
  2. Categorize failures by type (hallucination, compliance, completeness, etc.)
  3. If systematic patterns emerge, apply targeted fixes to the entire batch
  4. Re-sample and re-verify the fixed batch
  5. Continue until a clean sample passes
python
import random

def santa_batch(items, rubric, sample_rate=0.15):
    sample = random.sample(items, max(5, int(len(items) * sample_rate)))

    for item in sample:
        result = santa_full(item, rubric)
        if result.verdict == "NAUGHTY":
            pattern = classify_failure(result.issues)
            items = batch_fix(items, pattern)  # Fix all items matching pattern
            return santa_batch(items, rubric)   # Re-sample

    return items  # Clean sample → ship batch
对于大型批量任务(100+项),对每个项目执行完整的Santa验证成本过高。可使用分层抽样:
  1. 对随机样本执行Santa验证(占批量的10-15%,最少5项)
  2. 按类型分类失效问题(幻觉、合规性、完整性等)
  3. 如果出现系统性问题模式,对整个批量应用针对性修复
  4. 重新抽样并验证修复后的批量
  5. 持续直到样本全部通过
python
import random

def santa_batch(items, rubric, sample_rate=0.15):
    sample = random.sample(items, max(5, int(len(items) * sample_rate)))

    for item in sample:
        result = santa_full(item, rubric)
        if result.verdict == "NAUGHTY":
            pattern = classify_failure(result.issues)
            items = batch_fix(items, pattern)  # Fix all items matching pattern
            return santa_batch(items, rubric)   # Re-sample

    return items  # Clean sample → ship batch

Failure Modes and Mitigations

失效模式与缓解措施

Failure ModeSymptomMitigation
Infinite loopReviewers keep finding new issues after fixesMax iteration cap (3). Escalate.
Rubber stampingBoth reviewers pass everythingAdversarial prompt: "Your job is to find problems, not approve."
Subjective driftReviewers flag style preferences, not errorsTight rubric with objective pass/fail criteria only
Fix regressionFixing issue A introduces issue BFresh reviewers each round catch regressions
Reviewer agreement biasBoth reviewers miss the same thingMitigated by independence, not eliminated. For critical output, add a third reviewer or human spot-check.
Cost explosionToo many iterations on large outputsBatch sampling pattern. Budget caps per verification cycle.
失效模式症状缓解措施
无限循环修改后审核智能体持续发现新问题设置最大迭代次数(3次)。升级至人工处理。
橡皮图章式审核两个审核智能体通过所有内容使用对抗式提示词:"你的工作是找出问题,而非通过审核。"
主观偏差审核智能体标记风格偏好而非错误仅使用带有客观通过/不通过条件的严格评估标准
修复回归修复问题A时引入问题B每轮使用全新审核智能体,可发现回归问题
审核员共识偏差两个审核智能体均遗漏同一问题通过独立性缓解,但无法完全消除。对于关键输出,增加第三个审核智能体或人工抽查。
成本激增大型输出的迭代次数过多使用批量抽样模式。为每个验证周期设置预算上限。

Integration with Other Skills

与其他技能的集成

SkillRelationship
Verification LoopUse for deterministic checks (build, lint, test). Santa for semantic checks (accuracy, hallucinations). Run verification-loop first, Santa second.
Eval HarnessSanta Method results feed eval metrics. Track pass@k across Santa runs to measure generator quality over time.
Continuous Learning v2Santa findings become instincts. Repeated failures on the same criterion → learned behavior to avoid the pattern.
Strategic CompactRun Santa BEFORE compacting. Don't lose review context mid-verification.
技能关系
Verification Loop用于确定性检查(构建、代码检查、测试)。Santa Method用于语义检查(准确性、幻觉)。先运行Verification Loop,再运行Santa Method。
Eval HarnessSanta Method的结果可作为评估指标。跟踪Santa运行的pass@k指标,随时间衡量生成器质量。
Continuous Learning v2Santa的发现可转化为直觉。同一评审项反复失效→学习避免该模式的行为。
Strategic Compact在压缩前运行Santa Method。不要在验证过程中丢失审核上下文。

Metrics

指标

Track these to measure Santa Method effectiveness:
  • First-pass rate: % of outputs that pass Santa on round 1 (target: >70%)
  • Mean iterations to convergence: average rounds to NICE (target: <1.5)
  • Issue taxonomy: distribution of failure types (hallucination vs. completeness vs. compliance)
  • Reviewer agreement: % of issues flagged by both reviewers vs. only one (low agreement = rubric needs tightening)
  • Escape rate: issues found post-ship that Santa should have caught (target: 0)
跟踪以下指标以衡量Santa Method的有效性:
  • 首次通过率:第一轮就通过Santa验证的输出占比(目标:>70%)
  • 平均收敛迭代次数:达到合格状态的平均轮数(目标:<1.5)
  • 问题分类:失效类型分布(幻觉 vs 完整性 vs 合规性)
  • 审核员共识率:两个审核智能体均标记的问题占比 vs 仅单个审核员标记的问题占比(共识率低=评估标准需要收紧)
  • 逃逸率:发布后发现的、本应被Santa Method发现的问题占比(目标:0)

Cost Analysis

成本分析

Santa Method costs approximately 2-3x the token cost of generation alone per verification cycle. For most high-stakes output, this is a bargain:
Cost of Santa = (generation tokens) + 2×(review tokens per round) × (avg rounds)
Cost of NOT Santa = (reputation damage) + (correction effort) + (trust erosion)
For batch operations, the sampling pattern reduces cost to ~15-20% of full verification while catching >90% of systematic issues.
每个验证周期,Santa Method的成本约为单独生成内容的2-3倍。对于大多数高风险输出而言,这是划算的:
Cost of Santa = (generation tokens) + 2×(review tokens per round) × (avg rounds)
Cost of NOT Santa = (reputation damage) + (correction effort) + (trust erosion)
对于批量操作,抽样模式可将成本降低至完整验证的15-20%,同时可发现>90%的系统性问题。