santa-method
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSanta Method
Santa Method
Multi-agent adversarial verification framework. Make a list, check it twice. If it's naughty, fix it until it's nice.
The core insight: a single agent reviewing its own output shares the same biases, knowledge gaps, and systematic errors that produced the output. Two independent reviewers with no shared context break this failure mode.
多智能体对抗验证框架。列清单,查两遍。如果存在问题,就修改直到符合要求。
核心思路:单个智能体审核自身输出时,会带有与生成该输出时相同的偏见、知识盲区和系统性错误。两个无共享上下文的独立审核智能体可以打破这种失效模式。
When to Activate
激活场景
Invoke this skill when:
- Output will be published, deployed, or consumed by end users
- Compliance, regulatory, or brand constraints must be enforced
- Code ships to production without human review
- Content accuracy matters (technical docs, educational material, customer-facing copy)
- Batch generation at scale where spot-checking misses systemic patterns
- Hallucination risk is elevated (claims, statistics, API references, legal language)
Do NOT use for internal drafts, exploratory research, or tasks with deterministic verification (use build/test/lint pipelines for those).
在以下场景中调用该机制:
- 输出内容将发布、部署或供终端用户使用
- 必须遵守合规、监管或品牌约束
- 代码无需人工审核即可上线生产环境
- 内容准确性至关重要(技术文档、教育材料、客户面向文案)
- 大规模批量生成场景中,抽查无法发现系统性问题
- 幻觉风险较高(声明、统计数据、API参考、法律语言)
请勿在内部草稿、探索性研究或可确定性验证的任务中使用此类场景请使用构建/测试/代码检查流水线。
Architecture
架构
┌─────────────┐
│ GENERATOR │ Phase 1: Make a List
│ (Agent A) │ Produce the deliverable
└──────┬───────┘
│ output
▼
┌──────────────────────────────┐
│ DUAL INDEPENDENT REVIEW │ Phase 2: Check It Twice
│ │
│ ┌───────────┐ ┌───────────┐ │ Two agents, same rubric,
│ │ Reviewer B │ │ Reviewer C │ │ no shared context
│ └─────┬─────┘ └─────┬─────┘ │
│ │ │ │
└────────┼──────────────┼────────┘
│ │
▼ ▼
┌──────────────────────────────┐
│ VERDICT GATE │ Phase 3: Naughty or Nice
│ │
│ B passes AND C passes → NICE │ Both must pass.
│ Otherwise → NAUGHTY │ No exceptions.
└──────┬──────────────┬─────────┘
│ │
NICE NAUGHTY
│ │
▼ ▼
[ SHIP ] ┌─────────────┐
│ FIX CYCLE │ Phase 4: Fix Until Nice
│ │
│ iteration++ │ Collect all flags.
│ if i > MAX: │ Fix all issues.
│ escalate │ Re-run both reviewers.
│ else: │ Loop until convergence.
│ goto Ph.2 │
└──────────────┘┌─────────────┐
│ GENERATOR │ Phase 1: Make a List
│ (Agent A) │ Produce the deliverable
└──────┬───────┘
│ output
▼
┌──────────────────────────────┐
│ DUAL INDEPENDENT REVIEW │ Phase 2: Check It Twice
│ │
│ ┌───────────┐ ┌───────────┐ │ Two agents, same rubric,
│ │ Reviewer B │ │ Reviewer C │ │ no shared context
│ └─────┬─────┘ └─────┬─────┘ │
│ │ │ │
└────────┼──────────────┼────────┘
│ │
▼ ▼
┌──────────────────────────────┐
│ VERDICT GATE │ Phase 3: Naughty or Nice
│ │
│ B passes AND C passes → NICE │ Both must pass.
│ Otherwise → NAUGHTY │ No exceptions.
└──────┬──────────────┬─────────┘
│ │
NICE NAUGHTY
│ │
▼ ▼
[ SHIP ] ┌─────────────┐
│ FIX CYCLE │ Phase 4: Fix Until Nice
│ │
│ iteration++ │ Collect all flags.
│ if i > MAX: │ Fix all issues.
│ escalate │ Re-run both reviewers.
│ else: │ Loop until convergence.
│ goto Ph.2 │
└──────────────┘Phase Details
阶段详情
Phase 1: Make a List (Generate)
阶段1:列清单(生成)
Execute the primary task. No changes to your normal generation workflow. Santa Method is a post-generation verification layer, not a generation strategy.
python
undefined执行核心任务。无需更改常规生成工作流。Santa Method是生成后的验证层,而非生成策略。
python
undefinedThe generator runs as normal
The generator runs as normal
output = generate(task_spec)
undefinedoutput = generate(task_spec)
undefinedPhase 2: Check It Twice (Independent Dual Review)
阶段2:查两遍(独立双重审核)
Spawn two review agents in parallel. Critical invariants:
- Context isolation — neither reviewer sees the other's assessment
- Identical rubric — both receive the same evaluation criteria
- Same inputs — both receive the original spec AND the generated output
- Structured output — each returns a typed verdict, not prose
python
REVIEWER_PROMPT = """
You are an independent quality reviewer. You have NOT seen any other review of this output.并行生成两个审核智能体。关键约束:
- 上下文隔离 — 两个审核智能体均无法查看对方的评估结果
- 相同评估标准 — 两者使用完全一致的评审规则
- 相同输入 — 两者均接收原始任务规范和生成的输出内容
- 结构化输出 — 每个审核智能体返回标准化判定结果,而非散文式描述
python
REVIEWER_PROMPT = """
You are an independent quality reviewer. You have NOT seen any other review of this output.Task Specification
Task Specification
{task_spec}
{task_spec}
Output Under Review
Output Under Review
{output}
{output}
Evaluation Rubric
Evaluation Rubric
{rubric}
{rubric}
Instructions
Instructions
Evaluate the output against EACH rubric criterion. For each:
- PASS: criterion fully met, no issues
- FAIL: specific issue found (cite the exact problem)
Return your assessment as structured JSON:
{
"verdict": "PASS" | "FAIL",
"checks": [
{"criterion": "...", "result": "PASS|FAIL", "detail": "..."}
],
"critical_issues": ["..."], // blockers that must be fixed
"suggestions": ["..."] // non-blocking improvements
}
Be rigorous. Your job is to find problems, not to approve.
"""
```pythonEvaluate the output against EACH rubric criterion. For each:
- PASS: criterion fully met, no issues
- FAIL: specific issue found (cite the exact problem)
Return your assessment as structured JSON:
{
"verdict": "PASS" | "FAIL",
"checks": [
{"criterion": "...", "result": "PASS|FAIL", "detail": "..."}
],
"critical_issues": ["..."], // blockers that must be fixed
"suggestions": ["..."] // non-blocking improvements
}
Be rigorous. Your job is to find problems, not to approve.
"""
```pythonSpawn reviewers in parallel (Claude Code subagents)
Spawn reviewers in parallel (Claude Code subagents)
review_b = Agent(prompt=REVIEWER_PROMPT.format(...), description="Santa Reviewer B")
review_c = Agent(prompt=REVIEWER_PROMPT.format(...), description="Santa Reviewer C")
review_b = Agent(prompt=REVIEWER_PROMPT.format(...), description="Santa Reviewer B")
review_c = Agent(prompt=REVIEWER_PROMPT.format(...), description="Santa Reviewer C")
Both run concurrently — neither sees the other
Both run concurrently — neither sees the other
undefinedundefinedRubric Design
评估标准设计
The rubric is the most important input. Vague rubrics produce vague reviews. Every criterion must have an objective pass/fail condition.
| Criterion | Pass Condition | Failure Signal |
|---|---|---|
| Factual accuracy | All claims verifiable against source material or common knowledge | Invented statistics, wrong version numbers, nonexistent APIs |
| Hallucination-free | No fabricated entities, quotes, URLs, or references | Links to pages that don't exist, attributed quotes with no source |
| Completeness | Every requirement in the spec is addressed | Missing sections, skipped edge cases, incomplete coverage |
| Compliance | Passes all project-specific constraints | Banned terms used, tone violations, regulatory non-compliance |
| Internal consistency | No contradictions within the output | Section A says X, section B says not-X |
| Technical correctness | Code compiles/runs, algorithms are sound | Syntax errors, logic bugs, wrong complexity claims |
评估标准是最重要的输入。模糊的标准会导致模糊的审核结果。每个评审项必须具备客观的通过/不通过条件。
| 评审项 | 通过条件 | 失效信号 |
|---|---|---|
| 事实准确性 | 所有声明均可通过源材料或常识验证 | 虚构统计数据、错误版本号、不存在的API |
| 无幻觉内容 | 无虚构实体、引用、URL或参考资料 | 指向不存在页面的链接、无来源的引用内容 |
| 内容完整性 | 任务规范中的所有需求均已覆盖 | 缺失章节、跳过边缘场景、覆盖不完整 |
| 合规性 | 符合所有项目特定约束 | 使用禁用术语、违反语气要求、不符合监管规定 |
| 内部一致性 | 输出内容无自相矛盾 | A章节表述X,B章节表述非X |
| 技术正确性 | 代码可编译/运行、算法合理 | 语法错误、逻辑漏洞、复杂度表述错误 |
Domain-Specific Rubric Extensions
特定领域评估标准扩展
Content/Marketing:
- Brand voice adherence
- SEO requirements met (keyword density, meta tags, structure)
- No competitor trademark misuse
- CTA present and correctly linked
Code:
- Type safety (no leaks, proper null handling)
any - Error handling coverage
- Security (no secrets in code, input validation, injection prevention)
- Test coverage for new paths
Compliance-Sensitive (regulated, legal, financial):
- No outcome guarantees or unsubstantiated claims
- Required disclaimers present
- Approved terminology only
- Jurisdiction-appropriate language
内容/营销领域:
- 符合品牌语调
- 满足SEO要求(关键词密度、元标签、结构)
- 无竞争对手商标滥用
- 包含正确链接的行动号召(CTA)
代码领域:
- 类型安全(无类型泄漏、正确的空值处理)
any - 错误处理覆盖
- 安全性(代码中无密钥、输入验证、注入防护)
- 新路径的测试覆盖
合规敏感领域(受监管、法律、金融):
- 无结果保证或无依据声明
- 包含必要的免责声明
- 仅使用批准术语
- 符合管辖区域的语言要求
Phase 3: Naughty or Nice (Verdict Gate)
阶段3:合格或不合格(判定闸门)
python
def santa_verdict(review_b, review_c):
"""Both reviewers must pass. No partial credit."""
if review_b.verdict == "PASS" and review_c.verdict == "PASS":
return "NICE" # Ship it
# Merge flags from both reviewers, deduplicate
all_issues = dedupe(review_b.critical_issues + review_c.critical_issues)
all_suggestions = dedupe(review_b.suggestions + review_c.suggestions)
return "NAUGHTY", all_issues, all_suggestionsWhy both must pass: if only one reviewer catches an issue, that issue is real. The other reviewer's blind spot is exactly the failure mode Santa Method exists to eliminate.
python
def santa_verdict(review_b, review_c):
"""Both reviewers must pass. No partial credit."""
if review_b.verdict == "PASS" and review_c.verdict == "PASS":
return "NICE" # Ship it
# Merge flags from both reviewers, deduplicate
all_issues = dedupe(review_b.critical_issues + review_c.critical_issues)
all_suggestions = dedupe(review_b.suggestions + review_c.suggestions)
return "NAUGHTY", all_issues, all_suggestions为什么必须两者都通过:如果只有一个审核智能体发现问题,那这个问题确实存在。另一个审核智能体的盲区正是Santa Method要解决的失效模式。
Phase 4: Fix Until Nice (Convergence Loop)
阶段4:修改至合格(收敛循环)
python
MAX_ITERATIONS = 3
for iteration in range(MAX_ITERATIONS):
verdict, issues, suggestions = santa_verdict(review_b, review_c)
if verdict == "NICE":
log_santa_result(output, iteration, "passed")
return ship(output)
# Fix all critical issues (suggestions are optional)
output = fix_agent.execute(
output=output,
issues=issues,
instruction="Fix ONLY the flagged issues. Do not refactor or add unrequested changes."
)
# Re-run BOTH reviewers on fixed output (fresh agents, no memory of previous round)
review_b = Agent(prompt=REVIEWER_PROMPT.format(output=output, ...))
review_c = Agent(prompt=REVIEWER_PROMPT.format(output=output, ...))python
MAX_ITERATIONS = 3
for iteration in range(MAX_ITERATIONS):
verdict, issues, suggestions = santa_verdict(review_b, review_c)
if verdict == "NICE":
log_santa_result(output, iteration, "passed")
return ship(output)
# Fix all critical issues (suggestions are optional)
output = fix_agent.execute(
output=output,
issues=issues,
instruction="Fix ONLY the flagged issues. Do not refactor or add unrequested changes."
)
# Re-run BOTH reviewers on fixed output (fresh agents, no memory of previous round)
review_b = Agent(prompt=REVIEWER_PROMPT.format(output=output, ...))
review_c = Agent(prompt=REVIEWER_PROMPT.format(output=output, ...))Exhausted iterations — escalate
Exhausted iterations — escalate
log_santa_result(output, MAX_ITERATIONS, "escalated")
escalate_to_human(output, issues)
Critical: each review round uses **fresh agents**. Reviewers must not carry memory from previous rounds, as prior context creates anchoring bias.log_santa_result(output, MAX_ITERATIONS, "escalated")
escalate_to_human(output, issues)
关键注意点:每一轮审核都使用**全新的智能体**。审核智能体不能携带上一轮的记忆,因为之前的上下文会产生锚定偏见。Implementation Patterns
实现模式
Pattern A: Claude Code Subagents (Recommended)
模式A:Claude Code子智能体(推荐)
Subagents provide true context isolation. Each reviewer is a separate process with no shared state.
bash
undefined子智能体可提供真正的上下文隔离。每个审核智能体都是独立进程,无共享状态。
bash
undefinedIn a Claude Code session, use the Agent tool to spawn reviewers
In a Claude Code session, use the Agent tool to spawn reviewers
Both agents run in parallel for speed
Both agents run in parallel for speed
```python
```pythonPseudocode for Agent tool invocation
Pseudocode for Agent tool invocation
reviewer_b = Agent(
description="Santa Review B",
prompt=f"Review this output for quality...\n\nRUBRIC:\n{rubric}\n\nOUTPUT:\n{output}"
)
reviewer_c = Agent(
description="Santa Review C",
prompt=f"Review this output for quality...\n\nRUBRIC:\n{rubric}\n\nOUTPUT:\n{output}"
)
undefinedreviewer_b = Agent(
description="Santa Review B",
prompt=f"Review this output for quality...\n\nRUBRIC:\n{rubric}\n\nOUTPUT:\n{output}"
)
reviewer_c = Agent(
description="Santa Review C",
prompt=f"Review this output for quality...\n\nRUBRIC:\n{rubric}\n\nOUTPUT:\n{output}"
)
undefinedPattern B: Sequential Inline (Fallback)
模式B:顺序内联(备选方案)
When subagents aren't available, simulate isolation with explicit context resets:
- Generate output
- New context: "You are Reviewer 1. Evaluate ONLY against this rubric. Find problems."
- Record findings verbatim
- Clear context completely
- New context: "You are Reviewer 2. Evaluate ONLY against this rubric. Find problems."
- Compare both reviews, fix, repeat
The subagent pattern is strictly superior — inline simulation risks context bleed between reviewers.
当无法使用子智能体时,通过显式重置上下文模拟隔离:
- 生成输出内容
- 新建上下文:"你是审核员1。仅根据此评估标准进行评估。找出问题。"
- 逐字记录发现的问题
- 完全清除上下文
- 新建上下文:"你是审核员2。仅根据此评估标准进行评估。找出问题。"
- 对比两次审核结果,修改内容,重复流程
子智能体模式严格更优——内联模拟存在审核智能体之间上下文泄漏的风险。
Pattern C: Batch Sampling
模式C:批量抽样
For large batches (100+ items), full Santa on every item is cost-prohibitive. Use stratified sampling:
- Run Santa on a random sample (10-15% of batch, minimum 5 items)
- Categorize failures by type (hallucination, compliance, completeness, etc.)
- If systematic patterns emerge, apply targeted fixes to the entire batch
- Re-sample and re-verify the fixed batch
- Continue until a clean sample passes
python
import random
def santa_batch(items, rubric, sample_rate=0.15):
sample = random.sample(items, max(5, int(len(items) * sample_rate)))
for item in sample:
result = santa_full(item, rubric)
if result.verdict == "NAUGHTY":
pattern = classify_failure(result.issues)
items = batch_fix(items, pattern) # Fix all items matching pattern
return santa_batch(items, rubric) # Re-sample
return items # Clean sample → ship batch对于大型批量任务(100+项),对每个项目执行完整的Santa验证成本过高。可使用分层抽样:
- 对随机样本执行Santa验证(占批量的10-15%,最少5项)
- 按类型分类失效问题(幻觉、合规性、完整性等)
- 如果出现系统性问题模式,对整个批量应用针对性修复
- 重新抽样并验证修复后的批量
- 持续直到样本全部通过
python
import random
def santa_batch(items, rubric, sample_rate=0.15):
sample = random.sample(items, max(5, int(len(items) * sample_rate)))
for item in sample:
result = santa_full(item, rubric)
if result.verdict == "NAUGHTY":
pattern = classify_failure(result.issues)
items = batch_fix(items, pattern) # Fix all items matching pattern
return santa_batch(items, rubric) # Re-sample
return items # Clean sample → ship batchFailure Modes and Mitigations
失效模式与缓解措施
| Failure Mode | Symptom | Mitigation |
|---|---|---|
| Infinite loop | Reviewers keep finding new issues after fixes | Max iteration cap (3). Escalate. |
| Rubber stamping | Both reviewers pass everything | Adversarial prompt: "Your job is to find problems, not approve." |
| Subjective drift | Reviewers flag style preferences, not errors | Tight rubric with objective pass/fail criteria only |
| Fix regression | Fixing issue A introduces issue B | Fresh reviewers each round catch regressions |
| Reviewer agreement bias | Both reviewers miss the same thing | Mitigated by independence, not eliminated. For critical output, add a third reviewer or human spot-check. |
| Cost explosion | Too many iterations on large outputs | Batch sampling pattern. Budget caps per verification cycle. |
| 失效模式 | 症状 | 缓解措施 |
|---|---|---|
| 无限循环 | 修改后审核智能体持续发现新问题 | 设置最大迭代次数(3次)。升级至人工处理。 |
| 橡皮图章式审核 | 两个审核智能体通过所有内容 | 使用对抗式提示词:"你的工作是找出问题,而非通过审核。" |
| 主观偏差 | 审核智能体标记风格偏好而非错误 | 仅使用带有客观通过/不通过条件的严格评估标准 |
| 修复回归 | 修复问题A时引入问题B | 每轮使用全新审核智能体,可发现回归问题 |
| 审核员共识偏差 | 两个审核智能体均遗漏同一问题 | 通过独立性缓解,但无法完全消除。对于关键输出,增加第三个审核智能体或人工抽查。 |
| 成本激增 | 大型输出的迭代次数过多 | 使用批量抽样模式。为每个验证周期设置预算上限。 |
Integration with Other Skills
与其他技能的集成
| Skill | Relationship |
|---|---|
| Verification Loop | Use for deterministic checks (build, lint, test). Santa for semantic checks (accuracy, hallucinations). Run verification-loop first, Santa second. |
| Eval Harness | Santa Method results feed eval metrics. Track pass@k across Santa runs to measure generator quality over time. |
| Continuous Learning v2 | Santa findings become instincts. Repeated failures on the same criterion → learned behavior to avoid the pattern. |
| Strategic Compact | Run Santa BEFORE compacting. Don't lose review context mid-verification. |
| 技能 | 关系 |
|---|---|
| Verification Loop | 用于确定性检查(构建、代码检查、测试)。Santa Method用于语义检查(准确性、幻觉)。先运行Verification Loop,再运行Santa Method。 |
| Eval Harness | Santa Method的结果可作为评估指标。跟踪Santa运行的pass@k指标,随时间衡量生成器质量。 |
| Continuous Learning v2 | Santa的发现可转化为直觉。同一评审项反复失效→学习避免该模式的行为。 |
| Strategic Compact | 在压缩前运行Santa Method。不要在验证过程中丢失审核上下文。 |
Metrics
指标
Track these to measure Santa Method effectiveness:
- First-pass rate: % of outputs that pass Santa on round 1 (target: >70%)
- Mean iterations to convergence: average rounds to NICE (target: <1.5)
- Issue taxonomy: distribution of failure types (hallucination vs. completeness vs. compliance)
- Reviewer agreement: % of issues flagged by both reviewers vs. only one (low agreement = rubric needs tightening)
- Escape rate: issues found post-ship that Santa should have caught (target: 0)
跟踪以下指标以衡量Santa Method的有效性:
- 首次通过率:第一轮就通过Santa验证的输出占比(目标:>70%)
- 平均收敛迭代次数:达到合格状态的平均轮数(目标:<1.5)
- 问题分类:失效类型分布(幻觉 vs 完整性 vs 合规性)
- 审核员共识率:两个审核智能体均标记的问题占比 vs 仅单个审核员标记的问题占比(共识率低=评估标准需要收紧)
- 逃逸率:发布后发现的、本应被Santa Method发现的问题占比(目标:0)
Cost Analysis
成本分析
Santa Method costs approximately 2-3x the token cost of generation alone per verification cycle. For most high-stakes output, this is a bargain:
Cost of Santa = (generation tokens) + 2×(review tokens per round) × (avg rounds)
Cost of NOT Santa = (reputation damage) + (correction effort) + (trust erosion)For batch operations, the sampling pattern reduces cost to ~15-20% of full verification while catching >90% of systematic issues.
每个验证周期,Santa Method的成本约为单独生成内容的2-3倍。对于大多数高风险输出而言,这是划算的:
Cost of Santa = (generation tokens) + 2×(review tokens per round) × (avg rounds)
Cost of NOT Santa = (reputation damage) + (correction effort) + (trust erosion)对于批量操作,抽样模式可将成本降低至完整验证的15-20%,同时可发现>90%的系统性问题。