Loading...
Loading...
Multi-agent adversarial verification with convergence loop. Two independent review agents must both pass before output ships.
npx skill4agent add affaan-m/everything-claude-code santa-method┌─────────────┐
│ GENERATOR │ Phase 1: Make a List
│ (Agent A) │ Produce the deliverable
└──────┬───────┘
│ output
▼
┌──────────────────────────────┐
│ DUAL INDEPENDENT REVIEW │ Phase 2: Check It Twice
│ │
│ ┌───────────┐ ┌───────────┐ │ Two agents, same rubric,
│ │ Reviewer B │ │ Reviewer C │ │ no shared context
│ └─────┬─────┘ └─────┬─────┘ │
│ │ │ │
└────────┼──────────────┼────────┘
│ │
▼ ▼
┌──────────────────────────────┐
│ VERDICT GATE │ Phase 3: Naughty or Nice
│ │
│ B passes AND C passes → NICE │ Both must pass.
│ Otherwise → NAUGHTY │ No exceptions.
└──────┬──────────────┬─────────┘
│ │
NICE NAUGHTY
│ │
▼ ▼
[ SHIP ] ┌─────────────┐
│ FIX CYCLE │ Phase 4: Fix Until Nice
│ │
│ iteration++ │ Collect all flags.
│ if i > MAX: │ Fix all issues.
│ escalate │ Re-run both reviewers.
│ else: │ Loop until convergence.
│ goto Ph.2 │
└──────────────┘# The generator runs as normal
output = generate(task_spec)REVIEWER_PROMPT = """
You are an independent quality reviewer. You have NOT seen any other review of this output.
## Task Specification
{task_spec}
## Output Under Review
{output}
## Evaluation Rubric
{rubric}
## Instructions
Evaluate the output against EACH rubric criterion. For each:
- PASS: criterion fully met, no issues
- FAIL: specific issue found (cite the exact problem)
Return your assessment as structured JSON:
{
"verdict": "PASS" | "FAIL",
"checks": [
{"criterion": "...", "result": "PASS|FAIL", "detail": "..."}
],
"critical_issues": ["..."], // blockers that must be fixed
"suggestions": ["..."] // non-blocking improvements
}
Be rigorous. Your job is to find problems, not to approve.
"""# Spawn reviewers in parallel (Claude Code subagents)
review_b = Agent(prompt=REVIEWER_PROMPT.format(...), description="Santa Reviewer B")
review_c = Agent(prompt=REVIEWER_PROMPT.format(...), description="Santa Reviewer C")
# Both run concurrently — neither sees the other| Criterion | Pass Condition | Failure Signal |
|---|---|---|
| Factual accuracy | All claims verifiable against source material or common knowledge | Invented statistics, wrong version numbers, nonexistent APIs |
| Hallucination-free | No fabricated entities, quotes, URLs, or references | Links to pages that don't exist, attributed quotes with no source |
| Completeness | Every requirement in the spec is addressed | Missing sections, skipped edge cases, incomplete coverage |
| Compliance | Passes all project-specific constraints | Banned terms used, tone violations, regulatory non-compliance |
| Internal consistency | No contradictions within the output | Section A says X, section B says not-X |
| Technical correctness | Code compiles/runs, algorithms are sound | Syntax errors, logic bugs, wrong complexity claims |
anydef santa_verdict(review_b, review_c):
"""Both reviewers must pass. No partial credit."""
if review_b.verdict == "PASS" and review_c.verdict == "PASS":
return "NICE" # Ship it
# Merge flags from both reviewers, deduplicate
all_issues = dedupe(review_b.critical_issues + review_c.critical_issues)
all_suggestions = dedupe(review_b.suggestions + review_c.suggestions)
return "NAUGHTY", all_issues, all_suggestionsMAX_ITERATIONS = 3
for iteration in range(MAX_ITERATIONS):
verdict, issues, suggestions = santa_verdict(review_b, review_c)
if verdict == "NICE":
log_santa_result(output, iteration, "passed")
return ship(output)
# Fix all critical issues (suggestions are optional)
output = fix_agent.execute(
output=output,
issues=issues,
instruction="Fix ONLY the flagged issues. Do not refactor or add unrequested changes."
)
# Re-run BOTH reviewers on fixed output (fresh agents, no memory of previous round)
review_b = Agent(prompt=REVIEWER_PROMPT.format(output=output, ...))
review_c = Agent(prompt=REVIEWER_PROMPT.format(output=output, ...))
# Exhausted iterations — escalate
log_santa_result(output, MAX_ITERATIONS, "escalated")
escalate_to_human(output, issues)# In a Claude Code session, use the Agent tool to spawn reviewers
# Both agents run in parallel for speed# Pseudocode for Agent tool invocation
reviewer_b = Agent(
description="Santa Review B",
prompt=f"Review this output for quality...\n\nRUBRIC:\n{rubric}\n\nOUTPUT:\n{output}"
)
reviewer_c = Agent(
description="Santa Review C",
prompt=f"Review this output for quality...\n\nRUBRIC:\n{rubric}\n\nOUTPUT:\n{output}"
)import random
def santa_batch(items, rubric, sample_rate=0.15):
sample = random.sample(items, max(5, int(len(items) * sample_rate)))
for item in sample:
result = santa_full(item, rubric)
if result.verdict == "NAUGHTY":
pattern = classify_failure(result.issues)
items = batch_fix(items, pattern) # Fix all items matching pattern
return santa_batch(items, rubric) # Re-sample
return items # Clean sample → ship batch| Failure Mode | Symptom | Mitigation |
|---|---|---|
| Infinite loop | Reviewers keep finding new issues after fixes | Max iteration cap (3). Escalate. |
| Rubber stamping | Both reviewers pass everything | Adversarial prompt: "Your job is to find problems, not approve." |
| Subjective drift | Reviewers flag style preferences, not errors | Tight rubric with objective pass/fail criteria only |
| Fix regression | Fixing issue A introduces issue B | Fresh reviewers each round catch regressions |
| Reviewer agreement bias | Both reviewers miss the same thing | Mitigated by independence, not eliminated. For critical output, add a third reviewer or human spot-check. |
| Cost explosion | Too many iterations on large outputs | Batch sampling pattern. Budget caps per verification cycle. |
| Skill | Relationship |
|---|---|
| Verification Loop | Use for deterministic checks (build, lint, test). Santa for semantic checks (accuracy, hallucinations). Run verification-loop first, Santa second. |
| Eval Harness | Santa Method results feed eval metrics. Track pass@k across Santa runs to measure generator quality over time. |
| Continuous Learning v2 | Santa findings become instincts. Repeated failures on the same criterion → learned behavior to avoid the pattern. |
| Strategic Compact | Run Santa BEFORE compacting. Don't lose review context mid-verification. |
Cost of Santa = (generation tokens) + 2×(review tokens per round) × (avg rounds)
Cost of NOT Santa = (reputation damage) + (correction effort) + (trust erosion)