/council — Multi-Model Consensus Council
Spawn parallel judges with different perspectives, consolidate into consensus. Works for any task — validation, research, brainstorming.
Quick Start
bash
/council --quick validate recent # fast inline check
/council validate this plan # validation (2 agents)
/council brainstorm caching approaches # brainstorm
/council validate the implementation # validation (critique triggers map here)
/council research kubernetes upgrade strategies # research
/council research the CI/CD pipeline bottlenecks # research (analyze triggers map here)
/council --preset=security-audit validate the auth system # preset personas
/council --deep --explorers=3 research upgrade automation # deep + explorers
/council --debate validate the auth system # adversarial 2-round review
/council --deep --debate validate the migration plan # thorough + debate
/council # infers from context
Council works independently — no RPI workflow, no ratchet chain, no
CLI required. Zero setup beyond plugin install.
Modes
| Mode | Agents | Execution Backend | Use Case |
|---|
| 0 (inline) | Self | Fast single-agent check, no spawning |
| default | 2 | Runtime-native (Codex sub-agents preferred; Claude teams fallback) | Independent judges (no perspective labels) |
| 3 | Runtime-native | Thorough review |
| 3+3 | Runtime-native + Codex CLI | Cross-vendor consensus |
| 2+ | Runtime-native | Adversarial refinement (2 rounds) |
bash
/council --quick validate recent # inline single-agent check, no spawning
/council recent # 2 runtime-native judges
/council --deep recent # 3 runtime-native judges
/council --mixed recent # runtime-native + Codex CLI
Spawn Backend Selection (MANDATORY)
Council must auto-select backend using capability detection:
- If is available, use Codex experimental sub-agents
- Else if is available, use Claude native teams
- Else use Task(run_in_background=true) fallback
This keeps
universal across Claude and Codex sessions.
When to Use
Use
for high-stakes or ambiguous reviews where judges are likely to disagree:
- Security audits, architecture decisions, migration plans
- Reviews where multiple valid perspectives exist
- Cases where a missed finding has real consequences
Skip
for routine validation where consensus is expected. Debate adds R2 latency (judges stay alive and process a second round via backend messaging).
Incompatibilities:
- and cannot be combined. runs inline with no spawning; requires multi-agent rounds. If both are passed, exit with error: "Error: --quick and --debate are incompatible."
- is only supported with validate mode. Brainstorm and research do not produce PASS/WARN/FAIL verdicts. If combined, exit with error: "Error: --debate is only supported with validate mode."
Task Types
| Type | Trigger Words | Perspective Focus |
|---|
| validate | validate, check, review, assess, critique, feedback, improve | Is this correct? What's wrong? What could be better? |
| brainstorm | brainstorm, explore, options, approaches | What are the alternatives? Pros/cons? |
| research | research, investigate, deep dive, explore deeply, analyze, examine, evaluate, compare | What can we discover? What are the properties, trade-offs, and structure? |
Natural language works — the skill infers task type from your prompt.
Architecture
Execution Flow
┌─────────────────────────────────────────────────────────────────┐
│ Phase 1: Build Packet (JSON) │
│ - Task type (validate/brainstorm/research) │
│ - Target description │
│ - Context (files, diffs, prior decisions) │
│ - Perspectives to assign │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Phase 1a: Select spawn backend │
│ codex_subagents | claude_teams | background_fallback │
│ Team lead = spawner (this agent) │
└─────────────────────────────────────────────────────────────────┘
│
┌─────────────────┴─────────────────┐
▼ ▼
┌───────────────────────┐ ┌───────────────────────┐
│ RUNTIME-NATIVE JUDGES│ │ CODEX AGENTS │
│ (spawn_agent or teams)│ │ (Bash tool, parallel)│
│ │ │ Agent 1 (independent │
│ Agent 1 (independent │ │ or with preset) │
│ or with preset) │ │ Agent 2 │
│ Agent 2 │ │ Agent 3 │
│ Agent 3 (--deep only)│ │ (--mixed only) │
│ (--deep/--mixed only)│ │ │
│ │ │ Output: JSON + MD │
│ Write files, then │ │ Files: .agents/ │
│ wait()/SendMessage to │ │ council/codex-* │
│ lead │ │ │
│ Files: .agents/ │ └───────────────────────┘
│ council/claude-* │ │
└───────────────────────┘ │
│ │
└─────────────────┬─────────────────┘
▼
┌─────────────────────────────────────────────────────────────────┐
│ Phase 2: Consolidation (Team Lead) │
│ - Receive completion from backend channel (wait/SendMessage) │
│ - Read all agent output files │
│ - If schema_version is missing from a judge's output, treat │
│ as version 0 (backward compatibility) │
│ - Compute consensus verdict │
│ - Identify shared findings │
│ - Surface disagreements with attribution │
│ - Generate Markdown report for human │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Phase 3: Cleanup │
│ - Cleanup backend resources (close_agent / TeamDelete / none) │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Output: Markdown Council Report │
│ - Consensus: PASS/WARN/FAIL │
│ - Shared findings │
│ - Disagreements (if any) │
│ - Recommendations │
└─────────────────────────────────────────────────────────────────┘
Graceful Degradation
| Failure | Behavior |
|---|
| 1 of N agents times out | Proceed with N-1, note in report |
| All Codex CLI agents fail | Proceed with runtime-native judges only, note degradation |
| All agents fail | Return error, suggest retry |
| Codex CLI not installed | Skip Codex CLI judges, continue runtime-native mode (warn user) |
| Codex sub-agents unavailable | Fall back to Claude teams |
| Native teams unavailable | Fall back to Task(run_in_background=true)
fire-and-forget |
| Output dir missing | Create automatically |
Timeout: 120s per agent (configurable via
in seconds).
Minimum quorum: At least 1 agent must respond for a valid council. If 0 agents respond, return error.
Pre-Flight Checks
- Runtime-native backend: Select via capability detection ( -> ->
Task(run_in_background=true)
).
- Codex CLI judges (--mixed only): Check , test model availability, test support. Downgrade mixed mode when unavailable.
- Agent count: Verify
judges * (1 + explorers) <= MAX_AGENTS (12)
- Output dir:
Quick Mode ()
Single-agent inline validation. No subprocess spawning, no Task tool, no Codex. The current agent performs a structured self-review using the same output schema as a full council.
When to use: Routine checks, mid-implementation sanity checks, pre-commit quick scan.
Execution: Gather context (files, diffs) -> perform structured self-review inline using the council output_schema (verdict, confidence, findings, recommendation) -> write report to
.agents/council/YYYY-MM-DD-quick-<target>.md
labeled as
Mode: quick (single-agent)
.
Limitations: No cross-perspective disagreement, no cross-vendor insights, lower confidence ceiling. Not suitable for security audits or architecture decisions.
Packet Format (JSON)
The packet sent to each agent. File contents are included inline — agents receive the actual code/plan text in the packet, not just paths. This ensures both Claude and Codex agents can analyze without needing file access.
json
{
"council_packet": {
"version": "1.0",
"mode": "validate | brainstorm | research",
"target": "Implementation of user authentication system",
"context": {
"files": [
{
"path": "src/auth/jwt.py",
"content": "<file contents inlined here>"
},
{
"path": "src/auth/middleware.py",
"content": "<file contents inlined here>"
}
],
"diff": "git diff output if applicable",
"spec": {
"source": "bead na-0042 | plan doc | none",
"content": "The spec/bead description text (optional — included when wrapper provides it)"
},
"prior_decisions": [
"Using JWT, not sessions",
"Refresh tokens required"
]
},
"perspective": "skeptic (only when --preset or --perspectives used)",
"perspective_description": "What could go wrong? (only when --preset or --perspectives used)",
"output_schema": {
"verdict": "PASS | WARN | FAIL",
"confidence": "HIGH | MEDIUM | LOW",
"key_insight": "Single sentence summary",
"findings": [
{
"severity": "critical | significant | minor",
"category": "security | architecture | performance | style",
"description": "What was found",
"location": "file:line if applicable",
"recommendation": "How to address"
}
],
"recommendation": "Concrete next step",
"schema_version": 1
}
}
}
Perspectives
Perspectives & Presets: Use
tool on
skills/council/references/personas.md
for persona definitions, preset configurations, and custom perspective details.
Auto-Escalation: When
or
specifies more perspectives than the current judge count, automatically escalate judge count to match. The
flag overrides auto-escalation.
Explorer Sub-Agents
Explorer Details: Use
tool on
skills/council/references/explorers.md
for explorer architecture, prompts, sub-question generation, and timeout configuration.
Summary: Judges can spawn explorer sub-agents (
, max 5) for parallel deep-dive research. Total agents =
, capped at MAX_AGENTS=12.
Debate Phase ()
Debate Protocol: Use
tool on
skills/council/references/debate-protocol.md
for full debate execution flow, R1-to-R2 verdict injection, timeout handling, and cost analysis.
Summary: Two-round adversarial review. R1 produces independent verdicts. R2 sends other judges' verdicts via backend messaging (
or
) for steel-manning and revision. Only supported with validate mode.
Agent Prompts
Agent Prompts: Use
tool on
skills/council/references/agent-prompts.md
for judge prompts (default and perspective-based), consolidation prompt, and debate R2 message template.
Consensus Rules
| Condition | Verdict |
|---|
| All PASS | PASS |
| Any FAIL | FAIL |
| Mixed PASS/WARN | WARN |
| All WARN | WARN |
Disagreement handling:
- If Claude says PASS and Codex says FAIL → DISAGREE (surface both)
- Severity-weighted: Security FAIL outweighs style WARN
DISAGREE resolution: When vendors disagree, the spawner presents both positions with reasoning and defers to the user. No automatic tie-breaking — cross-vendor disagreement is a signal worth human attention.
Output Format
Report Templates: Use
tool on
skills/council/references/output-format.md
for full report templates (validate, brainstorm, research) and debate report additions (verdict shifts, convergence detection).
All reports write to
.agents/council/YYYY-MM-DD-<type>-<target>.md
.
Configuration
Partial Completion
Minimum quorum: 1 agent. Recommended: 80% of judges. On timeout, proceed with remaining judges and note in report. On user cancellation, shutdown all judges and generate partial report with INCOMPLETE marker.
Environment Variables
| Variable | Default | Description |
|---|
| 120 | Agent timeout in seconds |
| gpt-5.3-codex | Default Codex model for --mixed |
| opus | Claude model for agents |
| sonnet | Model for explorer sub-agents |
| 60 | Explorer timeout in seconds |
| 90 | Maximum wait time for R2 debate completion after sending debate messages. Shorter than R1 since judges already have context. |
Flags
| Flag | Description |
|---|
| 3 Claude agents instead of 2 |
| Add 3 Codex agents |
| Enable adversarial debate round (2 rounds via backend messaging, same agents). Incompatible with . |
| Override timeout in seconds (default: 120) |
| Custom perspective names |
| Built-in persona preset (security-audit, architecture, research, ops, code-review, plan-review, retrospective) |
| Override agent count per vendor (e.g., = 4 Claude, or 4+4 with --mixed). Subject to MAX_AGENTS=12 cap. |
| Explorer sub-agents per judge (default: 0, max: 5). Max effective value depends on judge count. Total agents capped at 12. |
| Override explorer model (default: sonnet) |
CLI Spawning Commands
CLI Spawning: Use
tool on
skills/council/references/cli-spawning.md
for team setup, Claude/Codex agent spawning, parallel execution, debate R2 commands, cleanup, and model selection.
Examples
bash
/council validate recent # 2 judges, recent commits
/council --deep --preset=architecture research the auth system # 3 judges with architecture personas
/council --mixed validate this plan # 3 Claude + 3 Codex
/council --deep --explorers=3 research upgrade patterns # 12 agents (3 judges x 4)
/council --preset=security-audit --deep validate the API # attacker, defender, compliance
/council brainstorm caching strategies for the API # 2 judges explore options
/council research Redis vs Memcached for session storage # 2 judges assess trade-offs
/council validate the implementation plan in PLAN.md # structured plan feedback
Migration from /judge
The
skill is deprecated. Use
.
Runtime-Native Architecture
Council uses runtime-native spawning as primary:
- Codex sessions: experimental sub-agents (, , , )
- Claude sessions: native teams (, , shared )
- Fallback:
Task(run_in_background=true)
Deliberation Protocol
The
flag implements the
deliberation protocol pattern:
Independent assessment → evidence exchange → position revision → convergence analysis
Runtime-native backends make this pattern first-class:
- R1: Judges spawn as sub-agents/teammates, assess independently, return verdicts to lead
- R2: Team lead sends other judges' verdicts via (Codex) or (Claude). Judges wake from idle with full R1 context.
- Consolidation: Team lead reads all output files, computes consensus
- Cleanup: (Codex) or + (Claude)
Communication Rules
- Judges → team lead only. Judges never message each other directly. This prevents anchoring.
- Team lead → judges. Only the team lead sends follow-ups ( or ).
- No shared task mutation by judges. Team lead manages coordination state.
Ralph Wiggum Compliance
Council maintains fresh-context isolation (Ralph Wiggum pattern) with one documented exception:
reuses judge context across R1 and R2. This is intentional. Judges persist within a single atomic council invocation — they do NOT persist across separate council calls. The rationale:
- Judges benefit from their own R1 analytical context (reasoning chain, not just the verdict JSON) when evaluating other judges' positions in R2
- Re-spawning with only the verdict summary (~200 tokens) would lose the judge's working memory of WHY they reached their verdict
- The exception is bounded: max 2 rounds, within one invocation, with explicit cleanup (close_agent or shutdown_request + TeamDelete)
Without
, council is fully Ralph-compliant: each judge is a fresh spawn, executes once, writes output, and terminates.
Fallback
If runtime-native backend is unavailable, fall back to
Task(run_in_background=true)
fire-and-forget. In fallback mode:
- reverts to R2 re-spawning with truncated R1 verdicts
- The debate report must include
**Fidelity:** degraded (fallback — R1 verdicts truncated for R2 re-spawn)
in the header so users know results may be lower fidelity
- Non-debate mode works identically (judges write files, team lead reads them)
Judge Naming
Convention:
council-YYYYMMDD-<target>
(e.g.,
council-20260206-auth-system
).
Judge names:
for independent judges (e.g.,
,
), or
when using presets/perspectives (e.g.,
,
). Use the same logical names across both Codex and Claude backends.
See Also
- — Complexity + council for code validation (uses when spec found)
skills/pre-mortem/SKILL.md
— Plan validation (uses , always 3 judges)
skills/post-mortem/SKILL.md
— Work wrap-up (uses , always 3 judges + retro)
- — Multi-agent orchestration
skills/standards/SKILL.md
— Language-specific coding standards
- — Codebase exploration (complementary to council research mode)