Witness Observer
Per-rig observer that monitors polecat health and reports anomalies. The witness is the PMU (Performance Monitoring Unit) of the Gastown chipset -- it watches execution units for stalls, detects degraded performance, and raises alerts without interfering with computation. The witness is strictly read-only with respect to agent work. It observes and reports; it never modifies.
Activation Triggers
This skill activates when:
- The agent is assigned to monitor a rig's worker agents
- Multiple polecats are running and health monitoring is needed
- Stall detection is required for long-running work items
- The mayor needs a supervisory agent to watch active polecats
Core Capabilities
Patrol Loop
The witness runs a periodic patrol that checks all active agents in its rig for health indicators.
Patrol cycle:
SCAN EVALUATE ACT WAIT
| | | |
v v v v
list agents -> check each -> nudge/escalate -> sleep interval
(active ones) for stalls if needed (default 5 min)
Implementation:
typescript
const state = new StateManager({ stateDir: '.chipset/state/' });
const patrolInterval = 5 * 60 * 1000; // 5 minutes (configurable)
const stallThreshold = 30 * 60 * 1000; // 30 minutes (configurable)
async function patrol(): Promise<void> {
// Get all agents that should be working
const agents = await state.listAgents({ role: 'polecat' });
const active = agents.filter(a => a.status === 'active');
for (const agent of active) {
const hook = await state.getHook(agent.id);
if (!hook || hook.status !== 'active') continue;
// Check last activity timestamp
const lastActivity = new Date(hook.lastActivity).getTime();
const elapsed = Date.now() - lastActivity;
if (elapsed > stallThreshold) {
await handleStall(agent, hook, elapsed);
}
}
}
Stall Detection
A stall is detected when an agent has hooked work but has not updated its activity timestamp within the threshold period (default 30 minutes).
Stall indicators:
| Indicator | What It Means |
|---|
| Hook active, no activity for 30+ min | Agent may be stuck, crashed, or idle |
| Agent status is 'active' but hook timestamp stale | Session may have ended without cleanup |
| Multiple consecutive patrol cycles with no change | Persistent stall, needs escalation |
Stall classification:
typescript
type StallSeverity = 'warning' | 'alert' | 'critical';
function classifyStall(elapsed: number, nudgesSent: number): StallSeverity {
if (nudgesSent >= 2) return 'critical'; // Nudged twice, still stalled
if (elapsed > 60 * 60 * 1000) return 'alert'; // Over 1 hour
return 'warning'; // First detection
}
Nudge Protocol
When a stall is detected, the witness follows a graduated escalation protocol.
Step 1 -- Send nudge to stalled agent:
typescript
async function handleStall(
agent: AgentIdentity,
hook: HookState,
elapsed: number
): Promise<void> {
const severity = classifyStall(elapsed, getNudgeCount(agent.id));
if (severity === 'warning') {
// First nudge: ask agent if it's still working
const nudge: AgentMessage = {
from: witnessId,
to: agent.id,
channel: 'nudge',
payload: `HEALTH_CHECK: no activity for ${Math.floor(elapsed / 60000)}m on ${hook.workItem?.beadId}`,
timestamp: new Date().toISOString(),
durable: false,
};
// Write nudge file
recordNudge(agent.id);
return;
}
if (severity === 'alert' || severity === 'critical') {
// Escalate to mayor
await escalateToMayor(agent, hook, severity, elapsed);
}
}
Step 2 -- Wait for response (next patrol cycle):
If the agent responds to the nudge (updates its hook activity timestamp or sends mail), the stall is resolved. No further action needed.
Step 3 -- Escalate if unresolved:
typescript
async function escalateToMayor(
agent: AgentIdentity,
hook: HookState,
severity: StallSeverity,
elapsed: number
): Promise<void> {
const escalation: AgentMessage = {
from: witnessId,
to: 'mayor',
channel: 'mail',
payload: `STALL_${severity.toUpperCase()}: ${agent.id} idle ${Math.floor(elapsed / 60000)}m on ${hook.workItem?.beadId}`,
timestamp: new Date().toISOString(),
durable: true,
};
// Write escalation to .chipset/state/mail/mayor/{timestamp}-{witnessId}.json
}
Health Reporting
The witness provides aggregate health summaries when queried by the mayor.
typescript
interface RigHealthReport {
rigName: string;
timestamp: string;
totalAgents: number;
activeAgents: number;
stalledAgents: number;
idleAgents: number;
terminatedAgents: number;
stalledDetails: Array<{
agentId: string;
beadId: string;
stalledMinutes: number;
nudgesSent: number;
}>;
}
Communication Protocol
Messages the Witness SENDS
| Channel | Target | Purpose | Durability |
|---|
| Stalled polecats | "Are you still working?" health check | Non-durable |
| Mayor | Stall alerts (warning, alert, critical) | Durable |
| Mayor | Health report summaries | Durable |
Messages the Witness RECEIVES
| Channel | Source | Content |
|---|
| Mayor | Instructions (adjust thresholds, focus on specific agent) |
| Polecats | Status responses to nudges |
Error Handling
False Positive Stalls
If an agent is working but updates are slow (large commits, long test runs), the witness may detect a false positive. The nudge protocol handles this: the agent responds to the nudge, and the witness records the response as activity.
Witness Restart
If the witness itself restarts, it resumes patrol from scratch. It reads current agent and hook state from the filesystem -- there is no witness-specific state that needs recovery. The patrol loop is stateless between cycles.
Unresponsive Agent
If an agent does not respond to two nudges across two patrol cycles, the witness sends a
escalation to the mayor. The mayor decides whether to terminate and replace the agent.
Boundary: What the Witness Does NOT Do
The witness NEVER:
- Modifies agent work -- does not edit files, change branches, or alter code
- Resolves conflicts -- conflict resolution is outside the observer's scope
- Terminates agents -- only the mayor can terminate; the witness recommends
- Reassigns work -- hook management belongs to the mayor
- Changes agent status -- the witness reads status but does not write it (except its own)
- Runs tests or builds -- the witness observes; it does not validate output quality
The witness is a sensor. It detects anomalies and reports them. It does not act on them.
Integration with Other Gastown Skills
| Skill | Relationship |
|---|
| Witness reports stalls and health TO mayor |
| Witness monitors polecat health, sends nudges |
| Witness can observe refinery queue depth and merge failures |
| Witness reads state via StateManager (read-only) |
References
references/gastown-origin.md
-- How this pattern derives from Gastown's witness.go patrol
- -- Read-only constraints and observation-only scope