Loading...
Loading...
Diagnose gateway failures by reading daemon logs, session transcripts, Redis state, and OTEL telemetry. Full Telegram path triage: daemon process → Redis channel → command queue → pi session → model API → Telegram delivery. Use when: 'gateway broken', 'telegram not working', 'why is gateway down', 'gateway not responding', 'check gateway logs', 'what happened to gateway', 'gateway diagnose', 'gateway errors', 'review gateway logs', 'fallback activated', 'gateway stuck', or any request to understand why the gateway failed. Distinct from the gateway skill (operations) — this skill is diagnostic.
npx skill4agent add joelhooks/joelclaw gateway-diagnose# Automated health check — runs all layers, returns structured findings
joelclaw gateway diagnose [--hours 1] [--lines 100]
# Session context — what happened recently? Exchanges, tools, errors.
joelclaw gateway review [--hours 1] [--max 20]diagnosereview| Artifact | Path | What's in it |
|---|---|---|
| Daemon stdout | | Startup info, event flow, responses, fallback messages |
| Daemon stderr | | Errors, stack traces, retries, fallback activations — check this first |
| PID file | | Current daemon process ID |
| Session ID | | Current pi session ID |
| Session transcripts | | Full pi session history (most recent by mtime) |
| Gateway working dir | | Has |
| Launchd plist | | Service config, env vars, log paths |
| Start script | | Secret leasing, env setup, bun invocation |
| Tripwire | | Last heartbeat timestamp (updated every 15 min) |
| WS port | | WebSocket port for TUI attach (default 3018) |
# Is the daemon running?
launchctl list | grep gateway
ps aux | grep gateway | grep -v grep
# What's the PID and uptime?
cat /tmp/joelclaw/gateway.pid
# Compare PID to launchctl list output — mismatch = stale PID filelaunchctl kickstart -kjoelclaw gateway statusredis: "connected"activeSessionsgatewayalive: truepending: 0# Default: last 100 lines. Adjust for time range.
tail -100 /tmp/joelclaw/gateway.err| Pattern | Meaning | Root Cause |
|---|---|---|
| Command queue tried to prompt while session streaming | Session busy — long turn, compaction, or initialization race |
| Model timeout or consecutive failures triggered model swap | Primary model API down or slow |
| Timeout — prompt dispatched but no response | Model API issue, auth failure, or session not ready |
| Drain loop retry (3 attempts, 2s each) | Turn taking longer than expected |
| No turn_end for 10+ minutes after prompt | Hung tool call or model hang |
| 3+ consecutive prompt failures | Triggers self-restart via graceful shutdown |
| Typesense unreachable | k8s port-forward or Typesense pod issue (secondary) |
| Nth failure in a row | Check model API, session state |
tail -100 /tmp/joelclaw/gateway.log[gateway] daemon started[gateway:telegram] message received[gateway:store] persisted inbound message[gateway:fallback] prompt dispatched[gateway] response ready[gateway:fallback] activated[redis] suppressed N noise event(s)[gateway:store] replayed unacked messagesjoelclaw gateway test
# Wait 5 seconds
joelclaw gateway events# Find most recent gateway session
ls -lt ~/.joelclaw/sessions/gateway/*.jsonl | head -1
# Read last N lines of the session JSONL
tail -50 ~/.joelclaw/sessions/gateway/<session-file>.jsonl"type": "turn_end""type": "error"turn_startturn_end# Gateway-specific events
joelclaw otel search "gateway" --hours 1
# Fallback events
joelclaw otel search "fallback" --hours 1
# Queue events
joelclaw otel search "command-queue" --hours 1# Quick API reachability test (auth error = API reachable)
curl -s -m 10 https://api.anthropic.com/v1/messages \
-H "x-api-key: test" \
-H "anthropic-version: 2023-06-01" \
-H "content-type: application/json" \
-d '{}' | jq .error.type
# Expected: "authentication_error" (means API is reachable)# Check gateway queue directly
kubectl exec -n joelclaw redis-0 -- redis-cli LLEN joelclaw:notify:gateway
# Check message store
kubectl exec -n joelclaw redis-0 -- redis-cli XLEN gateway:messages
# Check unacked messages (these replay on restart)
kubectl exec -n joelclaw redis-0 -- redis-cli XRANGE gateway:messages - + COUNT 5replayUnacked()joelclaw gateway restartkubectl get pods -n joelclawmodel_fallback.swappedmodel_fallback.primary_restoredmodel_fallback.probe_failed[gateway:fallback] activatedrecoveredTelegram → channels/telegram.ts → enqueueToGateway()
Redis → channels/redis.ts → enqueueToGateway()
↓
command-queue.ts
(serial FIFO)
↓
session.prompt(text)
↓
pi SDK (isStreaming gate)
↓
Model API (claude-opus-4-6)
↓
turn_end → idleWaiter resolves
↓
Response routed to origin channelidleWaiterturn_end| File | What to look for |
|---|---|
| Session creation, event handler, idle waiter, watchdog |
| |
| Timeout tracking, fallback swap, recovery probes |
| Event batching, prompt building, sleep mode |
| Bot polling, message routing |
| Tripwire writer only (ADR-0103: no prompt injection) |