debug
Original:🇺🇸 English
Translated
Investigate stuck runs and execution failures by tracing Symphony and Codex logs with issue/session identifiers; use when runs stall, retry repeatedly, or fail unexpectedly.
4installs
Sourceopenai/symphony
Added on
NPX Install
npx skill4agent add openai/symphony debugTags
Translated version includes tags in frontmatterSKILL.md Content
View Translation Comparison →Debug
Goals
- Find why a run is stuck, retrying, or failing.
- Correlate Linear issue identity to a Codex session quickly.
- Read the right logs in the right order to isolate root cause.
Log Sources
- Primary runtime log:
log/symphony.log- Default comes from (
SymphonyElixir.LogFile).log/symphony.log - Includes orchestrator, agent runner, and Codex app-server lifecycle logs.
- Default comes from
- Rotated runtime logs:
log/symphony.log*- Check these when the relevant run is older.
Correlation Keys
- : human ticket key (example:
issue_identifier)MT-625 - : Linear UUID (stable internal ID)
issue_id - : Codex thread-turn pair (
session_id)<thread_id>-<turn_id>
elixir/docs/logging.mdQuick Triage (Stuck Run)
- Confirm scheduler/worker symptoms for the ticket.
- Find recent lines for the ticket (first).
issue_identifier - Extract from matching lines.
session_id - Trace that across start, stream, completion/failure, and stall handling logs.
session_id - Decide class of failure: timeout/stall, app-server startup failure, turn failure, or orchestrator retry loop.
Commands
bash
# 1) Narrow by ticket key (fastest entry point)
rg -n "issue_identifier=MT-625" log/symphony.log*
# 2) If needed, narrow by Linear UUID
rg -n "issue_id=<linear-uuid>" log/symphony.log*
# 3) Pull session IDs seen for that ticket
rg -o "session_id=[^ ;]+" log/symphony.log* | sort -u
# 4) Trace one session end-to-end
rg -n "session_id=<thread>-<turn>" log/symphony.log*
# 5) Focus on stuck/retry signals
rg -n "Issue stalled|scheduling retry|turn_timeout|turn_failed|Codex session failed|Codex session ended with error" log/symphony.log*Investigation Flow
- Locate the ticket slice:
- Search by .
issue_identifier=<KEY> - If noise is high, add .
issue_id=<UUID>
- Search by
- Establish timeline:
- Identify first .
Codex session started ... session_id=... - Follow with ,
Codex session completed, or worker exit lines.ended with error
- Identify first
- Classify the problem:
- Stall loop: .
Issue stalled ... restarting with backoff - App-server startup: .
Codex session failed ... - Turn execution failure: ,
turn_failed,turn_cancelled, orturn_timeout.ended with error - Worker crash: .
Agent task exited ... reason=...
- Stall loop:
- Validate scope:
- Check whether failures are isolated to one issue/session or repeating across multiple tickets.
- Capture evidence:
- Save key log lines with timestamps, ,
issue_identifier, andissue_id.session_id - Record probable root cause and the exact failing stage.
- Save key log lines with timestamps,
Reading Codex Session Logs
In Symphony, Codex session diagnostics are emitted into and
keyed by . Read them as a lifecycle:
log/symphony.logsession_idCodex session started ... session_id=...- Session stream/lifecycle events for the same
session_id - Terminal event:
- , or
Codex session completed ... - , or
Codex session ended with error ... Issue stalled ... restarting with backoff
For one specific session investigation, keep the trace narrow:
- Capture one for the ticket.
session_id - Build a timestamped slice for only that session:
rg -n "session_id=<thread>-<turn>" log/symphony.log*
- Mark the exact failing stage:
- Startup failure before stream events ().
Codex session failed ... - Turn/runtime failure after stream events (/
turn_*).ended with error - Stall recovery ().
Issue stalled ... restarting with backoff
- Startup failure before stream events (
- Pair findings with and
issue_identifierfrom nearby lines to confirm you are not mixing concurrent retries.issue_id
Always pair session findings with / to avoid mixing
concurrent runs.
issue_identifierissue_idNotes
- Prefer over
rgfor speed on large logs.grep - Check rotated logs () before concluding data is missing.
log/symphony.log* - If required context fields are missing in new log statements, align with
conventions.
elixir/docs/logging.md