debug

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Debug

调试

Goals

目标

Find why a run is stuck, retrying, or failing.
Correlate Linear issue identity to a Codex session quickly.
Read the right logs in the right order to isolate root cause.

查找运行卡住、重试或失败的原因。
快速将Linear问题标识与Codex会话关联。
按正确顺序读取对应日志，定位根因。

Log Sources

日志来源

Primary runtime log:
```
log/symphony.log
```
- Default comes from
```
SymphonyElixir.LogFile
```
  (
```
log/symphony.log
```
  ).
- Includes orchestrator, agent runner, and Codex app-server lifecycle logs.
Rotated runtime logs:
```
log/symphony.log*
```
- Check these when the relevant run is older.

主运行时日志：
```
log/symphony.log
```
- 默认来自
```
SymphonyElixir.LogFile
```
  （
```
log/symphony.log
```
  ）。
- 包含编排器、Agent运行器和Codex应用服务器的生命周期日志。
轮转运行时日志：
```
log/symphony.log*
```
- 相关运行记录时间较早时查看这些日志。

Correlation Keys

关联键

```
issue_identifier
```
: human ticket key (example:
```
MT-625
```
)
```
issue_id
```
: Linear UUID (stable internal ID)
```
session_id
```
: Codex thread-turn pair (
```
<thread_id>-<turn_id>
```
)

elixir/docs/logging.md

requires these fields for issue/session lifecycle logs. Use them as your join keys during debugging.

```
issue_identifier
```
：人工可读工单键（示例：
```
MT-625
```
）
```
issue_id
```
：Linear UUID（稳定的内部ID）
```
session_id
```
：Codex线程-轮次对（
```
<thread_id>-<turn_id>
```
）

elixir/docs/logging.md

要求问题/会话生命周期日志必须包含这些字段。调试时可将它们作为关联键使用。

Quick Triage (Stuck Run)

快速排查（运行卡住）

Confirm scheduler/worker symptoms for the ticket.
Find recent lines for the ticket (
```
issue_identifier
```
first).
Extract
```
session_id
```
from matching lines.
Trace that
```
session_id
```
across start, stream, completion/failure, and stall handling logs.
Decide class of failure: timeout/stall, app-server startup failure, turn failure, or orchestrator retry loop.

确认该工单对应的调度器/Worker异常表现。
查找该工单的最近日志行（优先用
```
issue_identifier
```
）。
从匹配的日志行中提取
```
session_id
```
。
追踪该
```
session_id
```
对应的启动、流处理、完成/失败以及停滞处理日志。
判定故障类型：超时/停滞、应用服务器启动失败、轮次执行失败、编排器重试循环。

Commands

命令

bash

undefined

bash

undefined

1) Narrow by ticket key (fastest entry point)

rg -n "issue_identifier=MT-625" log/symphony.log*

2) If needed, narrow by Linear UUID

rg -n "issue_id=<linear-uuid>" log/symphony.log*

3) Pull session IDs seen for that ticket

rg -o "session_id=[^ ;]+" log/symphony.log* | sort -u

4) Trace one session end-to-end

rg -n "session_id=<thread>-<turn>" log/symphony.log*

5) Focus on stuck/retry signals

undefined

undefined

Investigation Flow

排查流程

Locate the ticket slice:
- Search by
```
issue_identifier=<KEY>
```
  .
- If noise is high, add
```
issue_id=<UUID>
```
  .

Establish timeline:

Identify first

Codex session started ... session_id=...

Follow with
```
Codex session completed
```
,
```
ended with error
```
, or worker exit lines.

Classify the problem:

Stall loop:

Issue stalled ... restarting with backoff

App-server startup:
```
Codex session failed ...
```
.

Turn execution failure:

turn_failed

turn_cancelled

turn_timeout

, or

ended with error

Worker crash:
```
Agent task exited ... reason=...
```
.

Validate scope:
- Check whether failures are isolated to one issue/session or repeating across multiple tickets.
Capture evidence:
- Save key log lines with timestamps,
```
issue_identifier
```
  ,
```
issue_id
```
  , and
```
session_id
```
  .
- Record probable root cause and the exact failing stage.

定位工单相关日志片段：
- 按
```
issue_identifier=<KEY>
```
  搜索。
- 如果噪音过高，增加
```
issue_id=<UUID>
```
  过滤。
梳理时间线：
- 找到第一条
```
Codex session started ... session_id=...
```
  记录。
- 后续跟进
```
Codex session completed
```
  、
```
ended with error
```
  或Worker退出的日志行。

归类问题：

停滞循环：

Issue stalled ... restarting with backoff

。

应用服务器启动问题：
```
Codex session failed ...
```
。

轮次执行失败：

turn_failed

、

turn_cancelled

、

turn_timeout

或

ended with error

。

Worker崩溃：
```
Agent task exited ... reason=...
```
。

确认影响范围：
- 检查故障是仅出现在单个问题/会话中，还是在多个工单中重复发生。
留存证据：
- 保存带时间戳、
```
issue_identifier
```
  、
```
issue_id
```
  和
```
session_id
```
  的关键日志行。
- 记录可能的根因和确切的故障阶段。

Reading Codex Session Logs

读取Codex会话日志

In Symphony, Codex session diagnostics are emitted into

log/symphony.log

and keyed by

session_id

. Read them as a lifecycle:

Codex session started ... session_id=...

Session stream/lifecycle events for the same
```
session_id
```

Terminal event:

```
Codex session completed ...
```
, or
```
Codex session ended with error ...
```
, or

Issue stalled ... restarting with backoff

For one specific session investigation, keep the trace narrow:

Capture one
```
session_id
```
for the ticket.

Build a timestamped slice for only that session:

rg -n "session_id=<thread>-<turn>" log/symphony.log*

Mark the exact failing stage:
- Startup failure before stream events (
```
Codex session failed ...
```
  ).
- Turn/runtime failure after stream events (
```
turn_*
```
  /
```
ended with error
```
  ).
- Stall recovery (
```
Issue stalled ... restarting with backoff
```
  ).
Pair findings with
```
issue_identifier
```
and
```
issue_id
```
from nearby lines to confirm you are not mixing concurrent retries.

Always pair session findings with

issue_identifier

issue_id

to avoid mixing concurrent runs.

在Symphony中，Codex会话诊断信息会输出到

log/symphony.log

中，以

session_id

为键标识。可按生命周期顺序读取：

Codex session started ... session_id=...

同一个
```
session_id
```
对应的会话流/生命周期事件

终止事件：

```
Codex session completed ...
```
，或
```
Codex session ended with error ...
```
，或

Issue stalled ... restarting with backoff

如果要排查单个特定会话，可缩小追踪范围：

获取该工单对应的一个
```
session_id
```
。

仅提取该会话的带时间戳的日志片段：

rg -n "session_id=<thread>-<turn>" log/symphony.log*

标记确切的故障阶段：
- 流事件前的启动失败（
```
Codex session failed ...
```
  ）。
- 流事件后的轮次/运行时失败（
```
turn_*
```
  /
```
ended with error
```
  ）。
- 停滞恢复（
```
Issue stalled ... restarting with backoff
```
  ）。
将排查结果与附近日志行的
```
issue_identifier
```
和
```
issue_id
```
匹配，确认没有混淆并发重试的记录。

始终将会话排查结果与

issue_identifier

issue_id

匹配，避免混淆并发运行的任务。

Notes

注意事项

Prefer
```
rg
```
over
```
grep
```
for speed on large logs.
Check rotated logs (
```
log/symphony.log*
```
) before concluding data is missing.
If required context fields are missing in new log statements, align with
```
elixir/docs/logging.md
```
conventions.

处理大日志时优先使用
```
rg
```
而非
```
grep
```
，速度更快。
判定数据缺失前，先检查轮转日志（
```
log/symphony.log*
```
）。
如果新的日志语句缺少必填的上下文字段，请遵循
```
elixir/docs/logging.md
```
的规范调整。