debug

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Debug

调试

Goals

目标

Find why a run is stuck, retrying, or failing.
Correlate Linear issue identity to a Codex session quickly.
Read the right logs in the right order to isolate root cause.

找出任务运行卡住、反复重试或失败的原因。
快速关联Linear工单标识与Codex会话。
按正确顺序读取对应日志以定位根本原因。

Log Sources

日志来源

Primary runtime log:
```
log/symphony.log
```
- Default comes from
```
SymphonyElixir.LogFile
```
  (
```
log/symphony.log
```
  ).
- Includes orchestrator, agent runner, and Codex app-server lifecycle logs.
Rotated runtime logs:
```
log/symphony.log*
```
- Check these when the relevant run is older.

主要运行时日志：
```
log/symphony.log
```
- 默认来自
```
SymphonyElixir.LogFile
```
  （路径为
```
log/symphony.log
```
  ）。
- 包含编排器、Agent运行器和Codex应用服务器的生命周期日志。
轮转运行时日志：
```
log/symphony.log*
```
- 当相关任务运行时间较久时，检查这些日志。

Correlation Keys

关联关键字

```
issue_identifier
```
: human ticket key (example:
```
MT-625
```
)
```
issue_id
```
: Linear UUID (stable internal ID)
```
session_id
```
: Codex thread-turn pair (
```
<thread_id>-<turn_id>
```
)

elixir/docs/logging.md

requires these fields for issue/session lifecycle logs. Use them as your join keys during debugging.

```
issue_identifier
```
: 人工工单编号（示例：
```
MT-625
```
）
```
issue_id
```
: Linear平台UUID（稳定的内部标识）
```
session_id
```
: Codex线程-轮次对（格式：
```
<thread_id>-<turn_id>
```
）

elixir/docs/logging.md

要求工单/会话生命周期日志中必须包含这些字段，调试时可将它们作为关联关键字使用。

Quick Triage (Stuck Run)

快速排查（任务卡住场景）

Confirm scheduler/worker symptoms for the ticket.
Find recent lines for the ticket (
```
issue_identifier
```
first).
Extract
```
session_id
```
from matching lines.
Trace that
```
session_id
```
across start, stream, completion/failure, and stall handling logs.
Decide class of failure: timeout/stall, app-server startup failure, turn failure, or orchestrator retry loop.

确认工单对应的调度器/工作节点症状。
查找该工单的最新日志行（优先使用
```
issue_identifier
```
）。
从匹配的日志行中提取
```
session_id
```
。
追踪该
```
session_id
```
对应的启动、流处理、完成/失败以及停滞处理日志。
判定故障类型：超时/停滞、应用服务器启动失败、轮次执行失败或编排器重试循环。

Commands

命令

bash

undefined

bash

undefined

1) Narrow by ticket key (fastest entry point)

1) 按工单编号筛选（最快的入口方式）

rg -n "issue_identifier=MT-625" log/symphony.log*

2) If needed, narrow by Linear UUID

2) 必要时，按Linear UUID筛选

rg -n "issue_id=<linear-uuid>" log/symphony.log*

3) Pull session IDs seen for that ticket

3) 提取该工单对应的所有会话ID

rg -o "session_id=[^ ;]+" log/symphony.log* | sort -u

4) Trace one session end-to-end

4) 完整追踪单个会话的全流程

rg -n "session_id=<thread>-<turn>" log/symphony.log*

5) Focus on stuck/retry signals

5) 聚焦卡住/重试相关日志

undefined

undefined

Investigation Flow

排查流程

Locate the ticket slice:
- Search by
```
issue_identifier=<KEY>
```
  .
- If noise is high, add
```
issue_id=<UUID>
```
  .

Establish timeline:

Identify first

Codex session started ... session_id=...

Follow with
```
Codex session completed
```
,
```
ended with error
```
, or worker exit lines.

Classify the problem:

Stall loop:

Issue stalled ... restarting with backoff

App-server startup:
```
Codex session failed ...
```
.

Turn execution failure:

turn_failed

turn_cancelled

turn_timeout

, or

ended with error

Worker crash:
```
Agent task exited ... reason=...
```
.

Validate scope:
- Check whether failures are isolated to one issue/session or repeating across multiple tickets.
Capture evidence:
- Save key log lines with timestamps,
```
issue_identifier
```
  ,
```
issue_id
```
  , and
```
session_id
```
  .
- Record probable root cause and the exact failing stage.

定位工单相关日志片段：
- 使用
```
issue_identifier=<工单编号>
```
  搜索。
- 如果日志噪音较大，添加
```
issue_id=<UUID>
```
  缩小范围。
梳理时间线：
- 找到首次出现的
```
Codex session started ... session_id=...
```
  日志。
- 跟进后续的
```
Codex session completed
```
  、
```
ended with error
```
  或工作节点退出日志。

分类故障类型：

停滞循环：

Issue stalled ... restarting with backoff

。

应用服务器启动失败：
```
Codex session failed ...
```
。

轮次执行失败：

turn_failed

、

turn_cancelled

、

turn_timeout

或

ended with error

。

工作节点崩溃：
```
Agent task exited ... reason=...
```
。

验证故障范围：
- 检查故障是仅存在于单个工单/会话，还是在多个工单中重复出现。
留存证据：
- 保存包含时间戳、
```
issue_identifier
```
  、
```
issue_id
```
  和
```
session_id
```
  的关键日志行。
- 记录可能的根本原因和具体的故障阶段。

Reading Codex Session Logs

解读Codex会话日志

In Symphony, Codex session diagnostics are emitted into

log/symphony.log

and keyed by

session_id

. Read them as a lifecycle:

Codex session started ... session_id=...

Session stream/lifecycle events for the same
```
session_id
```

Terminal event:

```
Codex session completed ...
```
, or
```
Codex session ended with error ...
```
, or

Issue stalled ... restarting with backoff

For one specific session investigation, keep the trace narrow:

Capture one
```
session_id
```
for the ticket.

Build a timestamped slice for only that session:

rg -n "session_id=<thread>-<turn>" log/symphony.log*

Mark the exact failing stage:
- Startup failure before stream events (
```
Codex session failed ...
```
  ).
- Turn/runtime failure after stream events (
```
turn_*
```
  /
```
ended with error
```
  ).
- Stall recovery (
```
Issue stalled ... restarting with backoff
```
  ).
Pair findings with
```
issue_identifier
```
and
```
issue_id
```
from nearby lines to confirm you are not mixing concurrent retries.

Always pair session findings with

issue_identifier

issue_id

to avoid mixing concurrent runs.

在Symphony中，Codex会话诊断信息会输出到

log/symphony.log

，并以

session_id

作为标识。请按生命周期顺序解读：

Codex session started ... session_id=...

同一
```
session_id
```
对应的会话流/生命周期事件

终止事件：

```
Codex session completed ...
```
，或
```
Codex session ended with error ...
```
，或

Issue stalled ... restarting with backoff

针对单个会话的排查，请缩小追踪范围：

获取该工单对应的一个
```
session_id
```
。
生成仅包含该会话的带时间戳日志片段：
- ```
rg -n "session_id=<thread>-<turn>" log/symphony.log*
```
标记具体的故障阶段：
- 流事件之前的启动失败（
```
Codex session failed ...
```
  ）。
- 流事件之后的轮次/运行时失败（
```
turn_*
```
  /
```
ended with error
```
  ）。
- 停滞恢复（
```
Issue stalled ... restarting with backoff
```
  ）。
结合附近日志行中的
```
issue_identifier
```
和
```
issue_id
```
验证结果，避免混淆并发的重试任务。

务必将会话排查结果与

issue_identifier

issue_id

关联，避免混淆并发运行的任务。

Notes

注意事项

Prefer
```
rg
```
over
```
grep
```
for speed on large logs.
Check rotated logs (
```
log/symphony.log*
```
) before concluding data is missing.
If required context fields are missing in new log statements, align with
```
elixir/docs/logging.md
```
conventions.

处理大型日志时，优先使用
```
rg
```
而非
```
grep
```
以提升速度。
在判定日志缺失前，请先检查轮转日志（
```
log/symphony.log*
```
）。
如果新日志语句中缺少必要的上下文字段，请对齐
```
elixir/docs/logging.md
```
中的规范。