codex-token-summary

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Codex Token 消耗统计

Codex Token Consumption Statistics

快速使用（推荐使用 Python 脚本）

Quick Start (Recommended to Use Python Script)

本技能提供参数化的 Python 脚本，无需手工编写 SQL/JSON 解析代码。脚本位于本目录

codex_token_summary.py

，支持灵活的时间范围、输出格式控制。

This skill provides a parameterized Python script, eliminating the need to manually write SQL/JSON parsing code. The script is located in this directory as

codex_token_summary.py

, supporting flexible time ranges and output format control.

常用命令

Common Commands

bash

undefined

bash

undefined

最近 7 天简单表格（默认）

Simple table for the last 7 days (default)

python codex_token_summary.py

Last 30 days

python codex_token_summary.py --days 30

自定义日期范围

Custom date range

python codex_token_summary.py --start-date 2026-04-01 --end-date 2026-05-18

详细格式（含 Token 分类）

Detailed format (including Token categories)

python codex_token_summary.py --days 7 --format detailed

输出为 JSON

Output as JSON

python codex_token_summary.py --days 14 --output json

包含周末

Include weekends

python codex_token_summary.py --days 30 --no-exclude-weekends

undefined

python codex_token_summary.py --days 30 --no-exclude-weekends

undefined

Windows 环境初始化

Windows Environment Initialization

若 Windows 上未安装 Python，使用同目录的

setup_python_env.ps1

一键安装并配置：

powershell

undefined

If Python is not installed on Windows, use the

setup_python_env.ps1

in the same directory to install and configure it with one click:

powershell

undefined

以管理员身份运行 PowerShell，然后执行

Run PowerShell as administrator, then execute

.\setup_python_env.ps1


该脚本会自动检测 Python、从 python.org 下载官方安装包、加入 PATH，并安装脚本所需的依赖（仅 Python < 3.9 时需 `backports.zoneinfo`）。

.\setup_python_env.ps1


This script will automatically detect Python, download the official installation package from python.org, add it to PATH, and install dependencies required by the script (only `backports.zoneinfo` is needed when Python < 3.9).

脚本参数速查

Quick Reference for Script Parameters

参数	说明	示例
`--days N`	最近 N 天（默认 7）	`--days 30`
`--start-date YYYY-MM-DD`	自定义开始日期	`--start-date 2026-04-01`
`--end-date YYYY-MM-DD`	自定义结束日期	`--end-date 2026-05-18`
`--format simple\|detailed`	输出详细程度（默认 simple）	`--format detailed`
`--output json\|table`	输出类型（默认 table）	`--output json`
`--no-exclude-weekends`	包含周末	`--no-exclude-weekends`
`--tz TIMEZONE`	时区（默认 Asia/Shanghai）	`--tz America/New_York`

详见本目录

README_SCRIPT.md

。

Parameter	Description	Example
`--days N`	Last N days (default: 7)	`--days 30`
`--start-date YYYY-MM-DD`	Custom start date	`--start-date 2026-04-01`
`--end-date YYYY-MM-DD`	Custom end date	`--end-date 2026-05-18`
`--format simple\|detailed`	Output detail level (default: simple)	`--format detailed`
`--output json\|table`	Output type (default: table)	`--output json`
`--no-exclude-weekends`	Include weekends	`--no-exclude-weekends`
`--tz TIMEZONE`	Time zone (default: Asia/Shanghai)	`--tz America/New_York`

See

README_SCRIPT.md

in this directory for details.

适用场景

Applicable Scenarios

当用户要求统计本机 Codex / Codex Desktop 在一段时间内的 Token 消耗，尤其是需要按任务目的、会话、模型、Token 类型占比输出表格时，使用本技能。

默认使用中文输出。除非用户另有说明，统计“最近 7 天”时按自然日回看 7 天。

This skill is used when users need to count the Token consumption of the local Codex / Codex Desktop over a period of time, especially when requiring tables output by task purpose, session, model, and Token type proportion.

Chinese output is used by default. Unless specified otherwise by the user, "last 7 days" refers to looking back 7 natural days.

数据源

Data Sources

主要读取两个本地数据源：

```
~/.codex/state_*.sqlite
```
- 表：
```
threads
```
- 用途：获取线程 ID、任务目的、项目路径、模型、rollout 文件路径、创建/更新时间、总 Token 计数。
- 应在
```
~/.codex/
```
  下发现
```
state_*.sqlite
```
  ，再按 schema 判断哪个是状态库。
- 关键字段：
  - ```
  id
```
- ```
rollout_path
```
  - ```
  cwd
```
- ```
title
```
  - ```
  preview
```
- ```
first_user_message
```
  - ```
  created_at
```
- ```
updated_at
```
  - ```
  tokens_used
```
- ```
model
```
  - ```
  reasoning_effort
```
- ```
agent_role
```
  - ```
  thread_source
```
兼容来源：
- 旧的
```
~/.codex/session_index.jsonl
```
  里 session 名称字段通常叫
```
thread_name
```
  ，只在状态库缺失或需要交叉校验时使用。

~/.codex/sessions/**/*.jsonl

和

~/.codex/archived_sessions/*.jsonl

路径通常来自
```
threads.rollout_path
```
，不要只扫描
```
sessions/
```
，因为较早或归档会话可能在
```
archived_sessions/
```
。
用途：读取
```
token_count
```
事件，按事件时间做增量统计。

关键事件：

JSONL 行的
```
type == "event_msg"
```
```
payload.type == "token_count"
```
```
payload.info.total_token_usage
```

Mainly reads two local data sources:

```
~/.codex/state_*.sqlite
```
- Table:
```
threads
```
- Purpose: Obtain thread ID, task purpose, project path, model, rollout file path, creation/update time, total Token count.
- Should find
```
state_*.sqlite
```
  under
```
~/.codex/
```
  , then determine which one is the state database based on the schema.
- Key fields:
  - ```
  id
```
- ```
rollout_path
```
  - ```
  cwd
```
- ```
title
```
  - ```
  preview
```
- ```
first_user_message
```
  - ```
  created_at
```
- ```
updated_at
```
  - ```
  tokens_used
```
- ```
model
```
  - ```
  reasoning_effort
```
- ```
agent_role
```
  - ```
  thread_source
```
Compatible source:
- The session name field in the old
```
~/.codex/session_index.jsonl
```
  is usually called
```
thread_name
```
  , which is only used when the state database is missing or cross-verification is needed.

~/.codex/sessions/**/*.jsonl

and

~/.codex/archived_sessions/*.jsonl

The path usually comes from
```
threads.rollout_path
```
; do not only scan
```
sessions/
```
, as older or archived sessions may be in
```
archived_sessions/
```
.
Purpose: Read
```
token_count
```
events and perform incremental statistics by event time.

Key events:

```
type == "event_msg"
```
in JSONL lines
```
payload.type == "token_count"
```
```
payload.info.total_token_usage
```

访问方式

Access Method

优先发现可用的 Codex 状态 SQLite，再查询线程元数据，最后按

rollout_path

逐个读取 JSONL。

状态库发现规则：

在
```
~/.codex/
```
下查找
```
state_*.sqlite
```
。
对候选文件执行
```
.tables
```
或查询
```
sqlite_master
```
。
选择包含
```
threads
```
表的文件。
校验
```
threads
```
表至少包含这些字段：
```
id
```
、
```
rollout_path
```
、
```
cwd
```
、
```
created_at
```
、
```
tokens_used
```
。
如果多个文件都匹配，优先选择最近修改时间最新的文件；必要时输出候选列表让用户确认。
不要把
```
logs_*.sqlite
```
当主统计源。日志库可用于调试，但项目维度和 rollout 路径来自状态库的
```
threads
```
表。

示例查询：

bash

STATE_DB="$(find ~/.codex -maxdepth 1 -name 'state_*.sqlite' -type f -print | sort | tail -1)"
sqlite3 -json "$STATE_DB" \
  "select id, rollout_path, cwd, title, preview, first_user_message, tokens_used, created_at, updated_at, model, reasoning_effort, agent_role, thread_source from threads"

注意windows下使用符合windows powershell的脚本命令查询

推荐用脚本聚合，而不是只使用

threads.tokens_used

。原因是

threads.tokens_used

是线程级累计值，无法按天切分，也无法拆出缓存输入、输出、思考等类型。

Prioritize discovering available Codex state SQLite, then query thread metadata, and finally read JSONL files one by one according to

rollout_path

State database discovery rules:

Search for
```
state_*.sqlite
```
under
```
~/.codex/
```
.
Execute
```
.tables
```
or query
```
sqlite_master
```
on candidate files.
Select the file that contains the
```
threads
```
table.
Verify that the
```
threads
```
table contains at least these fields:
```
id
```
,
```
rollout_path
```
,
```
cwd
```
,
```
created_at
```
,
```
tokens_used
```
.
If multiple files match, prioritize the one with the latest modification time; if necessary, output a candidate list for user confirmation.
Do not use
```
logs_*.sqlite
```
as the main statistics source. The log database can be used for debugging, but project dimensions and rollout paths come from the
```
threads
```
table of the state database.

Example query:

bash

STATE_DB="$(find ~/.codex -maxdepth 1 -name 'state_*.sqlite' -type f -print | sort | tail -1)"
sqlite3 -json "$STATE_DB" \
  "select id, rollout_path, cwd, title, preview, first_user_message, tokens_used, created_at, updated_at, model, reasoning_effort, agent_role, thread_source from threads"

Note: Use script commands compatible with Windows PowerShell for queries on Windows

It is recommended to aggregate using the script instead of only using

threads.tokens_used

. The reason is that

threads.tokens_used

is a thread-level cumulative value, which cannot be split by day or broken down into types like cached input, output, reasoning, etc.

时间范围

Time Range

默认口径：

时区：
```
Asia/Shanghai
```
当前日期以运行环境日期为准。
“最近 7 天”表示包含今天在内的 7 个自然日。
排除
```
weekday >= 5
```
的日期，即周六和周日。
用 JSONL
```
token_count
```
事件的
```
timestamp
```
判断是否落入统计范围。

示例：如果当前日期是

2026-05-18

，统计范围为

2026-05-12

到

2026-05-18

，实际计入工作日为

2026-05-12

、

2026-05-13

、

2026-05-14

、

2026-05-15

、

2026-05-18

。

Default specifications:

Time zone:
```
Asia/Shanghai
```
Current date is based on the runtime environment date.
"Last 7 days" means 7 natural days including today.
Exclude dates where
```
weekday >= 5
```
, i.e., Saturdays and Sundays.
Use the
```
timestamp
```
of JSONL
```
token_count
```
events to determine if they fall within the statistical range.

Example: If the current date is

2026-05-18

, the statistical range is from

2026-05-12

2026-05-18

, and the actual working days included are

2026-05-12

2026-05-13

2026-05-14

2026-05-15

2026-05-18

聚合算法

Aggregation Algorithm

对每个 rollout JSONL 文件：

读取对应
```
threads
```
行，生成“任务目的”作为主维度。
顺序读取 JSONL 中的
```
token_count
```
事件。
每个事件里的
```
total_token_usage
```
是线程累计值，不是单次增量。
对同一文件内相邻
```
token_count
```
做差，得到本次事件增量。
只累加事件时间落入统计工作日范围内的增量。
按“任务目的”聚合总量。
同时保留项目路径、模型、会话 ID 等明细，便于解释异常大头

任务目的生成规则：

优先使用
```
threads.title
```
。
如果
```
title
```
为空或明显是系统审查/子 agent 长指令，可回退到
```
threads.preview
```
。
如果
```
preview
```
也不可用，可回退到
```
threads.first_user_message
```
的首行或前 80 个字符。
保留
```
thread_source
```
和
```
agent_role
```
，因为子 agent 的任务目的通常来自 spawn 指令，不一定等同用户原始需求。
对过长任务目的做展示截断，但聚合键应使用完整值，避免不同任务被合并。

如果某个线程创建于统计范围内但找不到任何

token_count

事件，需要在结果里说明缺失，不要静默忽略。

For each rollout JSONL file:

Read the corresponding
```
threads
```
row and generate "task purpose" as the main dimension.
Sequentially read
```
token_count
```
events in JSONL.
```
total_token_usage
```
in each event is a cumulative value for the thread, not a single increment.
Calculate the difference between adjacent
```
token_count
```
events in the same file to get the increment of this event.
Only accumulate increments whose event time falls within the statistical working days.
Aggregate the total amount by "task purpose".
At the same time, retain details such as project path, model, session ID to facilitate explaining abnormal large amounts.

Task purpose generation rules:

Prioritize using
```
threads.title
```
.
If
```
title
```
is empty or obviously a system review/sub-agent long instruction, fall back to
```
threads.preview
```
.
If
```
preview
```
is also unavailable, fall back to the first line or first 80 characters of
```
threads.first_user_message
```
.
Retain
```
thread_source
```
and
```
agent_role
```
, because the task purpose of sub-agents usually comes from spawn instructions, which may not be equivalent to the user's original requirement.
Truncate overly long task purposes for display, but use the complete value as the aggregation key to avoid merging different tasks.

If a thread was created within the statistical range but no

token_count

events are found, it needs to be explained in the results instead of being silently ignored.

Token 分类

Token Categories

Codex 本地记录能直接拆出的字段：

```
input_tokens
```
```
cached_input_tokens
```
```
output_tokens
```
```
reasoning_output_tokens
```
```
total_tokens
```

推荐输出分类：

新输入/检索上下文

：

input_tokens - cached_input_tokens

```
缓存输入
```
：
```
cached_input_tokens
```

输出/写代码

：

output_tokens - reasoning_output_tokens

```
思考
```
：
```
reasoning_output_tokens
```

Fields that can be directly extracted from local Codex records:

```
input_tokens
```
```
cached_input_tokens
```
```
output_tokens
```
```
reasoning_output_tokens
```
```
total_tokens
```

Recommended output categories:

New Input/Retrieval Context

input_tokens - cached_input_tokens

```
Cached Input
```
:
```
cached_input_tokens
```

Output/Code Writing

output_tokens - reasoning_output_tokens

```
Reasoning
```
:
```
reasoning_output_tokens
```

统计维度

Statistical Dimensions

主维度：

任务目的：优先来自

threads.title

，必要时回退到

threads.preview

或

threads.first_user_message

默认指标：

```
项目
```
```
模型
```
```
Token 总量
```
```
占比
```
```
新输入/检索上下文
```
```
缓存输入
```
```
输出/写代码
```
```
思考
```

可选辅助维度：

日期
会话 ID：
```
id
```
项目路径：
```
cwd
```
模型：
```
model
```
推理强度：
```
reasoning_effort
```
子 agent 类型：
```
agent_role
```
线程来源：
```
thread_source
```

Main dimension:

Task purpose: Prioritized from
```
threads.title
```
, fall back to
```
threads.preview
```
or
```
threads.first_user_message
```
if necessary.

Default metrics:

```
Project
```
```
Model
```
```
Total Tokens
```
```
Proportion
```
```
New Input/Retrieval Context
```
```
Cached Input
```
```
Output/Code Writing
```
```
Reasoning
```

Optional auxiliary dimensions:

Date
Session ID:
```
id
```
Project path:
```
cwd
```
Model:
```
model
```
Reasoning effort:
```
reasoning_effort
```
Sub-agent type:
```
agent_role
```
Thread source:
```
thread_source
```

模型

Model

模型字段：

优先使用
```
threads.model
```
。
如果
```
threads.model
```
为空，可从 rollout JSONL 的
```
turn_context.payload.model
```
中寻找模型字段。
不要把
```
session_meta.payload.model_provider
```
当成模型名；它通常只是供应商，例如
```
openai
```
。
推理强度优先使用
```
threads.reasoning_effort
```
；如果直接读 JSONL，可用
```
turn_context.payload.effort
```
。
同一个任务目的下如果出现多个模型，表格中用逗号列出，或显示为
```
mixed: model-a / model-b
```
。

Model field:

Prioritize using
```
threads.model
```
.
If
```
threads.model
```
is empty, look for the model field from
```
turn_context.payload.model
```
in the rollout JSONL.
Do not treat
```
session_meta.payload.model_provider
```
as the model name; it is usually just the provider, e.g.,
```
openai
```
.
Prioritize using
```
threads.reasoning_effort
```
for reasoning intensity; if reading JSONL directly, use
```
turn_context.payload.effort
```
.
If multiple models appear under the same task purpose, list them with commas in the table, or display as
```
mixed: model-a / model-b
```
.

输出格式

Output Format

先用一段话说明统计口径，再输出表格。默认输出“简单”方案；只有用户明确要求详细分类、Token 类型占比、输入/缓存/输出/思考拆分时，才输出“详细”方案。

First use a paragraph to explain the statistical specifications, then output the table. The "simple" scheme is output by default; the "detailed" scheme is only output when the user explicitly requests detailed classification, Token type proportion, or split of input/cache/output/reasoning.

简单方案（默认）

Simple Scheme (Default)

先输出

合计

摘要行，再输出明细表格。

合计

展示总 Token 量和 100% 占比。

markdown

**合计 Token**：XXX（100%）

表格字段固定为：

任务目的	模型	Token 总量	占比

示例：

markdown

| `统计近7天项目Token消耗` | `gpt-5.5` | 255,187,782 | 99.55% |

明细表格中不再包含

合计

行。

First output the

Total

summary row, then the detailed table. The

Total

shows the total Token amount and 100% proportion.

markdown

**Total Tokens**: XXX (100%)

Fixed table fields:

Task Purpose	Model	Total Tokens	Proportion

Example:

markdown

| `Count project Token consumption in the last 7 days` | `gpt-5.5` | 255,187,782 | 99.55% |

The

Total

row is no longer included in the detailed table.

详细方案

Detailed Scheme

用户要求详细输出时，先输出

合计

摘要行，再输出明细表格。

合计

展示各分类汇总值及占总量的百分比。

markdown

**合计 Token**：XXX（100%）| 新输入/检索上下文：XXX (XX%) | 缓存输入：XXX (XX%) | 输出/写代码：XXX (XX%) | 思考：XXX (XX%)

明细表格使用当前完整字段：

任务目的	项目	模型	Token 总量	占比	新输入/检索上下文	缓存输入	输出/写代码	思考

每个分类字段建议显示为：

text

数量 (占该任务总量百分比)

例如：

markdown

| `统计近7天项目Token消耗` | `/path/to/project` | `gpt-5.5` | 255,187,782 | 99.55% | 25,302,445 (9.91%) | 229,055,744 (89.76%) | 619,303 (0.24%) | 210,290 (0.08%) |

明细表格中不再包含

合计

行。

When the user requests detailed output, first output the

Total

summary row, then the detailed table. The

Total

shows the aggregated value of each category and its percentage of the total amount.

markdown

**Total Tokens**: XXX (100%) | New Input/Retrieval Context: XXX (XX%) | Cached Input: XXX (XX%) | Output/Code Writing: XXX (XX%) | Reasoning: XXX (XX%)

The detailed table uses the complete current fields:

Task Purpose	Project	Model	Total Tokens	Proportion	New Input/Retrieval Context	Cached Input	Output/Code Writing	Reasoning

Each category field is recommended to be displayed as:

text

Amount (percentage of the total amount of this task)

Example:

markdown

| `Count project Token consumption in the last 7 days` | `/path/to/project` | `gpt-5.5` | 255,187,782 | 99.55% | 25,302,445 (9.91%) | 229,055,744 (89.76%) | 619,303 (0.24%) | 210,290 (0.08%) |

The

Total

row is no longer included in the detailed table.

结果说明

Result Explanation

表格后应补充一段限制说明：

数据来自本机
```
~/.codex/state_*.sqlite
```
中符合 schema 的状态库，以及 rollout JSONL。
分类中的
```
缓存输入
```
、
```
输出
```
、
```
思考
```
是 Codex 记录的结构化 Token 字段。
“搜索、写代码、读文件”等语义类型没有独立精确字段，只能通过 Token 字段近似解释。
如果存在缺失 rollout 文件或缺失
```
token_count
```
事件的线程，需要列出受影响线程数量和路径。

A paragraph of limitation explanation should be added after the table:

Data comes from the state database that conforms to the schema in the local
```
~/.codex/state_*.sqlite
```
, and rollout JSONL.
```
Cached Input
```
,
```
Output
```
,
```
Reasoning
```
in the categories are structured Token fields recorded by Codex.
Semantic types such as "search, code writing, file reading" do not have independent precise fields and can only be approximately explained through Token fields.
If there are threads with missing rollout files or missing
```
token_count
```
events, list the number of affected threads and their paths.

Recommended Verification

统计完成后做两项校验：

检查是否有统计范围内创建的线程没有
```
token_count
```
事件。
对比项目聚合总量与会话增量总量之和是否一致。

Perform two verifications after statistics are completed:

Check if there are threads created within the statistical range without
```
token_count
```
events.
Compare whether the total aggregated amount of projects is consistent with the sum of session incremental amounts.

codex-token-summary

Original

Translation

Codex Token 消耗统计

Codex Token Consumption Statistics

快速使用（推荐使用 Python 脚本）

Quick Start (Recommended to Use Python Script)

常用命令

Common Commands

最近 7 天简单表格（默认）

Simple table for the last 7 days (default)

最近 30 天

Last 30 days

自定义日期范围

Custom date range

详细格式（含 Token 分类）

Detailed format (including Token categories)

输出为 JSON

Output as JSON

包含周末

Include weekends

Windows 环境初始化

Windows Environment Initialization

以管理员身份运行 PowerShell，然后执行

Run PowerShell as administrator, then execute

脚本参数速查

Quick Reference for Script Parameters

适用场景

Applicable Scenarios

数据源

Data Sources

访问方式

Access Method

时间范围

Time Range

聚合算法

Aggregation Algorithm

Token 分类

Token Categories

统计维度

Statistical Dimensions

模型

Model

输出格式

Output Format

简单方案（默认）

Simple Scheme (Default)

详细方案

Detailed Scheme

结果说明

Result Explanation

推荐校验

Recommended Verification