codex-history-ingest

Original🇺🇸 English
Translated

Ingest Codex CLI conversation history into the Obsidian wiki. Use this skill when the user wants to mine their past Codex sessions for knowledge, import their ~/.codex folder, extract insights from previous coding sessions, or says things like "process my Codex history", "add my Codex conversations to the wiki", or "what have I discussed in Codex before". Also triggers when the user mentions .codex sessions, rollout files, session_index.jsonl, or Codex transcript logs.

4installs
Added on

NPX Install

npx skill4agent add ar9av/obsidian-wiki codex-history-ingest

Tags

Translated version includes tags in frontmatter

Codex History Ingest — Conversation Mining

You are extracting knowledge from the user's past Codex sessions and distilling it into the Obsidian wiki. Session logs are rich but noisy: focus on durable knowledge, not operational telemetry.
This skill can be invoked directly or via the
wiki-history-ingest
router (
/wiki-history-ingest codex
).

Before You Start

  1. Read
    .env
    to get
    OBSIDIAN_VAULT_PATH
    and
    CODEX_HISTORY_PATH
    (default to
    ~/.codex
    if unset)
  2. Read
    .manifest.json
    at the vault root to check what has already been ingested
  3. Read
    index.md
    at the vault root to understand what the wiki already contains

Ingest Modes

Append Mode (default)

Check
.manifest.json
for each source file. Only process:
  • Files not in the manifest (new session rollouts, new index files)
  • Files whose modification time is newer than
    ingested_at
    in the manifest
Use this mode for regular syncs.

Full Mode

Process everything regardless of manifest. Use after
wiki-rebuild
or if the user explicitly asks for a full re-ingest.

Codex Data Layout

Codex stores local artifacts under
~/.codex/
.
~/.codex/
├── sessions/                          # Session rollout logs by date
│   └── YYYY/MM/DD/
│       └── rollout-<timestamp>-<id>.jsonl
├── archived_sessions/                 # Archived rollout logs
├── session_index.jsonl                # Lightweight index of thread id/name/updated_at
├── history.jsonl                      # Local transcript history (if persistence enabled)
├── config.toml                        # User config (contains history settings)
└── state_*.sqlite / logs_*.sqlite     # Runtime DBs (usually skip)

Key data sources ranked by value

  1. session_index.jsonl
    — best inventory source for IDs, titles, and freshness
  2. sessions/**/rollout-*.jsonl
    — rich structured transcript events
  3. history.jsonl
    — useful fallback/timeline aid if enabled
Avoid ingesting SQLite internals unless the user explicitly asks.

Step 1: Survey and Compute Delta

Scan
CODEX_HISTORY_PATH
and compare against
.manifest.json
:
  • ~/.codex/session_index.jsonl
  • ~/.codex/sessions/**/rollout-*.jsonl
  • ~/.codex/archived_sessions/**
    (optional; only if user asks for archived history)
  • ~/.codex/history.jsonl
    (optional fallback)
Classify each file:
  • New — not in manifest
  • Modified — in manifest but file is newer than
    ingested_at
  • Unchanged — already ingested and unchanged
Report a concise delta summary before deep parsing.

Step 2: Parse Session Index First

session_index.jsonl
typically has entries like:
json
{"id":"...","thread_name":"...","updated_at":"..."}
Use it to:
  • Build a canonical session inventory
  • Prioritize recent/high-signal sessions
  • Map rollout IDs to human-readable thread names

Step 3: Parse Rollout JSONL Safely

Each
rollout-*.jsonl
line is an event envelope with:
json
{
  "timestamp": "...",
  "type": "session_meta|turn_context|event_msg|response_item",
  "payload": { ... }
}

Extraction rules

  • Prioritize user intent and assistant-visible outputs
  • Favor
    response_item
    records with user/assistant message content
  • Use
    event_msg
    selectively for meaningful milestones; ignore pure telemetry
  • Treat
    session_meta
    as metadata (cwd, model, ids), not user knowledge

Skip/noise filters

  • Token accounting events
  • Tool plumbing with no semantic content
  • Raw command output unless it contains reusable decisions/patterns
  • Repeated plan snapshots unless they add novel decisions

Critical privacy filter

Rollout logs can include injected instructions, tool payloads, and sensitive text. Do not ingest verbatim system/developer prompts or secrets.
  • Remove API keys, tokens, passwords, credentials
  • Redact private identifiers unless relevant and approved
  • Summarize instead of quoting raw transcripts

Step 4: Cluster by Topic

Do not create one wiki page per session.
  • Group by stable topics across many sessions
  • Split mixed sessions into separate themes
  • Merge recurring concepts across dates/projects
  • Use
    cwd
    from metadata to infer project scope

Step 5: Distill into Wiki Pages

Route extracted knowledge using existing wiki conventions:
  • Project-specific architecture/process ->
    projects/<name>/...
  • General concepts ->
    concepts/
  • Recurring techniques/debug playbooks ->
    skills/
  • Tools/services ->
    entities/
  • Cross-session patterns ->
    synthesis/
For each impacted project, create/update
projects/<name>/<name>.md
(project name as filename, never
_project.md
).

Writing rules

  • Distill knowledge, not chronology
  • Avoid "on date X we discussed..." unless date context is essential
  • Add
    summary:
    frontmatter on each new/updated page (1-2 sentences, <= 200 chars)
  • Add provenance markers:
    • ^[extracted]
      when directly grounded in explicit session content
    • ^[inferred]
      when synthesizing patterns across events/sessions
    • ^[ambiguous]
      when sessions conflict
  • Add/update
    provenance:
    frontmatter mix for each changed page

Step 6: Update Manifest, Log, and Index

Update
.manifest.json

For each processed source file:
  • ingested_at
    ,
    size_bytes
    ,
    modified_at
  • source_type
    :
    codex_rollout
    |
    codex_index
    |
    codex_history
  • project
    : inferred project name (when applicable)
  • pages_created
    ,
    pages_updated
Add/update a top-level project/session summary block:
json
{
  "project-name": {
    "source_path": "~/.codex/sessions/...",
    "last_ingested": "TIMESTAMP",
    "sessions_ingested": 12,
    "sessions_total": 40,
    "index_updated_at": "TIMESTAMP"
  }
}

Update special files

Update
index.md
and
log.md
:
- [TIMESTAMP] CODEX_HISTORY_INGEST sessions=N pages_updated=X pages_created=Y mode=append|full

Privacy and Compliance

  • Distill and synthesize; avoid raw transcript dumps
  • Default to redaction for anything that looks sensitive
  • Ask the user before storing personal/sensitive details
  • Keep references to other people minimal and purpose-bound

Reference

See
references/codex-data-format.md
for field-level parsing notes and extraction guidance.