codex-history-ingest
Original:🇺🇸 English
Translated
Ingest Codex CLI conversation history into the Obsidian wiki. Use this skill when the user wants to mine their past Codex sessions for knowledge, import their ~/.codex folder, extract insights from previous coding sessions, or says things like "process my Codex history", "add my Codex conversations to the wiki", or "what have I discussed in Codex before". Also triggers when the user mentions .codex sessions, rollout files, session_index.jsonl, or Codex transcript logs.
4installs
Sourcear9av/obsidian-wiki
Added on
NPX Install
npx skill4agent add ar9av/obsidian-wiki codex-history-ingestTags
Translated version includes tags in frontmatterSKILL.md Content
View Translation Comparison →Codex History Ingest — Conversation Mining
You are extracting knowledge from the user's past Codex sessions and distilling it into the Obsidian wiki. Session logs are rich but noisy: focus on durable knowledge, not operational telemetry.
This skill can be invoked directly or via the router ().
wiki-history-ingest/wiki-history-ingest codexBefore You Start
- Read to get
.envandOBSIDIAN_VAULT_PATH(default toCODEX_HISTORY_PATHif unset)~/.codex - Read at the vault root to check what has already been ingested
.manifest.json - Read at the vault root to understand what the wiki already contains
index.md
Ingest Modes
Append Mode (default)
Check for each source file. Only process:
.manifest.json- Files not in the manifest (new session rollouts, new index files)
- Files whose modification time is newer than in the manifest
ingested_at
Use this mode for regular syncs.
Full Mode
Process everything regardless of manifest. Use after or if the user explicitly asks for a full re-ingest.
wiki-rebuildCodex Data Layout
Codex stores local artifacts under .
~/.codex/~/.codex/
├── sessions/ # Session rollout logs by date
│ └── YYYY/MM/DD/
│ └── rollout-<timestamp>-<id>.jsonl
├── archived_sessions/ # Archived rollout logs
├── session_index.jsonl # Lightweight index of thread id/name/updated_at
├── history.jsonl # Local transcript history (if persistence enabled)
├── config.toml # User config (contains history settings)
└── state_*.sqlite / logs_*.sqlite # Runtime DBs (usually skip)Key data sources ranked by value
- — best inventory source for IDs, titles, and freshness
session_index.jsonl - — rich structured transcript events
sessions/**/rollout-*.jsonl - — useful fallback/timeline aid if enabled
history.jsonl
Avoid ingesting SQLite internals unless the user explicitly asks.
Step 1: Survey and Compute Delta
Scan and compare against :
CODEX_HISTORY_PATH.manifest.json~/.codex/session_index.jsonl~/.codex/sessions/**/rollout-*.jsonl- (optional; only if user asks for archived history)
~/.codex/archived_sessions/** - (optional fallback)
~/.codex/history.jsonl
Classify each file:
- New — not in manifest
- Modified — in manifest but file is newer than
ingested_at - Unchanged — already ingested and unchanged
Report a concise delta summary before deep parsing.
Step 2: Parse Session Index First
session_index.jsonljson
{"id":"...","thread_name":"...","updated_at":"..."}Use it to:
- Build a canonical session inventory
- Prioritize recent/high-signal sessions
- Map rollout IDs to human-readable thread names
Step 3: Parse Rollout JSONL Safely
Each line is an event envelope with:
rollout-*.jsonljson
{
"timestamp": "...",
"type": "session_meta|turn_context|event_msg|response_item",
"payload": { ... }
}Extraction rules
- Prioritize user intent and assistant-visible outputs
- Favor records with user/assistant message content
response_item - Use selectively for meaningful milestones; ignore pure telemetry
event_msg - Treat as metadata (cwd, model, ids), not user knowledge
session_meta
Skip/noise filters
- Token accounting events
- Tool plumbing with no semantic content
- Raw command output unless it contains reusable decisions/patterns
- Repeated plan snapshots unless they add novel decisions
Critical privacy filter
Rollout logs can include injected instructions, tool payloads, and sensitive text. Do not ingest verbatim system/developer prompts or secrets.
- Remove API keys, tokens, passwords, credentials
- Redact private identifiers unless relevant and approved
- Summarize instead of quoting raw transcripts
Step 4: Cluster by Topic
Do not create one wiki page per session.
- Group by stable topics across many sessions
- Split mixed sessions into separate themes
- Merge recurring concepts across dates/projects
- Use from metadata to infer project scope
cwd
Step 5: Distill into Wiki Pages
Route extracted knowledge using existing wiki conventions:
- Project-specific architecture/process ->
projects/<name>/... - General concepts ->
concepts/ - Recurring techniques/debug playbooks ->
skills/ - Tools/services ->
entities/ - Cross-session patterns ->
synthesis/
For each impacted project, create/update (project name as filename, never ).
projects/<name>/<name>.md_project.mdWriting rules
- Distill knowledge, not chronology
- Avoid "on date X we discussed..." unless date context is essential
- Add frontmatter on each new/updated page (1-2 sentences, <= 200 chars)
summary: - Add provenance markers:
- when directly grounded in explicit session content
^[extracted] - when synthesizing patterns across events/sessions
^[inferred] - when sessions conflict
^[ambiguous]
- Add/update frontmatter mix for each changed page
provenance:
Step 6: Update Manifest, Log, and Index
Update .manifest.json
.manifest.jsonFor each processed source file:
- ,
ingested_at,size_bytesmodified_at - :
source_type|codex_rollout|codex_indexcodex_history - : inferred project name (when applicable)
project - ,
pages_createdpages_updated
Add/update a top-level project/session summary block:
json
{
"project-name": {
"source_path": "~/.codex/sessions/...",
"last_ingested": "TIMESTAMP",
"sessions_ingested": 12,
"sessions_total": 40,
"index_updated_at": "TIMESTAMP"
}
}Update special files
Update and :
index.mdlog.md- [TIMESTAMP] CODEX_HISTORY_INGEST sessions=N pages_updated=X pages_created=Y mode=append|fullPrivacy and Compliance
- Distill and synthesize; avoid raw transcript dumps
- Default to redaction for anything that looks sensitive
- Ask the user before storing personal/sensitive details
- Keep references to other people minimal and purpose-bound
Reference
See for field-level parsing notes and extraction guidance.
references/codex-data-format.md