claude-history-ingest
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseClaude History Ingest — Conversation Mining
Claude历史导入 — 对话挖掘
You are extracting knowledge from the user's past Claude Code conversations and distilling it into the Obsidian wiki. Conversations are rich but messy — your job is to find the signal and compile it.
This skill can be invoked directly or via the router ().
wiki-history-ingest/wiki-history-ingest claude你需要从用户过往的Claude Code对话中提取知识,并提炼到Obsidian wiki中。对话内容丰富但杂乱,你的任务是筛选有效信息并进行整理。
本技能可以直接调用,也可以通过路由调用()。
wiki-history-ingest/wiki-history-ingest claudeBefore You Start
开始前准备
- Read to get
.envandOBSIDIAN_VAULT_PATH(defaults toCLAUDE_HISTORY_PATH)~/.claude - Read at the vault root to check what's already been ingested
.manifest.json - Read at the vault root to know what the wiki already contains
index.md
- 读取获取
.env和OBSIDIAN_VAULT_PATH(默认路径为CLAUDE_HISTORY_PATH)~/.claude - 读取知识库根目录下的,检查哪些内容已经被导入过
.manifest.json - 读取知识库根目录下的,了解wiki已有的内容
index.md
Ingest Modes
导入模式
Append Mode (default)
追加模式(默认)
Check for each source file (conversation JSONL, memory file). Only process:
.manifest.json- Files not in the manifest (new conversations, new memory files, new projects)
- Files whose modification time is newer than their in the manifest
ingested_at
This is usually what you want — the user ran a few new sessions and wants to capture the delta.
检查每个源文件(对话JSONL、记忆文件)的记录,仅处理以下文件:
.manifest.json- 未出现在清单中的文件(新对话、新记忆文件、新项目)
- 修改时间晚于清单中记录的文件
ingested_at
这是最常用的模式——用户运行了几次新会话,想要捕获增量更新。
Full Mode
全量模式
Process everything regardless of manifest. Use after a or if the user explicitly asks.
wiki-rebuild忽略清单记录,处理所有文件。在之后或用户明确要求时使用。
wiki-rebuildClaude Code Data Layout
Claude Code数据结构
Claude Code stores everything under . Here is the actual structure:
~/.claude/~/.claude/
├── projects/ # Per-project directories
│ ├── -Users-name-project-a/ # Path-derived name (slashes → dashes)
│ │ ├── <session-uuid>.jsonl # Conversation data (JSONL)
│ │ └── memory/ # Structured memories
│ │ ├── MEMORY.md # Memory index
│ │ ├── user_*.md # User profile memories
│ │ ├── feedback_*.md # Workflow feedback memories
│ │ └── project_*.md # Project context memories
│ ├── -Users-name-project-b/
│ │ └── ...
├── sessions/ # Session metadata (JSON)
│ └── <pid>.json # {pid, sessionId, cwd, startedAt, kind, entrypoint}
├── history.jsonl # Global session history
├── tasks/ # Subagent task data
├── plans/ # Saved plans
└── settings.jsonClaude Code将所有内容存储在路径下,实际结构如下:
~/.claude/~/.claude/
├── projects/ # 按项目划分的目录
│ ├── -Users-name-project-a/ # 路径派生的名称(斜杠→短横线)
│ │ ├── <session-uuid>.jsonl # 对话数据(JSONL格式)
│ │ └── memory/ # 结构化记忆
│ │ ├── MEMORY.md # 记忆索引
│ │ ├── user_*.md # 用户个人资料记忆
│ │ ├── feedback_*.md # 工作流反馈记忆
│ │ └── project_*.md # 项目上下文记忆
│ ├── -Users-name-project-b/
│ │ └── ...
├── sessions/ # 会话元数据(JSON格式)
│ └── <pid>.json # {pid, sessionId, cwd, startedAt, kind, entrypoint}
├── history.jsonl # 全局会话历史
├── tasks/ # 子Agent任务数据
├── plans/ # 已保存的计划
└── settings.jsonKey data sources ranked by value:
按价值排序的核心数据源:
- Memory files () — Pre-distilled, already wiki-friendly. These contain the user's preferences, project decisions, and feedback. Gold.
projects/*/memory/*.md - Conversation JSONL () — Full conversation transcripts. Rich but noisy.
projects/*/*.jsonl - Session metadata () — Tells you which project, when, and what CWD.
sessions/*.json
- 记忆文件()—— 预提炼内容,天然适配wiki格式,包含用户偏好、项目决策和反馈,含金量最高。
projects/*/memory/*.md - 对话JSONL()—— 完整的对话记录,内容丰富但存在噪音。
projects/*/*.jsonl - 会话元数据()—— 记录所属项目、时间、当前工作目录等信息。
sessions/*.json
Step 1: Survey and Compute Delta
步骤1:排查并计算增量
Scan and compare against :
CLAUDE_HISTORY_PATH.manifest.jsonundefined扫描并与对比:
CLAUDE_HISTORY_PATH.manifest.jsonundefinedFind all projects
查找所有项目
Glob: ~/.claude/projects/*/
Glob: ~/.claude/projects/*/
Find memory files (highest value)
查找记忆文件(最高价值)
Glob: ~/.claude/projects//memory/.md
Glob: ~/.claude/projects//memory/.md
Find conversation JSONL files
查找对话JSONL文件
Glob: ~/.claude/projects//.jsonl
Build an inventory and classify each file:
- **New** — not in manifest → needs ingesting
- **Modified** — in manifest but file is newer → needs re-ingesting
- **Unchanged** — in manifest and not modified → skip in append mode
Report to the user: "Found X projects, Y conversations, Z memory files. Delta: A new, B modified."Glob: ~/.claude/projects//.jsonl
构建清单并对每个文件分类:
- **新增**—— 未出现在清单中 → 需要导入
- **已修改**—— 已在清单中,但文件更新时间更新 → 需要重新导入
- **未改动**—— 已在清单中且无修改 → 追加模式下跳过
向用户报告:"共找到X个项目,Y条对话,Z个记忆文件。增量:A个新增,B个已修改。"Step 2: Ingest Memory Files First
步骤2:优先导入记忆文件
Memory files are already structured with YAML frontmatter:
markdown
---
name: memory-name
description: one-line description
type: user|feedback|project|reference
---
Memory content here.For each memory file:
- Read it and parse the frontmatter
- type → feeds into an entity page about the user, or concept pages about their domain
user - type → feeds into skills pages (workflow patterns, what works, what doesn't)
feedback - type → feeds into entity pages for the project
project - type → feeds into reference pages pointing to external resources
reference
The index file in each project is a quick summary — read it first to decide which individual memory files are worth reading in full.
MEMORY.md记忆文件已自带YAML frontmatter结构化内容:
markdown
---
name: memory-name
description: 单行描述
type: user|feedback|project|reference
---
记忆内容此处展示。每个记忆文件的处理规则:
- 读取文件并解析frontmatter
- 类型 → 归入用户实体页面,或用户所属领域的概念页面
user - 类型 → 归入技能页面(工作流模式、有效方案、无效方案)
feedback - 类型 → 归入对应项目的实体页面
project - 类型 → 归入指向外部资源的参考页面
reference
每个项目下的索引文件是快速摘要,优先读取该文件来判断哪些单独的记忆文件值得完整读取。
MEMORY.mdStep 3: Parse Conversation JSONL
步骤3:解析对话JSONL
Each JSONL file is one conversation session. Each line is a JSON object:
json
{
"type": "user|assistant|progress|file-history-snapshot",
"message": {
"role": "user|assistant",
"content": "text string"
},
"uuid": "...",
"timestamp": "2026-03-15T10:30:00.000Z",
"sessionId": "...",
"cwd": "/path/to/project",
"version": "2.1.59"
}For assistant messages, may be an array of content blocks:
contentjson
{
"content": [
{"type": "thinking", "text": "..."},
{"type": "text", "text": "The actual response..."},
{"type": "tool_use", "name": "Read", "input": {...}}
]
}What to extract from conversations:
- Filter to and
type: "user"entries onlytype: "assistant" - For assistant entries, extract blocks (skip
textandthinking— those are noise)tool_use - The field tells you which project this conversation belongs to
cwd - The project directory name (e.g., ) tells you the project path
-Users-name-Documents-projects-my-app
Skip these:
- — internal agent progress updates
type: "progress" - — file state tracking
type: "file-history-snapshot" - Subagent conversations (under subdirectories) — unless the user specifically asks
subagents/
每个JSONL文件对应一个会话,每一行是一个JSON对象:
json
{
"type": "user|assistant|progress|file-history-snapshot",
"message": {
"role": "user|assistant",
"content": "文本字符串"
},
"uuid": "...",
"timestamp": "2026-03-15T10:30:00.000Z",
"sessionId": "...",
"cwd": "/path/to/project",
"version": "2.1.59"
}助手消息的可能是内容块数组:
contentjson
{
"content": [
{"type": "thinking", "text": "..."},
{"type": "text", "text": "实际回复内容..."},
{"type": "tool_use", "name": "Read", "input": {...}}
]
}从对话中提取的内容:
- 仅筛选和
type: "user"的条目type: "assistant" - 对于助手条目,仅提取块(跳过
text和thinking——属于噪音内容)tool_use - 字段可告知对话所属的项目
cwd - 项目目录名(例如)可告知项目路径
-Users-name-Documents-projects-my-app
跳过以下内容:
- —— Agent内部进度更新
type: "progress" - —— 文件状态跟踪记录
type: "file-history-snapshot" - 子Agent对话(子目录下)—— 除非用户明确要求保留
subagents/
Step 4: Cluster by Topic
步骤4:按主题聚类
Don't create one wiki page per conversation. Instead:
- Group extracted knowledge by topic across conversations
- A single conversation about "debugging auth + setting up CI" → two separate topics
- Three conversations across different days about "React performance" → one merged topic
- The project directory name gives you a natural first-level grouping
不要为单条对话创建单独的wiki页面,而是:
- 跨会话将提取的知识按主题分组
- 涉及“调试认证+配置CI”的单条对话 → 拆分为两个独立主题
- 不同日期内关于“React性能”的三条对话 → 合并为一个主题
- 项目目录名可作为天然的一级分组维度
Step 5: Distill into Wiki Pages
步骤5:提炼为wiki页面
Each Claude project maps to a project directory in the vault. The project directory name from encodes the original path — decode it to get a clean project name:
~/.claude/projects/-Users/Documents/projects/my-Project → myproject
-Users/Documents/projects/Another-app → anotherapp每个Claude项目对应知识库中的一个项目目录。下的项目目录名编码了原始路径,解码后可得到清晰的项目名:
~/.claude/projects/-Users/Documents/projects/my-Project → myproject
-Users/Documents/projects/Another-app → anotherappProject-specific vs. global knowledge
项目专属知识 vs 全局知识
| What you found | Where it goes | Example |
|---|---|---|
| Project architecture decisions | | |
| Project-specific debugging | | |
| General concept the user learned | | |
| Recurring problem across projects | | |
| A tool/service used | | |
| Patterns across many conversations | | |
For each project with content, create or update the project overview page at — named after the project, not . Obsidian's graph view uses the filename as the node label, so makes every project show up as in the graph. Naming it gives each project a distinct, readable node name.
projects/<name>/<name>.md_project.md_project.md_project<name>.mdImportant: Distill the knowledge, not the conversation. Don't write "In a conversation on March 15, the user asked about X." Write the knowledge itself, with the conversation as a source attribution.
Write a frontmatter field on every new/updated page — 1–2 sentences, ≤200 chars, answering "what is this page about?" for a reader who hasn't opened it. 's cheap retrieval path reads this field to avoid opening page bodies.
summary:wiki-queryMark provenance per the convention in (Provenance Markers section):
llm-wiki- Memory files are mostly extracted — the user wrote them by hand and they're already distilled. Treat memory-derived claims as extracted unless you're stitching together claims from multiple memory files.
- Conversation distillation is mostly inferred. You're synthesizing a coherent claim from many turns of dialogue, often filling in implicit reasoning. Apply liberally to synthesized patterns, generalizations across sessions, and "what the user really meant" interpretations.
^[inferred] - Use when the user changed their mind across sessions or when assistant and user contradicted each other and the resolution is unclear.
^[ambiguous] - Write a frontmatter block on every new/updated page summarizing the rough mix.
provenance:
| 提取内容 | 存放位置 | 示例 |
|---|---|---|
| 项目架构决策 | | |
| 项目专属调试方案 | | |
| 用户学到的通用概念 | | |
| 跨项目的共性问题 | | |
| 使用过的工具/服务 | | |
| 多会话中提炼的共性模式 | | |
对于有内容的每个项目,在路径下创建或更新项目概览页面——以项目名命名,不要用。Obsidian的图谱视图使用文件名作为节点标签,会导致所有项目在图谱中都显示为,使用可以让每个项目拥有清晰可识别的节点名称。
projects/<name>/<name>.md_project.md_project.md_project<name>.md重要: 提炼知识本身,而非对话过程。不要写“在3月15日的对话中,用户询问了X相关内容”,直接呈现知识内容,将对话作为来源标注。
每个新建/更新的页面都要添加 frontmatter字段——1-2句话,不超过200字符,回答“这个页面是关于什么的”,方便未打开页面的读者快速了解。的轻量检索路径会读取该字段,避免打开页面正文。
summary:wiki-query按照的约定标注来源(来源标记部分):
llm-wiki- 记忆文件多为直接提取内容——由用户手动编写,已经过提炼。除非你需要拼接多个记忆文件的内容,否则将记忆衍生的内容标记为提取内容。
- 对话提炼多为推断内容。你需要从多轮对话中合成连贯的结论,通常需要补充隐含的逻辑。对于合成的模式、跨会话的概括、“用户实际想表达的内容”这类解读,广泛使用标记。
^[inferred] - 当用户在不同会话中改变想法,或者助手与用户的内容存在矛盾且没有明确解决方案时,使用标记。
^[ambiguous] - 每个新建/更新的页面都要添加frontmatter块,概括内容的来源构成。
provenance:
Step 6: Update Manifest, Journal, and Special Files
步骤6:更新清单、日志和特殊文件
Update .manifest.json
.manifest.json更新.manifest.json
.manifest.jsonFor each source file processed (conversation JSONL, memory file), add/update its entry with:
- ,
ingested_at,size_bytesmodified_at - :
source_typeor"claude_conversation""claude_memory" - : the decoded project name
project - and
pages_createdlistspages_updated
Also update the section of the manifest:
projectsjson
{
"project-name": {
"source_path": "~/.claude/projects/-Users-...",
"vault_path": "projects/project-name",
"last_ingested": "TIMESTAMP",
"conversations_ingested": 5,
"conversations_total": 8,
"memory_files_ingested": 3
}
}对于每个处理过的源文件(对话JSONL、记忆文件),添加/更新对应的条目,包含以下字段:
- 、
ingested_at、size_bytesmodified_at - :
source_type或"claude_conversation""claude_memory" - :解码后的项目名称
project - 和
pages_created列表pages_updated
同时更新清单的部分:
projectsjson
{
"project-name": {
"source_path": "~/.claude/projects/-Users-...",
"vault_path": "projects/project-name",
"last_ingested": "TIMESTAMP",
"conversations_ingested": 5,
"conversations_total": 8,
"memory_files_ingested": 3
}
}Create journal entry + update special files
创建日志条目 + 更新特殊文件
Update and per the standard process:
index.mdlog.md- [TIMESTAMP] CLAUDE_HISTORY_INGEST projects=N conversations=M pages_updated=X pages_created=Y mode=append|full按照标准流程更新和:
index.mdlog.md- [TIMESTAMP] CLAUDE_HISTORY_INGEST projects=N conversations=M pages_updated=X pages_created=Y mode=append|fullPrivacy
隐私说明
- Distill and synthesize — don't copy raw conversation text verbatim
- Skip anything that looks like secrets, API keys, passwords, tokens
- If you encounter personal/sensitive content, ask the user before including it
- The user's conversations may reference other people — be thoughtful about what goes in the wiki
- 进行提炼和合成——不要逐字复制原始对话文本
- 跳过所有看起来像密钥、API key、密码、令牌的内容
- 如果遇到个人/敏感内容,先询问用户再决定是否纳入
- 用户的对话可能会涉及其他人,请谨慎处理纳入wiki的内容
Reference
参考
See for more details on the data structures.
references/claude-data-format.md查看了解更多数据结构的详细信息。
references/claude-data-format.md