claude-history-ingest

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Claude History Ingest — Conversation Mining

Claude历史导入 — 对话挖掘

You are extracting knowledge from the user's past Claude Code conversations and distilling it into the Obsidian wiki. Conversations are rich but messy — your job is to find the signal and compile it.

This skill can be invoked directly or via the

wiki-history-ingest

router (

/wiki-history-ingest claude

你需要从用户过往的Claude Code对话中提取知识，并提炼到Obsidian wiki中。对话内容丰富但杂乱，你的任务是筛选有效信息并进行整理。

本技能可以直接调用，也可以通过

wiki-history-ingest

路由调用（

/wiki-history-ingest claude

）。

Before You Start

开始前准备

Read

.env

to get

OBSIDIAN_VAULT_PATH

and

CLAUDE_HISTORY_PATH

(defaults to

~/.claude

)

Read
```
.manifest.json
```
at the vault root to check what's already been ingested
Read
```
index.md
```
at the vault root to know what the wiki already contains

读取

.env

获取

OBSIDIAN_VAULT_PATH

和

CLAUDE_HISTORY_PATH

（默认路径为

~/.claude

）

读取知识库根目录下的
```
.manifest.json
```
，检查哪些内容已经被导入过
读取知识库根目录下的
```
index.md
```
，了解wiki已有的内容

Ingest Modes

导入模式

Append Mode (default)

追加模式（默认）

Check

.manifest.json

for each source file (conversation JSONL, memory file). Only process:

Files not in the manifest (new conversations, new memory files, new projects)
Files whose modification time is newer than their
```
ingested_at
```
in the manifest

This is usually what you want — the user ran a few new sessions and wants to capture the delta.

检查每个源文件（对话JSONL、记忆文件）的

.manifest.json

记录，仅处理以下文件：

未出现在清单中的文件（新对话、新记忆文件、新项目）
修改时间晚于清单中
```
ingested_at
```
记录的文件

这是最常用的模式——用户运行了几次新会话，想要捕获增量更新。

Full Mode

全量模式

Process everything regardless of manifest. Use after a

wiki-rebuild

or if the user explicitly asks.

忽略清单记录，处理所有文件。在

wiki-rebuild

之后或用户明确要求时使用。

Claude Code Data Layout

Claude Code数据结构

Claude Code stores everything under

~/.claude/

. Here is the actual structure:

~/.claude/
├── projects/                          # Per-project directories
│   ├── -Users-name-project-a/         # Path-derived name (slashes → dashes)
│   │   ├── <session-uuid>.jsonl       # Conversation data (JSONL)
│   │   └── memory/                    # Structured memories
│   │       ├── MEMORY.md              # Memory index
│   │       ├── user_*.md              # User profile memories
│   │       ├── feedback_*.md          # Workflow feedback memories
│   │       └── project_*.md           # Project context memories
│   ├── -Users-name-project-b/
│   │   └── ...
├── sessions/                          # Session metadata (JSON)
│   └── <pid>.json                     # {pid, sessionId, cwd, startedAt, kind, entrypoint}
├── history.jsonl                      # Global session history
├── tasks/                             # Subagent task data
├── plans/                             # Saved plans
└── settings.json

Claude Code将所有内容存储在

~/.claude/

路径下，实际结构如下：

~/.claude/
├── projects/                          # 按项目划分的目录
│   ├── -Users-name-project-a/         # 路径派生的名称（斜杠→短横线）
│   │   ├── <session-uuid>.jsonl       # 对话数据（JSONL格式）
│   │   └── memory/                    # 结构化记忆
│   │       ├── MEMORY.md              # 记忆索引
│   │       ├── user_*.md              # 用户个人资料记忆
│   │       ├── feedback_*.md          # 工作流反馈记忆
│   │       └── project_*.md           # 项目上下文记忆
│   ├── -Users-name-project-b/
│   │   └── ...
├── sessions/                          # 会话元数据（JSON格式）
│   └── <pid>.json                     # {pid, sessionId, cwd, startedAt, kind, entrypoint}
├── history.jsonl                      # 全局会话历史
├── tasks/                             # 子Agent任务数据
├── plans/                             # 已保存的计划
└── settings.json

Key data sources ranked by value:

按价值排序的核心数据源：

Memory files (
```
projects/*/memory/*.md
```
) — Pre-distilled, already wiki-friendly. These contain the user's preferences, project decisions, and feedback. Gold.
Conversation JSONL (
```
projects/*/*.jsonl
```
) — Full conversation transcripts. Rich but noisy.
Session metadata (
```
sessions/*.json
```
) — Tells you which project, when, and what CWD.

记忆文件（
```
projects/*/memory/*.md
```
）—— 预提炼内容，天然适配wiki格式，包含用户偏好、项目决策和反馈，含金量最高。
对话JSONL（
```
projects/*/*.jsonl
```
）—— 完整的对话记录，内容丰富但存在噪音。
会话元数据（
```
sessions/*.json
```
）—— 记录所属项目、时间、当前工作目录等信息。

Step 1: Survey and Compute Delta

步骤1：排查并计算增量

Scan

CLAUDE_HISTORY_PATH

and compare against

.manifest.json

undefined

扫描

CLAUDE_HISTORY_PATH

并与

.manifest.json

对比：

undefined

Find all projects

查找所有项目

Glob: ~/.claude/projects/*/

Find memory files (highest value)

查找记忆文件（最高价值）

Glob: ~/.claude/projects//memory/.md

Find conversation JSONL files

查找对话JSONL文件

Glob: ~/.claude/projects//.jsonl


Build an inventory and classify each file:

- **New** — not in manifest → needs ingesting
- **Modified** — in manifest but file is newer → needs re-ingesting
- **Unchanged** — in manifest and not modified → skip in append mode

Report to the user: "Found X projects, Y conversations, Z memory files. Delta: A new, B modified."

Glob: ~/.claude/projects//.jsonl


构建清单并对每个文件分类：

- **新增**—— 未出现在清单中 → 需要导入
- **已修改**—— 已在清单中，但文件更新时间更新 → 需要重新导入
- **未改动**—— 已在清单中且无修改 → 追加模式下跳过

向用户报告："共找到X个项目，Y条对话，Z个记忆文件。增量：A个新增，B个已修改。"

Step 2: Ingest Memory Files First

步骤2：优先导入记忆文件

Memory files are already structured with YAML frontmatter:

markdown

---
name: memory-name
description: one-line description
type: user|feedback|project|reference
---

Memory content here.

For each memory file:

Read it and parse the frontmatter
```
user
```
type → feeds into an entity page about the user, or concept pages about their domain
```
feedback
```
type → feeds into skills pages (workflow patterns, what works, what doesn't)
```
project
```
type → feeds into entity pages for the project
```
reference
```
type → feeds into reference pages pointing to external resources

The

MEMORY.md

index file in each project is a quick summary — read it first to decide which individual memory files are worth reading in full.

记忆文件已自带YAML frontmatter结构化内容：

markdown

---
name: memory-name
description: 单行描述
type: user|feedback|project|reference
---

记忆内容此处展示。

每个记忆文件的处理规则：

读取文件并解析frontmatter
```
user
```
类型 → 归入用户实体页面，或用户所属领域的概念页面
```
feedback
```
类型 → 归入技能页面（工作流模式、有效方案、无效方案）
```
project
```
类型 → 归入对应项目的实体页面
```
reference
```
类型 → 归入指向外部资源的参考页面

每个项目下的

MEMORY.md

索引文件是快速摘要，优先读取该文件来判断哪些单独的记忆文件值得完整读取。

Step 3: Parse Conversation JSONL

步骤3：解析对话JSONL

Each JSONL file is one conversation session. Each line is a JSON object:

json

{
  "type": "user|assistant|progress|file-history-snapshot",
  "message": {
    "role": "user|assistant",
    "content": "text string"
  },
  "uuid": "...",
  "timestamp": "2026-03-15T10:30:00.000Z",
  "sessionId": "...",
  "cwd": "/path/to/project",
  "version": "2.1.59"
}

For assistant messages,

content

may be an array of content blocks:

json

{
  "content": [
    {"type": "thinking", "text": "..."},
    {"type": "text", "text": "The actual response..."},
    {"type": "tool_use", "name": "Read", "input": {...}}
  ]
}

What to extract from conversations:

Filter to
```
type: "user"
```
and
```
type: "assistant"
```
entries only
For assistant entries, extract
```
text
```
blocks (skip
```
thinking
```
and
```
tool_use
```
— those are noise)
The
```
cwd
```
field tells you which project this conversation belongs to
The project directory name (e.g.,
```
-Users-name-Documents-projects-my-app
```
) tells you the project path

Skip these:

```
type: "progress"
```
— internal agent progress updates
```
type: "file-history-snapshot"
```
— file state tracking
Subagent conversations (under
```
subagents/
```
subdirectories) — unless the user specifically asks

每个JSONL文件对应一个会话，每一行是一个JSON对象：

json

{
  "type": "user|assistant|progress|file-history-snapshot",
  "message": {
    "role": "user|assistant",
    "content": "文本字符串"
  },
  "uuid": "...",
  "timestamp": "2026-03-15T10:30:00.000Z",
  "sessionId": "...",
  "cwd": "/path/to/project",
  "version": "2.1.59"
}

助手消息的

content

可能是内容块数组：

json

{
  "content": [
    {"type": "thinking", "text": "..."},
    {"type": "text", "text": "实际回复内容..."},
    {"type": "tool_use", "name": "Read", "input": {...}}
  ]
}

从对话中提取的内容：

仅筛选
```
type: "user"
```
和
```
type: "assistant"
```
的条目
对于助手条目，仅提取
```
text
```
块（跳过
```
thinking
```
和
```
tool_use
```
——属于噪音内容）
```
cwd
```
字段可告知对话所属的项目
项目目录名（例如
```
-Users-name-Documents-projects-my-app
```
）可告知项目路径

跳过以下内容：

```
type: "progress"
```
—— Agent内部进度更新
```
type: "file-history-snapshot"
```
—— 文件状态跟踪记录
子Agent对话（
```
subagents/
```
子目录下）—— 除非用户明确要求保留

Step 4: Cluster by Topic

步骤4：按主题聚类

Don't create one wiki page per conversation. Instead:

Group extracted knowledge by topic across conversations
A single conversation about "debugging auth + setting up CI" → two separate topics
Three conversations across different days about "React performance" → one merged topic
The project directory name gives you a natural first-level grouping

不要为单条对话创建单独的wiki页面，而是：

跨会话将提取的知识按主题分组
涉及“调试认证+配置CI”的单条对话 → 拆分为两个独立主题
不同日期内关于“React性能”的三条对话 → 合并为一个主题
项目目录名可作为天然的一级分组维度

Step 5: Distill into Wiki Pages

步骤5：提炼为wiki页面

Each Claude project maps to a project directory in the vault. The project directory name from

~/.claude/projects/

encodes the original path — decode it to get a clean project name:

-Users/Documents/projects/my-Project   → myproject
-Users/Documents/projects/Another-app  → anotherapp

每个Claude项目对应知识库中的一个项目目录。

~/.claude/projects/

下的项目目录名编码了原始路径，解码后可得到清晰的项目名：

-Users/Documents/projects/my-Project   → myproject
-Users/Documents/projects/Another-app  → anotherapp

Project-specific vs. global knowledge

项目专属知识 vs 全局知识

What you found	Where it goes	Example
Project architecture decisions	`projects/<name>/concepts/`	`projects/my-project/concepts/main-architecture.md`
Project-specific debugging	`projects/<name>/skills/`	`projects/my-project/skills/api-rate-limiting.md`
General concept the user learned	`concepts/` (global)	`concepts/react-server-components.md`
Recurring problem across projects	`skills/` (global)	`skills/debugging-hydration-errors.md`
A tool/service used	`entities/` (global)	`entities/vercel-functions.md`
Patterns across many conversations	`synthesis/` (global)	`synthesis/common-debugging-patterns.md`

For each project with content, create or update the project overview page at

projects/<name>/<name>.md

— named after the project, not
_project.md
. Obsidian's graph view uses the filename as the node label, so

_project.md

makes every project show up as

_project

in the graph. Naming it

<name>.md

gives each project a distinct, readable node name.

Important: Distill the knowledge, not the conversation. Don't write "In a conversation on March 15, the user asked about X." Write the knowledge itself, with the conversation as a source attribution.

Write a
summary:
frontmatter field on every new/updated page — 1–2 sentences, ≤200 chars, answering "what is this page about?" for a reader who hasn't opened it.

wiki-query

's cheap retrieval path reads this field to avoid opening page bodies.

Mark provenance per the convention in

llm-wiki

(Provenance Markers section):

Memory files are mostly extracted — the user wrote them by hand and they're already distilled. Treat memory-derived claims as extracted unless you're stitching together claims from multiple memory files.
Conversation distillation is mostly inferred. You're synthesizing a coherent claim from many turns of dialogue, often filling in implicit reasoning. Apply
```
^[inferred]
```
liberally to synthesized patterns, generalizations across sessions, and "what the user really meant" interpretations.
Use
```
^[ambiguous]
```
when the user changed their mind across sessions or when assistant and user contradicted each other and the resolution is unclear.
Write a
```
provenance:
```
frontmatter block on every new/updated page summarizing the rough mix.

提取内容	存放位置	示例
项目架构决策	`projects/<name>/concepts/`	`projects/my-project/concepts/main-architecture.md`
项目专属调试方案	`projects/<name>/skills/`	`projects/my-project/skills/api-rate-limiting.md`
用户学到的通用概念	`concepts/` （全局）	`concepts/react-server-components.md`
跨项目的共性问题	`skills/` （全局）	`skills/debugging-hydration-errors.md`
使用过的工具/服务	`entities/` （全局）	`entities/vercel-functions.md`
多会话中提炼的共性模式	`synthesis/` （全局）	`synthesis/common-debugging-patterns.md`

对于有内容的每个项目，在

projects/<name>/<name>.md

路径下创建或更新项目概览页面——以项目名命名，不要用
_project.md
。Obsidian的图谱视图使用文件名作为节点标签，

_project.md

会导致所有项目在图谱中都显示为

_project

，使用

<name>.md

可以让每个项目拥有清晰可识别的节点名称。

重要： 提炼知识本身，而非对话过程。不要写“在3月15日的对话中，用户询问了X相关内容”，直接呈现知识内容，将对话作为来源标注。

每个新建/更新的页面都要添加
summary:
frontmatter字段——1-2句话，不超过200字符，回答“这个页面是关于什么的”，方便未打开页面的读者快速了解。

wiki-query

的轻量检索路径会读取该字段，避免打开页面正文。

按照
llm-wiki
的约定标注来源（来源标记部分）：

记忆文件多为直接提取内容——由用户手动编写，已经过提炼。除非你需要拼接多个记忆文件的内容，否则将记忆衍生的内容标记为提取内容。
对话提炼多为推断内容。你需要从多轮对话中合成连贯的结论，通常需要补充隐含的逻辑。对于合成的模式、跨会话的概括、“用户实际想表达的内容”这类解读，广泛使用
```
^[inferred]
```
标记。
当用户在不同会话中改变想法，或者助手与用户的内容存在矛盾且没有明确解决方案时，使用
```
^[ambiguous]
```
标记。
每个新建/更新的页面都要添加
```
provenance:
```
frontmatter块，概括内容的来源构成。

Step 6: Update Manifest, Journal, and Special Files

步骤6：更新清单、日志和特殊文件

Update

.manifest.json

更新

.manifest.json

For each source file processed (conversation JSONL, memory file), add/update its entry with:

```
ingested_at
```
,
```
size_bytes
```
,
```
modified_at
```

source_type

"claude_conversation"

"claude_memory"

```
project
```
: the decoded project name
```
pages_created
```
and
```
pages_updated
```
lists

Also update the

projects

section of the manifest:

json

{
  "project-name": {
    "source_path": "~/.claude/projects/-Users-...",
    "vault_path": "projects/project-name",
    "last_ingested": "TIMESTAMP",
    "conversations_ingested": 5,
    "conversations_total": 8,
    "memory_files_ingested": 3
  }
}

对于每个处理过的源文件（对话JSONL、记忆文件），添加/更新对应的条目，包含以下字段：

```
ingested_at
```
、
```
size_bytes
```
、
```
modified_at
```

source_type

：

"claude_conversation"

或

"claude_memory"

```
project
```
：解码后的项目名称
```
pages_created
```
和
```
pages_updated
```
列表

同时更新清单的

projects

部分：

json

{
  "project-name": {
    "source_path": "~/.claude/projects/-Users-...",
    "vault_path": "projects/project-name",
    "last_ingested": "TIMESTAMP",
    "conversations_ingested": 5,
    "conversations_total": 8,
    "memory_files_ingested": 3
  }
}

Create journal entry + update special files

创建日志条目 + 更新特殊文件

Update

index.md

and

log.md

per the standard process:

- [TIMESTAMP] CLAUDE_HISTORY_INGEST projects=N conversations=M pages_updated=X pages_created=Y mode=append|full

按照标准流程更新

index.md

和

log.md

：

- [TIMESTAMP] CLAUDE_HISTORY_INGEST projects=N conversations=M pages_updated=X pages_created=Y mode=append|full

Privacy

隐私说明

Distill and synthesize — don't copy raw conversation text verbatim
Skip anything that looks like secrets, API keys, passwords, tokens
If you encounter personal/sensitive content, ask the user before including it
The user's conversations may reference other people — be thoughtful about what goes in the wiki

进行提炼和合成——不要逐字复制原始对话文本
跳过所有看起来像密钥、API key、密码、令牌的内容
如果遇到个人/敏感内容，先询问用户再决定是否纳入
用户的对话可能会涉及其他人，请谨慎处理纳入wiki的内容

Reference

参考

See

references/claude-data-format.md

for more details on the data structures.

查看

references/claude-data-format.md

了解更多数据结构的详细信息。

claude-history-ingest

Original

Translation

Claude History Ingest — Conversation Mining

Claude历史导入 — 对话挖掘

Before You Start

开始前准备

Ingest Modes

导入模式

Append Mode (default)

追加模式（默认）

Full Mode

全量模式

Claude Code Data Layout

Claude Code数据结构

Key data sources ranked by value:

按价值排序的核心数据源：

Step 1: Survey and Compute Delta

步骤1：排查并计算增量

Find all projects

查找所有项目

Find memory files (highest value)

查找记忆文件（最高价值）

Find conversation JSONL files

查找对话JSONL文件

Step 2: Ingest Memory Files First

步骤2：优先导入记忆文件

Step 3: Parse Conversation JSONL

步骤3：解析对话JSONL

Step 4: Cluster by Topic

步骤4：按主题聚类

Step 5: Distill into Wiki Pages

步骤5：提炼为wiki页面

Project-specific vs. global knowledge

项目专属知识 vs 全局知识

Step 6: Update Manifest, Journal, and Special Files

步骤6：更新清单、日志和特殊文件

Update .manifest.json

更新.manifest.json

Create journal entry + update special files

创建日志条目 + 更新特殊文件

Privacy

隐私说明

Reference

参考

Update
`.manifest.json`

更新
`.manifest.json`