claude-history-ingest

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Claude History Ingest — Conversation Mining

Claude历史导入 — 对话挖掘

You are extracting knowledge from the user's past Claude Code conversations and distilling it into the Obsidian wiki. Conversations are rich but messy — your job is to find the signal and compile it.
This skill can be invoked directly or via the
wiki-history-ingest
router (
/wiki-history-ingest claude
).
你需要从用户过往的Claude Code对话中提取知识,并提炼到Obsidian wiki中。对话内容丰富但杂乱,你的任务是筛选有效信息并进行整理。
本技能可以直接调用,也可以通过
wiki-history-ingest
路由调用(
/wiki-history-ingest claude
)。

Before You Start

开始前准备

  1. Read
    .env
    to get
    OBSIDIAN_VAULT_PATH
    and
    CLAUDE_HISTORY_PATH
    (defaults to
    ~/.claude
    )
  2. Read
    .manifest.json
    at the vault root to check what's already been ingested
  3. Read
    index.md
    at the vault root to know what the wiki already contains
  1. 读取
    .env
    获取
    OBSIDIAN_VAULT_PATH
    CLAUDE_HISTORY_PATH
    (默认路径为
    ~/.claude
  2. 读取知识库根目录下的
    .manifest.json
    ,检查哪些内容已经被导入过
  3. 读取知识库根目录下的
    index.md
    ,了解wiki已有的内容

Ingest Modes

导入模式

Append Mode (default)

追加模式(默认)

Check
.manifest.json
for each source file (conversation JSONL, memory file). Only process:
  • Files not in the manifest (new conversations, new memory files, new projects)
  • Files whose modification time is newer than their
    ingested_at
    in the manifest
This is usually what you want — the user ran a few new sessions and wants to capture the delta.
检查每个源文件(对话JSONL、记忆文件)的
.manifest.json
记录,仅处理以下文件:
  • 未出现在清单中的文件(新对话、新记忆文件、新项目)
  • 修改时间晚于清单中
    ingested_at
    记录的文件
这是最常用的模式——用户运行了几次新会话,想要捕获增量更新。

Full Mode

全量模式

Process everything regardless of manifest. Use after a
wiki-rebuild
or if the user explicitly asks.
忽略清单记录,处理所有文件。在
wiki-rebuild
之后或用户明确要求时使用。

Claude Code Data Layout

Claude Code数据结构

Claude Code stores everything under
~/.claude/
. Here is the actual structure:
~/.claude/
├── projects/                          # Per-project directories
│   ├── -Users-name-project-a/         # Path-derived name (slashes → dashes)
│   │   ├── <session-uuid>.jsonl       # Conversation data (JSONL)
│   │   └── memory/                    # Structured memories
│   │       ├── MEMORY.md              # Memory index
│   │       ├── user_*.md              # User profile memories
│   │       ├── feedback_*.md          # Workflow feedback memories
│   │       └── project_*.md           # Project context memories
│   ├── -Users-name-project-b/
│   │   └── ...
├── sessions/                          # Session metadata (JSON)
│   └── <pid>.json                     # {pid, sessionId, cwd, startedAt, kind, entrypoint}
├── history.jsonl                      # Global session history
├── tasks/                             # Subagent task data
├── plans/                             # Saved plans
└── settings.json
Claude Code将所有内容存储在
~/.claude/
路径下,实际结构如下:
~/.claude/
├── projects/                          # 按项目划分的目录
│   ├── -Users-name-project-a/         # 路径派生的名称(斜杠→短横线)
│   │   ├── <session-uuid>.jsonl       # 对话数据(JSONL格式)
│   │   └── memory/                    # 结构化记忆
│   │       ├── MEMORY.md              # 记忆索引
│   │       ├── user_*.md              # 用户个人资料记忆
│   │       ├── feedback_*.md          # 工作流反馈记忆
│   │       └── project_*.md           # 项目上下文记忆
│   ├── -Users-name-project-b/
│   │   └── ...
├── sessions/                          # 会话元数据(JSON格式)
│   └── <pid>.json                     # {pid, sessionId, cwd, startedAt, kind, entrypoint}
├── history.jsonl                      # 全局会话历史
├── tasks/                             # 子Agent任务数据
├── plans/                             # 已保存的计划
└── settings.json

Key data sources ranked by value:

按价值排序的核心数据源:

  1. Memory files (
    projects/*/memory/*.md
    ) — Pre-distilled, already wiki-friendly. These contain the user's preferences, project decisions, and feedback. Gold.
  2. Conversation JSONL (
    projects/*/*.jsonl
    ) — Full conversation transcripts. Rich but noisy.
  3. Session metadata (
    sessions/*.json
    ) — Tells you which project, when, and what CWD.
  1. 记忆文件
    projects/*/memory/*.md
    )—— 预提炼内容,天然适配wiki格式,包含用户偏好、项目决策和反馈,含金量最高。
  2. 对话JSONL
    projects/*/*.jsonl
    )—— 完整的对话记录,内容丰富但存在噪音。
  3. 会话元数据
    sessions/*.json
    )—— 记录所属项目、时间、当前工作目录等信息。

Step 1: Survey and Compute Delta

步骤1:排查并计算增量

Scan
CLAUDE_HISTORY_PATH
and compare against
.manifest.json
:
undefined
扫描
CLAUDE_HISTORY_PATH
并与
.manifest.json
对比:
undefined

Find all projects

查找所有项目

Glob: ~/.claude/projects/*/
Glob: ~/.claude/projects/*/

Find memory files (highest value)

查找记忆文件(最高价值)

Glob: ~/.claude/projects//memory/.md
Glob: ~/.claude/projects//memory/.md

Find conversation JSONL files

查找对话JSONL文件

Glob: ~/.claude/projects//.jsonl

Build an inventory and classify each file:

- **New** — not in manifest → needs ingesting
- **Modified** — in manifest but file is newer → needs re-ingesting
- **Unchanged** — in manifest and not modified → skip in append mode

Report to the user: "Found X projects, Y conversations, Z memory files. Delta: A new, B modified."
Glob: ~/.claude/projects//.jsonl

构建清单并对每个文件分类:

- **新增**—— 未出现在清单中 → 需要导入
- **已修改**—— 已在清单中,但文件更新时间更新 → 需要重新导入
- **未改动**—— 已在清单中且无修改 → 追加模式下跳过

向用户报告:"共找到X个项目,Y条对话,Z个记忆文件。增量:A个新增,B个已修改。"

Step 2: Ingest Memory Files First

步骤2:优先导入记忆文件

Memory files are already structured with YAML frontmatter:
markdown
---
name: memory-name
description: one-line description
type: user|feedback|project|reference
---

Memory content here.
For each memory file:
  • Read it and parse the frontmatter
  • user
    type → feeds into an entity page about the user, or concept pages about their domain
  • feedback
    type → feeds into skills pages (workflow patterns, what works, what doesn't)
  • project
    type → feeds into entity pages for the project
  • reference
    type → feeds into reference pages pointing to external resources
The
MEMORY.md
index file in each project is a quick summary — read it first to decide which individual memory files are worth reading in full.
记忆文件已自带YAML frontmatter结构化内容:
markdown
---
name: memory-name
description: 单行描述
type: user|feedback|project|reference
---

记忆内容此处展示。
每个记忆文件的处理规则:
  • 读取文件并解析frontmatter
  • user
    类型 → 归入用户实体页面,或用户所属领域的概念页面
  • feedback
    类型 → 归入技能页面(工作流模式、有效方案、无效方案)
  • project
    类型 → 归入对应项目的实体页面
  • reference
    类型 → 归入指向外部资源的参考页面
每个项目下的
MEMORY.md
索引文件是快速摘要,优先读取该文件来判断哪些单独的记忆文件值得完整读取。

Step 3: Parse Conversation JSONL

步骤3:解析对话JSONL

Each JSONL file is one conversation session. Each line is a JSON object:
json
{
  "type": "user|assistant|progress|file-history-snapshot",
  "message": {
    "role": "user|assistant",
    "content": "text string"
  },
  "uuid": "...",
  "timestamp": "2026-03-15T10:30:00.000Z",
  "sessionId": "...",
  "cwd": "/path/to/project",
  "version": "2.1.59"
}
For assistant messages,
content
may be an array of content blocks:
json
{
  "content": [
    {"type": "thinking", "text": "..."},
    {"type": "text", "text": "The actual response..."},
    {"type": "tool_use", "name": "Read", "input": {...}}
  ]
}
What to extract from conversations:
  • Filter to
    type: "user"
    and
    type: "assistant"
    entries only
  • For assistant entries, extract
    text
    blocks (skip
    thinking
    and
    tool_use
    — those are noise)
  • The
    cwd
    field tells you which project this conversation belongs to
  • The project directory name (e.g.,
    -Users-name-Documents-projects-my-app
    ) tells you the project path
Skip these:
  • type: "progress"
    — internal agent progress updates
  • type: "file-history-snapshot"
    — file state tracking
  • Subagent conversations (under
    subagents/
    subdirectories) — unless the user specifically asks
每个JSONL文件对应一个会话,每一行是一个JSON对象:
json
{
  "type": "user|assistant|progress|file-history-snapshot",
  "message": {
    "role": "user|assistant",
    "content": "文本字符串"
  },
  "uuid": "...",
  "timestamp": "2026-03-15T10:30:00.000Z",
  "sessionId": "...",
  "cwd": "/path/to/project",
  "version": "2.1.59"
}
助手消息的
content
可能是内容块数组:
json
{
  "content": [
    {"type": "thinking", "text": "..."},
    {"type": "text", "text": "实际回复内容..."},
    {"type": "tool_use", "name": "Read", "input": {...}}
  ]
}
从对话中提取的内容:
  • 仅筛选
    type: "user"
    type: "assistant"
    的条目
  • 对于助手条目,仅提取
    text
    块(跳过
    thinking
    tool_use
    ——属于噪音内容)
  • cwd
    字段可告知对话所属的项目
  • 项目目录名(例如
    -Users-name-Documents-projects-my-app
    )可告知项目路径
跳过以下内容:
  • type: "progress"
    —— Agent内部进度更新
  • type: "file-history-snapshot"
    —— 文件状态跟踪记录
  • 子Agent对话(
    subagents/
    子目录下)—— 除非用户明确要求保留

Step 4: Cluster by Topic

步骤4:按主题聚类

Don't create one wiki page per conversation. Instead:
  • Group extracted knowledge by topic across conversations
  • A single conversation about "debugging auth + setting up CI" → two separate topics
  • Three conversations across different days about "React performance" → one merged topic
  • The project directory name gives you a natural first-level grouping
不要为单条对话创建单独的wiki页面,而是:
  • 跨会话将提取的知识按主题分组
  • 涉及“调试认证+配置CI”的单条对话 → 拆分为两个独立主题
  • 不同日期内关于“React性能”的三条对话 → 合并为一个主题
  • 项目目录名可作为天然的一级分组维度

Step 5: Distill into Wiki Pages

步骤5:提炼为wiki页面

Each Claude project maps to a project directory in the vault. The project directory name from
~/.claude/projects/
encodes the original path — decode it to get a clean project name:
-Users/Documents/projects/my-Project   → myproject
-Users/Documents/projects/Another-app  → anotherapp
每个Claude项目对应知识库中的一个项目目录。
~/.claude/projects/
下的项目目录名编码了原始路径,解码后可得到清晰的项目名:
-Users/Documents/projects/my-Project   → myproject
-Users/Documents/projects/Another-app  → anotherapp

Project-specific vs. global knowledge

项目专属知识 vs 全局知识

What you foundWhere it goesExample
Project architecture decisions
projects/<name>/concepts/
projects/my-project/concepts/main-architecture.md
Project-specific debugging
projects/<name>/skills/
projects/my-project/skills/api-rate-limiting.md
General concept the user learned
concepts/
(global)
concepts/react-server-components.md
Recurring problem across projects
skills/
(global)
skills/debugging-hydration-errors.md
A tool/service used
entities/
(global)
entities/vercel-functions.md
Patterns across many conversations
synthesis/
(global)
synthesis/common-debugging-patterns.md
For each project with content, create or update the project overview page at
projects/<name>/<name>.md
named after the project, not
_project.md
. Obsidian's graph view uses the filename as the node label, so
_project.md
makes every project show up as
_project
in the graph. Naming it
<name>.md
gives each project a distinct, readable node name.
Important: Distill the knowledge, not the conversation. Don't write "In a conversation on March 15, the user asked about X." Write the knowledge itself, with the conversation as a source attribution.
Write a
summary:
frontmatter field
on every new/updated page — 1–2 sentences, ≤200 chars, answering "what is this page about?" for a reader who hasn't opened it.
wiki-query
's cheap retrieval path reads this field to avoid opening page bodies.
Mark provenance per the convention in
llm-wiki
(Provenance Markers section):
  • Memory files are mostly extracted — the user wrote them by hand and they're already distilled. Treat memory-derived claims as extracted unless you're stitching together claims from multiple memory files.
  • Conversation distillation is mostly inferred. You're synthesizing a coherent claim from many turns of dialogue, often filling in implicit reasoning. Apply
    ^[inferred]
    liberally to synthesized patterns, generalizations across sessions, and "what the user really meant" interpretations.
  • Use
    ^[ambiguous]
    when the user changed their mind across sessions or when assistant and user contradicted each other and the resolution is unclear.
  • Write a
    provenance:
    frontmatter block on every new/updated page summarizing the rough mix.
提取内容存放位置示例
项目架构决策
projects/<name>/concepts/
projects/my-project/concepts/main-architecture.md
项目专属调试方案
projects/<name>/skills/
projects/my-project/skills/api-rate-limiting.md
用户学到的通用概念
concepts/
(全局)
concepts/react-server-components.md
跨项目的共性问题
skills/
(全局)
skills/debugging-hydration-errors.md
使用过的工具/服务
entities/
(全局)
entities/vercel-functions.md
多会话中提炼的共性模式
synthesis/
(全局)
synthesis/common-debugging-patterns.md
对于有内容的每个项目,在
projects/<name>/<name>.md
路径下创建或更新项目概览页面——以项目名命名,不要用
_project.md
。Obsidian的图谱视图使用文件名作为节点标签,
_project.md
会导致所有项目在图谱中都显示为
_project
,使用
<name>.md
可以让每个项目拥有清晰可识别的节点名称。
重要: 提炼知识本身,而非对话过程。不要写“在3月15日的对话中,用户询问了X相关内容”,直接呈现知识内容,将对话作为来源标注。
每个新建/更新的页面都要添加
summary:
frontmatter字段
——1-2句话,不超过200字符,回答“这个页面是关于什么的”,方便未打开页面的读者快速了解。
wiki-query
的轻量检索路径会读取该字段,避免打开页面正文。
按照
llm-wiki
的约定标注来源
(来源标记部分):
  • 记忆文件多为直接提取内容——由用户手动编写,已经过提炼。除非你需要拼接多个记忆文件的内容,否则将记忆衍生的内容标记为提取内容。
  • 对话提炼多为推断内容。你需要从多轮对话中合成连贯的结论,通常需要补充隐含的逻辑。对于合成的模式、跨会话的概括、“用户实际想表达的内容”这类解读,广泛使用
    ^[inferred]
    标记。
  • 当用户在不同会话中改变想法,或者助手与用户的内容存在矛盾且没有明确解决方案时,使用
    ^[ambiguous]
    标记。
  • 每个新建/更新的页面都要添加
    provenance:
    frontmatter块,概括内容的来源构成。

Step 6: Update Manifest, Journal, and Special Files

步骤6:更新清单、日志和特殊文件

Update
.manifest.json

更新
.manifest.json

For each source file processed (conversation JSONL, memory file), add/update its entry with:
  • ingested_at
    ,
    size_bytes
    ,
    modified_at
  • source_type
    :
    "claude_conversation"
    or
    "claude_memory"
  • project
    : the decoded project name
  • pages_created
    and
    pages_updated
    lists
Also update the
projects
section of the manifest:
json
{
  "project-name": {
    "source_path": "~/.claude/projects/-Users-...",
    "vault_path": "projects/project-name",
    "last_ingested": "TIMESTAMP",
    "conversations_ingested": 5,
    "conversations_total": 8,
    "memory_files_ingested": 3
  }
}
对于每个处理过的源文件(对话JSONL、记忆文件),添加/更新对应的条目,包含以下字段:
  • ingested_at
    size_bytes
    modified_at
  • source_type
    "claude_conversation"
    "claude_memory"
  • project
    :解码后的项目名称
  • pages_created
    pages_updated
    列表
同时更新清单的
projects
部分:
json
{
  "project-name": {
    "source_path": "~/.claude/projects/-Users-...",
    "vault_path": "projects/project-name",
    "last_ingested": "TIMESTAMP",
    "conversations_ingested": 5,
    "conversations_total": 8,
    "memory_files_ingested": 3
  }
}

Create journal entry + update special files

创建日志条目 + 更新特殊文件

Update
index.md
and
log.md
per the standard process:
- [TIMESTAMP] CLAUDE_HISTORY_INGEST projects=N conversations=M pages_updated=X pages_created=Y mode=append|full
按照标准流程更新
index.md
log.md
- [TIMESTAMP] CLAUDE_HISTORY_INGEST projects=N conversations=M pages_updated=X pages_created=Y mode=append|full

Privacy

隐私说明

  • Distill and synthesize — don't copy raw conversation text verbatim
  • Skip anything that looks like secrets, API keys, passwords, tokens
  • If you encounter personal/sensitive content, ask the user before including it
  • The user's conversations may reference other people — be thoughtful about what goes in the wiki
  • 进行提炼和合成——不要逐字复制原始对话文本
  • 跳过所有看起来像密钥、API key、密码、令牌的内容
  • 如果遇到个人/敏感内容,先询问用户再决定是否纳入
  • 用户的对话可能会涉及其他人,请谨慎处理纳入wiki的内容

Reference

参考

See
references/claude-data-format.md
for more details on the data structures.
查看
references/claude-data-format.md
了解更多数据结构的详细信息。