copilot-history-ingest

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Copilot History Ingest — Conversation Mining

Copilot历史导入——会话挖掘

You are extracting knowledge from the user's past GitHub Copilot CLI conversations and distilling it into the Obsidian wiki. Conversations are rich but messy — your job is to find the signal and compile it.

This skill can be invoked directly or via the

wiki-history-ingest

router (

/wiki-history-ingest copilot

你需要从用户过往的GitHub Copilot CLI对话中提取知识，并将其提炼到Obsidian wiki中。对话内容丰富但杂乱——你的任务是筛选有效信息并进行整理。

此技能可直接调用，也可通过

wiki-history-ingest

路由调用（

/wiki-history-ingest copilot

）。

Before You Start

开始前准备

Read

.env

to get

OBSIDIAN_VAULT_PATH

and

COPILOT_HISTORY_PATH

(defaults to

~/.copilot/session-state

) and

COPILOT_VSCODE_STORAGE_PATH

(the VS Code

workspaceStorage

directory; platform-specific — ask the user if absent from

.env

)

Read
```
.manifest.json
```
at the vault root to check what's already been ingested
Read
```
index.md
```
at the vault root to know what the wiki already contains

读取
```
.env
```
文件获取
```
OBSIDIAN_VAULT_PATH
```
、
```
COPILOT_HISTORY_PATH
```
（默认值为
```
~/.copilot/session-state
```
）和
```
COPILOT_VSCODE_STORAGE_PATH
```
（VS Code的
```
workspaceStorage
```
目录；路径因平台而异——如果
```
.env
```
中缺失，请询问用户）
读取vault根目录下的
```
.manifest.json
```
文件，检查已导入的内容
读取vault根目录下的
```
index.md
```
文件，了解wiki已包含的内容

Ingest Modes

导入模式

Append Mode (default)

追加模式（默认）

Check

.manifest.json

for each source file (events JSONL, transcript JSONL, checkpoint, session-store DB). Only process:

Sessions not in the manifest (new sessions)
Sessions whose
```
updated_at
```
is newer than their
```
ingested_at
```
in the manifest

This is usually what you want — the user ran a few new sessions and wants to capture the delta.

检查

.manifest.json

中的每个源文件（events JSONL、transcript JSONL、checkpoint、session-store DB）。仅处理：

未在manifest中记录的会话（新会话）
```
updated_at
```
时间晚于manifest中
```
ingested_at
```
时间的会话

这通常是用户需要的模式——用户运行了一些新会话，希望捕获增量内容。

Full Mode

全量模式

Process everything regardless of manifest. Use after a

wiki-rebuild

or if the user explicitly asks.

无论manifest记录如何，处理所有内容。在

wiki-rebuild

之后或用户明确要求时使用。

GitHub Copilot Data Layout

GitHub Copilot数据布局

Copilot stores data in three locations. Scan all three.

Copilot将数据存储在三个位置。请扫描所有三个位置。

Source 1:

~/.copilot/session-state/

(CLI sessions)

数据源1：

~/.copilot/session-state/

（CLI会话）

~/.copilot/session-state/
├── <session-uuid>/
│   ├── workspace.yaml           # Session metadata (id, cwd, summary_count, created_at, updated_at)
│   ├── vscode.metadata.json     # VS Code context (workspaceFolder, repositoryProperties, customTitle)
│   ├── events.jsonl             # Full event log — all turns, tool calls, reasoning
│   ├── session.db               # Per-session SQLite (todos/todo_deps only — skip for ingestion)
│   ├── index.md                 # Session summary written at session end
│   ├── checkpoints/             # Checkpoint JSON files (mid-session summaries)
│   │   └── <uuid>.json          # title, overview, history, work_done, technical_details,
│   │                            #   important_files, next_steps
│   ├── files/                   # Artifacts produced during session (plans, diagrams, etc.)
│   └── research/                # Research artifacts
└── ...

~/.copilot/session-state/
├── <session-uuid>/
│   ├── workspace.yaml           # Session metadata (id, cwd, summary_count, created_at, updated_at)
│   ├── vscode.metadata.json     # VS Code context (workspaceFolder, repositoryProperties, customTitle)
│   ├── events.jsonl             # Full event log — all turns, tool calls, reasoning
│   ├── session.db               # Per-session SQLite (todos/todo_deps only — skip for ingestion)
│   ├── index.md                 # Session summary written at session end
│   ├── checkpoints/             # Checkpoint JSON files (mid-session summaries)
│   │   └── <uuid>.json          # title, overview, history, work_done, technical_details,
│   │                            #   important_files, next_steps
│   ├── files/                   # Artifacts produced during session (plans, diagrams, etc.)
│   └── research/                # Research artifacts
└── ...

Source 2:

~/.copilot/session-store.db

(Global SQLite)

数据源2：

~/.copilot/session-store.db

（全局SQLite数据库）

The canonical cross-session database. This is the highest-value source: structured, queryable, and pre-summarised.

sessions       — id, cwd, repository, branch, summary, created_at, updated_at, host_type
turns          — session_id, turn_index, user_message, assistant_response, timestamp
checkpoints    — session_id, checkpoint_number, title, overview, history, work_done,
                 technical_details, important_files, next_steps, created_at
session_files  — session_id, file_path, tool_name, turn_index, first_seen_at
session_refs   — session_id, ref_type (commit/pr/issue), ref_value, turn_index, created_at
search_index   — FTS5 virtual table (content, session_id, source_type, source_id)

跨会话的标准数据库。这是价值最高的数据源：结构化、可查询且已预先提炼摘要。

sessions       — id, cwd, repository, branch, summary, created_at, updated_at, host_type
turns          — session_id, turn_index, user_message, assistant_response, timestamp
checkpoints    — session_id, checkpoint_number, title, overview, history, work_done,
                 technical_details, important_files, next_steps, created_at
session_files  — session_id, file_path, tool_name, turn_index, first_seen_at
session_refs   — session_id, ref_type (commit/pr/issue), ref_value, turn_index, created_at
search_index   — FTS5 virtual table (content, session_id, source_type, source_id)

Source 3: VS Code Workspace Storage (

<workspaceStorage>/<hash>/GitHub.copilot-chat/

)

数据源3：VS Code工作区存储（

<workspaceStorage>/<hash>/GitHub.copilot-chat/

）

VS Code extension data, keyed by workspace hash. The path is platform-specific and must come from

.env

or user input.

<hash>/GitHub.copilot-chat/
├── transcripts/
│   └── <session-uuid>.jsonl     # Conversation transcripts (same JSONL format as events.jsonl)
├── memory-tool/
│   └── memories/
│       └── <base64-session-id>/ # Per-session saved artifacts (plan.md, etc.)
│           └── plan.md
└── codebase-external.sqlite     # Codebase index (skip — no conversation knowledge)

VS Code扩展数据，按工作区哈希值索引。路径因平台而异，必须从

.env

或用户输入获取。

<hash>/GitHub.copilot-chat/
├── transcripts/
│   └── <session-uuid>.jsonl     # Conversation transcripts (same JSONL format as events.jsonl)
├── memory-tool/
│   └── memories/
│       └── <base64-session-id>/ # Per-session saved artifacts (plan.md, etc.)
│           └── plan.md
└── codebase-external.sqlite     # Codebase index (skip — no conversation knowledge)

Key data sources ranked by value:

关键数据源价值排名：

Checkpoints (

session-store.db

checkpoints

table + per-session

checkpoints/*.json

) — Pre-distilled summaries with

overview

work_done

technical_details

important_files

next_steps

. Gold.

Session summaries (
```
session-store.db
```
```
sessions.summary
```
+
```
index.md
```
) — One-paragraph synopsis per session.
Turns (
```
session-store.db
```
```
turns
```
table +
```
events.jsonl
```
/ transcript JSONL) — Full conversation. Rich but verbose.
Memory artifacts (
```
memory-tool/memories/<id>/plan.md
```
etc.) — Pre-written plans and structured notes the user saved explicitly. Worth importing verbatim (or lightly summarised).
File access patterns (
```
session_files
```
table +
```
tool.execution_*
```
events) — Which files the agent repeatedly touched — reveals high-value project files.
Session refs (
```
session_refs
```
table) — Commits, PRs, and issues linked to sessions.
vscode.metadata.json
— Workspace folder path, branch,
```
customTitle
```
(user-set session label). Useful for grouping and naming.

Checkpoints（
```
session-store.db
```
的
```
checkpoints
```
表 + 会话级
```
checkpoints/*.json
```
）——预先提炼的摘要，包含
```
overview
```
、
```
work_done
```
、
```
technical_details
```
、
```
important_files
```
、
```
next_steps
```
。黄金级数据源。
会话摘要（
```
session-store.db
```
的
```
sessions.summary
```
+
```
index.md
```
）——每个会话的一段式概要。
对话回合（
```
session-store.db
```
的
```
turns
```
表 +
```
events.jsonl
```
/ transcript JSONL）——完整对话。内容丰富但冗长。
记忆工件（
```
memory-tool/memories/<id>/plan.md
```
等）——用户明确保存的预先编写的计划和结构化笔记。值得直接导入（或轻度提炼）。
文件访问模式（
```
session_files
```
表 +
```
tool.execution_*
```
事件）——Agent反复访问的文件——揭示高价值项目文件。
会话引用（
```
session_refs
```
表）——与会话关联的提交、PR和问题。
vscode.metadata.json
——工作区文件夹路径、分支、
```
customTitle
```
（用户设置的会话标签）。对分组和命名有用。

Step 1: Survey and Compute Delta

步骤1：排查并计算增量

Scan all three data locations and compare against

.manifest.json

bash

undefined

扫描所有三个数据位置，并与

.manifest.json

进行比较：

bash

undefined

--- Source 1: per-session directories ---

Find all session directories (each has workspace.yaml)

ls ~/.copilot/session-state/

For each session, read workspace.yaml for id/cwd/updated_at

and vscode.metadata.json for customTitle / repositoryProperties

--- Source 2: global database ---

Query session-store.db with sqlite3 (or Python sqlite3)

SELECT s.id, s.cwd, s.repository, s.branch, s.summary, s.updated_at, COUNT(DISTINCT t.turn_index) AS turn_count, COUNT(DISTINCT c.id) AS checkpoint_count FROM sessions s LEFT JOIN turns t ON t.session_id = s.id LEFT JOIN checkpoints c ON c.session_id = s.id GROUP BY s.id ORDER BY s.updated_at DESC;

--- Source 3: VS Code workspace storage ---

For each <hash> directory under workspaceStorage, check for GitHub.copilot-chat/

Find transcript files

ls <workspaceStorage>/<hash>/GitHub.copilot-chat/transcripts/


Build a unified inventory — one entry per session UUID — and classify:

- **New** — not in manifest → needs ingesting
- **Modified** — in manifest but `updated_at` is newer → needs re-ingesting
- **Unchanged** — in manifest and not modified → skip in append mode

Report to the user: "Found X sessions in session-state, Y in session-store.db, Z VS Code transcript files. Checkpoints: A. Delta: B new, C modified."

ls <workspaceStorage>/<hash>/GitHub.copilot-chat/transcripts/


构建统一清单——每个会话UUID对应一个条目，并分类：

- **新会话**——未在manifest中记录 → 需要导入
- **已修改会话**——在manifest中记录但`updated_at`时间更新 → 需要重新导入
- **未修改会话**——在manifest中记录且未修改 → 追加模式下跳过

向用户报告：“在session-state中找到X个会话，在session-store.db中找到Y个，VS Code转录文件Z个。Checkpoints：A个。增量：B个新会话，C个已修改会话。”

Step 2: Ingest Checkpoints and Summaries First

步骤2：优先导入Checkpoints和摘要

Checkpoints are already distilled — process them before touching raw turns.

Checkpoints已完成提炼——在处理原始对话回合之前先处理它们。

From

session-store.db

从

session-store.db

导入：

sql

SELECT s.id, s.cwd, s.repository, s.branch, s.summary,
       c.checkpoint_number, c.title, c.overview, c.work_done,
       c.technical_details, c.important_files, c.next_steps,
       c.created_at
FROM checkpoints c
JOIN sessions s ON c.session_id = s.id
ORDER BY s.updated_at DESC, c.checkpoint_number ASC;

sql

SELECT s.id, s.cwd, s.repository, s.branch, s.summary,
       c.checkpoint_number, c.title, c.overview, c.work_done,
       c.technical_details, c.important_files, c.next_steps,
       c.created_at
FROM checkpoints c
JOIN sessions s ON c.session_id = s.id
ORDER BY s.updated_at DESC, c.checkpoint_number ASC;

From per-session

checkpoints/*.json

从会话级

checkpoints/*.json

导入：

Each checkpoint file has:

title

overview

history

work_done

technical_details

important_files

next_steps

Read

index.md

(if present) as a session-level summary — it's typically written at session end and is already concise.

每个checkpoint文件包含：

title

、

overview

、

history

、

work_done

、

technical_details

、

important_files

、

next_steps

。

读取

index.md

（如果存在）作为会话级摘要——它通常在会话结束时编写，内容已简洁。

What to extract:

需要提取的内容：

```
overview
```
→ high-level description of what the session accomplished
```
work_done
```
→ concrete tasks completed (good for skills / project pages)
```
technical_details
```
→ implementation specifics (good for concepts pages)
```
important_files
```
→ high-value files in the project (good for project pages)
```
next_steps
```
→ open threads (good for linking to ongoing project work)

```
overview
```
→ 会话完成内容的高层描述
```
work_done
```
→ 已完成的具体任务（适用于技能/项目页面）
```
technical_details
```
→ 实现细节（适用于概念页面）
```
important_files
```
→ 项目中的高价值文件（适用于项目页面）
```
next_steps
```
→ 未完成的事项（适用于链接到正在进行的项目工作）

Step 3: Parse Session Turns

步骤3：解析会话回合

Read turns from

session-store.db

(preferred — already parsed) or from

events.jsonl

/ transcript JSONL.

从

session-store.db

（优先选择——已解析）或

events.jsonl

/ transcript JSONL读取对话回合。

From

session-store.db

从

session-store.db

读取：

sql

SELECT turn_index, user_message, assistant_response, timestamp
FROM turns
WHERE session_id = '<uuid>'
ORDER BY turn_index ASC;

sql

SELECT turn_index, user_message, assistant_response, timestamp
FROM turns
WHERE session_id = '<uuid>'
ORDER BY turn_index ASC;

From

events.jsonl

/ transcript JSONL:

从

events.jsonl

/ transcript JSONL读取：

Each file is one session. Each line is a JSON event. See

references/copilot-data-format.md

for the full schema.

Relevant event types:

`type`	What it is	Worth reading?
`session.start`	Session metadata (cwd, branch, version)	Yes — establishes project context
`user.message`	User turn	Yes — `data.content`
`assistant.message`	Assistant turn	Yes — `data.content` (text) + `data.toolRequests`
`tool.execution_start`	Tool call	Skim — reveals what files/commands were used
`tool.execution_end`	Tool result	No — usually noise

Extraction strategy for
assistant.message
:

```
data.content
```
is the assistant's text response — extract this
```
data.reasoningText
```
is internal reasoning — skip (it's the unpacked
```
reasoningOpaque
```
field)
```
data.toolRequests
```
lists tool calls — skim tool names and arguments for file access patterns
Skip
```
type: "tool.execution_end"
```
entirely

每个文件对应一个会话。每行是一个JSON事件。完整架构请参见

references/copilot-data-format.md

。

相关事件类型：

`type`	说明	是否值得读取？
`session.start`	会话元数据（cwd、分支、版本）	是——建立项目上下文
`user.message`	用户回合	是—— `data.content`
`assistant.message`	助手回合	是—— `data.content` （文本） + `data.toolRequests`
`tool.execution_start`	工具调用	略读——揭示使用的文件/命令
`tool.execution_end`	工具结果	否——通常是冗余内容

assistant.message
提取策略：

```
data.content
```
是助手的文本响应——提取此内容
```
data.reasoningText
```
是内部推理逻辑——跳过（是
```
reasoningOpaque
```
字段的展开内容）
```
data.toolRequests
```
列出工具调用——略读工具名称和参数以了解文件访问模式
完全跳过
```
type: "tool.execution_end"
```
事件

Step 3b: Process Memory Artifacts

步骤3b：处理记忆工件

For each session that has a

memory-tool/memories/<base64-id>/

directory in VS Code workspace storage, read any markdown files saved there (typically

plan.md

). These are documents the user explicitly saved — treat them as high-quality, user-authored content.

Decode the base64 directory name to get the session UUID:

python

import base64
session_id = base64.b64decode(dir_name).decode('utf-8')

Memory artifacts map to project

skills/

concepts/

pages, depending on content type.

对于在VS Code工作区存储中拥有

memory-tool/memories/<base64-id>/

目录的会话，读取其中保存的所有markdown文件（通常是

plan.md

）。这些是用户明确保存的文档——视为高质量的用户原创内容。

解码base64目录名称以获取会话UUID：

python

import base64
session_id = base64.b64decode(dir_name).decode('utf-8')

记忆工件根据内容类型映射到项目的

skills/

或

concepts/

页面。

Step 3c: Extract File and Ref Patterns

步骤3c：提取文件和引用模式

From

session-store.db

sql

-- Most-touched files per project
SELECT repository, file_path, COUNT(*) AS touch_count
FROM session_files
GROUP BY repository, file_path
ORDER BY touch_count DESC;

-- Linked commits/PRs/issues per session
SELECT session_id, ref_type, ref_value, turn_index
FROM session_refs
ORDER BY session_id, turn_index;

File access patterns reveal which files are architecturally important — note them on project pages.

Session refs link Copilot sessions to git history — useful for connecting wiki knowledge to concrete code changes.

从

session-store.db

提取：

sql

-- Most-touched files per project
SELECT repository, file_path, COUNT(*) AS touch_count
FROM session_files
GROUP BY repository, file_path
ORDER BY touch_count DESC;

-- Linked commits/PRs/issues per session
SELECT session_id, ref_type, ref_value, turn_index
FROM session_refs
ORDER BY session_id, turn_index;

文件访问模式揭示哪些文件在架构上重要——在项目页面中记录它们。

会话引用将Copilot会话与git历史关联——有助于将wiki知识与具体代码变更连接起来。

Step 4: Cluster by Topic

步骤4：按主题聚类

Don't create one wiki page per session. Instead:

Group extracted knowledge by topic across sessions
A single session about "debugging auth + setting up CI" → two separate topics
Three sessions across different days about "React performance" → one merged topic
```
cwd
```
/
```
repository
```
give you a natural first-level grouping;
```
vscode.metadata.json
```
's
```
customTitle
```
gives a human-readable session label

不要为每个会话创建一个wiki页面。相反：

跨会话按主题分组提取的知识
一个关于“调试认证 + 设置CI”的会话 → 分为两个独立主题
不同日期的三个关于“React性能”的会话 → 合并为一个主题
```
cwd
```
/
```
repository
```
提供自然的一级分组；
```
vscode.metadata.json
```
中的
```
customTitle
```
提供人类可读的会话标签

Step 5: Distill into Wiki Pages

步骤5：提炼为Wiki页面

Each Copilot project maps to a project directory in the vault. Derive the project name from

cwd

repository

C:\Users\name\git\my-project   → my-project
/Users/name/code/another-app   → another-app

Prefer

repository

(e.g.,

owner/repo

) from

session-store.db

over raw

cwd

when available.

每个Copilot项目对应vault中的一个项目目录。从

cwd

或

repository

派生项目名称：

C:\Users\name\git\my-project   → my-project
/Users/name/code/another-app   → another-app

当可用时，优先使用

session-store.db

中的

repository

（例如

owner/repo

）而非原始

cwd

。

Project-specific vs. global knowledge

项目特定知识 vs 全局知识

What you found	Where it goes	Example
Project architecture decisions	`projects/<name>/concepts/`	`projects/my-project/concepts/main-architecture.md`
Project-specific debugging patterns	`projects/<name>/skills/`	`projects/my-project/skills/api-rate-limiting.md`
General concept the user learned	`concepts/` (global)	`concepts/react-server-components.md`
Recurring problem across projects	`skills/` (global)	`skills/debugging-hydration-errors.md`
A tool/service used	`entities/` (global)	`entities/vercel-functions.md`
Patterns across many sessions	`synthesis/` (global)	`synthesis/common-debugging-patterns.md`

For each project with content, create or update the project overview page at

projects/<name>/<name>.md

— named after the project, not
_project.md
. Obsidian's graph view uses the filename as the node label, so

_project.md

makes every project show up as

_project

in the graph. Naming it

<name>.md

gives each project a distinct, readable node name.

Important: Distill the knowledge, not the conversation. Don't write "In a session on March 15, the user asked about X." Write the knowledge itself, with the session as a source attribution.

Write a
summary:
frontmatter field on every new/updated page — 1–2 sentences, ≤200 chars, answering "what is this page about?" for a reader who hasn't opened it.

wiki-query

's cheap retrieval path reads this field to avoid opening page bodies.

Mark provenance per the convention in

llm-wiki

(Provenance Markers section):

Checkpoints and index.md are pre-distilled by the system — treat checkpoint-derived claims as extracted (the system wrote them from observed actions).
Memory artifacts are user-authored — treat as extracted.
Conversation turn distillation is mostly inferred. You're synthesizing a coherent claim from many turns. Apply
```
^[inferred]
```
liberally to synthesized patterns, generalizations across sessions, and "what the user really meant" interpretations.
Use
```
^[ambiguous]
```
when the user changed direction mid-session or when the session ended unresolved.
Write a
```
provenance:
```
frontmatter block on every new/updated page summarizing the rough mix.

发现内容	存储位置	示例
项目架构决策	`projects/<name>/concepts/`	`projects/my-project/concepts/main-architecture.md`
项目特定调试模式	`projects/<name>/skills/`	`projects/my-project/skills/api-rate-limiting.md`
用户学到的通用概念	`concepts/` （全局）	`concepts/react-server-components.md`
跨项目的常见问题	`skills/` （全局）	`skills/debugging-hydration-errors.md`
使用的工具/服务	`entities/` （全局）	`entities/vercel-functions.md`
多会话中的模式	`synthesis/` （全局）	`synthesis/common-debugging-patterns.md`

对于有内容的每个项目，创建或更新项目概述页面

projects/<name>/<name>.md

——以项目命名，而非
_project.md
。Obsidian的图谱视图使用文件名作为节点标签，因此

_project.md

会使每个项目在图谱中显示为

_project

。命名为

<name>.md

可为每个项目提供独特、可读的节点名称。

**重要提示：**提炼的是_知识_，而非对话内容。不要写“在3月15日的会话中，用户询问了X。”而是直接写知识本身，并将会话作为来源归因。

在每个新建/更新的页面上添加
summary:
前置字段——1-2句话，≤200字符，回答“这个页面是关于什么的？”，供未打开页面的读者了解。

wiki-query

的快速检索路径会读取此字段，避免打开页面主体。

按照
llm-wiki
中的约定标记来源（来源标记部分）：

Checkpoints和index.md由系统预先提炼——将来自checkpoint的声明视为提取内容（系统从观察到的操作中编写）。
记忆工件是用户原创——视为提取内容。
对话回合提炼大多是推断内容。你需要从多个回合中合成连贯的声明。对合成模式、跨会话概括以及“用户真实意图”的解释，广泛使用
```
^[inferred]
```
标记。
当用户在会话中途改变方向或会话未解决时，使用
```
^[ambiguous]
```
标记。
在每个新建/更新的页面上添加
```
provenance:
```
前置块，总结大致的来源类型占比。

Step 6: Update Manifest, Journal, and Special Files

步骤6：更新Manifest、日志和特殊文件

Update

.manifest.json

更新

.manifest.json

For each session processed, add/update its entry with:

```
ingested_at
```
,
```
session_id
```
,
```
updated_at
```

source_type

: one of

"copilot_session"

"copilot_checkpoint"

"copilot_transcript"

"copilot_memory_artifact"

```
project
```
: the decoded project name
```
pages_created
```
and
```
pages_updated
```
lists

Also update the

projects

section of the manifest:

json

{
  "project-name": {
    "repository": "owner/repo",
    "cwd": "C:\\Users\\name\\git\\project-name",
    "vault_path": "projects/project-name",
    "last_ingested": "TIMESTAMP",
    "sessions_ingested": 5,
    "sessions_total": 8,
    "checkpoints_ingested": 12,
    "memory_artifacts_ingested": 3
  }
}

对于每个处理的会话，添加/更新其条目，包含：

```
ingested_at
```
、
```
session_id
```
、
```
updated_at
```

source_type

：

"copilot_session"

、

"copilot_checkpoint"

、

"copilot_transcript"

、

"copilot_memory_artifact"

之一

```
project
```
：解码后的项目名称
```
pages_created
```
和
```
pages_updated
```
列表

同时更新manifest的

projects

部分：

json

{
  "project-name": {
    "repository": "owner/repo",
    "cwd": "C:\\Users\\name\\git\\project-name",
    "vault_path": "projects/project-name",
    "last_ingested": "TIMESTAMP",
    "sessions_ingested": 5,
    "sessions_total": 8,
    "checkpoints_ingested": 12,
    "memory_artifacts_ingested": 3
  }
}

Create journal entry + update special files

创建日志条目 + 更新特殊文件

Update

index.md

and

log.md

per the standard process:

- [TIMESTAMP] COPILOT_HISTORY_INGEST projects=N sessions=M checkpoints=C pages_updated=X pages_created=Y mode=append|full

hot.md
— Read

$OBSIDIAN_VAULT_PATH/hot.md

(create from the template in

wiki-ingest

if missing). Update Recent Activity with a one-line summary — e.g. "Ingested 5 Copilot sessions across 2 projects; surfaced patterns in API design and testing strategy." Keep the last 3 operations. Update Active Threads if any ongoing project is now better understood. Update

updated

timestamp.

按照标准流程更新

index.md

和

log.md

：

- [TIMESTAMP] COPILOT_HISTORY_INGEST projects=N sessions=M checkpoints=C pages_updated=X pages_created=Y mode=append|full

hot.md
——读取

$OBSIDIAN_VAULT_PATH/hot.md

（如果缺失，从

wiki-ingest

中的模板创建）。更新近期活动部分，添加一行摘要——例如“导入了2个项目的5个Copilot会话；提炼出API设计和测试策略中的模式。”保留最近3次操作。如果任何进行中的项目现在有了更清晰的理解，更新活跃线程部分。更新

updated

时间戳。

Privacy

隐私说明

Distill and synthesize — don't copy raw conversation text verbatim
Skip anything that looks like secrets, API keys, passwords, tokens
```
data.reasoningOpaque
```
/
```
data.reasoningText
```
in assistant events is internal reasoning — skip entirely, never copy to wiki
If you encounter personal/sensitive content, ask the user before including it
The user's conversations may reference other people — be thoughtful about what goes in the wiki

提炼和合成内容——不要直接复制原始对话文本
跳过任何看起来像密钥、API密钥、密码、令牌的内容
助手事件中的
```
data.reasoningOpaque
```
/
```
data.reasoningText
```
是内部推理逻辑——完全跳过，绝不复制到wiki
如果遇到个人/敏感内容，在包含前询问用户
用户的对话可能涉及其他人——谨慎决定哪些内容可以加入wiki

Reference

参考资料

See

references/copilot-data-format.md

for detailed data structure documentation.

详细的数据结构文档请参见

references/copilot-data-format.md

。

copilot-history-ingest

Original

Translation

Copilot History Ingest — Conversation Mining

Copilot历史导入——会话挖掘

Before You Start

开始前准备

Ingest Modes

导入模式

Append Mode (default)

追加模式（默认）

Full Mode

全量模式

GitHub Copilot Data Layout

GitHub Copilot数据布局

Source 1: ~/.copilot/session-state/ (CLI sessions)

数据源1：~/.copilot/session-state/（CLI会话）

Source 2: ~/.copilot/session-store.db (Global SQLite)

数据源2：~/.copilot/session-store.db（全局SQLite数据库）

Source 3: VS Code Workspace Storage (<workspaceStorage>/<hash>/GitHub.copilot-chat/)

数据源3：VS Code工作区存储（<workspaceStorage>/<hash>/GitHub.copilot-chat/）

Key data sources ranked by value:

关键数据源价值排名：

Step 1: Survey and Compute Delta

步骤1：排查并计算增量

--- Source 1: per-session directories ---

--- Source 1: per-session directories ---

Find all session directories (each has workspace.yaml)

Find all session directories (each has workspace.yaml)

For each session, read workspace.yaml for id/cwd/updated_at

For each session, read workspace.yaml for id/cwd/updated_at

and vscode.metadata.json for customTitle / repositoryProperties

and vscode.metadata.json for customTitle / repositoryProperties

--- Source 2: global database ---

--- Source 2: global database ---

Query session-store.db with sqlite3 (or Python sqlite3)

Query session-store.db with sqlite3 (or Python sqlite3)

--- Source 3: VS Code workspace storage ---

--- Source 3: VS Code workspace storage ---

For each <hash> directory under workspaceStorage, check for GitHub.copilot-chat/

For each <hash> directory under workspaceStorage, check for GitHub.copilot-chat/

Find transcript files

Find transcript files

Step 2: Ingest Checkpoints and Summaries First

步骤2：优先导入Checkpoints和摘要

From session-store.db:

从session-store.db导入：

From per-session checkpoints/*.json:

从会话级checkpoints/*.json导入：

What to extract:

需要提取的内容：

Step 3: Parse Session Turns

步骤3：解析会话回合

From session-store.db:

从session-store.db读取：

From events.jsonl / transcript JSONL:

从events.jsonl / transcript JSONL读取：

Step 3b: Process Memory Artifacts

步骤3b：处理记忆工件

Step 3c: Extract File and Ref Patterns

步骤3c：提取文件和引用模式

Step 4: Cluster by Topic

步骤4：按主题聚类

Step 5: Distill into Wiki Pages

步骤5：提炼为Wiki页面

Project-specific vs. global knowledge

项目特定知识 vs 全局知识

Step 6: Update Manifest, Journal, and Special Files

步骤6：更新Manifest、日志和特殊文件

Update .manifest.json

更新.manifest.json

Create journal entry + update special files

创建日志条目 + 更新特殊文件

Privacy

隐私说明

Reference

参考资料

Source 1:
`~/.copilot/session-state/`
(CLI sessions)

数据源1：
`~/.copilot/session-state/`
（CLI会话）

Source 2:
`~/.copilot/session-store.db`
(Global SQLite)

数据源2：
`~/.copilot/session-store.db`
（全局SQLite数据库）

Source 3: VS Code Workspace Storage (
`<workspaceStorage>/<hash>/GitHub.copilot-chat/`
)

数据源3：VS Code工作区存储（
`<workspaceStorage>/<hash>/GitHub.copilot-chat/`
）

From
`session-store.db`
:

从
`session-store.db`
导入：

From per-session
`checkpoints/*.json`
:

从会话级
`checkpoints/*.json`
导入：

From
`session-store.db`
:

从
`session-store.db`
读取：

From
`events.jsonl`
/ transcript JSONL:

从
`events.jsonl`
/ transcript JSONL读取：

Update
`.manifest.json`

更新
`.manifest.json`