llm-wiki

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

LLM Wiki — Knowledge Distillation Pattern

LLM Wiki — 知识蒸馏模式

You are maintaining a persistent, compounding knowledge base. The wiki is not a chatbot — it is a compiled artifact where knowledge is distilled once and kept current, not re-derived on every query.

你正在维护一个持久化、可积累的知识库。这个wiki不是聊天机器人，它是一个编译产物，知识仅需蒸馏一次并保持更新，无需在每次查询时重新推导。

Three-Layer Architecture

三层架构

Layer 1: Raw Sources (immutable)

第一层：原始源数据（不可变）

The user's original documents — articles, papers, notes, PDFs, conversation logs, bookmarks, and images (screenshots, whiteboard photos, diagrams, slide captures). These are never modified by the system. They live wherever the user keeps them (configured via

OBSIDIAN_SOURCES_DIR

.env

). Images are first-class sources: the ingest skills read them via the Read tool's vision support and treat their interpreted content as inferred unless it's verbatim transcribed text. Image ingestion requires a vision-capable model — models without vision support should skip image sources and report which files were skipped.

Think of raw sources as the "source code" — authoritative but hard to query directly.

用户的原始文档——文章、论文、笔记、PDF、对话日志、书签、以及图片（截图、白板照片、示意图、幻灯片截图）。系统永远不会修改这些内容，它们存储在用户指定的位置（通过

.env

中的

OBSIDIAN_SOURCES_DIR

配置）。图片是一等源数据：摄入技能通过Read工具的视觉支持读取图片，将其解析内容视为推断内容，除非是逐字转录的文本。图片摄入需要支持视觉能力的模型，不支持视觉的模型应跳过图片源并报告跳过的文件。

可以把原始源数据想象成「源代码」：权威但难以直接查询。

Layer 2: The Wiki (LLM-maintained)

第二层：Wiki（LLM维护）

A collection of interconnected Obsidian-compatible markdown files organized by category. This is the compiled knowledge — synthesized, cross-referenced, and navigable. Each page has:

YAML frontmatter (title, category, tags, sources, timestamps)
Obsidian
```
[[wikilinks]]
```
connecting related concepts
Clear provenance — every claim traces back to a source

The wiki lives at the path configured via

OBSIDIAN_VAULT_PATH

.env

一系列相互关联、兼容Obsidian的markdown文件，按类别组织。这是编译后的知识：经过合成、交叉引用、可导航。每个页面包含：

YAML frontmatter（标题、类别、标签、来源、时间戳）
Obsidian
```
[[wikilinks]]
```
连接相关概念
清晰的来源追溯——每条主张都可溯源到对应来源

wiki存储在

.env

中

OBSIDIAN_VAULT_PATH

配置的路径下。

Layer 3: The Schema (this skill + config)

第三层：模式（本技能 + 配置）

The rules governing how the wiki is structured — categories, conventions, page templates, and operational workflows. The schema tells the LLM how to maintain the wiki.

管控wiki结构的规则：类别、约定、页面模板、操作工作流。模式告诉LLM 如何维护wiki。

Wiki Organization

Wiki组织

The vault has two levels of structure: categories (what kind of knowledge) and projects (where the knowledge came from).

vault有两层结构：类别（知识的类型）和项目（知识的来源）。

类别

Organize pages into these default categories (customizable in

.env

Category	Purpose	Example
`concepts/`	Ideas, theories, mental models	`concepts/transformer-architecture.md`
`entities/`	People, orgs, tools, projects	`entities/andrej-karpathy.md`
`skills/`	How-to knowledge, procedures	`skills/fine-tuning-llms.md`
`references/`	Summaries of specific sources	`references/attention-is-all-you-need.md`
`synthesis/`	Cross-cutting analysis across sources	`synthesis/scaling-laws-debate.md`
`journal/`	Timestamped observations, session logs	`journal/2024-03-15.md`

将页面归类到以下默认类别中（可在

.env

中自定义）：

类别	用途	示例
`concepts/`	创意、理论、思维模型	`concepts/transformer-architecture.md`
`entities/`	人物、组织、工具、项目	`entities/andrej-karpathy.md`
`skills/`	操作指南类知识、流程	`skills/fine-tuning-llms.md`
`references/`	特定来源的摘要	`references/attention-is-all-you-need.md`
`synthesis/`	跨来源的交叉分析	`synthesis/scaling-laws-debate.md`
`journal/`	带时间戳的观察记录、会话日志	`journal/2024-03-15.md`

Projects

项目

Knowledge often belongs to a specific project. The

projects/

directory mirrors this:

$OBSIDIAN_VAULT_PATH/
├── projects/
│   ├── my-project/
│   │   ├── my-project.md      ← project overview (named after project)
│   │   ├── concepts/          ← project-scoped category pages
│   │   ├── skills/
│   │   └── ...
│   ├── another-project/
│   │   └── ...
│   └── side-project/
│       └── ...
├── concepts/                   ← global (cross-project) knowledge
├── entities/
├── skills/
└── ...

When knowledge is project-specific (a debugging technique that only applies to one codebase, a project-specific architecture decision), put it under

projects/<project-name>/<category>/

When knowledge is general (a concept like "React Server Components", a person like "Andrej Karpathy", a widely applicable skill), put it in the global category directory.

Cross-referencing: Project pages should

[[wikilink]]

to global pages and vice versa. A project's overview page should link to the key concept, skill, and entity pages relevant to that project — whether they live under the project or globally.

Naming rule: The project overview file must be named

<project-name>.md

, not

_project.md

. Obsidian's graph view uses the filename as the node label —

_project.md

makes every project appear as

_project

in the graph, making it unreadable. So

projects/my-project/my-project.md

projects/another-project/another-project.md

, etc.

Each project directory has an overview page structured like this:

markdown

---
title: My Project
category: project
tags: [ai, web, backend]
source_path: ~/.claude/projects/-Users-name-Documents-projects-my-project
created: 2026-03-01T00:00:00Z
updated: 2026-04-06T00:00:00Z
---

知识通常归属于特定项目，

projects/

目录对应这一属性：

$OBSIDIAN_VAULT_PATH/
├── projects/
│   ├── my-project/
│   │   ├── my-project.md      ← 项目概览（以项目名命名）
│   │   ├── concepts/          ← 项目级别的类别页面
│   │   ├── skills/
│   │   └── ...
│   ├── another-project/
│   │   └── ...
│   └── side-project/
│       └── ...
├── concepts/                   ← 全局（跨项目）知识
├── entities/
├── skills/
└── ...

当知识属于特定项目时（仅适用于某个代码库的调试技巧、项目专属的架构决策），将其放在

projects/<project-name>/<category>/

下。

当知识是通用内容时（比如「React Server Components」这类概念、「Andrej Karpathy」这类人物、广泛适用的技能），将其放在全局类别目录中。

交叉引用： 项目页面应通过

[[wikilink]]

链接到全局页面，反之亦然。项目概览页面需要链接到与该项目相关的核心概念、技能、实体页面，无论这些页面是在项目目录下还是全局目录下。

命名规则： 项目概览文件必须命名为

<project-name>.md

，不能用

_project.md

。Obsidian的关系图视图会将文件名作为节点标签，

_project.md

会导致所有项目在关系图中都显示为

_project

，无法识别。因此要按照

projects/my-project/my-project.md

、

projects/another-project/another-project.md

的规则命名。

每个项目目录的概览页面结构如下：

markdown

---
title: My Project
category: project
tags: [ai, web, backend]
source_path: ~/.claude/projects/-Users-name-Documents-projects-my-project
created: 2026-03-01T00:00:00Z
updated: 2026-04-06T00:00:00Z
---

My Project

One-paragraph summary of what this project is.

用一段文字总结该项目的内容。

Key Concepts

核心概念

[[concepts/some-api]] — used for core functionality
[[projects/my-project/concepts/main-architecture]] — project-specific architecture

[[concepts/some-api]] — 用于核心功能
[[projects/my-project/concepts/main-architecture]] — 项目专属架构

Special Files

特殊文件

Every wiki has these files at its root:

每个wiki的根目录下都包含以下文件：

index.md

index.md

A content-oriented catalog organized by category. Each entry has a one-line summary and tags. Rebuild this after every ingest operation. Format:

markdown

undefined

按类别组织的内容导向目录，每个条目包含一行简介和标签。每次摄入操作后都需要重建该文件，格式如下：

markdown

undefined

Wiki Index

Wiki索引

Concepts

概念

[[transformer-architecture]] — The dominant architecture for sequence modeling ( #ml #architecture)
[[attention-mechanism]] — Core building block of transformers ( #ml #fundamentals)

[[transformer-architecture]] — 序列建模的主流架构 ( #ml #architecture)
[[attention-mechanism]] — transformer的核心构建块 ( #ml #fundamentals)

Entities

实体

[[andrej-karpathy]] — AI researcher, educator, former Tesla AI director ( #person #ml)

**Format rule**: Add a space after the opening `(` and tags.
❌ Don't: `description (#tag)` — breaks tag parsing
✅ Do: `description ( #tag)` — proper spacing and tag parsing

[[andrej-karpathy]] — AI研究员、教育者、前特斯拉AI负责人 ( #person #ml)

**格式规则：** 在左括号`(`和标签之间添加空格。
❌ 错误写法：`description (#tag)` — 会导致标签解析失败
✅ 正确写法：`description ( #tag)` — 空格正确，标签可正常解析

log.md

log.md

Chronological append-only record tracking every operation. Each entry is parseable:

markdown

undefined

按时间顺序追加的操作记录，每条记录都可被解析：

markdown

undefined

Log

日志

[2024-03-15T10:30:00Z] INGEST source="papers/attention.pdf" pages_updated=12 pages_created=3
[2024-03-15T11:00:00Z] QUERY query="How do transformers handle long sequences?" result_pages=4
[2024-03-16T09:00:00Z] LINT issues_found=2 orphans=1 contradictions=1
[2024-03-17T10:00:00Z] ARCHIVE reason="rebuild" pages=87 destination="_archives/..."
[2024-03-17T10:05:00Z] REBUILD archived_to="_archives/..." previous_pages=87

undefined

[2024-03-15T10:30:00Z] INGEST source="papers/attention.pdf" pages_updated=12 pages_created=3
[2024-03-15T11:00:00Z] QUERY query="How do transformers handle long sequences?" result_pages=4
[2024-03-16T09:00:00Z] LINT issues_found=2 orphans=1 contradictions=1
[2024-03-17T10:00:00Z] ARCHIVE reason="rebuild" pages=87 destination="_archives/..."
[2024-03-17T10:05:00Z] REBUILD archived_to="_archives/..." previous_pages=87

undefined

.manifest.json

.manifest.json

Tracks every source file that has been ingested — path, timestamps, what wiki pages it produced. This is the backbone of the delta system. See the

wiki-status

skill for the full schema.

The manifest enables:

Delta computation — what's new or modified since last ingest
Append mode — only process the delta, not everything
Audit — which source produced which wiki page
Staleness detection — source changed but wiki page hasn't been updated

跟踪所有已摄入的源文件：路径、时间戳、生成的wiki页面。这是增量更新系统的核心，完整schema可参考

wiki-status

技能。

该清单支持以下能力：

增量计算 — 上次摄入后新增或修改的内容
追加模式 — 仅处理增量内容，而非全部内容
审计 — 追溯某个wiki页面对应的来源
过期检测 — 源文件已修改但wiki页面未更新

Page Template

页面模板

When creating a new wiki page, use this structure:

markdown

---
title: Page Title
category: concepts
tags: [ml, architecture]
aliases: [alternate name]
sources: [papers/attention.pdf]
summary: One or two sentences, ≤200 chars, so a reader (or another skill) can preview this page without opening it.
provenance:
  extracted: 0.72
  inferred: 0.25
  ambiguous: 0.03
created: 2024-03-15T10:30:00Z
updated: 2024-03-15T10:30:00Z
---

创建新wiki页面时，使用以下结构：

markdown

---
title: 页面标题
category: concepts
tags: [ml, architecture]
aliases: [别名]
sources: [papers/attention.pdf]
summary: 1-2句话，不超过200字符，方便用户或其他技能无需打开页面即可预览内容。
provenance:
  extracted: 0.72
  inferred: 0.25
  ambiguous: 0.03
created: 2024-03-15T10:30:00Z
updated: 2024-03-15T10:30:00Z
---

Page Title

页面标题

One-paragraph summary of what this page covers.

用一段文字总结该页面覆盖的内容。

Key Ideas

核心观点

The source's central claim, paraphrased directly.
A generalization the source implies but doesn't state outright. ^[inferred]
A figure two sources disagree on. ^[ambiguous]

Use [[wikilinks]] to connect to related pages.

源文件核心主张的直接转述。
源文件暗示但未直接说明的通用结论。 ^[inferred]
两个来源存在分歧的数值。 ^[ambiguous]

使用[[wikilinks]]链接到相关页面。

Open Questions

待解决问题

Things that are unresolved or need more sources.

尚未解决或需要更多来源佐证的内容。

Sources

来源

[[references/attention-is-all-you-need]] — Original paper

undefined

[[references/attention-is-all-you-need]] — 原始论文

undefined

Provenance Markers

来源标记

Every claim on a wiki page has one of three provenance states. Mark them inline so the reader (and future ingest passes) can tell signal from synthesis.

State	Marker	Meaning
Extracted	(no marker — default)	A paraphrase of something a source actually says.
Inferred	`^[inferred]` suffix	An LLM-synthesized claim — a connection, generalization, or implication the source doesn't state directly.
Ambiguous	`^[ambiguous]` suffix	Sources disagree, or the source is unclear.

Example:

markdown

- Transformers parallelize across positions, unlike RNNs.
- This is why they scale better on modern hardware. ^[inferred]
- GPT-4 was trained on roughly 13T tokens. ^[ambiguous]

Why this syntax:

```
^[...]
```
is footnote-adjacent in Obsidian — renders cleanly and never collides with
```
[[wikilinks]]
```
.
Inline (suffix) so a single bullet stays a single bullet.
Default = extracted means existing pages without markers stay valid.

Frontmatter summary: Optionally surface the rough mix at the page level so the user can scan for speculation-heavy pages without reading them:

yaml

provenance:
  extracted: 0.72   # rough fraction of sentences/bullets with no marker
  inferred: 0.25
  ambiguous: 0.03

These are best-effort numbers written by the ingest skill at create/update time.

wiki-lint

recomputes them and flags drift. The block is optional — pages without it are treated as fully extracted by convention.

wiki页面上的每条主张都属于以下三种来源状态之一。在行内标注这些状态，方便读者和后续的摄入流程区分原始信息和合成内容。

状态	标记	含义
提取内容	(无标记 — 默认)	源文件实际表述内容的转述。
推断内容	`^[inferred]` 后缀	LLM合成的主张——源文件未直接说明的关联、通用结论或隐含信息。
存疑内容	`^[ambiguous]` 后缀	来源存在分歧，或源文件表述不清晰。

示例：

markdown

- Transformers可以跨位置并行计算，和RNN不同。
- 这是它在现代硬件上扩展性更好的原因。 ^[inferred]
- GPT-4大约使用13T token训练。 ^[ambiguous]

使用该语法的原因：

```
^[...]
```
和Obsidian的脚注语法接近，渲染效果干净，不会和
```
[[wikilinks]]
```
冲突。
行内后缀形式，保证单个列表项保持完整。
默认状态为提取内容，意味着没有标记的现有页面依然有效。

Frontmatter摘要： 可以选择在页面级别展示大致的内容占比，用户无需阅读页面即可快速识别推测内容较多的页面：

yaml

provenance:
  extracted: 0.72   # 无标记的句子/列表项的大致占比
  inferred: 0.25
  ambiguous: 0.03

这些是摄入技能在创建/更新页面时写入的估算数值，

wiki-lint

会重新计算并标记偏差。该模块是可选的，按照约定，没有该模块的页面会被视为全部是提取内容。

Retrieval Primitives

检索原语

Reading the vault is the dominant cost of every read-side skill. Use the cheapest primitive that can answer the question and escalate only when the cheaper one is insufficient. Any skill that needs content from the vault should follow this table rather than jumping straight to full-page reads.

Need	Primitive	Relative cost
Does a page exist? What's its title/category/tags?	Read `index.md` ; `Grep` frontmatter blocks (scope with a pattern that targets `^---` blocks at file heads)	Cheapest
1–2 sentence preview of a page	Read the `summary:` field in its frontmatter	Cheap
A specific claim or section inside a page	`Grep -A <n> -B <n> "<term>" <file>` — returns only the matching lines plus context	Medium
Whole-page content	`Read <file>`	Expensive — last resort
Relationships across pages	`Grep "\[\[.*?\]\]"` across the vault, or walk wikilinks from a known page	Case-by-case

The rule: escalate only when the cheaper primitive can't answer the question. If you can answer from

summary:

fields alone, don't read page bodies. If a grepped section with

-A 10 -B 2

gives you the claim, don't read the whole page. A 500-line page opened to read 15 lines is 485 lines of wasted tokens.

Why this matters: a 20-page vault lets you get away with full-vault scans. A 200-page vault does not. The primitives above are how the skills framework scales to large vaults without a database.

Skills that consume this table:

wiki-query

cross-linker

wiki-lint

wiki-status

(insights mode). Any new skill that reads the vault should cite this section rather than reinvent the pattern.

读取vault是所有读侧技能的主要开销。使用能满足需求的最低开销原语，仅当低开销原语不足以满足需求时再升级。任何需要从vault获取内容的技能都应遵循下表，而不是直接读取全页内容。

需求	原语	相对开销
页面是否存在？它的标题/类别/标签是什么？	读取 `index.md` ； `Grep` frontmatter块（使用匹配文件头部 `^---` 块的规则限定范围）	最低
获取页面的1-2句预览内容	读取frontmatter中的 `summary:` 字段	低
页面内的特定主张或章节	`Grep -A <n> -B <n> "<term>" <file>` — 仅返回匹配行及上下文	中等
全页内容	`Read <file>`	高 — 最后手段
跨页面的关系	全vault `Grep "\[\[.*?\]\]"` ，或从已知页面遍历wikilinks	视情况而定

规则： 仅当低开销原语无法回答问题时再升级。如果仅通过

summary:

字段就能回答，就不要读取页面正文。如果带

-A 10 -B 2

的grep结果就能获取需要的主张，就不要读取整个页面。为了读取15行内容打开一个500行的页面，会浪费485行的token。

重要性： 20页的vault可以接受全量扫描，但200页的vault不行。上述原语是技能框架无需数据库即可支撑大型vault的核心。

使用该规则的技能：

wiki-query

、

cross-linker

、

wiki-lint

、

wiki-status

（洞察模式）。任何读取vault的新技能都应参考本章节，不要重复造轮子。

Core Principles

核心原则

Compile, don't retrieve. The wiki is pre-compiled knowledge. When you ingest a source, update every relevant page — don't just create a summary of the source.
Compound over time. Each ingest should make the wiki smarter, not just bigger. Merge new information into existing pages, resolve contradictions, strengthen cross-references.
Provenance matters. Every claim should trace to a source. When updating a page, note which source prompted the update.
Mark inferences. Default sentences are extracted. Mark synthesized claims with
```
^[inferred]
```
and contested claims with
```
^[ambiguous]
```
. A wiki that hides its guessing rots silently; one that marks it stays trustworthy.
Human curates, LLM maintains. The human decides what sources to add and what questions to ask. The LLM handles the bookkeeping — updating cross-references, maintaining consistency, noting contradictions.
Obsidian is the IDE. The user browses and explores the wiki in Obsidian. Everything must be valid Obsidian markdown with working wikilinks.

编译，而非检索。 wiki是预编译的知识。摄入源文件时，更新所有相关页面，不要只创建源文件的摘要。
随时间持续积累。 每次摄入都应该让wiki更智能，而不仅仅是体积更大。将新信息合并到现有页面，解决矛盾，强化交叉引用。
来源追溯很重要。 每条主张都可以溯源到对应的来源。更新页面时，标注触发更新的来源。
标记推断内容。 默认句子为提取内容，合成主张用
```
^[inferred]
```
标记，存在争议的主张用
```
^[ambiguous]
```
标记。隐藏猜测内容的wiki会默默失效，标记猜测内容的wiki才能保持可信度。
人类负责 curated，LLM负责维护。 人类决定添加哪些来源、提出哪些问题，LLM负责记录工作：更新交叉引用、保持一致性、标注矛盾。
Obsidian是IDE。 用户在Obsidian中浏览和探索wiki，所有内容都必须是有效的Obsidian markdown，带有可正常工作的wikilinks。

Environment Variables

环境变量

The wiki is configured through environment variables (see

.env.example

). The only required variable is the vault path — everything else has sensible defaults.

```
OBSIDIAN_VAULT_PATH
```
— Where the wiki lives (required)
```
OBSIDIAN_SOURCES_DIR
```
— Where raw source documents are
```
OBSIDIAN_CATEGORIES
```
— Comma-separated list of categories
```
CLAUDE_HISTORY_PATH
```
— Where to find Claude conversation data

No API keys are needed — the agent running these skills already has LLM access built in.

wiki通过环境变量配置（参考

.env.example

）。唯一必填的变量是vault路径，其他所有配置都有合理的默认值。

```
OBSIDIAN_VAULT_PATH
```
— wiki的存储路径 （必填）
```
OBSIDIAN_SOURCES_DIR
```
— 原始源文档的存储路径
```
OBSIDIAN_CATEGORIES
```
— 逗号分隔的类别列表
```
CLAUDE_HISTORY_PATH
```
— Claude对话数据的存储路径

不需要API密钥——运行这些技能的Agent已经内置了LLM访问权限。

Modes of Operation

操作模式

The wiki supports three ingest modes:

Mode	When to use	What happens
Append	Small delta, incremental updates	Compute delta via manifest, ingest only new/modified sources
Rebuild	Major drift, fresh start needed	Archive current wiki to `_archives/` , clear, reprocess all sources
Restore	Need to go back	Bring back a previous archive

Use

wiki-status

to see the delta and get a recommendation. Use

wiki-rebuild

for archive/rebuild/restore operations.

wiki支持三种摄入模式：

模式	使用场景	操作内容
追加	少量增量、渐进式更新	通过清单计算增量，仅摄入新增/修改的源文件
重建	偏差较大、需要全新开始	将当前wiki归档到 `_archives/` ，清空后重新处理所有源文件
恢复	需要回退到历史版本	恢复之前的归档版本

使用

wiki-status

查看增量并获取推荐操作，使用

wiki-rebuild

执行归档/重建/恢复操作。

Reference

参考

For details on specific operations, see the companion skills:

wiki-status — Audit what's ingested, compute delta, recommend append vs rebuild
wiki-rebuild — Archive current wiki, rebuild from scratch, or restore from archive
wiki-ingest — Distill source documents into wiki pages
claude-history-ingest — Ingest Claude conversation history
codex-history-ingest — Ingest Codex CLI session history
data-ingest — Ingest any raw text data
wiki-query — Answer questions against the wiki
wiki-lint — Audit and maintain wiki health
wiki-setup — Initialize a new vault

具体操作的细节可参考配套技能：

wiki-status — 审计已摄入内容、计算增量、推荐追加或重建操作
wiki-rebuild — 归档当前wiki、从头重建，或从归档恢复
wiki-ingest — 将源文档蒸馏为wiki页面
claude-history-ingest — 摄入Claude对话历史
codex-history-ingest — 摄入Codex CLI会话历史
data-ingest — 摄入任意原始文本数据
wiki-query — 基于wiki回答问题
wiki-lint — 审计和维护wiki健康度
wiki-setup — 初始化新的vault

llm-wiki

Original

Translation

LLM Wiki — Knowledge Distillation Pattern

LLM Wiki — 知识蒸馏模式

Three-Layer Architecture

三层架构

Layer 1: Raw Sources (immutable)

第一层：原始源数据（不可变）

Layer 2: The Wiki (LLM-maintained)

第二层：Wiki（LLM维护）

Layer 3: The Schema (this skill + config)

第三层：模式（本技能 + 配置）

Wiki Organization

Wiki组织

Categories

类别

Projects

项目

My Project

My Project

Key Concepts

核心概念

Related

相关内容

Special Files

特殊文件

index.md

index.md

Wiki Index

Wiki索引

Concepts

概念

Entities

实体

log.md

log.md

Log

日志

.manifest.json

.manifest.json

Page Template

页面模板

Page Title

页面标题

Key Ideas

核心观点

Open Questions

待解决问题

Sources

来源

Provenance Markers

来源标记

Retrieval Primitives

检索原语

Core Principles

核心原则

Environment Variables

环境变量

Modes of Operation

操作模式

Reference

参考

`index.md`

`index.md`

`log.md`

`log.md`

`.manifest.json`

`.manifest.json`