llm-wiki

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

LLM Wiki — Knowledge Distillation Pattern

LLM Wiki — 知识蒸馏模式

You are maintaining a persistent, compounding knowledge base. The wiki is not a chatbot — it is a compiled artifact where knowledge is distilled once and kept current, not re-derived on every query.
你正在维护一个持久化、可积累的知识库。这个wiki不是聊天机器人,它是一个编译产物,知识仅需蒸馏一次并保持更新,无需在每次查询时重新推导。

Three-Layer Architecture

三层架构

Layer 1: Raw Sources (immutable)

第一层:原始源数据(不可变)

The user's original documents — articles, papers, notes, PDFs, conversation logs, bookmarks, and images (screenshots, whiteboard photos, diagrams, slide captures). These are never modified by the system. They live wherever the user keeps them (configured via
OBSIDIAN_SOURCES_DIR
in
.env
). Images are first-class sources: the ingest skills read them via the Read tool's vision support and treat their interpreted content as inferred unless it's verbatim transcribed text. Image ingestion requires a vision-capable model — models without vision support should skip image sources and report which files were skipped.
Think of raw sources as the "source code" — authoritative but hard to query directly.
用户的原始文档——文章、论文、笔记、PDF、对话日志、书签、以及图片(截图、白板照片、示意图、幻灯片截图)。系统永远不会修改这些内容,它们存储在用户指定的位置(通过
.env
中的
OBSIDIAN_SOURCES_DIR
配置)。图片是一等源数据:摄入技能通过Read工具的视觉支持读取图片,将其解析内容视为推断内容,除非是逐字转录的文本。图片摄入需要支持视觉能力的模型,不支持视觉的模型应跳过图片源并报告跳过的文件。
可以把原始源数据想象成「源代码」:权威但难以直接查询。

Layer 2: The Wiki (LLM-maintained)

第二层:Wiki(LLM维护)

A collection of interconnected Obsidian-compatible markdown files organized by category. This is the compiled knowledge — synthesized, cross-referenced, and navigable. Each page has:
  • YAML frontmatter (title, category, tags, sources, timestamps)
  • Obsidian
    [[wikilinks]]
    connecting related concepts
  • Clear provenance — every claim traces back to a source
The wiki lives at the path configured via
OBSIDIAN_VAULT_PATH
in
.env
.
一系列相互关联、兼容Obsidian的markdown文件,按类别组织。这是编译后的知识:经过合成、交叉引用、可导航。每个页面包含:
  • YAML frontmatter(标题、类别、标签、来源、时间戳)
  • Obsidian
    [[wikilinks]]
    连接相关概念
  • 清晰的来源追溯——每条主张都可溯源到对应来源
wiki存储在
.env
OBSIDIAN_VAULT_PATH
配置的路径下。

Layer 3: The Schema (this skill + config)

第三层:模式(本技能 + 配置)

The rules governing how the wiki is structured — categories, conventions, page templates, and operational workflows. The schema tells the LLM how to maintain the wiki.
管控wiki结构的规则:类别、约定、页面模板、操作工作流。模式告诉LLM 如何 维护wiki。

Wiki Organization

Wiki组织

The vault has two levels of structure: categories (what kind of knowledge) and projects (where the knowledge came from).
vault有两层结构:类别(知识的类型)和项目(知识的来源)。

Categories

类别

Organize pages into these default categories (customizable in
.env
):
CategoryPurposeExample
concepts/
Ideas, theories, mental models
concepts/transformer-architecture.md
entities/
People, orgs, tools, projects
entities/andrej-karpathy.md
skills/
How-to knowledge, procedures
skills/fine-tuning-llms.md
references/
Summaries of specific sources
references/attention-is-all-you-need.md
synthesis/
Cross-cutting analysis across sources
synthesis/scaling-laws-debate.md
journal/
Timestamped observations, session logs
journal/2024-03-15.md
将页面归类到以下默认类别中(可在
.env
中自定义):
类别用途示例
concepts/
创意、理论、思维模型
concepts/transformer-architecture.md
entities/
人物、组织、工具、项目
entities/andrej-karpathy.md
skills/
操作指南类知识、流程
skills/fine-tuning-llms.md
references/
特定来源的摘要
references/attention-is-all-you-need.md
synthesis/
跨来源的交叉分析
synthesis/scaling-laws-debate.md
journal/
带时间戳的观察记录、会话日志
journal/2024-03-15.md

Projects

项目

Knowledge often belongs to a specific project. The
projects/
directory mirrors this:
$OBSIDIAN_VAULT_PATH/
├── projects/
│   ├── my-project/
│   │   ├── my-project.md      ← project overview (named after project)
│   │   ├── concepts/          ← project-scoped category pages
│   │   ├── skills/
│   │   └── ...
│   ├── another-project/
│   │   └── ...
│   └── side-project/
│       └── ...
├── concepts/                   ← global (cross-project) knowledge
├── entities/
├── skills/
└── ...
When knowledge is project-specific (a debugging technique that only applies to one codebase, a project-specific architecture decision), put it under
projects/<project-name>/<category>/
.
When knowledge is general (a concept like "React Server Components", a person like "Andrej Karpathy", a widely applicable skill), put it in the global category directory.
Cross-referencing: Project pages should
[[wikilink]]
to global pages and vice versa. A project's overview page should link to the key concept, skill, and entity pages relevant to that project — whether they live under the project or globally.
Naming rule: The project overview file must be named
<project-name>.md
, not
_project.md
. Obsidian's graph view uses the filename as the node label —
_project.md
makes every project appear as
_project
in the graph, making it unreadable. So
projects/my-project/my-project.md
,
projects/another-project/another-project.md
, etc.
Each project directory has an overview page structured like this:
markdown
---
title: My Project
category: project
tags: [ai, web, backend]
source_path: ~/.claude/projects/-Users-name-Documents-projects-my-project
created: 2026-03-01T00:00:00Z
updated: 2026-04-06T00:00:00Z
---
知识通常归属于特定项目,
projects/
目录对应这一属性:
$OBSIDIAN_VAULT_PATH/
├── projects/
│   ├── my-project/
│   │   ├── my-project.md      ← 项目概览(以项目名命名)
│   │   ├── concepts/          ← 项目级别的类别页面
│   │   ├── skills/
│   │   └── ...
│   ├── another-project/
│   │   └── ...
│   └── side-project/
│       └── ...
├── concepts/                   ← 全局(跨项目)知识
├── entities/
├── skills/
└── ...
当知识属于特定项目时(仅适用于某个代码库的调试技巧、项目专属的架构决策),将其放在
projects/<project-name>/<category>/
下。
当知识是通用内容时(比如「React Server Components」这类概念、「Andrej Karpathy」这类人物、广泛适用的技能),将其放在全局类别目录中。
交叉引用: 项目页面应通过
[[wikilink]]
链接到全局页面,反之亦然。项目概览页面需要链接到与该项目相关的核心概念、技能、实体页面,无论这些页面是在项目目录下还是全局目录下。
命名规则: 项目概览文件必须命名为
<project-name>.md
,不能用
_project.md
。Obsidian的关系图视图会将文件名作为节点标签,
_project.md
会导致所有项目在关系图中都显示为
_project
,无法识别。因此要按照
projects/my-project/my-project.md
projects/another-project/another-project.md
的规则命名。
每个项目目录的概览页面结构如下:
markdown
---
title: My Project
category: project
tags: [ai, web, backend]
source_path: ~/.claude/projects/-Users-name-Documents-projects-my-project
created: 2026-03-01T00:00:00Z
updated: 2026-04-06T00:00:00Z
---

My Project

My Project

One-paragraph summary of what this project is.
用一段文字总结该项目的内容。

Key Concepts

核心概念

  • [[concepts/some-api]] — used for core functionality
  • [[projects/my-project/concepts/main-architecture]] — project-specific architecture
  • [[concepts/some-api]] — 用于核心功能
  • [[projects/my-project/concepts/main-architecture]] — 项目专属架构

Related

相关内容

  • [[entities/some-service]] — deployment platform
undefined
  • [[entities/some-service]] — 部署平台
undefined

Special Files

特殊文件

Every wiki has these files at its root:
每个wiki的根目录下都包含以下文件:

index.md

index.md

A content-oriented catalog organized by category. Each entry has a one-line summary and tags. Rebuild this after every ingest operation. Format:
markdown
undefined
按类别组织的内容导向目录,每个条目包含一行简介和标签。每次摄入操作后都需要重建该文件,格式如下:
markdown
undefined

Wiki Index

Wiki索引

Concepts

概念

  • [[transformer-architecture]] — The dominant architecture for sequence modeling ( #ml #architecture)
  • [[attention-mechanism]] — Core building block of transformers ( #ml #fundamentals)
  • [[transformer-architecture]] — 序列建模的主流架构 ( #ml #architecture)
  • [[attention-mechanism]] — transformer的核心构建块 ( #ml #fundamentals)

Entities

实体

  • [[andrej-karpathy]] — AI researcher, educator, former Tesla AI director ( #person #ml)
**Format rule**: Add a space after the opening `(` and tags.
❌ Don't: `description (#tag)` — breaks tag parsing
✅ Do: `description ( #tag)` — proper spacing and tag parsing
  • [[andrej-karpathy]] — AI研究员、教育者、前特斯拉AI负责人 ( #person #ml)
**格式规则:** 在左括号`(`和标签之间添加空格。
❌ 错误写法:`description (#tag)` — 会导致标签解析失败
✅ 正确写法:`description ( #tag)` — 空格正确,标签可正常解析

log.md

log.md

Chronological append-only record tracking every operation. Each entry is parseable:
markdown
undefined
按时间顺序追加的操作记录,每条记录都可被解析:
markdown
undefined

Log

日志

  • [2024-03-15T10:30:00Z] INGEST source="papers/attention.pdf" pages_updated=12 pages_created=3
  • [2024-03-15T11:00:00Z] QUERY query="How do transformers handle long sequences?" result_pages=4
  • [2024-03-16T09:00:00Z] LINT issues_found=2 orphans=1 contradictions=1
  • [2024-03-17T10:00:00Z] ARCHIVE reason="rebuild" pages=87 destination="_archives/..."
  • [2024-03-17T10:05:00Z] REBUILD archived_to="_archives/..." previous_pages=87
undefined
  • [2024-03-15T10:30:00Z] INGEST source="papers/attention.pdf" pages_updated=12 pages_created=3
  • [2024-03-15T11:00:00Z] QUERY query="How do transformers handle long sequences?" result_pages=4
  • [2024-03-16T09:00:00Z] LINT issues_found=2 orphans=1 contradictions=1
  • [2024-03-17T10:00:00Z] ARCHIVE reason="rebuild" pages=87 destination="_archives/..."
  • [2024-03-17T10:05:00Z] REBUILD archived_to="_archives/..." previous_pages=87
undefined

.manifest.json

.manifest.json

Tracks every source file that has been ingested — path, timestamps, what wiki pages it produced. This is the backbone of the delta system. See the
wiki-status
skill for the full schema.
The manifest enables:
  • Delta computation — what's new or modified since last ingest
  • Append mode — only process the delta, not everything
  • Audit — which source produced which wiki page
  • Staleness detection — source changed but wiki page hasn't been updated
跟踪所有已摄入的源文件:路径、时间戳、生成的wiki页面。这是增量更新系统的核心,完整schema可参考
wiki-status
技能。
该清单支持以下能力:
  • 增量计算 — 上次摄入后新增或修改的内容
  • 追加模式 — 仅处理增量内容,而非全部内容
  • 审计 — 追溯某个wiki页面对应的来源
  • 过期检测 — 源文件已修改但wiki页面未更新

Page Template

页面模板

When creating a new wiki page, use this structure:
markdown
---
title: Page Title
category: concepts
tags: [ml, architecture]
aliases: [alternate name]
sources: [papers/attention.pdf]
summary: One or two sentences, ≤200 chars, so a reader (or another skill) can preview this page without opening it.
provenance:
  extracted: 0.72
  inferred: 0.25
  ambiguous: 0.03
created: 2024-03-15T10:30:00Z
updated: 2024-03-15T10:30:00Z
---
创建新wiki页面时,使用以下结构:
markdown
---
title: 页面标题
category: concepts
tags: [ml, architecture]
aliases: [别名]
sources: [papers/attention.pdf]
summary: 1-2句话,不超过200字符,方便用户或其他技能无需打开页面即可预览内容。
provenance:
  extracted: 0.72
  inferred: 0.25
  ambiguous: 0.03
created: 2024-03-15T10:30:00Z
updated: 2024-03-15T10:30:00Z
---

Page Title

页面标题

One-paragraph summary of what this page covers.
用一段文字总结该页面覆盖的内容。

Key Ideas

核心观点

  • The source's central claim, paraphrased directly.
  • A generalization the source implies but doesn't state outright. ^[inferred]
  • A figure two sources disagree on. ^[ambiguous]
Use [[wikilinks]] to connect to related pages.
  • 源文件核心主张的直接转述。
  • 源文件暗示但未直接说明的通用结论。 ^[inferred]
  • 两个来源存在分歧的数值。 ^[ambiguous]
使用[[wikilinks]]链接到相关页面。

Open Questions

待解决问题

Things that are unresolved or need more sources.
尚未解决或需要更多来源佐证的内容。

Sources

来源

  • [[references/attention-is-all-you-need]] — Original paper
undefined
  • [[references/attention-is-all-you-need]] — 原始论文
undefined

Provenance Markers

来源标记

Every claim on a wiki page has one of three provenance states. Mark them inline so the reader (and future ingest passes) can tell signal from synthesis.
StateMarkerMeaning
Extracted(no marker — default)A paraphrase of something a source actually says.
Inferred
^[inferred]
suffix
An LLM-synthesized claim — a connection, generalization, or implication the source doesn't state directly.
Ambiguous
^[ambiguous]
suffix
Sources disagree, or the source is unclear.
Example:
markdown
- Transformers parallelize across positions, unlike RNNs.
- This is why they scale better on modern hardware. ^[inferred]
- GPT-4 was trained on roughly 13T tokens. ^[ambiguous]
Why this syntax:
  • ^[...]
    is footnote-adjacent in Obsidian — renders cleanly and never collides with
    [[wikilinks]]
    .
  • Inline (suffix) so a single bullet stays a single bullet.
  • Default = extracted means existing pages without markers stay valid.
Frontmatter summary: Optionally surface the rough mix at the page level so the user can scan for speculation-heavy pages without reading them:
yaml
provenance:
  extracted: 0.72   # rough fraction of sentences/bullets with no marker
  inferred: 0.25
  ambiguous: 0.03
These are best-effort numbers written by the ingest skill at create/update time.
wiki-lint
recomputes them and flags drift. The block is optional — pages without it are treated as fully extracted by convention.
wiki页面上的每条主张都属于以下三种来源状态之一。在行内标注这些状态,方便读者和后续的摄入流程区分原始信息和合成内容。
状态标记含义
提取内容(无标记 — 默认)源文件实际表述内容的转述。
推断内容
^[inferred]
后缀
LLM合成的主张——源文件未直接说明的关联、通用结论或隐含信息。
存疑内容
^[ambiguous]
后缀
来源存在分歧,或源文件表述不清晰。
示例:
markdown
- Transformers可以跨位置并行计算,和RNN不同。
- 这是它在现代硬件上扩展性更好的原因。 ^[inferred]
- GPT-4大约使用13T token训练。 ^[ambiguous]
使用该语法的原因:
  • ^[...]
    和Obsidian的脚注语法接近,渲染效果干净,不会和
    [[wikilinks]]
    冲突。
  • 行内后缀形式,保证单个列表项保持完整。
  • 默认状态为提取内容,意味着没有标记的现有页面依然有效。
Frontmatter摘要: 可以选择在页面级别展示大致的内容占比,用户无需阅读页面即可快速识别推测内容较多的页面:
yaml
provenance:
  extracted: 0.72   # 无标记的句子/列表项的大致占比
  inferred: 0.25
  ambiguous: 0.03
这些是摄入技能在创建/更新页面时写入的估算数值,
wiki-lint
会重新计算并标记偏差。该模块是可选的,按照约定,没有该模块的页面会被视为全部是提取内容。

Retrieval Primitives

检索原语

Reading the vault is the dominant cost of every read-side skill. Use the cheapest primitive that can answer the question and escalate only when the cheaper one is insufficient. Any skill that needs content from the vault should follow this table rather than jumping straight to full-page reads.
NeedPrimitiveRelative cost
Does a page exist? What's its title/category/tags?Read
index.md
;
Grep
frontmatter blocks (scope with a pattern that targets
^---
blocks at file heads)
Cheapest
1–2 sentence preview of a pageRead the
summary:
field in its frontmatter
Cheap
A specific claim or section inside a page
Grep -A <n> -B <n> "<term>" <file>
— returns only the matching lines plus context
Medium
Whole-page content
Read <file>
Expensive — last resort
Relationships across pages
Grep "\[\[.*?\]\]"
across the vault, or walk wikilinks from a known page
Case-by-case
The rule: escalate only when the cheaper primitive can't answer the question. If you can answer from
summary:
fields alone, don't read page bodies. If a grepped section with
-A 10 -B 2
gives you the claim, don't read the whole page. A 500-line page opened to read 15 lines is 485 lines of wasted tokens.
Why this matters: a 20-page vault lets you get away with full-vault scans. A 200-page vault does not. The primitives above are how the skills framework scales to large vaults without a database.
Skills that consume this table:
wiki-query
,
cross-linker
,
wiki-lint
,
wiki-status
(insights mode). Any new skill that reads the vault should cite this section rather than reinvent the pattern.
读取vault是所有读侧技能的主要开销。使用能满足需求的最低开销原语,仅当低开销原语不足以满足需求时再升级。任何需要从vault获取内容的技能都应遵循下表,而不是直接读取全页内容。
需求原语相对开销
页面是否存在?它的标题/类别/标签是什么?读取
index.md
Grep
frontmatter块(使用匹配文件头部
^---
块的规则限定范围)
最低
获取页面的1-2句预览内容读取frontmatter中的
summary:
字段
页面内的特定主张或章节
Grep -A <n> -B <n> "<term>" <file>
— 仅返回匹配行及上下文
中等
全页内容
Read <file>
— 最后手段
跨页面的关系全vault
Grep "\[\[.*?\]\]"
,或从已知页面遍历wikilinks
视情况而定
规则: 仅当低开销原语无法回答问题时再升级。如果仅通过
summary:
字段就能回答,就不要读取页面正文。如果带
-A 10 -B 2
的grep结果就能获取需要的主张,就不要读取整个页面。为了读取15行内容打开一个500行的页面,会浪费485行的token。
重要性: 20页的vault可以接受全量扫描,但200页的vault不行。上述原语是技能框架无需数据库即可支撑大型vault的核心。
使用该规则的技能:
wiki-query
cross-linker
wiki-lint
wiki-status
(洞察模式)。任何读取vault的新技能都应参考本章节,不要重复造轮子。

Core Principles

核心原则

  1. Compile, don't retrieve. The wiki is pre-compiled knowledge. When you ingest a source, update every relevant page — don't just create a summary of the source.
  2. Compound over time. Each ingest should make the wiki smarter, not just bigger. Merge new information into existing pages, resolve contradictions, strengthen cross-references.
  3. Provenance matters. Every claim should trace to a source. When updating a page, note which source prompted the update.
  4. Mark inferences. Default sentences are extracted. Mark synthesized claims with
    ^[inferred]
    and contested claims with
    ^[ambiguous]
    . A wiki that hides its guessing rots silently; one that marks it stays trustworthy.
  5. Human curates, LLM maintains. The human decides what sources to add and what questions to ask. The LLM handles the bookkeeping — updating cross-references, maintaining consistency, noting contradictions.
  6. Obsidian is the IDE. The user browses and explores the wiki in Obsidian. Everything must be valid Obsidian markdown with working wikilinks.
  1. 编译,而非检索。 wiki是预编译的知识。摄入源文件时,更新所有相关页面,不要只创建源文件的摘要。
  2. 随时间持续积累。 每次摄入都应该让wiki更智能,而不仅仅是体积更大。将新信息合并到现有页面,解决矛盾,强化交叉引用。
  3. 来源追溯很重要。 每条主张都可以溯源到对应的来源。更新页面时,标注触发更新的来源。
  4. 标记推断内容。 默认句子为提取内容,合成主张用
    ^[inferred]
    标记,存在争议的主张用
    ^[ambiguous]
    标记。隐藏猜测内容的wiki会默默失效,标记猜测内容的wiki才能保持可信度。
  5. 人类负责 curated,LLM负责维护。 人类决定添加哪些来源、提出哪些问题,LLM负责记录工作:更新交叉引用、保持一致性、标注矛盾。
  6. Obsidian是IDE。 用户在Obsidian中浏览和探索wiki,所有内容都必须是有效的Obsidian markdown,带有可正常工作的wikilinks。

Environment Variables

环境变量

The wiki is configured through environment variables (see
.env.example
). The only required variable is the vault path — everything else has sensible defaults.
  • OBSIDIAN_VAULT_PATH
    — Where the wiki lives (required)
  • OBSIDIAN_SOURCES_DIR
    — Where raw source documents are
  • OBSIDIAN_CATEGORIES
    — Comma-separated list of categories
  • CLAUDE_HISTORY_PATH
    — Where to find Claude conversation data
No API keys are needed — the agent running these skills already has LLM access built in.
wiki通过环境变量配置(参考
.env.example
)。唯一必填的变量是vault路径,其他所有配置都有合理的默认值。
  • OBSIDIAN_VAULT_PATH
    — wiki的存储路径 (必填)
  • OBSIDIAN_SOURCES_DIR
    — 原始源文档的存储路径
  • OBSIDIAN_CATEGORIES
    — 逗号分隔的类别列表
  • CLAUDE_HISTORY_PATH
    — Claude对话数据的存储路径
不需要API密钥——运行这些技能的Agent已经内置了LLM访问权限。

Modes of Operation

操作模式

The wiki supports three ingest modes:
ModeWhen to useWhat happens
AppendSmall delta, incremental updatesCompute delta via manifest, ingest only new/modified sources
RebuildMajor drift, fresh start neededArchive current wiki to
_archives/
, clear, reprocess all sources
RestoreNeed to go backBring back a previous archive
Use
wiki-status
to see the delta and get a recommendation. Use
wiki-rebuild
for archive/rebuild/restore operations.
wiki支持三种摄入模式:
模式使用场景操作内容
追加少量增量、渐进式更新通过清单计算增量,仅摄入新增/修改的源文件
重建偏差较大、需要全新开始将当前wiki归档到
_archives/
,清空后重新处理所有源文件
恢复需要回退到历史版本恢复之前的归档版本
使用
wiki-status
查看增量并获取推荐操作,使用
wiki-rebuild
执行归档/重建/恢复操作。

Reference

参考

For details on specific operations, see the companion skills:
  • wiki-status — Audit what's ingested, compute delta, recommend append vs rebuild
  • wiki-rebuild — Archive current wiki, rebuild from scratch, or restore from archive
  • wiki-ingest — Distill source documents into wiki pages
  • claude-history-ingest — Ingest Claude conversation history
  • codex-history-ingest — Ingest Codex CLI session history
  • data-ingest — Ingest any raw text data
  • wiki-query — Answer questions against the wiki
  • wiki-lint — Audit and maintain wiki health
  • wiki-setup — Initialize a new vault
具体操作的细节可参考配套技能:
  • wiki-status — 审计已摄入内容、计算增量、推荐追加或重建操作
  • wiki-rebuild — 归档当前wiki、从头重建,或从归档恢复
  • wiki-ingest — 将源文档蒸馏为wiki页面
  • claude-history-ingest — 摄入Claude对话历史
  • codex-history-ingest — 摄入Codex CLI会话历史
  • data-ingest — 摄入任意原始文本数据
  • wiki-query — 基于wiki回答问题
  • wiki-lint — 审计和维护wiki健康度
  • wiki-setup — 初始化新的vault