understand-knowledge

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

/understand-knowledge

Analyzes a Karpathy-pattern LLM wiki — a three-layer knowledge base with raw sources, wiki markdown, and a schema file — and produces an interactive knowledge graph dashboard.

分析Karpathy模式的LLM维基知识库——一种包含原始数据源、wiki markdown文件和schema文件的三层知识库——并生成交互式知识图谱仪表盘。

What It Detects

检测内容

The Karpathy LLM wiki pattern (see https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f):

Raw sources — immutable source documents (articles, papers, data files)
Wiki — LLM-generated markdown files with wikilinks (
```
[[target]]
```
syntax)
Schema — CLAUDE.md, AGENTS.md, or similar configuration file
index.md — content catalog organized by categories
log.md — chronological operation log

Detection signals: has

index.md

+ multiple

.md

files with wikilinks. May have

raw/

directory and schema file.

Karpathy LLM维基模式（详见：https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f）：

原始数据源 —— 不可修改的源文档（文章、论文、数据文件）
Wiki —— LLM生成的带维基链接（
```
[[target]]
```
语法）的markdown文件
Schema —— CLAUDE.md、AGENTS.md或类似配置文件
index.md —— 按类别组织的内容目录
log.md —— 按时间顺序记录的操作日志

检测特征：包含

index.md

+ 多个带维基链接的

.md

文件，可能包含

raw/

目录和schema文件。

Instructions

操作步骤

Phase 1: DETECT

阶段1：检测

Determine the target directory:
- If the user provided a path argument, use that
- Otherwise, use the current working directory
Run the format detection script bundled with this skill:
```
python3 <SKILL_DIR>/parse-knowledge-base.py <TARGET_DIR>
```
- If the script exits with an error, tell the user this doesn't appear to be a Karpathy-pattern wiki and explain what was expected
- If successful, proceed. The script writes
```
scan-manifest.json
```
  to
```
<TARGET_DIR>/.understand-anything/intermediate/
```
Read the scan-manifest.json and announce the results:
- "Detected Karpathy wiki: N articles, N sources, N topics, N wikilinks (N unresolved)"
- List the categories found from index.md

确定目标目录：
- 如果用户提供了路径参数，使用该路径
- 否则，使用当前工作目录
运行本技能附带的格式检测脚本：
```
python3 <SKILL_DIR>/parse-knowledge-base.py <TARGET_DIR>
```
- 如果脚本报错，告知用户当前目录不符合Karpathy模式维基的特征，并说明预期格式
- 如果检测成功，继续下一步。脚本会将
```
scan-manifest.json
```
  写入
```
<TARGET_DIR>/.understand-anything/intermediate/
```
  目录
读取scan-manifest.json并告知检测结果：
- 格式示例：“检测到Karpathy维基知识库：N篇文章、N个数据源、N个主题、N个维基链接（其中N个未解析）”
- 列出从index.md中识别到的所有类别

Phase 2: SCAN (already done)

阶段2：扫描（已完成）

The parse script in Phase 1 already performed the deterministic scan. The scan-manifest.json contains:

Article nodes (one per wiki .md file) with extracted wikilinks, headings, frontmatter
Source nodes (one per raw/ file)
Topic nodes (from index.md section headings)
```
related
```
edges (from wikilinks)
```
categorized_under
```
edges (from index.md sections)

No additional scanning is needed. Proceed to Phase 3.

阶段1中的解析脚本已完成确定性扫描。scan-manifest.json包含以下内容：

文章节点（每个wiki .md文件对应一个节点），包含提取的维基链接、标题、前置元数据
数据源节点（每个raw/目录下的文件对应一个节点）
主题节点（来自index.md的章节标题）
```
related
```
关联边（来自维基链接）
```
categorized_under
```
分类边（来自index.md的章节）

无需额外扫描，直接进入阶段3。

Phase 3: ANALYZE

阶段3：分析

Dispatch

article-analyzer

subagents to extract implicit knowledge:

Read the scan-manifest.json to get the article list
Prepare batches of 10-15 articles each, grouped by category when possible (articles in the same category are more likely to have implicit cross-references)
For each batch, dispatch an
```
article-analyzer
```
subagent with:
- The batch of articles (id, name, summary, wikilinks, category, content from knowledgeMeta)
- The full list of existing node IDs (so the agent can reference them)
- The batch number for output file naming
- The intermediate directory path:
```
$INTERMEDIATE_DIR = <TARGET_DIR>/.understand-anything/intermediate
```
The agent will write
```
analysis-batch-{N}.json
```
to the intermediate directory.
Run up to 3 batches concurrently. Wait for all batches to complete.
If any batch fails, log a warning but continue — the scan-manifest provides a solid base graph even without LLM analysis.

调用

article-analyzer

子代理提取隐式知识：

读取scan-manifest.json获取文章列表
将文章按每10-15篇分为一组，尽可能按类别分组（同一类别的文章更可能存在隐式交叉引用）
为每组文章调用
```
article-analyzer
```
子代理，传入以下参数：
- 该组文章的信息（ID、名称、摘要、维基链接、类别、来自knowledgeMeta的内容）
- 所有现有节点ID的完整列表（以便代理可以引用）
- 批次编号（用于输出文件命名）
- 中间目录路径：
```
$INTERMEDIATE_DIR = <TARGET_DIR>/.understand-anything/intermediate
```
代理会将
```
analysis-batch-{N}.json
```
写入中间目录
最多同时运行3个批次，等待所有批次完成
如果某个批次失败，记录警告但继续执行——即使没有LLM分析，scan-manifest也能提供基础的知识图谱

Phase 4: MERGE

阶段4：合并

Run the merge script bundled with this skill:

python3 <SKILL_DIR>/merge-knowledge-graph.py <TARGET_DIR>

The script:
- Combines scan-manifest.json + all analysis-batch-*.json files
- Deduplicates entities (case-insensitive name matching)
- Normalizes node/edge types via alias maps
- Builds layers from index.md categories
- Builds a tour from index.md section ordering
- Writes
```
assembled-graph.json
```
  to the intermediate directory
Read the merge report from stderr and announce:
- Total nodes, edges, layers, tour steps
- How many entities/claims the LLM analysis added

运行本技能附带的合并脚本：

python3 <SKILL_DIR>/merge-knowledge-graph.py <TARGET_DIR>

脚本执行以下操作：
- 合并scan-manifest.json与所有analysis-batch-*.json文件
- 去重实体（名称匹配不区分大小写）
- 通过别名映射标准化节点/边类型
- 基于index.md的类别构建图谱层级
- 基于index.md的章节顺序构建浏览路径
- 将
```
assembled-graph.json
```
  写入中间目录
读取stderr中的合并报告并告知结果：
- 总节点数、边数、层级数、浏览路径步数
- LLM分析新增的实体/声明数量

Phase 5: SAVE

阶段5：保存

Read the assembled-graph.json
Run basic validation:
- Every edge source/target must reference an existing node
- Every node must have: id, type, name, summary, tags, complexity
- Remove any edges with dangling references

Copy the validated graph to

<TARGET_DIR>/.understand-anything/knowledge-graph.json

Write metadata to

<TARGET_DIR>/.understand-anything/meta.json

json

{
  "lastAnalyzedAt": "<ISO timestamp>",
  "gitCommitHash": "<from git rev-parse HEAD or empty>",
  "version": "1.0.0",
  "analyzedFiles": <number of wiki articles>
}

Clean up intermediate files:

rm -rf <TARGET_DIR>/.understand-anything/intermediate

Report summary to the user:
- "Knowledge graph saved: N articles, N entities, N topics, N claims, N sources"
- "N edges (N wikilink, N categorized, N implicit)"
- "N layers, N tour steps"
Auto-trigger the dashboard:
```
/understand-dashboard <TARGET_DIR>
```

读取assembled-graph.json
执行基础验证：
- 每条边的源节点/目标节点必须指向已存在的节点
- 每个节点必须包含：id、type、name、summary、tags、complexity字段
- 删除所有指向无效节点的边

将验证后的图谱复制到

<TARGET_DIR>/.understand-anything/knowledge-graph.json

将元数据写入

<TARGET_DIR>/.understand-anything/meta.json

：

json

{
  "lastAnalyzedAt": "<ISO时间戳>",
  "gitCommitHash": "<来自git rev-parse HEAD，无则为空>",
  "version": "1.0.0",
  "analyzedFiles": <wiki文章数量>
}

清理中间文件：

rm -rf <TARGET_DIR>/.understand-anything/intermediate

向用户报告总结信息：
- 格式示例：“知识图谱已保存：N篇文章、N个实体、N个主题、N个声明、N个数据源”
- “N条边（其中N条来自维基链接、N条来自分类、N条为隐式关系）”
- “N个层级、N个浏览路径步数”
自动触发仪表盘：
```
/understand-dashboard <TARGET_DIR>
```

Notes

注意事项

The parse script handles ALL deterministic extraction (wikilinks, headings, frontmatter, categories from index.md). The LLM agents only add implicit knowledge that requires inference.
Categories and taxonomy come from index.md section headings, NOT from filename prefixes. The Karpathy spec is intentionally abstract about naming conventions.
The graph uses
```
kind: "knowledge"
```
to signal the dashboard to use force-directed layout instead of hierarchical dagre.
Source nodes from raw/ are lightweight (filename + size only) — we don't parse PDFs or binary files.

解析脚本负责所有确定性提取工作（维基链接、标题、前置元数据、来自index.md的类别），LLM代理仅添加需要推理的隐式知识
类别和分类体系来自index.md的章节标题，而非文件名前缀。Karpathy规范对命名约定故意设计得较为灵活
图谱使用
```
kind: "knowledge"
```
标记，用于告知仪表盘采用力导向布局而非分层dagre布局
raw/目录下的数据源节点仅包含基础信息（文件名+大小）——我们不会解析PDF或二进制文件