understand-knowledge
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinese/understand-knowledge
/understand-knowledge
Analyzes a Karpathy-pattern LLM wiki — a three-layer knowledge base with raw sources, wiki markdown, and a schema file — and produces an interactive knowledge graph dashboard.
分析Karpathy模式的LLM维基知识库——一种包含原始数据源、wiki markdown文件和schema文件的三层知识库——并生成交互式知识图谱仪表盘。
What It Detects
检测内容
The Karpathy LLM wiki pattern (see https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f):
- Raw sources — immutable source documents (articles, papers, data files)
- Wiki — LLM-generated markdown files with wikilinks (syntax)
[[target]] - Schema — CLAUDE.md, AGENTS.md, or similar configuration file
- index.md — content catalog organized by categories
- log.md — chronological operation log
Detection signals: has + multiple files with wikilinks. May have directory and schema file.
index.md.mdraw/Karpathy LLM维基模式(详见:https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f):
- 原始数据源 —— 不可修改的源文档(文章、论文、数据文件)
- Wiki —— LLM生成的带维基链接(语法)的markdown文件
[[target]] - Schema —— CLAUDE.md、AGENTS.md或类似配置文件
- index.md —— 按类别组织的内容目录
- log.md —— 按时间顺序记录的操作日志
检测特征:包含 + 多个带维基链接的文件,可能包含目录和schema文件。
index.md.mdraw/Instructions
操作步骤
Phase 1: DETECT
阶段1:检测
-
Determine the target directory:
- If the user provided a path argument, use that
- Otherwise, use the current working directory
-
Run the format detection script bundled with this skill:
python3 <SKILL_DIR>/parse-knowledge-base.py <TARGET_DIR>- If the script exits with an error, tell the user this doesn't appear to be a Karpathy-pattern wiki and explain what was expected
- If successful, proceed. The script writes to
scan-manifest.json<TARGET_DIR>/.understand-anything/intermediate/
-
Read the scan-manifest.json and announce the results:
- "Detected Karpathy wiki: N articles, N sources, N topics, N wikilinks (N unresolved)"
- List the categories found from index.md
-
确定目标目录:
- 如果用户提供了路径参数,使用该路径
- 否则,使用当前工作目录
-
运行本技能附带的格式检测脚本:
python3 <SKILL_DIR>/parse-knowledge-base.py <TARGET_DIR>- 如果脚本报错,告知用户当前目录不符合Karpathy模式维基的特征,并说明预期格式
- 如果检测成功,继续下一步。脚本会将写入
scan-manifest.json目录<TARGET_DIR>/.understand-anything/intermediate/
-
读取scan-manifest.json并告知检测结果:
- 格式示例:“检测到Karpathy维基知识库:N篇文章、N个数据源、N个主题、N个维基链接(其中N个未解析)”
- 列出从index.md中识别到的所有类别
Phase 2: SCAN (already done)
阶段2:扫描(已完成)
The parse script in Phase 1 already performed the deterministic scan. The scan-manifest.json contains:
- Article nodes (one per wiki .md file) with extracted wikilinks, headings, frontmatter
- Source nodes (one per raw/ file)
- Topic nodes (from index.md section headings)
- edges (from wikilinks)
related - edges (from index.md sections)
categorized_under
No additional scanning is needed. Proceed to Phase 3.
阶段1中的解析脚本已完成确定性扫描。scan-manifest.json包含以下内容:
- 文章节点(每个wiki .md文件对应一个节点),包含提取的维基链接、标题、前置元数据
- 数据源节点(每个raw/目录下的文件对应一个节点)
- 主题节点(来自index.md的章节标题)
- 关联边(来自维基链接)
related - 分类边(来自index.md的章节)
categorized_under
无需额外扫描,直接进入阶段3。
Phase 3: ANALYZE
阶段3:分析
Dispatch subagents to extract implicit knowledge:
article-analyzer-
Read the scan-manifest.json to get the article list
-
Prepare batches of 10-15 articles each, grouped by category when possible (articles in the same category are more likely to have implicit cross-references)
-
For each batch, dispatch ansubagent with:
article-analyzer- The batch of articles (id, name, summary, wikilinks, category, content from knowledgeMeta)
- The full list of existing node IDs (so the agent can reference them)
- The batch number for output file naming
- The intermediate directory path:
$INTERMEDIATE_DIR = <TARGET_DIR>/.understand-anything/intermediate
The agent will writeto the intermediate directory.analysis-batch-{N}.json -
Run up to 3 batches concurrently. Wait for all batches to complete.
-
If any batch fails, log a warning but continue — the scan-manifest provides a solid base graph even without LLM analysis.
调用子代理提取隐式知识:
article-analyzer-
读取scan-manifest.json获取文章列表
-
将文章按每10-15篇分为一组,尽可能按类别分组(同一类别的文章更可能存在隐式交叉引用)
-
为每组文章调用子代理,传入以下参数:
article-analyzer- 该组文章的信息(ID、名称、摘要、维基链接、类别、来自knowledgeMeta的内容)
- 所有现有节点ID的完整列表(以便代理可以引用)
- 批次编号(用于输出文件命名)
- 中间目录路径:
$INTERMEDIATE_DIR = <TARGET_DIR>/.understand-anything/intermediate
代理会将写入中间目录analysis-batch-{N}.json -
最多同时运行3个批次,等待所有批次完成
-
如果某个批次失败,记录警告但继续执行——即使没有LLM分析,scan-manifest也能提供基础的知识图谱
Phase 4: MERGE
阶段4:合并
-
Run the merge script bundled with this skill:
python3 <SKILL_DIR>/merge-knowledge-graph.py <TARGET_DIR> -
The script:
- Combines scan-manifest.json + all analysis-batch-*.json files
- Deduplicates entities (case-insensitive name matching)
- Normalizes node/edge types via alias maps
- Builds layers from index.md categories
- Builds a tour from index.md section ordering
- Writes to the intermediate directory
assembled-graph.json
-
Read the merge report from stderr and announce:
- Total nodes, edges, layers, tour steps
- How many entities/claims the LLM analysis added
-
运行本技能附带的合并脚本:
python3 <SKILL_DIR>/merge-knowledge-graph.py <TARGET_DIR> -
脚本执行以下操作:
- 合并scan-manifest.json与所有analysis-batch-*.json文件
- 去重实体(名称匹配不区分大小写)
- 通过别名映射标准化节点/边类型
- 基于index.md的类别构建图谱层级
- 基于index.md的章节顺序构建浏览路径
- 将写入中间目录
assembled-graph.json
-
读取stderr中的合并报告并告知结果:
- 总节点数、边数、层级数、浏览路径步数
- LLM分析新增的实体/声明数量
Phase 5: SAVE
阶段5:保存
-
Read the assembled-graph.json
-
Run basic validation:
- Every edge source/target must reference an existing node
- Every node must have: id, type, name, summary, tags, complexity
- Remove any edges with dangling references
-
Copy the validated graph to
<TARGET_DIR>/.understand-anything/knowledge-graph.json -
Write metadata to:
<TARGET_DIR>/.understand-anything/meta.jsonjson{ "lastAnalyzedAt": "<ISO timestamp>", "gitCommitHash": "<from git rev-parse HEAD or empty>", "version": "1.0.0", "analyzedFiles": <number of wiki articles> } -
Clean up intermediate files:
rm -rf <TARGET_DIR>/.understand-anything/intermediate -
Report summary to the user:
- "Knowledge graph saved: N articles, N entities, N topics, N claims, N sources"
- "N edges (N wikilink, N categorized, N implicit)"
- "N layers, N tour steps"
-
Auto-trigger the dashboard:
/understand-dashboard <TARGET_DIR>
-
读取assembled-graph.json
-
执行基础验证:
- 每条边的源节点/目标节点必须指向已存在的节点
- 每个节点必须包含:id、type、name、summary、tags、complexity字段
- 删除所有指向无效节点的边
-
将验证后的图谱复制到
<TARGET_DIR>/.understand-anything/knowledge-graph.json -
将元数据写入:
<TARGET_DIR>/.understand-anything/meta.jsonjson{ "lastAnalyzedAt": "<ISO时间戳>", "gitCommitHash": "<来自git rev-parse HEAD,无则为空>", "version": "1.0.0", "analyzedFiles": <wiki文章数量> } -
清理中间文件:
rm -rf <TARGET_DIR>/.understand-anything/intermediate -
向用户报告总结信息:
- 格式示例:“知识图谱已保存:N篇文章、N个实体、N个主题、N个声明、N个数据源”
- “N条边(其中N条来自维基链接、N条来自分类、N条为隐式关系)”
- “N个层级、N个浏览路径步数”
-
自动触发仪表盘:
/understand-dashboard <TARGET_DIR>
Notes
注意事项
- The parse script handles ALL deterministic extraction (wikilinks, headings, frontmatter, categories from index.md). The LLM agents only add implicit knowledge that requires inference.
- Categories and taxonomy come from index.md section headings, NOT from filename prefixes. The Karpathy spec is intentionally abstract about naming conventions.
- The graph uses to signal the dashboard to use force-directed layout instead of hierarchical dagre.
kind: "knowledge" - Source nodes from raw/ are lightweight (filename + size only) — we don't parse PDFs or binary files.
- 解析脚本负责所有确定性提取工作(维基链接、标题、前置元数据、来自index.md的类别),LLM代理仅添加需要推理的隐式知识
- 类别和分类体系来自index.md的章节标题,而非文件名前缀。Karpathy规范对命名约定故意设计得较为灵活
- 图谱使用标记,用于告知仪表盘采用力导向布局而非分层dagre布局
kind: "knowledge" - raw/目录下的数据源节点仅包含基础信息(文件名+大小)——我们不会解析PDF或二进制文件