cross-linker

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Cross-Linker — Automated Wiki Cross-Referencing

Cross-Linker — 自动化Wiki交叉引用工具

You are weaving the wiki's knowledge graph tighter by finding and inserting missing
[[wikilinks]]
between pages that should reference each other but currently don't.
Follow the Retrieval Primitives table in
llm-wiki/SKILL.md
.
Build the registry in Step 1 by grepping frontmatter only (not full pages). Reserve full
Read
for the unlinked-mention detection pass, and even there, only read pages whose summaries/titles make them plausible link targets. Blind full-vault reads are what this framework exists to avoid.
你正在通过查找并插入页面之间缺失的
[[wikilinks]]
(这些页面本应相互引用但目前没有),来进一步完善wiki的knowledge graph。
请遵循
llm-wiki/SKILL.md
中的检索原语表。
第一步构建注册表时仅grep frontmatter内容(不要扫描完整页面)。把完整的
Read
操作留给未链接提及检测环节,即便在该环节,也仅读取摘要/标题看起来是合理链接目标的页面。本框架就是为了避免盲目读取整个 vault 的内容而设计的。

Before You Start

开始之前

  1. Read
    .env
    to get
    OBSIDIAN_VAULT_PATH
  2. Read
    index.md
    to get the full inventory of pages and their one-line descriptions
  3. Skim
    log.md
    to see what was recently ingested (focus linking effort on new pages)
  1. 读取
    .env
    获取
    OBSIDIAN_VAULT_PATH
  2. 读取
    index.md
    获取页面完整清单及其单行描述
  3. 浏览
    log.md
    查看最近导入的内容(将链接工作的重点放在新页面上)

Step 1: Build the Page Registry

步骤1:构建页面注册表

Glob all
.md
files in the vault (excluding
_archives/
,
.obsidian/
). For each page, extract:
  • Filename (without
    .md
    ) — this is the wikilink target
  • Title from frontmatter
  • Aliases from frontmatter (if any)
  • Tags from frontmatter
  • Category from frontmatter or directory inference
  • One-line summary — first sentence or
    title
    field
Build a lookup table:
page_name → { path, title, aliases, tags, summary }
This is your "vocabulary" — every entry in this table is a valid wikilink target.
全局匹配vault中所有
.md
文件(排除
_archives/
.obsidian/
目录)。为每个页面提取以下信息:
  • 文件名(不带
    .md
    后缀)—— 这是wikilink的目标
  • Title:来自frontmatter
  • Aliases:来自frontmatter(如果有的话)
  • Tags:来自frontmatter
  • Category:来自frontmatter或通过目录推断
  • 单行摘要:第一句话或
    title
    字段
构建一个查找表:
page_name → { path, title, aliases, tags, summary }
这就是你的“词汇表”——表中的每个条目都是有效的wikilink目标。

Step 2: Scan for Missing Links

步骤2:扫描缺失的链接

For each page in the vault:
  1. Read the full content
  2. Extract existing wikilinks — find all
    [[...]]
    references already present
  3. Search for unlinked mentions — check if the page's text contains any of these, without being wrapped in
    [[...]]
    :
    • Page filenames (e.g., the word "MyProject" appears but
      [[projects/my-project/my-project]]
      is missing)
    • Page titles from frontmatter
    • Aliases from frontmatter
    • Entity names, project names, concept names from the registry
  4. Check for semantic connections — pages that share multiple tags or are in the same project directory but don't link to each other
针对vault中的每个页面:
  1. 读取完整内容
  2. 提取现有wikilinks——查找所有已经存在的
    [[...]]
    引用
  3. 搜索未链接的提及内容——检查页面文本中是否包含以下内容,但没有被
    [[...]]
    包裹:
    • 页面文件名(例如出现了单词“MyProject”,但缺少
      [[projects/my-project/my-project]]
      链接)
    • frontmatter中的页面标题
    • frontmatter中的别名
    • 注册表中的实体名、项目名、概念名
  4. 检查语义关联——共享多个标签或属于同一项目目录但没有相互链接的页面

Matching Rules

匹配规则

  • Case-insensitive matching for names (e.g., "my-project" matches page
    MyProject
    )
  • Skip self-references — a page shouldn't link to itself
  • Skip common words — don't link "the", "and", generic terms. Only match on distinctive names
  • Prefer the shortest unambiguous wikilink path — use
    [[page-name]]
    not
    [[full/path/to/page-name]]
    when the name is unique across the vault
  • Don't link inside code blocks or frontmatter
  • Don't double-link — if
    [[foo]]
    already appears on the page, don't add another
  • 名称匹配不区分大小写(例如“my-project”匹配页面
    MyProject
  • 跳过自引用——页面不应该链接到自身
  • 跳过通用词汇——不要链接“the”、“and”这类通用术语,仅匹配具有辨识度的名称
  • 优先使用最短的无歧义wikilink路径——当名称在整个vault中唯一时,使用
    [[page-name]]
    而非
    [[full/path/to/page-name]]
  • 不要在代码块或frontmatter内部添加链接
  • 不要重复链接——如果页面上已经出现了
    [[foo]]
    ,不要再添加第二个

Step 3: Score and Rank Suggestions

步骤3:对建议链接进行评分和排序

Not every possible link is worth adding. Score each candidate using a composite signal, then tag it with a confidence label.
不是所有可能的链接都值得添加。使用复合信号对每个候选链接进行评分,然后为其打上置信度标签。

Scoring

评分规则

SignalPointsExample
Exact name match in text+4"MyProject" appears in body text → link to my-project.md
Shared tags (2+)+2Both tagged
#ai #agent
but no link between them
Same project, no link+2Both under
projects/my-project/
but don't reference each other
Mentioned entity/concept+2Page mentions "knowledge graphs" → link to
[[concepts/knowledge-graphs]]
Cross-category connection+2Source is in
concepts/
, target is in
entities/
(or
skills/
synthesis/
) — different knowledge layers make this link more architecturally valuable
Peripheral→hub reach+2Source page has ≤ 2 total links (peripheral) but target has ≥ 8 (hub) — connecting a loose page to a load-bearing concept
Partial name match+1"graph" appears but page is
knowledge-graphs
— plausible but ambiguous
信号分值示例
文本中精确匹配名称+4正文出现“MyProject” → 链接到my-project.md
共享标签(≥2个)+2两个页面都打了
#ai #agent
标签但没有相互链接
同属一个项目且无链接+2两个页面都在
projects/my-project/
目录下但没有相互引用
提及的实体/概念+2页面提到“knowledge graphs” → 链接到
[[concepts/knowledge-graphs]]
跨类别关联+2源页面在
concepts/
目录下,目标页面在
entities/
目录下(或
skills/
synthesis/
)——不同知识层的链接具有更高的架构价值
边缘→中心节点连接+2源页面总链接数≤2(边缘节点),而目标页面总链接数≥8(中心节点)——将松散页面关联到核心概念
部分名称匹配+1出现“graph”,但对应页面是
knowledge-graphs
——看似合理但存在歧义

Confidence labels

置信度标签

Tag each candidate with a confidence label based on its score:
ScoreLabelAction
≥ 6EXTRACTEDLink is effectively certain — exact mention or very strong match. Apply inline.
3–5INFERREDLink is a reasonable inference — shared context, cross-category, peripheral→hub. Apply inline or as Related section.
1–2AMBIGUOUSWeak or partial match. Skip unless user specifically asks to connect loose pages.
Only act on EXTRACTED and INFERRED candidates. Include the confidence label in the Cross-Link Report so the user can review INFERRED links before trusting them.
根据得分给每个候选链接打上置信度标签:
得分标签操作
≥ 6EXTRACTED链接几乎是确定的——精确提及或匹配度极高,直接在正文中插入
3–5INFERRED链接是合理推断的结果——共享上下文、跨类别、边缘→中心连接,可插入正文或放在相关内容板块
1–2AMBIGUOUS匹配度弱或存在歧义,除非用户明确要求关联松散页面,否则跳过
仅处理EXTRACTEDINFERRED类别的候选链接。在交叉链接报告中包含置信度标签,方便用户在信任INFERRED类链接之前进行审核。

Step 4: Apply Links

步骤4:添加链接

For each page with missing links:
针对每个存在缺失链接的页面:

4a: Inline linking (preferred)

4a: 正文内链接(优先方式)

Find the first natural mention of the term in the body text and wrap it in wikilinks:
Before:
markdown
This project uses knowledge graphs to connect entities.
After:
markdown
This project uses [[concepts/knowledge-graphs|knowledge graphs]] to connect entities.
Use the
[[path|display text]]
format when the wikilink path differs from the display text.
找到术语在正文中第一次自然出现的位置,用wikilink包裹:
修改前:
markdown
This project uses knowledge graphs to connect entities.
修改后:
markdown
This project uses [[concepts/knowledge-graphs|knowledge graphs]] to connect entities.
当wikilink路径和展示文本不同时,使用
[[path|display text]]
格式。

4b: Related section (fallback)

4b: 相关内容板块(兜底方式)

If the term isn't mentioned naturally in the body but the pages are semantically related (shared tags, same project), add a
## Related
section at the bottom of the page:
markdown
undefined
如果术语没有在正文中自然出现,但页面之间存在语义关联(共享标签、同属一个项目),在页面底部添加
## Related
板块:
markdown
undefined

Related

Related

  • [[projects/my-project/my-project]] — Also uses AI agents for research automation
  • [[concepts/knowledge-graphs]] — Core technique used in this project

If a `## Related` section already exists, append to it. Don't duplicate existing entries.
  • [[projects/my-project/my-project]] — Also uses AI agents for research automation
  • [[concepts/knowledge-graphs]] — Core technique used in this project

如果已经存在`## Related`板块,直接追加内容即可,不要重复已有的条目。

Step 5: Report

步骤5:生成报告

Present a summary:
markdown
undefined
输出汇总信息:
markdown
undefined

Cross-Link Report

Cross-Link Report

Links Added: 23 across 12 pages

Links Added: 23 across 12 pages

PageLinks AddedConfidenceType
projects/my-project/my-project.md
3EXTRACTED2 inline, 1 related
entities/jane-doe.md
5INFERRED3 inline, 2 related
...
PageLinks AddedConfidenceType
projects/my-project/my-project.md
3EXTRACTED2 inline, 1 related
entities/jane-doe.md
5INFERRED3 inline, 2 related
...

Orphan Pages Remaining: 2

Orphan Pages Remaining: 2

  • references/foo.md
    — no incoming or outgoing links found
  • concepts/bar.md
    — could not find related pages
  • references/foo.md
    — no incoming or outgoing links found
  • concepts/bar.md
    — could not find related pages

Pages Skipped: 3

Pages Skipped: 3

  • index.md
    ,
    log.md
    — special files
  • _archives/*
    — archived content
undefined
  • index.md
    ,
    log.md
    — special files
  • _archives/*
    — archived content
undefined

Step 6: Update Log

步骤6:更新日志

Append to
log.md
:
- [TIMESTAMP] CROSS_LINK pages_scanned=N links_added=M pages_modified=P orphans_remaining=Q
追加内容到
log.md
- [TIMESTAMP] CROSS_LINK pages_scanned=N links_added=M pages_modified=P orphans_remaining=Q

Tips

提示

  • Run after every ingest. New pages are almost always poorly connected. This is the fix.
  • Be conservative with inline links. Only link the first natural mention, not every occurrence.
  • Don't touch pages in
    _archives/
    .
    Those are frozen snapshots.
  • Respect existing structure. If a page carefully curates its links in a
    ## Key Concepts
    section, add to that section rather than creating a separate
    ## Related
    .
  • Entity pages are link magnets. An entity like
    jane-doe
    should be linked from almost every project page. Prioritize these.
  • 每次内容导入后运行。新页面的关联度几乎都很差,这个工具可以解决这个问题。
  • 正文链接要保守。仅链接第一次自然出现的术语,不要每个出现的位置都加链接。
  • 不要修改
    _archives/
    目录下的页面
    。这些是冻结的快照。
  • 尊重现有结构。如果页面在
    ## Key Concepts
    板块精心整理了链接,就把新链接添加到该板块,不要单独创建
    ## Related
    板块。
  • 实体页面是链接核心。类似
    jane-doe
    这类实体页面应该被几乎所有项目页面链接,优先处理这类链接。