cross-linker

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Cross-Linker — Automated Wiki Cross-Referencing

Cross-Linker — 自动化Wiki交叉引用工具

You are weaving the wiki's knowledge graph tighter by finding and inserting missing

[[wikilinks]]

between pages that should reference each other but currently don't.

Follow the Retrieval Primitives table in
llm-wiki/SKILL.md
. Build the registry in Step 1 by grepping frontmatter only (not full pages). Reserve full

Read

for the unlinked-mention detection pass, and even there, only read pages whose summaries/titles make them plausible link targets. Blind full-vault reads are what this framework exists to avoid.

你正在通过查找并插入页面之间缺失的

[[wikilinks]]

（这些页面本应相互引用但目前没有），来进一步完善wiki的knowledge graph。

请遵循
llm-wiki/SKILL.md
中的检索原语表。第一步构建注册表时仅grep frontmatter内容（不要扫描完整页面）。把完整的

Read

操作留给未链接提及检测环节，即便在该环节，也仅读取摘要/标题看起来是合理链接目标的页面。本框架就是为了避免盲目读取整个 vault 的内容而设计的。

Before You Start

开始之前

Read
```
.env
```
to get
```
OBSIDIAN_VAULT_PATH
```
Read
```
index.md
```
to get the full inventory of pages and their one-line descriptions
Skim
```
log.md
```
to see what was recently ingested (focus linking effort on new pages)

读取
```
.env
```
获取
```
OBSIDIAN_VAULT_PATH
```
读取
```
index.md
```
获取页面完整清单及其单行描述
浏览
```
log.md
```
查看最近导入的内容（将链接工作的重点放在新页面上）

Step 1: Build the Page Registry

步骤1：构建页面注册表

Glob all

.md

files in the vault (excluding

_archives/

.obsidian/

). For each page, extract:

Filename (without
```
.md
```
) — this is the wikilink target
Title from frontmatter
Aliases from frontmatter (if any)
Tags from frontmatter
Category from frontmatter or directory inference
One-line summary — first sentence or
```
title
```
field

Build a lookup table:

page_name → { path, title, aliases, tags, summary }

This is your "vocabulary" — every entry in this table is a valid wikilink target.

全局匹配vault中所有

.md

文件（排除

_archives/

、

.obsidian/

目录）。为每个页面提取以下信息：

文件名（不带
```
.md
```
后缀）—— 这是wikilink的目标
Title：来自frontmatter
Aliases：来自frontmatter（如果有的话）
Tags：来自frontmatter
Category：来自frontmatter或通过目录推断
单行摘要：第一句话或
```
title
```
字段

构建一个查找表：

page_name → { path, title, aliases, tags, summary }

这就是你的“词汇表”——表中的每个条目都是有效的wikilink目标。

Step 2: Scan for Missing Links

步骤2：扫描缺失的链接

For each page in the vault:

Read the full content
Extract existing wikilinks — find all
```
[[...]]
```
references already present
Search for unlinked mentions — check if the page's text contains any of these, without being wrapped in
```
[[...]]
```
:
- Page filenames (e.g., the word "MyProject" appears but
```
[[projects/my-project/my-project]]
```
  is missing)
- Page titles from frontmatter
- Aliases from frontmatter
- Entity names, project names, concept names from the registry
Check for semantic connections — pages that share multiple tags or are in the same project directory but don't link to each other

针对vault中的每个页面：

读取完整内容
提取现有wikilinks——查找所有已经存在的
```
[[...]]
```
引用
搜索未链接的提及内容——检查页面文本中是否包含以下内容，但没有被
```
[[...]]
```
包裹：
- 页面文件名（例如出现了单词“MyProject”，但缺少
```
[[projects/my-project/my-project]]
```
  链接）
- frontmatter中的页面标题
- frontmatter中的别名
- 注册表中的实体名、项目名、概念名
检查语义关联——共享多个标签或属于同一项目目录但没有相互链接的页面

Matching Rules

匹配规则

Case-insensitive matching for names (e.g., "my-project" matches page
```
MyProject
```
)
Skip self-references — a page shouldn't link to itself
Skip common words — don't link "the", "and", generic terms. Only match on distinctive names
Prefer the shortest unambiguous wikilink path — use
```
[[page-name]]
```
not
```
[[full/path/to/page-name]]
```
when the name is unique across the vault
Don't link inside code blocks or frontmatter
Don't double-link — if
```
[[foo]]
```
already appears on the page, don't add another

名称匹配不区分大小写（例如“my-project”匹配页面
```
MyProject
```
）
跳过自引用——页面不应该链接到自身
跳过通用词汇——不要链接“the”、“and”这类通用术语，仅匹配具有辨识度的名称
优先使用最短的无歧义wikilink路径——当名称在整个vault中唯一时，使用
```
[[page-name]]
```
而非
```
[[full/path/to/page-name]]
```
不要在代码块或frontmatter内部添加链接
不要重复链接——如果页面上已经出现了
```
[[foo]]
```
，不要再添加第二个

Step 3: Score and Rank Suggestions

步骤3：对建议链接进行评分和排序

Not every possible link is worth adding. Score each candidate using a composite signal, then tag it with a confidence label.

不是所有可能的链接都值得添加。使用复合信号对每个候选链接进行评分，然后为其打上置信度标签。

Scoring

评分规则

Signal	Points	Example
Exact name match in text	+4	"MyProject" appears in body text → link to my-project.md
Shared tags (2+)	+2	Both tagged `#ai #agent` but no link between them
Same project, no link	+2	Both under `projects/my-project/` but don't reference each other
Mentioned entity/concept	+2	Page mentions "knowledge graphs" → link to `[[concepts/knowledge-graphs]]`
Cross-category connection	+2	Source is in `concepts/` , target is in `entities/` (or `skills/` ↔ `synthesis/` ) — different knowledge layers make this link more architecturally valuable
Peripheral→hub reach	+2	Source page has ≤ 2 total links (peripheral) but target has ≥ 8 (hub) — connecting a loose page to a load-bearing concept
Partial name match	+1	"graph" appears but page is `knowledge-graphs` — plausible but ambiguous

信号	分值	示例
文本中精确匹配名称	+4	正文出现“MyProject” → 链接到my-project.md
共享标签（≥2个）	+2	两个页面都打了 `#ai #agent` 标签但没有相互链接
同属一个项目且无链接	+2	两个页面都在 `projects/my-project/` 目录下但没有相互引用
提及的实体/概念	+2	页面提到“knowledge graphs” → 链接到 `[[concepts/knowledge-graphs]]`
跨类别关联	+2	源页面在 `concepts/` 目录下，目标页面在 `entities/` 目录下（或 `skills/` ↔ `synthesis/` ）——不同知识层的链接具有更高的架构价值
边缘→中心节点连接	+2	源页面总链接数≤2（边缘节点），而目标页面总链接数≥8（中心节点）——将松散页面关联到核心概念
部分名称匹配	+1	出现“graph”，但对应页面是 `knowledge-graphs` ——看似合理但存在歧义

Confidence labels

置信度标签

Tag each candidate with a confidence label based on its score:

Score	Label	Action
≥ 6	EXTRACTED	Link is effectively certain — exact mention or very strong match. Apply inline.
3–5	INFERRED	Link is a reasonable inference — shared context, cross-category, peripheral→hub. Apply inline or as Related section.
1–2	AMBIGUOUS	Weak or partial match. Skip unless user specifically asks to connect loose pages.

Only act on EXTRACTED and INFERRED candidates. Include the confidence label in the Cross-Link Report so the user can review INFERRED links before trusting them.

根据得分给每个候选链接打上置信度标签：

得分	标签	操作
≥ 6	EXTRACTED	链接几乎是确定的——精确提及或匹配度极高，直接在正文中插入
3–5	INFERRED	链接是合理推断的结果——共享上下文、跨类别、边缘→中心连接，可插入正文或放在相关内容板块
1–2	AMBIGUOUS	匹配度弱或存在歧义，除非用户明确要求关联松散页面，否则跳过

仅处理EXTRACTED和INFERRED类别的候选链接。在交叉链接报告中包含置信度标签，方便用户在信任INFERRED类链接之前进行审核。

Step 4: Apply Links

步骤4：添加链接

For each page with missing links:

针对每个存在缺失链接的页面：

4a: Inline linking (preferred)

4a: 正文内链接（优先方式）

Find the first natural mention of the term in the body text and wrap it in wikilinks:

Before:

markdown

This project uses knowledge graphs to connect entities.

After:

markdown

This project uses [[concepts/knowledge-graphs|knowledge graphs]] to connect entities.

Use the

[[path|display text]]

format when the wikilink path differs from the display text.

找到术语在正文中第一次自然出现的位置，用wikilink包裹：

修改前：

markdown

This project uses knowledge graphs to connect entities.

修改后：

markdown

This project uses [[concepts/knowledge-graphs|knowledge graphs]] to connect entities.

当wikilink路径和展示文本不同时，使用

[[path|display text]]

格式。

4b: Related section (fallback)

4b: 相关内容板块（兜底方式）

If the term isn't mentioned naturally in the body but the pages are semantically related (shared tags, same project), add a

## Related

section at the bottom of the page:

markdown

undefined

如果术语没有在正文中自然出现，但页面之间存在语义关联（共享标签、同属一个项目），在页面底部添加

## Related

板块：

markdown

undefined

[[projects/my-project/my-project]] — Also uses AI agents for research automation
[[concepts/knowledge-graphs]] — Core technique used in this project


If a `## Related` section already exists, append to it. Don't duplicate existing entries.

[[projects/my-project/my-project]] — Also uses AI agents for research automation
[[concepts/knowledge-graphs]] — Core technique used in this project


如果已经存在`## Related`板块，直接追加内容即可，不要重复已有的条目。

Step 5: Report

步骤5：生成报告

Present a summary:

markdown

undefined

输出汇总信息：

markdown

undefined

Cross-Link Report

Links Added: 23 across 12 pages

Page	Links Added	Confidence	Type
`projects/my-project/my-project.md`	3	EXTRACTED	2 inline, 1 related
`entities/jane-doe.md`	5	INFERRED	3 inline, 2 related
...

Page	Links Added	Confidence	Type
`projects/my-project/my-project.md`	3	EXTRACTED	2 inline, 1 related
`entities/jane-doe.md`	5	INFERRED	3 inline, 2 related
...

Orphan Pages Remaining: 2

```
references/foo.md
```
— no incoming or outgoing links found
```
concepts/bar.md
```
— could not find related pages

```
references/foo.md
```
— no incoming or outgoing links found
```
concepts/bar.md
```
— could not find related pages

Pages Skipped: 3

```
index.md
```
,
```
log.md
```
— special files
```
_archives/*
```
— archived content

undefined

```
index.md
```
,
```
log.md
```
— special files
```
_archives/*
```
— archived content

undefined

Step 6: Update Log

步骤6：更新日志

Append to

log.md

- [TIMESTAMP] CROSS_LINK pages_scanned=N links_added=M pages_modified=P orphans_remaining=Q

追加内容到

log.md

：

- [TIMESTAMP] CROSS_LINK pages_scanned=N links_added=M pages_modified=P orphans_remaining=Q

Tips

提示

Run after every ingest. New pages are almost always poorly connected. This is the fix.
Be conservative with inline links. Only link the first natural mention, not every occurrence.
Don't touch pages in
_archives/
. Those are frozen snapshots.
Respect existing structure. If a page carefully curates its links in a
```
## Key Concepts
```
section, add to that section rather than creating a separate
```
## Related
```
.
Entity pages are link magnets. An entity like
```
jane-doe
```
should be linked from almost every project page. Prioritize these.

每次内容导入后运行。新页面的关联度几乎都很差，这个工具可以解决这个问题。
正文链接要保守。仅链接第一次自然出现的术语，不要每个出现的位置都加链接。
不要修改
_archives/
目录下的页面。这些是冻结的快照。
尊重现有结构。如果页面在
```
## Key Concepts
```
板块精心整理了链接，就把新链接添加到该板块，不要单独创建
```
## Related
```
板块。
实体页面是链接核心。类似
```
jane-doe
```
这类实体页面应该被几乎所有项目页面链接，优先处理这类链接。