link-auditor

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Link Auditor Skill

Operator Context

操作器背景

This skill operates as an operator for link health analysis on Hugo static sites, configuring Claude's behavior for comprehensive, non-destructive link auditing. It implements the Pipeline architectural pattern -- Scan, Analyze, Validate, Report -- with Domain Intelligence embedded in Hugo path resolution and SEO link graph metrics.

该Skill作为Hugo静态站点的链接健康分析操作器，配置Claude的行为以实现全面、非破坏性的链接审计。它采用Pipeline架构模式——扫描、分析、验证、报告——并在Hugo路径解析和SEO链接图谱指标中嵌入了Domain Intelligence领域智能。

Hardcoded Behaviors (Always Apply)

硬编码行为（始终生效）

CLAUDE.md Compliance: Read and follow repository CLAUDE.md before auditing
Non-Destructive: Never modify content files without explicit user request
Complete Output: Show all findings; never summarize or abbreviate issue lists
Issue Classification: Clearly distinguish critical issues (orphans, broken links) from suggestions (under-linked)
Hugo Path Awareness: Try multiple path resolutions before reporting a link as broken

CLAUDE.md合规性：审计前需阅读并遵循仓库中的CLAUDE.md规则
非破坏性：未经用户明确请求，绝不修改内容文件
完整输出：展示所有检测结果；绝不汇总或简化问题列表
问题分类：明确区分关键问题（孤立页面、失效链接）与优化建议（链接不足）
Hugo路径识别：在报告链接失效前，尝试多种路径解析方式

Default Behaviors (ON unless disabled)

默认行为（默认开启，可关闭）

Full Scan: Analyze all markdown files in content/
Graph Analysis: Build and analyze internal link adjacency graph
Image Validation: Check all image paths exist in static/
Skip External Validation: Do not HTTP-check external URLs (enable with --check-external)
Issues-Only Output: Show only problems, not all valid links

全量扫描：分析content/目录下的所有markdown文件
图谱分析：构建并分析内部链接邻接图谱
图片验证：检查所有图片路径在static/目录中是否存在
跳过外部验证：不通过HTTP检查外部URL（可通过--check-external开启）
仅输出问题：仅展示存在问题的链接，不显示所有有效链接

Optional Behaviors (OFF unless enabled)

可选行为（默认关闭，需开启）

External Link Validation: HTTP HEAD check on external URLs (--check-external)
Verbose Mode: Show all links including valid ones (--verbose)
Custom Inbound Threshold: Flag pages with fewer than N inbound links (--min-inbound N)

外部链接验证：对外部URL执行HTTP HEAD检查（--check-external）
详细模式：展示所有链接，包括有效链接（--verbose）
自定义入链阈值：标记入链数少于N的页面（--min-inbound N）

What This Skill CAN Do

该Skill可实现的功能

Extract internal, external, and image links from Hugo markdown content
Build adjacency matrix of internal link relationships
Identify orphan pages (0 inbound internal links) and under-linked pages
Detect link sinks (receive links, no outbound) and hub pages (many outbound)
Validate internal link paths resolve to real content files
Validate image files exist in static/
Optionally validate external URLs via HTTP HEAD requests
Handle known false positives (LinkedIn, Twitter block bot requests)
Generate audit reports with actionable fix suggestions

从Hugo markdown内容中提取内部链接、外部链接和图片链接
构建内部链接关系的邻接矩阵
识别孤立页面（0个内部入链）和链接不足的页面
检测链接汇集页（仅接收链接，无出站链接）和链接枢纽页（大量出站链接）
验证内部链接路径是否指向真实内容文件
验证图片文件是否存在于static/目录中
可选通过HTTP HEAD请求验证外部URL
处理已知误报（LinkedIn、Twitter会拦截机器人请求）
生成包含可执行修复建议的审计报告

What This Skill CANNOT Do

该Skill无法实现的功能

Validate external URLs by default (network latency, rate limiting concerns)
Guarantee external link accuracy (social media sites block bots)
Automatically fix broken links or add missing links
Analyze JavaScript-rendered content or Hugo shortcodes beyond standard patterns
Replace pre-publish-checker for single-post validation

默认不验证外部URL（考虑网络延迟和速率限制）
无法保证外部链接的准确性（社交媒体站点会拦截机器人）
无法自动修复失效链接或添加缺失链接
无法分析JavaScript渲染内容或超出标准模式的Hugo短代码
无法替代单篇文章发布前的检查工具

Instructions

操作步骤

Phase 1: SCAN

阶段1：扫描（SCAN）

Goal: Extract all links from markdown files and classify them by type.

Step 1: Identify content root

Locate the Hugo content directory and enumerate all markdown files:

bash

undefined

目标：从markdown文件中提取所有链接并按类型分类。

步骤1：确定内容根目录

定位Hugo的content目录并枚举所有markdown文件：

bash

undefined

TODO: scripts/link_scanner.py not yet implemented

TODO: scripts/link_scanner.py 尚未实现

Manual alternative: extract links from markdown files

手动替代方案：从markdown文件中提取链接

grep -rn '[.](.' ~/your-blog/content/ --include="*.md"


**Step 2: Extract links by type**

Parse each markdown file for three link categories:

Internal Links:
- `[text](/posts/slug/)` -- absolute internal path
- `[text](../other-post/)` -- relative path
- `[text](/categories/tech/)` -- taxonomy pages
- `{{< ref "posts/slug.md" >}}` -- Hugo ref shortcode

External Links:
- `[text](https://example.com/path)`
- `[text](http://example.com/path)`

Image Links:
- `![alt](/images/filename.png)` -- static path
- `![alt](images/filename.png)` -- relative path
- `{{< figure src="/images/file.png" >}}` -- Hugo shortcode

**Step 3: Tally link counts per file**

Record total internal, external, and image links per file for the summary.

**Gate**: All markdown files scanned. Link extraction complete with counts by type. Proceed only when gate passes.

grep -rn '[.](.' ~/your-blog/content/ --include="*.md"


**步骤2：按类型提取链接**

解析每个markdown文件，提取三类链接：

内部链接：
- `[text](/posts/slug/)` -- 绝对内部路径
- `[text](../other-post/)` -- 相对路径
- `[text](/categories/tech/)` -- 分类页面
- `{{< ref "posts/slug.md" >}}` -- Hugo ref短代码

外部链接：
- `[text](https://example.com/path)`
- `[text](http://example.com/path)`

图片链接：
- `![alt](/images/filename.png)` -- 绝对静态路径
- `![alt](images/filename.png)` -- 相对静态路径
- `{{< figure src="/images/file.png" >}}` -- Hugo短代码

**步骤3：统计每个文件的链接数量**

记录每个文件的内部链接、外部链接和图片链接总数，用于生成摘要。

**检查点**：所有markdown文件已扫描，链接提取完成并按类型统计。仅当通过检查点后，方可进入下一阶段。

Phase 2: ANALYZE

阶段2：分析（ANALYZE）

Goal: Build internal link graph and compute structural metrics.

Step 1: Build adjacency matrix

Map every internal link to its source and target:

Page A -> Page B (A links to B)
Page A -> Page C
Page B -> Page D
Page C -> (no outbound)
Page E -> (no outbound, no inbound = orphan)

Step 2: Compute graph metrics

Metric	Definition	SEO Impact
Orphan Pages	0 inbound internal links	Critical -- invisible to crawlers
Under-Linked	< N inbound links (default 2)	Missed SEO opportunity
Link Sinks	Receives links, no outbound	May indicate incomplete content
Hub Pages	Many outbound links	Good for navigation

Step 3: Classify findings by severity

Critical: Orphan pages, broken internal links, missing images
Warning: Under-linked pages, link sinks
Info: Hub pages, external link stats

Gate: Adjacency matrix built. All pages classified with inbound/outbound counts. Proceed only when gate passes.

目标：构建内部链接图谱并计算结构指标。

步骤1：构建邻接矩阵

映射每个内部链接的来源和目标页面：

页面A -> 页面B（A链接到B）
页面A -> 页面C
页面B -> 页面D
页面C -> （无出站链接）
页面E -> （无出站、无入链 = 孤立页面）

步骤2：计算图谱指标

指标	定义	SEO影响
孤立页面	0个内部入链	严重问题 -- 爬虫无法发现
链接不足	入链数少于N（默认2个）	错失SEO优化机会
链接汇集页	仅接收链接，无出站链接	可能表示内容不完整
链接枢纽页	大量出站链接	有利于站点导航

步骤3：按严重程度分类结果

严重：孤立页面、失效内部链接、缺失图片
警告：链接不足页面、链接汇集页
信息：链接枢纽页、外部链接统计

检查点：邻接矩阵已构建，所有页面已按入链/出链数分类。仅当通过检查点后，方可进入下一阶段。

Phase 3: VALIDATE

阶段3：验证（VALIDATE）

Goal: Verify link targets resolve to real files or live URLs.

Step 1: Validate internal links

For each internal link target:

Parse the link target path

Try Hugo path resolutions:

content/posts/slug.md

content/posts/slug/index.md

content/posts/slug/_index.md

Mark as broken only if ALL resolutions fail
Record source file and line number for broken links

Step 2: Validate image paths

For each image reference:

Parse image source path (absolute or relative)
Map to static/ directory
Check file exists
Record source file and line number for missing images

Step 3: Validate external links (optional)

Only when

--check-external

is enabled:

HTTP HEAD request to URL
Follow redirects (up to 3)
Check response status code
Mark known false positives as "blocked (expected)" not broken

Known false positives: LinkedIn (403), Twitter/X (403/999), Facebook (varies).

Gate: All link targets checked. Broken links have file and line numbers. External results (if enabled) distinguish real failures from false positives. Proceed only when gate passes.

目标：验证链接目标是否指向真实文件或可用URL。

步骤1：验证内部链接

针对每个内部链接目标：

解析链接目标路径

尝试Hugo路径解析规则：

content/posts/slug.md

、

content/posts/slug/index.md

、

content/posts/slug/_index.md

仅当所有解析方式均失败时，标记为失效链接
记录失效链接的源文件和行号

步骤2：验证图片路径

针对每个图片引用：

解析图片源路径（绝对或相对）
映射到static/目录
检查文件是否存在
记录缺失图片的源文件和行号

步骤3：验证外部链接（可选）

仅当启用

--check-external

时执行：

对URL发送HTTP HEAD请求
跟随重定向（最多3次）
检查响应状态码
将已知误报标记为「拦截（预期）」而非失效

已知误报站点：LinkedIn（403）、Twitter/X（403/999）、Facebook（状态码不定）。

检查点：所有链接目标已验证，失效链接已记录文件和行号。若启用外部验证，需区分真实失效与误报。仅当通过检查点后，方可进入下一阶段。

Phase 4: REPORT

阶段4：报告（REPORT）

Goal: Present findings in a structured, actionable audit report.

Step 1: Generate summary header

===============================================================
 LINK AUDIT: ~/your-blog/content/
===============================================================

 SCAN SUMMARY:
   Posts scanned: 15
   Internal links: 42
   External links: 28
   Image references: 12

Step 2: Report by severity

List critical issues first (orphans, broken links, missing images), then warnings (under-linked, sinks), then info (hubs, valid external counts).

Each issue must include:

File path
Line number (for broken links and missing images)
Specific suggestion for resolution

Step 3: Generate recommendations

Conclude with numbered, actionable recommendations ordered by impact:

===============================================================
 RECOMMENDATIONS:
   1. Add internal links to 2 orphan pages
   2. Fix 1 broken internal link in /posts/example.md line 45
   3. Update or remove 1 dead external link
   4. Add missing image or fix path in /posts/images.md line 12
===============================================================

Gate: Report generated with all findings. Every issue has a file path and actionable suggestion. Audit is complete.

目标：以结构化、可执行的格式呈现审计结果。

步骤1：生成摘要头部

===============================================================
 链接审计报告: ~/your-blog/content/
===============================================================

 扫描摘要:
   已扫描文章数: 15
   内部链接数: 42
   外部链接数: 28
   图片引用数: 12

步骤2：按严重程度报告问题

先列出严重问题（孤立页面、失效链接、缺失图片），再列出警告（链接不足、汇集页），最后列出信息类内容（枢纽页、有效外部链接统计）。

每个问题需包含：

文件路径
行号（针对失效链接和缺失图片）
具体修复建议

步骤3：生成优化建议

以编号形式按影响优先级列出可执行建议：

===============================================================
 优化建议:
   1. 为2个孤立页面添加内部链接
   2. 修复/posts/example.md第45行的1个失效内部链接
   3. 更新或移除1个失效外部链接
   4. 在/posts/images.md第12行添加缺失图片或修正路径
===============================================================

检查点：报告已生成并包含所有检测结果，每个问题均附带文件路径和可执行建议。审计完成。

Error Handling

错误处理

Error: "No markdown files found"

错误：「未找到markdown文件」

Cause: Wrong directory path or empty content root Solution:

Verify the content/ directory exists at the given path
Check that .md files exist (not just subdirectories)
Confirm the path is the Hugo content root, not the project root

原因：目录路径错误或content目录为空解决方案：

确认content/目录存在于指定路径
检查目录中是否存在.md文件（而非仅子目录）
确认路径为Hugo的content根目录，而非项目根目录

Error: "External validation timeout"

错误：「外部验证超时」

Cause: Target site is slow, blocking requests, or unreachable Solution:

Check if the site is in the known false-positives list (LinkedIn, Twitter)
Add persistently failing sites to the false-positives list
Use shorter timeout with
```
--timeout 5
```
for slow sites

原因：目标站点响应缓慢、拦截请求或无法访问解决方案：

检查站点是否在已知误报列表中（LinkedIn、Twitter）
将持续失效的站点添加到误报列表
对缓慢站点使用
```
--timeout 5
```
缩短超时时间

Error: "Image path ambiguous"

错误：「图片路径不明确」

Cause: Path could be relative or absolute, unclear resolution Solution:

The scanner checks both interpretations automatically
Report shows which interpretation was attempted
Verify the Hugo site's static directory structure matches expectations

原因：路径可能为相对或绝对路径，解析规则不清晰解决方案：

扫描器会自动检查两种解析方式
报告中会显示尝试过的解析方式
确认Hugo站点的static目录结构符合预期

Anti-Patterns

反模式

Anti-Pattern 1: Treating Bot-Blocked Sites as Broken

反模式1：将机器人拦截的站点标记为失效

What it looks like: Reporting LinkedIn/Twitter links as broken when they return 403/999. Why wrong: These sites actively block bot requests. Links work fine in browsers. Do instead: Maintain false-positives list. Report as "blocked (expected)" not broken.

表现：当LinkedIn/Twitter返回403/999状态码时，报告链接失效。 错误原因：这些站点主动拦截机器人请求，但在浏览器中可正常访问。 正确做法：维护误报列表，标记为「拦截（预期）」而非失效。

Anti-Pattern 2: Skipping Graph Analysis

反模式2：跳过图谱分析

What it looks like: Only checking for broken links without analyzing the link graph. Why wrong: Orphan pages are invisible to search crawlers. This is often the highest-impact finding. Do instead: Always build the adjacency matrix and compute inbound link counts.

表现：仅检查失效链接，不分析链接图谱。 错误原因：孤立页面无法被搜索引擎爬虫发现，通常是影响最大的问题。 正确做法：始终构建邻接矩阵并计算入链数。

Anti-Pattern 3: Literal Path Matching Without Hugo Resolution

反模式3：不使用Hugo解析规则，直接匹配字面路径

What it looks like: Treating

/posts/slug/

as a literal file path and reporting it broken. Why wrong: Hugo resolves paths through multiple conventions (slug.md, slug/index.md, slug/_index.md). Do instead: Try all Hugo path resolutions before reporting a link as broken.

表现：将

/posts/slug/

视为字面文件路径并报告为失效。 错误原因：Hugo通过多种规则解析路径（slug.md、slug/index.md、slug/_index.md）。 正确做法：在报告链接失效前，尝试所有Hugo路径解析方式。

Anti-Pattern 4: Modifying Content Without User Consent

反模式4：未经用户同意修改内容

What it looks like: Automatically adding links to orphan pages or fixing broken paths. Why wrong: This skill is non-destructive. Users must approve all content changes. Do instead: Report findings with specific suggestions. Let the user decide which fixes to apply.

表现：自动为孤立页面添加链接或修正失效路径。 错误原因：该Skill为非破坏性工具，所有内容修改需经用户批准。 正确做法：报告检测结果并提供具体建议，由用户决定执行哪些修复。

References

参考资料

This skill uses these shared patterns:

Anti-Rationalization - Prevents shortcut rationalizations
Verification Checklist - Pre-completion checks

该Skill使用以下共享模式：

反合理化模式 - 避免捷径式合理化
验证检查清单 - 完成前检查

Domain-Specific Anti-Rationalization

领域特定反合理化

Rationalization	Why It's Wrong	Required Action
"Only 3 broken links, not worth a full audit"	Orphan pages are invisible without graph analysis	Run full 4-phase audit
"External links probably still work"	Link rot is progressive and silent	Validate with --check-external periodically
"Hugo will resolve it somehow"	Hugo path resolution has specific rules	Test all resolution patterns explicitly
"Small site doesn't need link auditing"	Even 10 posts can have orphans	Run audit regardless of site size

合理化借口	错误原因	要求操作
「只有3个失效链接，不值得完整审计」	孤立页面无法通过简单检查发现	执行完整的4阶段审计
「外部链接可能还能用」	链接失效是渐进且隐蔽的	定期使用--check-external验证
「Hugo会自动解析」	Hugo路径解析有明确规则	显式测试所有解析模式
「小站点不需要链接审计」	即使10篇文章也可能存在孤立页面	无论站点大小，均需执行审计

Reference Files

参考文件

${CLAUDE_SKILL_DIR}/references/link-graph-metrics.md

: Graph metrics definitions and SEO impact

${CLAUDE_SKILL_DIR}/references/false-positives.md

: Sites known to block validation requests

${CLAUDE_SKILL_DIR}/references/fix-strategies.md

: Resolution strategies for each issue type

${CLAUDE_SKILL_DIR}/references/link-graph-metrics.md

：图谱指标定义与SEO影响

${CLAUDE_SKILL_DIR}/references/false-positives.md

：已知会拦截验证请求的站点

${CLAUDE_SKILL_DIR}/references/fix-strategies.md

：各类问题的修复策略