link-auditor
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseLink Auditor Skill
Link Auditor Skill
Operator Context
操作器背景
This skill operates as an operator for link health analysis on Hugo static sites, configuring Claude's behavior for comprehensive, non-destructive link auditing. It implements the Pipeline architectural pattern -- Scan, Analyze, Validate, Report -- with Domain Intelligence embedded in Hugo path resolution and SEO link graph metrics.
该Skill作为Hugo静态站点的链接健康分析操作器,配置Claude的行为以实现全面、非破坏性的链接审计。它采用Pipeline架构模式——扫描、分析、验证、报告——并在Hugo路径解析和SEO链接图谱指标中嵌入了Domain Intelligence领域智能。
Hardcoded Behaviors (Always Apply)
硬编码行为(始终生效)
- CLAUDE.md Compliance: Read and follow repository CLAUDE.md before auditing
- Non-Destructive: Never modify content files without explicit user request
- Complete Output: Show all findings; never summarize or abbreviate issue lists
- Issue Classification: Clearly distinguish critical issues (orphans, broken links) from suggestions (under-linked)
- Hugo Path Awareness: Try multiple path resolutions before reporting a link as broken
- CLAUDE.md合规性:审计前需阅读并遵循仓库中的CLAUDE.md规则
- 非破坏性:未经用户明确请求,绝不修改内容文件
- 完整输出:展示所有检测结果;绝不汇总或简化问题列表
- 问题分类:明确区分关键问题(孤立页面、失效链接)与优化建议(链接不足)
- Hugo路径识别:在报告链接失效前,尝试多种路径解析方式
Default Behaviors (ON unless disabled)
默认行为(默认开启,可关闭)
- Full Scan: Analyze all markdown files in content/
- Graph Analysis: Build and analyze internal link adjacency graph
- Image Validation: Check all image paths exist in static/
- Skip External Validation: Do not HTTP-check external URLs (enable with --check-external)
- Issues-Only Output: Show only problems, not all valid links
- 全量扫描:分析content/目录下的所有markdown文件
- 图谱分析:构建并分析内部链接邻接图谱
- 图片验证:检查所有图片路径在static/目录中是否存在
- 跳过外部验证:不通过HTTP检查外部URL(可通过--check-external开启)
- 仅输出问题:仅展示存在问题的链接,不显示所有有效链接
Optional Behaviors (OFF unless enabled)
可选行为(默认关闭,需开启)
- External Link Validation: HTTP HEAD check on external URLs (--check-external)
- Verbose Mode: Show all links including valid ones (--verbose)
- Custom Inbound Threshold: Flag pages with fewer than N inbound links (--min-inbound N)
- 外部链接验证:对外部URL执行HTTP HEAD检查(--check-external)
- 详细模式:展示所有链接,包括有效链接(--verbose)
- 自定义入链阈值:标记入链数少于N的页面(--min-inbound N)
What This Skill CAN Do
该Skill可实现的功能
- Extract internal, external, and image links from Hugo markdown content
- Build adjacency matrix of internal link relationships
- Identify orphan pages (0 inbound internal links) and under-linked pages
- Detect link sinks (receive links, no outbound) and hub pages (many outbound)
- Validate internal link paths resolve to real content files
- Validate image files exist in static/
- Optionally validate external URLs via HTTP HEAD requests
- Handle known false positives (LinkedIn, Twitter block bot requests)
- Generate audit reports with actionable fix suggestions
- 从Hugo markdown内容中提取内部链接、外部链接和图片链接
- 构建内部链接关系的邻接矩阵
- 识别孤立页面(0个内部入链)和链接不足的页面
- 检测链接汇集页(仅接收链接,无出站链接)和链接枢纽页(大量出站链接)
- 验证内部链接路径是否指向真实内容文件
- 验证图片文件是否存在于static/目录中
- 可选通过HTTP HEAD请求验证外部URL
- 处理已知误报(LinkedIn、Twitter会拦截机器人请求)
- 生成包含可执行修复建议的审计报告
What This Skill CANNOT Do
该Skill无法实现的功能
- Validate external URLs by default (network latency, rate limiting concerns)
- Guarantee external link accuracy (social media sites block bots)
- Automatically fix broken links or add missing links
- Analyze JavaScript-rendered content or Hugo shortcodes beyond standard patterns
- Replace pre-publish-checker for single-post validation
- 默认不验证外部URL(考虑网络延迟和速率限制)
- 无法保证外部链接的准确性(社交媒体站点会拦截机器人)
- 无法自动修复失效链接或添加缺失链接
- 无法分析JavaScript渲染内容或超出标准模式的Hugo短代码
- 无法替代单篇文章发布前的检查工具
Instructions
操作步骤
Phase 1: SCAN
阶段1:扫描(SCAN)
Goal: Extract all links from markdown files and classify them by type.
Step 1: Identify content root
Locate the Hugo content directory and enumerate all markdown files:
bash
undefined目标:从markdown文件中提取所有链接并按类型分类。
步骤1:确定内容根目录
定位Hugo的content目录并枚举所有markdown文件:
bash
undefinedTODO: scripts/link_scanner.py not yet implemented
TODO: scripts/link_scanner.py 尚未实现
Manual alternative: extract links from markdown files
手动替代方案:从markdown文件中提取链接
grep -rn '[.](.' ~/your-blog/content/ --include="*.md"
**Step 2: Extract links by type**
Parse each markdown file for three link categories:
Internal Links:
- `[text](/posts/slug/)` -- absolute internal path
- `[text](../other-post/)` -- relative path
- `[text](/categories/tech/)` -- taxonomy pages
- `{{< ref "posts/slug.md" >}}` -- Hugo ref shortcode
External Links:
- `[text](https://example.com/path)`
- `[text](http://example.com/path)`
Image Links:
- `` -- static path
- `` -- relative path
- `{{< figure src="/images/file.png" >}}` -- Hugo shortcode
**Step 3: Tally link counts per file**
Record total internal, external, and image links per file for the summary.
**Gate**: All markdown files scanned. Link extraction complete with counts by type. Proceed only when gate passes.grep -rn '[.](.' ~/your-blog/content/ --include="*.md"
**步骤2:按类型提取链接**
解析每个markdown文件,提取三类链接:
内部链接:
- `[text](/posts/slug/)` -- 绝对内部路径
- `[text](../other-post/)` -- 相对路径
- `[text](/categories/tech/)` -- 分类页面
- `{{< ref "posts/slug.md" >}}` -- Hugo ref短代码
外部链接:
- `[text](https://example.com/path)`
- `[text](http://example.com/path)`
图片链接:
- `` -- 绝对静态路径
- `` -- 相对静态路径
- `{{< figure src="/images/file.png" >}}` -- Hugo短代码
**步骤3:统计每个文件的链接数量**
记录每个文件的内部链接、外部链接和图片链接总数,用于生成摘要。
**检查点**:所有markdown文件已扫描,链接提取完成并按类型统计。仅当通过检查点后,方可进入下一阶段。Phase 2: ANALYZE
阶段2:分析(ANALYZE)
Goal: Build internal link graph and compute structural metrics.
Step 1: Build adjacency matrix
Map every internal link to its source and target:
Page A -> Page B (A links to B)
Page A -> Page C
Page B -> Page D
Page C -> (no outbound)
Page E -> (no outbound, no inbound = orphan)Step 2: Compute graph metrics
| Metric | Definition | SEO Impact |
|---|---|---|
| Orphan Pages | 0 inbound internal links | Critical -- invisible to crawlers |
| Under-Linked | < N inbound links (default 2) | Missed SEO opportunity |
| Link Sinks | Receives links, no outbound | May indicate incomplete content |
| Hub Pages | Many outbound links | Good for navigation |
Step 3: Classify findings by severity
- Critical: Orphan pages, broken internal links, missing images
- Warning: Under-linked pages, link sinks
- Info: Hub pages, external link stats
Gate: Adjacency matrix built. All pages classified with inbound/outbound counts. Proceed only when gate passes.
目标:构建内部链接图谱并计算结构指标。
步骤1:构建邻接矩阵
映射每个内部链接的来源和目标页面:
页面A -> 页面B(A链接到B)
页面A -> 页面C
页面B -> 页面D
页面C -> (无出站链接)
页面E -> (无出站、无入链 = 孤立页面)步骤2:计算图谱指标
| 指标 | 定义 | SEO影响 |
|---|---|---|
| 孤立页面 | 0个内部入链 | 严重问题 -- 爬虫无法发现 |
| 链接不足 | 入链数少于N(默认2个) | 错失SEO优化机会 |
| 链接汇集页 | 仅接收链接,无出站链接 | 可能表示内容不完整 |
| 链接枢纽页 | 大量出站链接 | 有利于站点导航 |
步骤3:按严重程度分类结果
- 严重:孤立页面、失效内部链接、缺失图片
- 警告:链接不足页面、链接汇集页
- 信息:链接枢纽页、外部链接统计
检查点:邻接矩阵已构建,所有页面已按入链/出链数分类。仅当通过检查点后,方可进入下一阶段。
Phase 3: VALIDATE
阶段3:验证(VALIDATE)
Goal: Verify link targets resolve to real files or live URLs.
Step 1: Validate internal links
For each internal link target:
- Parse the link target path
- Try Hugo path resolutions: ,
content/posts/slug.md,content/posts/slug/index.mdcontent/posts/slug/_index.md - Mark as broken only if ALL resolutions fail
- Record source file and line number for broken links
Step 2: Validate image paths
For each image reference:
- Parse image source path (absolute or relative)
- Map to static/ directory
- Check file exists
- Record source file and line number for missing images
Step 3: Validate external links (optional)
Only when is enabled:
--check-external- HTTP HEAD request to URL
- Follow redirects (up to 3)
- Check response status code
- Mark known false positives as "blocked (expected)" not broken
Known false positives: LinkedIn (403), Twitter/X (403/999), Facebook (varies).
Gate: All link targets checked. Broken links have file and line numbers. External results (if enabled) distinguish real failures from false positives. Proceed only when gate passes.
目标:验证链接目标是否指向真实文件或可用URL。
步骤1:验证内部链接
针对每个内部链接目标:
- 解析链接目标路径
- 尝试Hugo路径解析规则:、
content/posts/slug.md、content/posts/slug/index.mdcontent/posts/slug/_index.md - 仅当所有解析方式均失败时,标记为失效链接
- 记录失效链接的源文件和行号
步骤2:验证图片路径
针对每个图片引用:
- 解析图片源路径(绝对或相对)
- 映射到static/目录
- 检查文件是否存在
- 记录缺失图片的源文件和行号
步骤3:验证外部链接(可选)
仅当启用时执行:
--check-external- 对URL发送HTTP HEAD请求
- 跟随重定向(最多3次)
- 检查响应状态码
- 将已知误报标记为「拦截(预期)」而非失效
已知误报站点:LinkedIn(403)、Twitter/X(403/999)、Facebook(状态码不定)。
检查点:所有链接目标已验证,失效链接已记录文件和行号。若启用外部验证,需区分真实失效与误报。仅当通过检查点后,方可进入下一阶段。
Phase 4: REPORT
阶段4:报告(REPORT)
Goal: Present findings in a structured, actionable audit report.
Step 1: Generate summary header
===============================================================
LINK AUDIT: ~/your-blog/content/
===============================================================
SCAN SUMMARY:
Posts scanned: 15
Internal links: 42
External links: 28
Image references: 12Step 2: Report by severity
List critical issues first (orphans, broken links, missing images), then warnings (under-linked, sinks), then info (hubs, valid external counts).
Each issue must include:
- File path
- Line number (for broken links and missing images)
- Specific suggestion for resolution
Step 3: Generate recommendations
Conclude with numbered, actionable recommendations ordered by impact:
===============================================================
RECOMMENDATIONS:
1. Add internal links to 2 orphan pages
2. Fix 1 broken internal link in /posts/example.md line 45
3. Update or remove 1 dead external link
4. Add missing image or fix path in /posts/images.md line 12
===============================================================Gate: Report generated with all findings. Every issue has a file path and actionable suggestion. Audit is complete.
目标:以结构化、可执行的格式呈现审计结果。
步骤1:生成摘要头部
===============================================================
链接审计报告: ~/your-blog/content/
===============================================================
扫描摘要:
已扫描文章数: 15
内部链接数: 42
外部链接数: 28
图片引用数: 12步骤2:按严重程度报告问题
先列出严重问题(孤立页面、失效链接、缺失图片),再列出警告(链接不足、汇集页),最后列出信息类内容(枢纽页、有效外部链接统计)。
每个问题需包含:
- 文件路径
- 行号(针对失效链接和缺失图片)
- 具体修复建议
步骤3:生成优化建议
以编号形式按影响优先级列出可执行建议:
===============================================================
优化建议:
1. 为2个孤立页面添加内部链接
2. 修复/posts/example.md第45行的1个失效内部链接
3. 更新或移除1个失效外部链接
4. 在/posts/images.md第12行添加缺失图片或修正路径
===============================================================检查点:报告已生成并包含所有检测结果,每个问题均附带文件路径和可执行建议。审计完成。
Error Handling
错误处理
Error: "No markdown files found"
错误:「未找到markdown文件」
Cause: Wrong directory path or empty content root
Solution:
- Verify the content/ directory exists at the given path
- Check that .md files exist (not just subdirectories)
- Confirm the path is the Hugo content root, not the project root
原因:目录路径错误或content目录为空
解决方案:
- 确认content/目录存在于指定路径
- 检查目录中是否存在.md文件(而非仅子目录)
- 确认路径为Hugo的content根目录,而非项目根目录
Error: "External validation timeout"
错误:「外部验证超时」
Cause: Target site is slow, blocking requests, or unreachable
Solution:
- Check if the site is in the known false-positives list (LinkedIn, Twitter)
- Add persistently failing sites to the false-positives list
- Use shorter timeout with for slow sites
--timeout 5
原因:目标站点响应缓慢、拦截请求或无法访问
解决方案:
- 检查站点是否在已知误报列表中(LinkedIn、Twitter)
- 将持续失效的站点添加到误报列表
- 对缓慢站点使用缩短超时时间
--timeout 5
Error: "Image path ambiguous"
错误:「图片路径不明确」
Cause: Path could be relative or absolute, unclear resolution
Solution:
- The scanner checks both interpretations automatically
- Report shows which interpretation was attempted
- Verify the Hugo site's static directory structure matches expectations
原因:路径可能为相对或绝对路径,解析规则不清晰
解决方案:
- 扫描器会自动检查两种解析方式
- 报告中会显示尝试过的解析方式
- 确认Hugo站点的static目录结构符合预期
Anti-Patterns
反模式
Anti-Pattern 1: Treating Bot-Blocked Sites as Broken
反模式1:将机器人拦截的站点标记为失效
What it looks like: Reporting LinkedIn/Twitter links as broken when they return 403/999.
Why wrong: These sites actively block bot requests. Links work fine in browsers.
Do instead: Maintain false-positives list. Report as "blocked (expected)" not broken.
表现:当LinkedIn/Twitter返回403/999状态码时,报告链接失效。
错误原因:这些站点主动拦截机器人请求,但在浏览器中可正常访问。
正确做法:维护误报列表,标记为「拦截(预期)」而非失效。
Anti-Pattern 2: Skipping Graph Analysis
反模式2:跳过图谱分析
What it looks like: Only checking for broken links without analyzing the link graph.
Why wrong: Orphan pages are invisible to search crawlers. This is often the highest-impact finding.
Do instead: Always build the adjacency matrix and compute inbound link counts.
表现:仅检查失效链接,不分析链接图谱。
错误原因:孤立页面无法被搜索引擎爬虫发现,通常是影响最大的问题。
正确做法:始终构建邻接矩阵并计算入链数。
Anti-Pattern 3: Literal Path Matching Without Hugo Resolution
反模式3:不使用Hugo解析规则,直接匹配字面路径
What it looks like: Treating as a literal file path and reporting it broken.
Why wrong: Hugo resolves paths through multiple conventions (slug.md, slug/index.md, slug/_index.md).
Do instead: Try all Hugo path resolutions before reporting a link as broken.
/posts/slug/表现:将视为字面文件路径并报告为失效。
错误原因:Hugo通过多种规则解析路径(slug.md、slug/index.md、slug/_index.md)。
正确做法:在报告链接失效前,尝试所有Hugo路径解析方式。
/posts/slug/Anti-Pattern 4: Modifying Content Without User Consent
反模式4:未经用户同意修改内容
What it looks like: Automatically adding links to orphan pages or fixing broken paths.
Why wrong: This skill is non-destructive. Users must approve all content changes.
Do instead: Report findings with specific suggestions. Let the user decide which fixes to apply.
表现:自动为孤立页面添加链接或修正失效路径。
错误原因:该Skill为非破坏性工具,所有内容修改需经用户批准。
正确做法:报告检测结果并提供具体建议,由用户决定执行哪些修复。
References
参考资料
This skill uses these shared patterns:
- Anti-Rationalization - Prevents shortcut rationalizations
- Verification Checklist - Pre-completion checks
该Skill使用以下共享模式:
- 反合理化模式 - 避免捷径式合理化
- 验证检查清单 - 完成前检查
Domain-Specific Anti-Rationalization
领域特定反合理化
| Rationalization | Why It's Wrong | Required Action |
|---|---|---|
| "Only 3 broken links, not worth a full audit" | Orphan pages are invisible without graph analysis | Run full 4-phase audit |
| "External links probably still work" | Link rot is progressive and silent | Validate with --check-external periodically |
| "Hugo will resolve it somehow" | Hugo path resolution has specific rules | Test all resolution patterns explicitly |
| "Small site doesn't need link auditing" | Even 10 posts can have orphans | Run audit regardless of site size |
| 合理化借口 | 错误原因 | 要求操作 |
|---|---|---|
| 「只有3个失效链接,不值得完整审计」 | 孤立页面无法通过简单检查发现 | 执行完整的4阶段审计 |
| 「外部链接可能还能用」 | 链接失效是渐进且隐蔽的 | 定期使用--check-external验证 |
| 「Hugo会自动解析」 | Hugo路径解析有明确规则 | 显式测试所有解析模式 |
| 「小站点不需要链接审计」 | 即使10篇文章也可能存在孤立页面 | 无论站点大小,均需执行审计 |
Reference Files
参考文件
- : Graph metrics definitions and SEO impact
${CLAUDE_SKILL_DIR}/references/link-graph-metrics.md - : Sites known to block validation requests
${CLAUDE_SKILL_DIR}/references/false-positives.md - : Resolution strategies for each issue type
${CLAUDE_SKILL_DIR}/references/fix-strategies.md
- :图谱指标定义与SEO影响
${CLAUDE_SKILL_DIR}/references/link-graph-metrics.md - :已知会拦截验证请求的站点
${CLAUDE_SKILL_DIR}/references/false-positives.md - :各类问题的修复策略
${CLAUDE_SKILL_DIR}/references/fix-strategies.md