site-crawlability

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

SEO Technical: Crawlability

SEO技术:网站可抓取性

Guides crawlability improvements: robots, X-Robots-Tag, site structure, and internal linking.
When invoking: On first use, if helpful, open with 1–2 sentences on what this skill covers and why it matters, then provide the main output. On subsequent use or when the user asks to skip, go directly to the main output.
指导提升网站可抓取性的方法:robots配置、X-Robots-Tag、站点结构与内部链接。
调用时机:首次使用时,如有必要,先用1–2句话说明此技能涵盖的内容及重要性,再提供主要输出内容。后续使用或用户要求跳过介绍时,直接输出主要内容。

Scope (Technical SEO)

适用范围(技术SEO)

  • Redirect chains & loops: Fix multi-hop redirects; point directly to final URL
  • Broken links (4xx): Fix broken internal/external links; 301 or remove
  • Site architecture: Logical hierarchy; pages within 3–4 clicks from homepage
  • Orphan pages: Add internal links to pages with no incoming links
  • Pagination: Prefer pagination over infinite scroll for crawlability
  • 重定向链与循环:修复多跳重定向;直接指向最终URL
  • 失效链接(4xx状态码):修复内部/外部失效链接;使用301重定向或移除链接
  • 站点架构:构建合理层级;重要页面需在首页3–4次点击范围内可到达
  • 孤立页面:为无入站链接的页面添加内部链接
  • 分页:为提升可抓取性,优先选择分页而非无限滚动

Initial Assessment

初始评估

Check for product marketing context first: If
.claude/product-marketing-context.md
or
.cursor/product-marketing-context.md
exists, read it for site structure.
Identify:
  1. Site structure: Flat vs. deep hierarchy
  2. Framework: Next.js, static, SPA, etc.
  3. Key paths: Sitemap, robots.txt, API, static assets
优先检查产品营销上下文:若存在
.claude/product-marketing-context.md
.cursor/product-marketing-context.md
文件,请读取该文件以了解站点结构。
需明确以下信息:
  1. 站点结构:扁平化层级 vs 深层级
  2. 技术框架:Next.js、静态站点、SPA等
  3. 关键路径:站点地图(Sitemap)、robots.txt、API、静态资源

Best Practices

最佳实践

Redirect Chains & Loops

重定向链与循环

  • Fix multi-hop redirects; point directly to final URL
  • Loops: URLs redirecting back to themselves; break the cycle
  • 修复多跳重定向;直接指向最终URL
  • 循环问题:URL重定向回自身;需打破循环

Broken Links (4xx)

失效链接(4xx状态码)

  • Fix broken internal/external links; 301 or remove
  • Audit regularly; update or remove broken links
  • 修复内部/外部失效链接;使用301重定向或移除链接
  • 定期审计;更新或移除失效链接

Site Architecture

站点架构

PrincipleGuideline
DepthImportant pages within 3–4 clicks from homepage
Orphan pagesAdd internal links to pages with no incoming links; see internal-links for link strategy
HierarchyLogical structure; hub pages link to content
原则指南
层级深度重要页面需在首页3–4次点击范围内可到达
孤立页面为无入站链接的页面添加内部链接;可参考internal-links技能获取链接策略
层级结构构建合理逻辑结构;枢纽页面链接至相关内容

Pagination vs Infinite Scroll

分页 vs 无限滚动

Problem: With infinite scroll, crawlers cannot emulate user behavior (scroll, click "Load more"); content loaded after initial page load is not discoverable. Same applies to masonry + infinite scroll, lazy-loaded lists, and similar patterns.
Solution: Prefer pagination for key content. If keeping infinite scroll, make it search-friendly per Google's recommendations:
RequirementPractice
Component pagesChunk content into paginated pages accessible without JavaScript
Full URLsEach page has unique URL (e.g.
?page=1
,
?lastid=567
); avoid
#1
No overlapEach item listed once in series; no duplication across pages
Direct accessURL works in new tab; no cookie/history dependency
pushState/replaceStateUpdate URL as user scrolls; enables back/forward, shareable links
404 for out-of-bounds
?page=999
returns 404 when only 998 pages exist
Reference: Infinite scroll search-friendly recommendations (Google Search Central, 2014)
问题:使用无限滚动时,爬虫无法模拟用户行为(滚动、点击“加载更多”);初始页面加载后才加载的内容无法被爬虫发现。瀑布流+无限滚动、懒加载列表及类似模式也存在相同问题。
解决方案:对于核心内容,优先选择分页。若需保留无限滚动,需按照Google官方建议实现搜索引擎友好的版本:
要求实践方案
组件化页面将内容拆分为无需JavaScript即可访问的分页页面
独立URL每个分页页面拥有唯一URL(例如
?page=1
?lastid=567
);避免使用
#1
这类锚点
无内容重叠每个内容项仅在一个分页中出现;避免跨页面重复
直接访问URL可在新标签页中正常打开;无需依赖Cookie或浏览历史
pushState/replaceState用户滚动时更新URL;支持前进/后退、可分享链接
越界返回404当仅存在998个页面时,
?page=999
需返回404状态码
参考资料无限滚动的搜索引擎友好实现建议(Google Search Central,2014)

Pagination (Traditional)

传统分页

  • Reference links to next/previous pages;
    rel="prev"
    /
    rel="next"
    where applicable
  • Avoid dynamic-only loading; ensure links in HTML
  • 添加指向前后页面的参考链接;适用时使用
    rel="prev"
    /
    rel="next"
    属性
  • 避免仅通过动态方式加载;确保链接存在于HTML中

Common Issues

常见问题

IssueCheck
Redirect chainsUpdate links to point directly to final URL
Broken links301 or remove; audit internal and external
Orphan pagesAdd internal links from hub or navigation; see internal-links for strategy
Infinite scrollProvide paginated component pages; or replace with pagination for key content; see above
问题检查与修复方案
重定向链更新链接,直接指向最终URL
失效链接使用301重定向或移除;审计内部和外部链接
孤立页面从枢纽页面或导航中添加内部链接;可参考internal-links技能获取策略
无限滚动提供分页组件页面;或为核心内容替换为分页;详见上述方案

Output Format

输出格式

  • Redirect audit: Chains and loops to fix
  • Broken link audit: 4xx links to fix
  • Site structure: Orphan pages, hierarchy
  • Pagination: Implementation for crawlable content
  • 重定向审计:需修复的重定向链与循环
  • 失效链接审计:需修复的4xx状态码链接
  • 站点结构:孤立页面、层级结构问题
  • 分页:可被抓取的内容分页实现方案

Related Skills

相关技能

  • seo-strategy: SEO workflow; crawlability is Technical phase (P0)
  • website-structure: Plan which pages to build, page priority, structure planning; use before or alongside crawlability audit
  • robots-txt: robots.txt configuration
  • xml-sitemap: URL discovery
  • google-search-console: Index status, Coverage report
  • indexing: Fix indexing issues
  • internal-links: Internal linking best practices
  • masonry: Masonry + infinite scroll has same crawl issue; layout skill references this for SEO
  • seo-strategy:SEO工作流;可抓取性属于技术阶段(最高优先级P0)
  • website-structure:规划需构建的页面、页面优先级、结构规划;可在可抓取性审计之前或同时使用
  • robots-txt:robots.txt配置
  • xml-sitemap:URL发现
  • google-search-console:收录状态、覆盖范围报告
  • indexing:修复收录问题
  • internal-links:内部链接最佳实践
  • masonry:瀑布流+无限滚动存在相同的抓取问题;布局技能会参考此技能的SEO建议