site-crawlability

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

SEO Technical: Crawlability

SEO技术：网站可抓取性

Guides crawlability improvements: robots, X-Robots-Tag, site structure, and internal linking.

When invoking: On first use, if helpful, open with 1–2 sentences on what this skill covers and why it matters, then provide the main output. On subsequent use or when the user asks to skip, go directly to the main output.

指导提升网站可抓取性的方法：robots配置、X-Robots-Tag、站点结构与内部链接。

调用时机：首次使用时，如有必要，先用1–2句话说明此技能涵盖的内容及重要性，再提供主要输出内容。后续使用或用户要求跳过介绍时，直接输出主要内容。

Scope (Technical SEO)

适用范围（技术SEO）

Redirect chains & loops: Fix multi-hop redirects; point directly to final URL
Broken links (4xx): Fix broken internal/external links; 301 or remove
Site architecture: Logical hierarchy; pages within 3–4 clicks from homepage
Orphan pages: Add internal links to pages with no incoming links
Pagination: Prefer pagination over infinite scroll for crawlability

重定向链与循环：修复多跳重定向；直接指向最终URL
失效链接（4xx状态码）：修复内部/外部失效链接；使用301重定向或移除链接
站点架构：构建合理层级；重要页面需在首页3–4次点击范围内可到达
孤立页面：为无入站链接的页面添加内部链接
分页：为提升可抓取性，优先选择分页而非无限滚动

Initial Assessment

初始评估

Check for product marketing context first: If

.claude/product-marketing-context.md

.cursor/product-marketing-context.md

exists, read it for site structure.

Identify:

Site structure: Flat vs. deep hierarchy
Framework: Next.js, static, SPA, etc.
Key paths: Sitemap, robots.txt, API, static assets

优先检查产品营销上下文：若存在

.claude/product-marketing-context.md

或

.cursor/product-marketing-context.md

文件，请读取该文件以了解站点结构。

需明确以下信息：

站点结构：扁平化层级 vs 深层级
技术框架：Next.js、静态站点、SPA等
关键路径：站点地图（Sitemap）、robots.txt、API、静态资源

Best Practices

最佳实践

Redirect Chains & Loops

重定向链与循环

Fix multi-hop redirects; point directly to final URL
Loops: URLs redirecting back to themselves; break the cycle

修复多跳重定向；直接指向最终URL
循环问题：URL重定向回自身；需打破循环

Broken Links (4xx)

失效链接（4xx状态码）

Fix broken internal/external links; 301 or remove
Audit regularly; update or remove broken links

修复内部/外部失效链接；使用301重定向或移除链接
定期审计；更新或移除失效链接

Site Architecture

站点架构

Principle	Guideline
Depth	Important pages within 3–4 clicks from homepage
Orphan pages	Add internal links to pages with no incoming links; see internal-links for link strategy
Hierarchy	Logical structure; hub pages link to content

原则	指南
层级深度	重要页面需在首页3–4次点击范围内可到达
孤立页面	为无入站链接的页面添加内部链接；可参考internal-links技能获取链接策略
层级结构	构建合理逻辑结构；枢纽页面链接至相关内容

Pagination vs Infinite Scroll

分页 vs 无限滚动

Problem: With infinite scroll, crawlers cannot emulate user behavior (scroll, click "Load more"); content loaded after initial page load is not discoverable. Same applies to masonry + infinite scroll, lazy-loaded lists, and similar patterns.

Solution: Prefer pagination for key content. If keeping infinite scroll, make it search-friendly per Google's recommendations:

Requirement	Practice
Component pages	Chunk content into paginated pages accessible without JavaScript
Full URLs	Each page has unique URL (e.g. `?page=1` , `?lastid=567` ); avoid `#1`
No overlap	Each item listed once in series; no duplication across pages
Direct access	URL works in new tab; no cookie/history dependency
pushState/replaceState	Update URL as user scrolls; enables back/forward, shareable links
404 for out-of-bounds	`?page=999` returns 404 when only 998 pages exist

Reference: Infinite scroll search-friendly recommendations (Google Search Central, 2014)

问题：使用无限滚动时，爬虫无法模拟用户行为（滚动、点击“加载更多”）；初始页面加载后才加载的内容无法被爬虫发现。瀑布流+无限滚动、懒加载列表及类似模式也存在相同问题。

解决方案：对于核心内容，优先选择分页。若需保留无限滚动，需按照Google官方建议实现搜索引擎友好的版本：

要求	实践方案
组件化页面	将内容拆分为无需JavaScript即可访问的分页页面
独立URL	每个分页页面拥有唯一URL（例如 `?page=1` 、 `?lastid=567` ）；避免使用 `#1` 这类锚点
无内容重叠	每个内容项仅在一个分页中出现；避免跨页面重复
直接访问	URL可在新标签页中正常打开；无需依赖Cookie或浏览历史
pushState/replaceState	用户滚动时更新URL；支持前进/后退、可分享链接
越界返回404	当仅存在998个页面时， `?page=999` 需返回404状态码

参考资料：无限滚动的搜索引擎友好实现建议（Google Search Central，2014）

Pagination (Traditional)

传统分页

Reference links to next/previous pages;
```
rel="prev"
```
/
```
rel="next"
```
where applicable
Avoid dynamic-only loading; ensure links in HTML

添加指向前后页面的参考链接；适用时使用
```
rel="prev"
```
/
```
rel="next"
```
属性
避免仅通过动态方式加载；确保链接存在于HTML中

Common Issues

常见问题

Issue	Check
Redirect chains	Update links to point directly to final URL
Broken links	301 or remove; audit internal and external
Orphan pages	Add internal links from hub or navigation; see internal-links for strategy
Infinite scroll	Provide paginated component pages; or replace with pagination for key content; see above

问题	检查与修复方案
重定向链	更新链接，直接指向最终URL
失效链接	使用301重定向或移除；审计内部和外部链接
孤立页面	从枢纽页面或导航中添加内部链接；可参考internal-links技能获取策略
无限滚动	提供分页组件页面；或为核心内容替换为分页；详见上述方案