site-crawlability
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSEO Technical: Crawlability
SEO技术:网站可抓取性
Guides crawlability improvements: robots, X-Robots-Tag, site structure, and internal linking.
When invoking: On first use, if helpful, open with 1–2 sentences on what this skill covers and why it matters, then provide the main output. On subsequent use or when the user asks to skip, go directly to the main output.
指导提升网站可抓取性的方法:robots配置、X-Robots-Tag、站点结构与内部链接。
调用时机:首次使用时,如有必要,先用1–2句话说明此技能涵盖的内容及重要性,再提供主要输出内容。后续使用或用户要求跳过介绍时,直接输出主要内容。
Scope (Technical SEO)
适用范围(技术SEO)
- Redirect chains & loops: Fix multi-hop redirects; point directly to final URL
- Broken links (4xx): Fix broken internal/external links; 301 or remove
- Site architecture: Logical hierarchy; pages within 3–4 clicks from homepage
- Orphan pages: Add internal links to pages with no incoming links
- Pagination: Prefer pagination over infinite scroll for crawlability
- 重定向链与循环:修复多跳重定向;直接指向最终URL
- 失效链接(4xx状态码):修复内部/外部失效链接;使用301重定向或移除链接
- 站点架构:构建合理层级;重要页面需在首页3–4次点击范围内可到达
- 孤立页面:为无入站链接的页面添加内部链接
- 分页:为提升可抓取性,优先选择分页而非无限滚动
Initial Assessment
初始评估
Check for product marketing context first: If or exists, read it for site structure.
.claude/product-marketing-context.md.cursor/product-marketing-context.mdIdentify:
- Site structure: Flat vs. deep hierarchy
- Framework: Next.js, static, SPA, etc.
- Key paths: Sitemap, robots.txt, API, static assets
优先检查产品营销上下文:若存在或文件,请读取该文件以了解站点结构。
.claude/product-marketing-context.md.cursor/product-marketing-context.md需明确以下信息:
- 站点结构:扁平化层级 vs 深层级
- 技术框架:Next.js、静态站点、SPA等
- 关键路径:站点地图(Sitemap)、robots.txt、API、静态资源
Best Practices
最佳实践
Redirect Chains & Loops
重定向链与循环
- Fix multi-hop redirects; point directly to final URL
- Loops: URLs redirecting back to themselves; break the cycle
- 修复多跳重定向;直接指向最终URL
- 循环问题:URL重定向回自身;需打破循环
Broken Links (4xx)
失效链接(4xx状态码)
- Fix broken internal/external links; 301 or remove
- Audit regularly; update or remove broken links
- 修复内部/外部失效链接;使用301重定向或移除链接
- 定期审计;更新或移除失效链接
Site Architecture
站点架构
| Principle | Guideline |
|---|---|
| Depth | Important pages within 3–4 clicks from homepage |
| Orphan pages | Add internal links to pages with no incoming links; see internal-links for link strategy |
| Hierarchy | Logical structure; hub pages link to content |
| 原则 | 指南 |
|---|---|
| 层级深度 | 重要页面需在首页3–4次点击范围内可到达 |
| 孤立页面 | 为无入站链接的页面添加内部链接;可参考internal-links技能获取链接策略 |
| 层级结构 | 构建合理逻辑结构;枢纽页面链接至相关内容 |
Pagination vs Infinite Scroll
分页 vs 无限滚动
Problem: With infinite scroll, crawlers cannot emulate user behavior (scroll, click "Load more"); content loaded after initial page load is not discoverable. Same applies to masonry + infinite scroll, lazy-loaded lists, and similar patterns.
Solution: Prefer pagination for key content. If keeping infinite scroll, make it search-friendly per Google's recommendations:
| Requirement | Practice |
|---|---|
| Component pages | Chunk content into paginated pages accessible without JavaScript |
| Full URLs | Each page has unique URL (e.g. |
| No overlap | Each item listed once in series; no duplication across pages |
| Direct access | URL works in new tab; no cookie/history dependency |
| pushState/replaceState | Update URL as user scrolls; enables back/forward, shareable links |
| 404 for out-of-bounds | |
Reference: Infinite scroll search-friendly recommendations (Google Search Central, 2014)
问题:使用无限滚动时,爬虫无法模拟用户行为(滚动、点击“加载更多”);初始页面加载后才加载的内容无法被爬虫发现。瀑布流+无限滚动、懒加载列表及类似模式也存在相同问题。
解决方案:对于核心内容,优先选择分页。若需保留无限滚动,需按照Google官方建议实现搜索引擎友好的版本:
| 要求 | 实践方案 |
|---|---|
| 组件化页面 | 将内容拆分为无需JavaScript即可访问的分页页面 |
| 独立URL | 每个分页页面拥有唯一URL(例如 |
| 无内容重叠 | 每个内容项仅在一个分页中出现;避免跨页面重复 |
| 直接访问 | URL可在新标签页中正常打开;无需依赖Cookie或浏览历史 |
| pushState/replaceState | 用户滚动时更新URL;支持前进/后退、可分享链接 |
| 越界返回404 | 当仅存在998个页面时, |
参考资料:无限滚动的搜索引擎友好实现建议(Google Search Central,2014)
Pagination (Traditional)
传统分页
- Reference links to next/previous pages; /
rel="prev"where applicablerel="next" - Avoid dynamic-only loading; ensure links in HTML
- 添加指向前后页面的参考链接;适用时使用/
rel="prev"属性rel="next" - 避免仅通过动态方式加载;确保链接存在于HTML中
Common Issues
常见问题
| Issue | Check |
|---|---|
| Redirect chains | Update links to point directly to final URL |
| Broken links | 301 or remove; audit internal and external |
| Orphan pages | Add internal links from hub or navigation; see internal-links for strategy |
| Infinite scroll | Provide paginated component pages; or replace with pagination for key content; see above |
| 问题 | 检查与修复方案 |
|---|---|
| 重定向链 | 更新链接,直接指向最终URL |
| 失效链接 | 使用301重定向或移除;审计内部和外部链接 |
| 孤立页面 | 从枢纽页面或导航中添加内部链接;可参考internal-links技能获取策略 |
| 无限滚动 | 提供分页组件页面;或为核心内容替换为分页;详见上述方案 |
Output Format
输出格式
- Redirect audit: Chains and loops to fix
- Broken link audit: 4xx links to fix
- Site structure: Orphan pages, hierarchy
- Pagination: Implementation for crawlable content
- 重定向审计:需修复的重定向链与循环
- 失效链接审计:需修复的4xx状态码链接
- 站点结构:孤立页面、层级结构问题
- 分页:可被抓取的内容分页实现方案
Related Skills
相关技能
- seo-strategy: SEO workflow; crawlability is Technical phase (P0)
- website-structure: Plan which pages to build, page priority, structure planning; use before or alongside crawlability audit
- robots-txt: robots.txt configuration
- xml-sitemap: URL discovery
- google-search-console: Index status, Coverage report
- indexing: Fix indexing issues
- internal-links: Internal linking best practices
- masonry: Masonry + infinite scroll has same crawl issue; layout skill references this for SEO
- seo-strategy:SEO工作流;可抓取性属于技术阶段(最高优先级P0)
- website-structure:规划需构建的页面、页面优先级、结构规划;可在可抓取性审计之前或同时使用
- robots-txt:robots.txt配置
- xml-sitemap:URL发现
- google-search-console:收录状态、覆盖范围报告
- indexing:修复收录问题
- internal-links:内部链接最佳实践
- masonry:瀑布流+无限滚动存在相同的抓取问题;布局技能会参考此技能的SEO建议