seo-technical

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Technical SEO Audit

技术SEO审计

Categories

审计类别

1. Crawlability

1. 可抓取性

  • robots.txt: exists, valid, not blocking important resources
  • XML sitemap: exists, referenced in robots.txt, valid format
  • Noindex tags: intentional vs accidental
  • Crawl depth: important pages within 3 clicks of homepage
  • JavaScript rendering: check if critical content requires JS execution
  • Crawl budget: for large sites (>10k pages), efficiency matters
  • robots.txt:存在、有效,未屏蔽重要资源
  • XML站点地图:存在、已在robots.txt中引用、格式有效
  • Noindex标签:区分有意设置与意外设置
  • 抓取深度:重要页面需在首页3次点击范围内
  • JavaScript渲染:检查关键内容是否需要执行JS才能显示
  • 抓取预算:针对大型站点(>10k页面),抓取效率至关重要

AI Crawler Management

AI爬虫管理

As of 2025-2026, AI companies actively crawl the web to train models and power AI search. Managing these crawlers via robots.txt is a critical technical SEO consideration.
Known AI crawlers:
CrawlerCompanyrobots.txt tokenPurpose
GPTBotOpenAI
GPTBot
Model training
ChatGPT-UserOpenAI
ChatGPT-User
Real-time browsing
ClaudeBotAnthropic
ClaudeBot
Model training
PerplexityBotPerplexity
PerplexityBot
Search index + training
BytespiderByteDance
Bytespider
Model training
Google-ExtendedGoogle
Google-Extended
Gemini training (NOT search)
CCBotCommon Crawl
CCBot
Open dataset
Key distinctions:
  • Blocking
    Google-Extended
    prevents Gemini training use but does NOT affect Google Search indexing or AI Overviews (those use
    Googlebot
    )
  • Blocking
    GPTBot
    prevents OpenAI training but does NOT prevent ChatGPT from citing your content via browsing (
    ChatGPT-User
    )
  • ~3-5% of websites now use AI-specific robots.txt rules
Example — selective AI crawler blocking:
undefined
2025-2026年,AI企业正积极抓取网页以训练模型并支持AI搜索。通过robots.txt管理这些爬虫是技术SEO的关键考量因素。
已知AI爬虫:
爬虫名称所属公司robots.txt标识用途
GPTBotOpenAI
GPTBot
模型训练
ChatGPT-UserOpenAI
ChatGPT-User
实时浏览
ClaudeBotAnthropic
ClaudeBot
模型训练
PerplexityBotPerplexity
PerplexityBot
搜索索引+模型训练
BytespiderByteDance
Bytespider
模型训练
Google-ExtendedGoogle
Google-Extended
Gemini模型训练(非搜索用途)
CCBotCommon Crawl
CCBot
开放数据集构建
关键区别:
  • 屏蔽
    Google-Extended
    会阻止Gemini使用站点内容训练模型,但不会影响Google搜索索引或AI概览(这些使用
    Googlebot
  • 屏蔽
    GPTBot
    会阻止OpenAI使用站点内容训练模型,但不会阻止ChatGPT通过浏览功能引用你的内容(该功能使用
    ChatGPT-User
  • 目前约3-5%的网站已使用AI专属的robots.txt规则
示例 — 选择性屏蔽AI爬虫:
undefined

Allow search indexing, block AI training crawlers

允许搜索索引,屏蔽AI训练爬虫

User-agent: GPTBot Disallow: /
User-agent: Google-Extended Disallow: /
User-agent: Bytespider Disallow: /
User-agent: GPTBot Disallow: /
User-agent: Google-Extended Disallow: /
User-agent: Bytespider Disallow: /

Allow all other crawlers (including Googlebot for search)

允许所有其他爬虫(包括用于搜索的Googlebot)

User-agent: * Allow: /

**Recommendation:** Consider your AI visibility strategy before blocking. Being cited by AI systems drives brand awareness and referral traffic. Cross-reference the `seo-geo` skill for full AI visibility optimization.
User-agent: * Allow: /

**建议:** 在屏蔽前考虑你的AI可见性策略。被AI系统引用能提升品牌知名度并带来推荐流量。如需完整的AI可见性优化,可参考`seo-geo`技能。

2. Indexability

2. 可索引性

  • Canonical tags: self-referencing, no conflicts with noindex
  • Duplicate content: near-duplicates, parameter URLs, www vs non-www
  • Thin content: pages below minimum word counts per type
  • Pagination: rel=next/prev or load-more pattern
  • Hreflang: correct for multi-language/multi-region sites
  • Index bloat: unnecessary pages consuming crawl budget
  • 规范标签(Canonical):自引用,与noindex无冲突
  • 重复内容:近似重复、带参数URL、www与非www域名重复
  • 低质内容:页面字数低于对应类型的最低要求
  • 分页:使用rel=next/prev或加载更多模式
  • Hreflang:多语言/多区域站点需配置正确
  • 索引膨胀:不必要的页面占用抓取预算

3. Security

3. 安全性

  • HTTPS: enforced, valid SSL certificate, no mixed content
  • Security headers:
    • Content-Security-Policy (CSP)
    • Strict-Transport-Security (HSTS)
    • X-Frame-Options
    • X-Content-Type-Options
    • Referrer-Policy
  • HSTS preload: check preload list inclusion for high-security sites
  • HTTPS:强制启用、SSL证书有效、无混合内容
  • 安全标头:
    • Content-Security-Policy (CSP)
    • Strict-Transport-Security (HSTS)
    • X-Frame-Options
    • X-Content-Type-Options
    • Referrer-Policy
  • HSTS预加载:高安全性站点需检查是否已加入预加载列表

4. URL Structure

4. URL结构

  • Clean URLs: descriptive, hyphenated, no query parameters for content
  • Hierarchy: logical folder structure reflecting site architecture
  • Redirects: no chains (max 1 hop), 301 for permanent moves
  • URL length: flag >100 characters
  • Trailing slashes: consistent usage
  • 简洁URL:描述性强、使用连字符分隔、内容页面无查询参数
  • 层级结构:符合站点架构的逻辑文件夹结构
  • 重定向:无链式重定向(最多1次跳转)、永久重定向使用301
  • URL长度:标记超过100字符的URL
  • 尾部斜杠:使用方式保持一致

5. Mobile Optimization

5. 移动端优化

  • Responsive design: viewport meta tag, responsive CSS
  • Touch targets: minimum 48x48px with 8px spacing
  • Font size: minimum 16px base
  • No horizontal scroll
  • Mobile-first indexing: Google indexes mobile version. Mobile-first indexing is 100% complete as of July 5, 2024. Google now crawls and indexes ALL websites exclusively with the mobile Googlebot user-agent.
  • 响应式设计:包含viewport元标签、使用响应式CSS
  • 触摸目标:最小尺寸48x48px,间距至少8px
  • 字体大小:基础字体最小16px
  • 无水平滚动
  • 移动端优先索引:Google优先索引移动端版本。截至2024年7月5日,移动端优先索引已全面完成。Google现在仅使用移动端Googlebot用户代理抓取和索引所有网站。

6. Core Web Vitals

6. Core Web Vitals

  • LCP (Largest Contentful Paint): target <2.5s
  • INP (Interaction to Next Paint): target <200ms
    • INP replaced FID on March 12, 2024. FID was fully removed from all Chrome tools (CrUX API, PageSpeed Insights, Lighthouse) on September 9, 2024. Do NOT reference FID anywhere.
  • CLS (Cumulative Layout Shift): target <0.1
  • Evaluation uses 75th percentile of real user data
  • Use PageSpeed Insights API or CrUX data if MCP available
  • LCP(最大内容绘制):目标<2.5秒
  • INP(交互到下一次绘制):目标<200ms
    • INP已于2024年3月12日取代FID。2024年9月9日,FID已从所有Chrome工具(CrUX API、PageSpeed Insights、Lighthouse)中完全移除,请勿再提及FID。
  • CLS(累积布局偏移):目标<0.1
  • 评估采用真实用户数据的75百分位数值
  • 若有MCP数据,可使用PageSpeed Insights API或CrUX数据进行评估

7. Structured Data

7. 结构化数据

  • Detection: JSON-LD (preferred), Microdata, RDFa
  • Validation against Google's supported types
  • See seo-schema skill for full analysis
  • 检测:支持JSON-LD(推荐格式)、Microdata、RDFa
  • 对照Google支持的类型进行验证
  • 如需完整分析,可参考seo-schema技能

8. JavaScript Rendering

8. JavaScript渲染

  • Check if content visible in initial HTML vs requires JS
  • Identify client-side rendered (CSR) vs server-side rendered (SSR)
  • Flag SPA frameworks (React, Vue, Angular) that may cause indexing issues
  • Verify dynamic rendering setup if applicable
  • 检查内容是在初始HTML中可见还是需要执行JS才能显示
  • 区分客户端渲染(CSR)与服务端渲染(SSR)
  • 标记可能导致索引问题的SPA框架(React、Vue、Angular)
  • 若适用,验证动态渲染配置

JavaScript SEO — Canonical & Indexing Guidance (December 2025)

JavaScript SEO — 规范标签与索引指南(2025年12月)

Google updated its JavaScript SEO documentation in December 2025 with critical clarifications:
  1. Canonical conflicts: If a canonical tag in raw HTML differs from one injected by JavaScript, Google may use EITHER one. Ensure canonical tags are identical between server-rendered HTML and JS-rendered output.
  2. noindex with JavaScript: If raw HTML contains
    <meta name="robots" content="noindex">
    but JavaScript removes it, Google MAY still honor the noindex from raw HTML. Serve correct robots directives in the initial HTML response.
  3. Non-200 status codes: Google does NOT render JavaScript on pages returning non-200 HTTP status codes. Any content or meta tags injected via JS on error pages will be invisible to Googlebot.
  4. Structured data in JavaScript: Product, Article, and other structured data injected via JS may face delayed processing. For time-sensitive structured data (especially e-commerce Product markup), include it in the initial server-rendered HTML.
Best practice: Serve critical SEO elements (canonical, meta robots, structured data, title, meta description) in the initial server-rendered HTML rather than relying on JavaScript injection.
Google于2025年12月更新了JavaScript SEO文档,带来以下关键说明:
  1. 规范标签冲突: 如果原始HTML中的规范标签与JavaScript注入的规范标签不同,Google可能会使用其中任意一个。请确保服务端渲染的HTML与JS渲染输出中的规范标签完全一致。
  2. JavaScript设置noindex: 如果原始HTML包含
    <meta name="robots" content="noindex">
    但JavaScript将其移除,Google仍可能遵循原始HTML中的noindex指令。请在初始HTML响应中提供正确的robots指令。
  3. 非200状态码: Google不会在返回非200 HTTP状态码的页面上渲染JavaScript。错误页面上通过JS注入的任何内容或元标签对Googlebot都是不可见的。
  4. JavaScript中的结构化数据: 通过JS注入的商品、文章等结构化数据可能会延迟处理。对于时间敏感的结构化数据(尤其是电商商品标记),请将其包含在初始服务端渲染的HTML中。
最佳实践: 关键SEO元素(规范标签、robots元标签、结构化数据、标题、元描述)应在初始服务端渲染的HTML中提供,而非依赖JavaScript注入。

9. IndexNow Protocol

9. IndexNow协议

  • Check if site supports IndexNow for Bing, Yandex, Naver
  • Supported by search engines other than Google
  • Recommend implementation for faster indexing on non-Google engines
  • 检查站点是否支持面向Bing、Yandex、Naver的IndexNow
  • 除Google外的搜索引擎均支持该协议
  • 建议实施以提升非Google引擎的索引速度

Output

输出结果

Technical Score: XX/100

技术得分:XX/100

Category Breakdown

类别细分评分

CategoryStatusScore
Crawlability✅/⚠️/❌XX/100
Indexability✅/⚠️/❌XX/100
Security✅/⚠️/❌XX/100
URL Structure✅/⚠️/❌XX/100
Mobile✅/⚠️/❌XX/100
Core Web Vitals✅/⚠️/❌XX/100
Structured Data✅/⚠️/❌XX/100
JS Rendering✅/⚠️/❌XX/100
类别状态得分
可抓取性✅/⚠️/❌XX/100
可索引性✅/⚠️/❌XX/100
安全性✅/⚠️/❌XX/100
URL结构✅/⚠️/❌XX/100
移动端✅/⚠️/❌XX/100
Core Web Vitals✅/⚠️/❌XX/100
结构化数据✅/⚠️/❌XX/100
JS渲染✅/⚠️/❌XX/100

Critical Issues (fix immediately)

关键问题(立即修复)

High Priority (fix within 1 week)

高优先级(1周内修复)

Medium Priority (fix within 1 month)

中优先级(1个月内修复)

Low Priority (backlog)

低优先级(放入待办清单)