seo-technical

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Technical SEO Audit

技术SEO审计

审计类别

1. Crawlability

1. 可抓取性

robots.txt: exists, valid, not blocking important resources
XML sitemap: exists, referenced in robots.txt, valid format
Noindex tags: intentional vs accidental
Crawl depth: important pages within 3 clicks of homepage
JavaScript rendering: check if critical content requires JS execution
Crawl budget: for large sites (>10k pages), efficiency matters

robots.txt：存在、有效，未屏蔽重要资源
XML站点地图：存在、已在robots.txt中引用、格式有效
Noindex标签：区分有意设置与意外设置
抓取深度：重要页面需在首页3次点击范围内
JavaScript渲染：检查关键内容是否需要执行JS才能显示
抓取预算：针对大型站点（>10k页面），抓取效率至关重要

AI Crawler Management

AI爬虫管理

As of 2025-2026, AI companies actively crawl the web to train models and power AI search. Managing these crawlers via robots.txt is a critical technical SEO consideration.

Known AI crawlers:

Crawler	Company	robots.txt token	Purpose
GPTBot	OpenAI	`GPTBot`	Model training
ChatGPT-User	OpenAI	`ChatGPT-User`	Real-time browsing
ClaudeBot	Anthropic	`ClaudeBot`	Model training
PerplexityBot	Perplexity	`PerplexityBot`	Search index + training
Bytespider	ByteDance	`Bytespider`	Model training
Google-Extended	Google	`Google-Extended`	Gemini training (NOT search)
CCBot	Common Crawl	`CCBot`	Open dataset

Key distinctions:

Blocking
```
Google-Extended
```
prevents Gemini training use but does NOT affect Google Search indexing or AI Overviews (those use
```
Googlebot
```
)
Blocking
```
GPTBot
```
prevents OpenAI training but does NOT prevent ChatGPT from citing your content via browsing (
```
ChatGPT-User
```
)
~3-5% of websites now use AI-specific robots.txt rules

Example — selective AI crawler blocking:

undefined

2025-2026年，AI企业正积极抓取网页以训练模型并支持AI搜索。通过robots.txt管理这些爬虫是技术SEO的关键考量因素。

已知AI爬虫：

爬虫名称	所属公司	robots.txt标识	用途
GPTBot	OpenAI	`GPTBot`	模型训练
ChatGPT-User	OpenAI	`ChatGPT-User`	实时浏览
ClaudeBot	Anthropic	`ClaudeBot`	模型训练
PerplexityBot	Perplexity	`PerplexityBot`	搜索索引+模型训练
Bytespider	ByteDance	`Bytespider`	模型训练
Google-Extended	Google	`Google-Extended`	Gemini模型训练（非搜索用途）
CCBot	Common Crawl	`CCBot`	开放数据集构建

关键区别：

屏蔽
```
Google-Extended
```
会阻止Gemini使用站点内容训练模型，但不会影响Google搜索索引或AI概览（这些使用
```
Googlebot
```
）
屏蔽
```
GPTBot
```
会阻止OpenAI使用站点内容训练模型，但不会阻止ChatGPT通过浏览功能引用你的内容（该功能使用
```
ChatGPT-User
```
）
目前约3-5%的网站已使用AI专属的robots.txt规则

示例 — 选择性屏蔽AI爬虫：

undefined

Allow search indexing, block AI training crawlers

允许搜索索引，屏蔽AI训练爬虫

User-agent: GPTBot Disallow: /

User-agent: Google-Extended Disallow: /

User-agent: Bytespider Disallow: /

User-agent: GPTBot Disallow: /

User-agent: Google-Extended Disallow: /

User-agent: Bytespider Disallow: /

Allow all other crawlers (including Googlebot for search)

允许所有其他爬虫（包括用于搜索的Googlebot）

User-agent: * Allow: /


**Recommendation:** Consider your AI visibility strategy before blocking. Being cited by AI systems drives brand awareness and referral traffic. Cross-reference the `seo-geo` skill for full AI visibility optimization.

User-agent: * Allow: /


**建议：** 在屏蔽前考虑你的AI可见性策略。被AI系统引用能提升品牌知名度并带来推荐流量。如需完整的AI可见性优化，可参考`seo-geo`技能。

2. Indexability

2. 可索引性

Canonical tags: self-referencing, no conflicts with noindex
Duplicate content: near-duplicates, parameter URLs, www vs non-www
Thin content: pages below minimum word counts per type
Pagination: rel=next/prev or load-more pattern
Hreflang: correct for multi-language/multi-region sites
Index bloat: unnecessary pages consuming crawl budget

规范标签（Canonical）：自引用，与noindex无冲突
重复内容：近似重复、带参数URL、www与非www域名重复
低质内容：页面字数低于对应类型的最低要求
分页：使用rel=next/prev或加载更多模式
Hreflang：多语言/多区域站点需配置正确
索引膨胀：不必要的页面占用抓取预算

3. Security

3. 安全性

HTTPS: enforced, valid SSL certificate, no mixed content
Security headers:
- Content-Security-Policy (CSP)
- Strict-Transport-Security (HSTS)
- X-Frame-Options
- X-Content-Type-Options
- Referrer-Policy
HSTS preload: check preload list inclusion for high-security sites

HTTPS：强制启用、SSL证书有效、无混合内容
安全标头：
- Content-Security-Policy (CSP)
- Strict-Transport-Security (HSTS)
- X-Frame-Options
- X-Content-Type-Options
- Referrer-Policy
HSTS预加载：高安全性站点需检查是否已加入预加载列表

4. URL Structure

4. URL结构

Clean URLs: descriptive, hyphenated, no query parameters for content
Hierarchy: logical folder structure reflecting site architecture
Redirects: no chains (max 1 hop), 301 for permanent moves
URL length: flag >100 characters
Trailing slashes: consistent usage

简洁URL：描述性强、使用连字符分隔、内容页面无查询参数
层级结构：符合站点架构的逻辑文件夹结构
重定向：无链式重定向（最多1次跳转）、永久重定向使用301
URL长度：标记超过100字符的URL
尾部斜杠：使用方式保持一致

5. Mobile Optimization

5. 移动端优化

Responsive design: viewport meta tag, responsive CSS
Touch targets: minimum 48x48px with 8px spacing
Font size: minimum 16px base
No horizontal scroll
Mobile-first indexing: Google indexes mobile version. Mobile-first indexing is 100% complete as of July 5, 2024. Google now crawls and indexes ALL websites exclusively with the mobile Googlebot user-agent.

响应式设计：包含viewport元标签、使用响应式CSS
触摸目标：最小尺寸48x48px，间距至少8px
字体大小：基础字体最小16px
无水平滚动
移动端优先索引：Google优先索引移动端版本。截至2024年7月5日，移动端优先索引已全面完成。Google现在仅使用移动端Googlebot用户代理抓取和索引所有网站。

6. Core Web Vitals

LCP (Largest Contentful Paint): target <2.5s
INP (Interaction to Next Paint): target <200ms
- INP replaced FID on March 12, 2024. FID was fully removed from all Chrome tools (CrUX API, PageSpeed Insights, Lighthouse) on September 9, 2024. Do NOT reference FID anywhere.
CLS (Cumulative Layout Shift): target <0.1
Evaluation uses 75th percentile of real user data
Use PageSpeed Insights API or CrUX data if MCP available

LCP（最大内容绘制）：目标<2.5秒
INP（交互到下一次绘制）：目标<200ms
- INP已于2024年3月12日取代FID。2024年9月9日，FID已从所有Chrome工具（CrUX API、PageSpeed Insights、Lighthouse）中完全移除，请勿再提及FID。
CLS（累积布局偏移）：目标<0.1
评估采用真实用户数据的75百分位数值
若有MCP数据，可使用PageSpeed Insights API或CrUX数据进行评估

7. Structured Data

7. 结构化数据

Detection: JSON-LD (preferred), Microdata, RDFa
Validation against Google's supported types
See seo-schema skill for full analysis

检测：支持JSON-LD（推荐格式）、Microdata、RDFa
对照Google支持的类型进行验证
如需完整分析，可参考seo-schema技能

8. JavaScript Rendering

8. JavaScript渲染

Check if content visible in initial HTML vs requires JS
Identify client-side rendered (CSR) vs server-side rendered (SSR)
Flag SPA frameworks (React, Vue, Angular) that may cause indexing issues
Verify dynamic rendering setup if applicable

检查内容是在初始HTML中可见还是需要执行JS才能显示
区分客户端渲染（CSR）与服务端渲染（SSR）
标记可能导致索引问题的SPA框架（React、Vue、Angular）
若适用，验证动态渲染配置

JavaScript SEO — Canonical & Indexing Guidance (December 2025)

JavaScript SEO — 规范标签与索引指南（2025年12月）

Google updated its JavaScript SEO documentation in December 2025 with critical clarifications:

Canonical conflicts: If a canonical tag in raw HTML differs from one injected by JavaScript, Google may use EITHER one. Ensure canonical tags are identical between server-rendered HTML and JS-rendered output.
noindex with JavaScript: If raw HTML contains
```
<meta name="robots" content="noindex">
```
but JavaScript removes it, Google MAY still honor the noindex from raw HTML. Serve correct robots directives in the initial HTML response.
Non-200 status codes: Google does NOT render JavaScript on pages returning non-200 HTTP status codes. Any content or meta tags injected via JS on error pages will be invisible to Googlebot.
Structured data in JavaScript: Product, Article, and other structured data injected via JS may face delayed processing. For time-sensitive structured data (especially e-commerce Product markup), include it in the initial server-rendered HTML.

Best practice: Serve critical SEO elements (canonical, meta robots, structured data, title, meta description) in the initial server-rendered HTML rather than relying on JavaScript injection.

Google于2025年12月更新了JavaScript SEO文档，带来以下关键说明：

规范标签冲突： 如果原始HTML中的规范标签与JavaScript注入的规范标签不同，Google可能会使用其中任意一个。请确保服务端渲染的HTML与JS渲染输出中的规范标签完全一致。
JavaScript设置noindex： 如果原始HTML包含
```
<meta name="robots" content="noindex">
```
但JavaScript将其移除，Google仍可能遵循原始HTML中的noindex指令。请在初始HTML响应中提供正确的robots指令。
非200状态码： Google不会在返回非200 HTTP状态码的页面上渲染JavaScript。错误页面上通过JS注入的任何内容或元标签对Googlebot都是不可见的。
JavaScript中的结构化数据： 通过JS注入的商品、文章等结构化数据可能会延迟处理。对于时间敏感的结构化数据（尤其是电商商品标记），请将其包含在初始服务端渲染的HTML中。

最佳实践： 关键SEO元素（规范标签、robots元标签、结构化数据、标题、元描述）应在初始服务端渲染的HTML中提供，而非依赖JavaScript注入。

9. IndexNow Protocol

9. IndexNow协议

Check if site supports IndexNow for Bing, Yandex, Naver
Supported by search engines other than Google
Recommend implementation for faster indexing on non-Google engines

检查站点是否支持面向Bing、Yandex、Naver的IndexNow
除Google外的搜索引擎均支持该协议
建议实施以提升非Google引擎的索引速度

Output

输出结果

Technical Score: XX/100

技术得分：XX/100

Category Breakdown

类别细分评分

Category	Status	Score
Crawlability	✅/⚠️/❌	XX/100
Indexability	✅/⚠️/❌	XX/100
Security	✅/⚠️/❌	XX/100
URL Structure	✅/⚠️/❌	XX/100
Mobile	✅/⚠️/❌	XX/100
Core Web Vitals	✅/⚠️/❌	XX/100
Structured Data	✅/⚠️/❌	XX/100
JS Rendering	✅/⚠️/❌	XX/100

类别	状态	得分
可抓取性	✅/⚠️/❌	XX/100
可索引性	✅/⚠️/❌	XX/100
安全性	✅/⚠️/❌	XX/100
URL结构	✅/⚠️/❌	XX/100
移动端	✅/⚠️/❌	XX/100
Core Web Vitals	✅/⚠️/❌	XX/100
结构化数据	✅/⚠️/❌	XX/100
JS渲染	✅/⚠️/❌	XX/100

seo-technical

Original

Translation

Technical SEO Audit

技术SEO审计

Categories

审计类别

1. Crawlability

1. 可抓取性

AI Crawler Management

AI爬虫管理

Allow search indexing, block AI training crawlers

允许搜索索引，屏蔽AI训练爬虫

Allow all other crawlers (including Googlebot for search)

允许所有其他爬虫（包括用于搜索的Googlebot）

2. Indexability

2. 可索引性

3. Security

3. 安全性

4. URL Structure

4. URL结构

5. Mobile Optimization

5. 移动端优化

6. Core Web Vitals

6. Core Web Vitals

7. Structured Data

7. 结构化数据

8. JavaScript Rendering

8. JavaScript渲染

JavaScript SEO — Canonical & Indexing Guidance (December 2025)

JavaScript SEO — 规范标签与索引指南（2025年12月）

9. IndexNow Protocol

9. IndexNow协议

Output

输出结果

Technical Score: XX/100

技术得分：XX/100

Category Breakdown

类别细分评分

Critical Issues (fix immediately)

关键问题（立即修复）

High Priority (fix within 1 week)

高优先级（1周内修复）

Medium Priority (fix within 1 month)

中优先级（1个月内修复）

Low Priority (backlog)

低优先级（放入待办清单）