seo-technical
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseTechnical SEO Audit
技术SEO审计
Categories
审计类别
1. Crawlability
1. 可抓取性
- robots.txt: exists, valid, not blocking important resources
- XML sitemap: exists, referenced in robots.txt, valid format
- Noindex tags: intentional vs accidental
- Crawl depth: important pages within 3 clicks of homepage
- JavaScript rendering: check if critical content requires JS execution
- Crawl budget: for large sites (>10k pages), efficiency matters
- robots.txt:存在、有效,未屏蔽重要资源
- XML站点地图:存在、已在robots.txt中引用、格式有效
- Noindex标签:区分有意设置与意外设置
- 抓取深度:重要页面需在首页3次点击范围内
- JavaScript渲染:检查关键内容是否需要执行JS才能显示
- 抓取预算:针对大型站点(>10k页面),抓取效率至关重要
AI Crawler Management
AI爬虫管理
As of 2025-2026, AI companies actively crawl the web to train models and power AI search. Managing these crawlers via robots.txt is a critical technical SEO consideration.
Known AI crawlers:
| Crawler | Company | robots.txt token | Purpose |
|---|---|---|---|
| GPTBot | OpenAI | | Model training |
| ChatGPT-User | OpenAI | | Real-time browsing |
| ClaudeBot | Anthropic | | Model training |
| PerplexityBot | Perplexity | | Search index + training |
| Bytespider | ByteDance | | Model training |
| Google-Extended | | Gemini training (NOT search) | |
| CCBot | Common Crawl | | Open dataset |
Key distinctions:
- Blocking prevents Gemini training use but does NOT affect Google Search indexing or AI Overviews (those use
Google-Extended)Googlebot - Blocking prevents OpenAI training but does NOT prevent ChatGPT from citing your content via browsing (
GPTBot)ChatGPT-User - ~3-5% of websites now use AI-specific robots.txt rules
Example — selective AI crawler blocking:
undefined2025-2026年,AI企业正积极抓取网页以训练模型并支持AI搜索。通过robots.txt管理这些爬虫是技术SEO的关键考量因素。
已知AI爬虫:
| 爬虫名称 | 所属公司 | robots.txt标识 | 用途 |
|---|---|---|---|
| GPTBot | OpenAI | | 模型训练 |
| ChatGPT-User | OpenAI | | 实时浏览 |
| ClaudeBot | Anthropic | | 模型训练 |
| PerplexityBot | Perplexity | | 搜索索引+模型训练 |
| Bytespider | ByteDance | | 模型训练 |
| Google-Extended | | Gemini模型训练(非搜索用途) | |
| CCBot | Common Crawl | | 开放数据集构建 |
关键区别:
- 屏蔽会阻止Gemini使用站点内容训练模型,但不会影响Google搜索索引或AI概览(这些使用
Google-Extended)Googlebot - 屏蔽会阻止OpenAI使用站点内容训练模型,但不会阻止ChatGPT通过浏览功能引用你的内容(该功能使用
GPTBot)ChatGPT-User - 目前约3-5%的网站已使用AI专属的robots.txt规则
示例 — 选择性屏蔽AI爬虫:
undefinedAllow search indexing, block AI training crawlers
允许搜索索引,屏蔽AI训练爬虫
User-agent: GPTBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: GPTBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: Bytespider
Disallow: /
Allow all other crawlers (including Googlebot for search)
允许所有其他爬虫(包括用于搜索的Googlebot)
User-agent: *
Allow: /
**Recommendation:** Consider your AI visibility strategy before blocking. Being cited by AI systems drives brand awareness and referral traffic. Cross-reference the `seo-geo` skill for full AI visibility optimization.User-agent: *
Allow: /
**建议:** 在屏蔽前考虑你的AI可见性策略。被AI系统引用能提升品牌知名度并带来推荐流量。如需完整的AI可见性优化,可参考`seo-geo`技能。2. Indexability
2. 可索引性
- Canonical tags: self-referencing, no conflicts with noindex
- Duplicate content: near-duplicates, parameter URLs, www vs non-www
- Thin content: pages below minimum word counts per type
- Pagination: rel=next/prev or load-more pattern
- Hreflang: correct for multi-language/multi-region sites
- Index bloat: unnecessary pages consuming crawl budget
- 规范标签(Canonical):自引用,与noindex无冲突
- 重复内容:近似重复、带参数URL、www与非www域名重复
- 低质内容:页面字数低于对应类型的最低要求
- 分页:使用rel=next/prev或加载更多模式
- Hreflang:多语言/多区域站点需配置正确
- 索引膨胀:不必要的页面占用抓取预算
3. Security
3. 安全性
- HTTPS: enforced, valid SSL certificate, no mixed content
- Security headers:
- Content-Security-Policy (CSP)
- Strict-Transport-Security (HSTS)
- X-Frame-Options
- X-Content-Type-Options
- Referrer-Policy
- HSTS preload: check preload list inclusion for high-security sites
- HTTPS:强制启用、SSL证书有效、无混合内容
- 安全标头:
- Content-Security-Policy (CSP)
- Strict-Transport-Security (HSTS)
- X-Frame-Options
- X-Content-Type-Options
- Referrer-Policy
- HSTS预加载:高安全性站点需检查是否已加入预加载列表
4. URL Structure
4. URL结构
- Clean URLs: descriptive, hyphenated, no query parameters for content
- Hierarchy: logical folder structure reflecting site architecture
- Redirects: no chains (max 1 hop), 301 for permanent moves
- URL length: flag >100 characters
- Trailing slashes: consistent usage
- 简洁URL:描述性强、使用连字符分隔、内容页面无查询参数
- 层级结构:符合站点架构的逻辑文件夹结构
- 重定向:无链式重定向(最多1次跳转)、永久重定向使用301
- URL长度:标记超过100字符的URL
- 尾部斜杠:使用方式保持一致
5. Mobile Optimization
5. 移动端优化
- Responsive design: viewport meta tag, responsive CSS
- Touch targets: minimum 48x48px with 8px spacing
- Font size: minimum 16px base
- No horizontal scroll
- Mobile-first indexing: Google indexes mobile version. Mobile-first indexing is 100% complete as of July 5, 2024. Google now crawls and indexes ALL websites exclusively with the mobile Googlebot user-agent.
- 响应式设计:包含viewport元标签、使用响应式CSS
- 触摸目标:最小尺寸48x48px,间距至少8px
- 字体大小:基础字体最小16px
- 无水平滚动
- 移动端优先索引:Google优先索引移动端版本。截至2024年7月5日,移动端优先索引已全面完成。Google现在仅使用移动端Googlebot用户代理抓取和索引所有网站。
6. Core Web Vitals
6. Core Web Vitals
- LCP (Largest Contentful Paint): target <2.5s
- INP (Interaction to Next Paint): target <200ms
- INP replaced FID on March 12, 2024. FID was fully removed from all Chrome tools (CrUX API, PageSpeed Insights, Lighthouse) on September 9, 2024. Do NOT reference FID anywhere.
- CLS (Cumulative Layout Shift): target <0.1
- Evaluation uses 75th percentile of real user data
- Use PageSpeed Insights API or CrUX data if MCP available
- LCP(最大内容绘制):目标<2.5秒
- INP(交互到下一次绘制):目标<200ms
- INP已于2024年3月12日取代FID。2024年9月9日,FID已从所有Chrome工具(CrUX API、PageSpeed Insights、Lighthouse)中完全移除,请勿再提及FID。
- CLS(累积布局偏移):目标<0.1
- 评估采用真实用户数据的75百分位数值
- 若有MCP数据,可使用PageSpeed Insights API或CrUX数据进行评估
7. Structured Data
7. 结构化数据
- Detection: JSON-LD (preferred), Microdata, RDFa
- Validation against Google's supported types
- See seo-schema skill for full analysis
- 检测:支持JSON-LD(推荐格式)、Microdata、RDFa
- 对照Google支持的类型进行验证
- 如需完整分析,可参考seo-schema技能
8. JavaScript Rendering
8. JavaScript渲染
- Check if content visible in initial HTML vs requires JS
- Identify client-side rendered (CSR) vs server-side rendered (SSR)
- Flag SPA frameworks (React, Vue, Angular) that may cause indexing issues
- Verify dynamic rendering setup if applicable
- 检查内容是在初始HTML中可见还是需要执行JS才能显示
- 区分客户端渲染(CSR)与服务端渲染(SSR)
- 标记可能导致索引问题的SPA框架(React、Vue、Angular)
- 若适用,验证动态渲染配置
JavaScript SEO — Canonical & Indexing Guidance (December 2025)
JavaScript SEO — 规范标签与索引指南(2025年12月)
Google updated its JavaScript SEO documentation in December 2025 with critical clarifications:
- Canonical conflicts: If a canonical tag in raw HTML differs from one injected by JavaScript, Google may use EITHER one. Ensure canonical tags are identical between server-rendered HTML and JS-rendered output.
- noindex with JavaScript: If raw HTML contains but JavaScript removes it, Google MAY still honor the noindex from raw HTML. Serve correct robots directives in the initial HTML response.
<meta name="robots" content="noindex"> - Non-200 status codes: Google does NOT render JavaScript on pages returning non-200 HTTP status codes. Any content or meta tags injected via JS on error pages will be invisible to Googlebot.
- Structured data in JavaScript: Product, Article, and other structured data injected via JS may face delayed processing. For time-sensitive structured data (especially e-commerce Product markup), include it in the initial server-rendered HTML.
Best practice: Serve critical SEO elements (canonical, meta robots, structured data, title, meta description) in the initial server-rendered HTML rather than relying on JavaScript injection.
Google于2025年12月更新了JavaScript SEO文档,带来以下关键说明:
- 规范标签冲突: 如果原始HTML中的规范标签与JavaScript注入的规范标签不同,Google可能会使用其中任意一个。请确保服务端渲染的HTML与JS渲染输出中的规范标签完全一致。
- JavaScript设置noindex: 如果原始HTML包含但JavaScript将其移除,Google仍可能遵循原始HTML中的noindex指令。请在初始HTML响应中提供正确的robots指令。
<meta name="robots" content="noindex"> - 非200状态码: Google不会在返回非200 HTTP状态码的页面上渲染JavaScript。错误页面上通过JS注入的任何内容或元标签对Googlebot都是不可见的。
- JavaScript中的结构化数据: 通过JS注入的商品、文章等结构化数据可能会延迟处理。对于时间敏感的结构化数据(尤其是电商商品标记),请将其包含在初始服务端渲染的HTML中。
最佳实践: 关键SEO元素(规范标签、robots元标签、结构化数据、标题、元描述)应在初始服务端渲染的HTML中提供,而非依赖JavaScript注入。
9. IndexNow Protocol
9. IndexNow协议
- Check if site supports IndexNow for Bing, Yandex, Naver
- Supported by search engines other than Google
- Recommend implementation for faster indexing on non-Google engines
- 检查站点是否支持面向Bing、Yandex、Naver的IndexNow
- 除Google外的搜索引擎均支持该协议
- 建议实施以提升非Google引擎的索引速度
Output
输出结果
Technical Score: XX/100
技术得分:XX/100
Category Breakdown
类别细分评分
| Category | Status | Score |
|---|---|---|
| Crawlability | ✅/⚠️/❌ | XX/100 |
| Indexability | ✅/⚠️/❌ | XX/100 |
| Security | ✅/⚠️/❌ | XX/100 |
| URL Structure | ✅/⚠️/❌ | XX/100 |
| Mobile | ✅/⚠️/❌ | XX/100 |
| Core Web Vitals | ✅/⚠️/❌ | XX/100 |
| Structured Data | ✅/⚠️/❌ | XX/100 |
| JS Rendering | ✅/⚠️/❌ | XX/100 |
| 类别 | 状态 | 得分 |
|---|---|---|
| 可抓取性 | ✅/⚠️/❌ | XX/100 |
| 可索引性 | ✅/⚠️/❌ | XX/100 |
| 安全性 | ✅/⚠️/❌ | XX/100 |
| URL结构 | ✅/⚠️/❌ | XX/100 |
| 移动端 | ✅/⚠️/❌ | XX/100 |
| Core Web Vitals | ✅/⚠️/❌ | XX/100 |
| 结构化数据 | ✅/⚠️/❌ | XX/100 |
| JS渲染 | ✅/⚠️/❌ | XX/100 |