seo-fundamentals

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

SEO Fundamentals for Web Applications

Web应用SEO基础

Overview

概述

SEO (Search Engine Optimization) ensures search engines can discover, understand, and rank your content. Architecture decisions significantly impact SEO capability.

SEO（搜索引擎优化）确保搜索引擎能够发现、理解并对您的内容进行排名。架构决策会对SEO能力产生重大影响。

How Search Engines Work

搜索引擎的工作原理

The Three Phases

三个阶段

1. CRAWLING: Googlebot discovers URLs and downloads content
2. INDEXING: Google parses content and stores in search index
3. RANKING: Algorithm determines position in search results

1. CRAWLING: Googlebot discovers URLs and downloads content
2. INDEXING: Google parses content and stores in search index
3. RANKING: Algorithm determines position in search results

What Crawlers See

爬虫能看到什么

Crawlers primarily process the initial HTML response. JavaScript execution is:

Delayed (crawl queue, separate rendering queue)
Resource-intensive (limited render budget)
Not guaranteed for all pages

Your Server Response          What Googlebot Sees First
─────────────────────         ──────────────────────────
<!DOCTYPE html>               Same HTML (good for SSR/SSG)
<html>
<body>                        OR
  <div id="root"></div>       Empty div (bad for CSR)
  <script src="app.js">       Script tag (won't execute immediately)
</body>
</html>

爬虫主要处理初始HTML响应。JavaScript执行具有以下特点：

延迟执行（爬取队列与独立的渲染队列）
资源密集（渲染预算有限）
并非所有页面都能保证执行

Your Server Response          What Googlebot Sees First
─────────────────────         ──────────────────────────
<!DOCTYPE html>               Same HTML (good for SSR/SSG)
<html>
<body>                        OR
  <div id="root"></div>       Empty div (bad for CSR)
  <script src="app.js">       Script tag (won't execute immediately)
</body>
</html>

Architecture Impact on SEO

架构对SEO的影响

SEO by Rendering Pattern

不同渲染模式的SEO表现

Pattern	Initial HTML	SEO Quality	Notes
SSG	Complete	Excellent	Best for SEO
SSR	Complete	Excellent	Dynamic content, good SEO
ISR	Complete	Excellent	Fresh + fast
CSR	Empty shell	Poor	Requires workarounds
Streaming	Progressive	Good	Shell + streamed content

模式	初始HTML	SEO质量	说明
SSG	完整内容	优秀	SEO最佳选择
SSR	完整内容	优秀	支持动态内容，SEO表现良好
ISR	完整内容	优秀	兼顾内容新鲜度与加载速度
CSR	空壳结构	较差	需要采用变通方案
Streaming	渐进式内容	良好	先返回壳结构，再流式传输内容

The CSR Problem

CSR的问题

javascript

// What your React app renders client-side
<html>
  <head><title>My App</title></head>
  <body>
    <div id="root">
      <!-- JS renders content here AFTER page load -->
      <!-- Crawler may not wait for this -->
    </div>
  </body>
</html>

javascript

// What your React app renders client-side
<html>
  <head><title>My App</title></head>
  <body>
    <div id="root">
      <!-- JS renders content here AFTER page load -->
      <!-- Crawler may not wait for this -->
    </div>
  </body>
</html>

Solutions for CSR Apps

CSR应用的解决方案

Server-Side Rendering (SSR): Render on server, hydrate on client
Pre-rendering: Generate static HTML at build time for key pages
Dynamic Rendering: Serve pre-rendered HTML to bots, SPA to users (not recommended by Google)

服务器端渲染（SSR）：在服务器端渲染内容，客户端进行激活
预渲染：在构建时为关键页面生成静态HTML
动态渲染：向爬虫提供预渲染HTML，向用户提供SPA（Google不推荐此方案）

Technical SEO Elements

技术SEO要素

Essential Meta Tags

必备Meta标签

html

<head>
  <!-- Title: 50-60 characters, unique per page -->
  <title>Product Name - Category | Brand</title>
  
  <!-- Description: 150-160 characters -->
  <meta name="description" content="Compelling description with keywords">
  
  <!-- Canonical: Prevent duplicate content -->
  <link rel="canonical" href="https://example.com/page">
  
  <!-- Robots: Control indexing -->
  <meta name="robots" content="index, follow">
  
  <!-- Open Graph: Social sharing -->
  <meta property="og:title" content="Page Title">
  <meta property="og:description" content="Description">
  <meta property="og:image" content="https://example.com/image.jpg">
  
  <!-- Viewport: Mobile-friendliness -->
  <meta name="viewport" content="width=device-width, initial-scale=1">
</head>

html

<head>
  <!-- Title: 50-60 characters, unique per page -->
  <title>Product Name - Category | Brand</title>
  
  <!-- Description: 150-160 characters -->
  <meta name="description" content="Compelling description with keywords">
  
  <!-- Canonical: Prevent duplicate content -->
  <link rel="canonical" href="https://example.com/page">
  
  <!-- Robots: Control indexing -->
  <meta name="robots" content="index, follow">
  
  <!-- Open Graph: Social sharing -->
  <meta property="og:title" content="Page Title">
  <meta property="og:description" content="Description">
  <meta property="og:image" content="https://example.com/image.jpg">
  
  <!-- Viewport: Mobile-friendliness -->
  <meta name="viewport" content="width=device-width, initial-scale=1">
</head>

Structured Data (JSON-LD)

结构化数据（JSON-LD）

html

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Product",
  "name": "Product Name",
  "description": "Product description",
  "price": "29.99",
  "priceCurrency": "USD",
  "availability": "https://schema.org/InStock"
}
</script>

Common schema types: Product, Article, FAQ, BreadcrumbList, Organization, LocalBusiness

html

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Product",
  "name": "Product Name",
  "description": "Product description",
  "price": "29.99",
  "priceCurrency": "USD",
  "availability": "https://schema.org/InStock"
}
</script>

常见的Schema类型：Product（产品）、Article（文章）、FAQ（常见问题）、BreadcrumbList（面包屑导航）、Organization（组织）、LocalBusiness（本地商家）

Semantic HTML

语义化HTML

html

<!-- Good: Semantic structure -->
<main>
  <article>
    <header>
      <h1>Main Title</h1>
    </header>
    <section>
      <h2>Section Title</h2>
      <p>Content...</p>
    </section>
  </article>
  <aside>Related content</aside>
</main>

<!-- Bad: Div soup -->
<div class="main">
  <div class="title">Main Title</div>
  <div class="content">Content...</div>
</div>

html

<!-- Good: Semantic structure -->
<main>
  <article>
    <header>
      <h1>Main Title</h1>
    </header>
    <section>
      <h2>Section Title</h2>
      <p>Content...</p>
    </section>
  </article>
  <aside>Related content</aside>
</main>

<!-- Bad: Div soup -->
<div class="main">
  <div class="title">Main Title</div>
  <div class="content">Content...</div>
</div>

Core Web Vitals

Google's page experience metrics that affect ranking.

Google用于评估页面体验的指标，会影响排名。

The Three Metrics

三个核心指标

Metric	What It Measures	Good	Needs Improvement	Poor
LCP (Largest Contentful Paint)	Loading performance	≤2.5s	2.5-4s	>4s
INP (Interaction to Next Paint)	Interactivity	≤200ms	200-500ms	>500ms
CLS (Cumulative Layout Shift)	Visual stability	≤0.1	0.1-0.25	>0.25

指标	衡量内容	优秀	需要改进	较差
LCP (Largest Contentful Paint)	加载性能	≤2.5s	2.5-4s	>4s
INP (Interaction to Next Paint)	交互性	≤200ms	200-500ms	>500ms
CLS (Cumulative Layout Shift)	视觉稳定性	≤0.1	0.1-0.25	>0.25

Common Issues by Architecture

不同架构的常见问题

SPA/CSR Issues:

LCP: Slow due to JS loading before content
CLS: Content shifts as JS loads and renders

SSR Issues:

INP: Hydration can block interactivity
TTFB: Slow server response times

SSG Issues:

Generally best for Core Web Vitals
CLS: Still possible with lazy-loaded images

SPA/CSR问题：

LCP：因JS加载完成后才渲染内容，导致加载缓慢
CLS：JS加载并渲染时，内容发生偏移

SSR问题：

INP：激活过程可能阻塞交互
TTFB：服务器响应时间缓慢

SSG问题：

通常Core Web Vitals表现最佳
CLS：懒加载图片仍可能导致问题

Optimization Strategies

优化策略

Improve LCP:

html

<!-- Preload critical resources -->
<link rel="preload" href="hero-image.jpg" as="image">
<link rel="preload" href="critical.css" as="style">

<!-- Inline critical CSS -->
<style>/* Above-the-fold styles */</style>

<!-- Defer non-critical JS -->
<script defer src="app.js"></script>

Improve CLS:

html

<!-- Reserve space for images -->
<img src="photo.jpg" width="800" height="600" alt="Description">

<!-- Reserve space for ads/embeds -->
<div style="min-height: 250px;">
  <!-- Ad will load here -->
</div>

Improve INP:

Break up long tasks
Minimize hydration cost
Use
```
requestIdleCallback
```
for non-critical work

提升LCP：

html

<!-- Preload critical resources -->
<link rel="preload" href="hero-image.jpg" as="image">
<link rel="preload" href="critical.css" as="style">

<!-- Inline critical CSS -->
<style>/* Above-the-fold styles */</style>

<!-- Defer non-critical JS -->
<script defer src="app.js"></script>

提升CLS：

html

<!-- Reserve space for images -->
<img src="photo.jpg" width="800" height="600" alt="Description">

<!-- Reserve space for ads/embeds -->
<div style="min-height: 250px;">
  <!-- Ad will load here -->
</div>

提升INP：

拆分长任务
最小化激活成本
使用
```
requestIdleCallback
```
处理非关键任务

URL Structure

URL结构

Best Practices

最佳实践

Good URLs:
/products/blue-running-shoes
/blog/2024/seo-guide
/category/electronics/phones

Poor URLs:
/products?id=12345
/page.php?cat=1&sub=2
/p/abc123xyz

Good URLs:
/products/blue-running-shoes
/blog/2024/seo-guide
/category/electronics/phones

Poor URLs:
/products?id=12345
/page.php?cat=1&sub=2
/p/abc123xyz

URL Guidelines

URL指南

Use hyphens, not underscores
Keep URLs short and descriptive
Include target keywords naturally
Use lowercase only
Avoid query parameters for indexable content

使用连字符（-）而非下划线（_）
保持URL简短且具有描述性
自然融入目标关键词
仅使用小写字母
可索引内容避免使用查询参数

Sitemap and Robots.txt

站点地图与Robots.txt

XML Sitemap

XML站点地图

xml

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/page</loc>
    <lastmod>2024-01-15</lastmod>
    <changefreq>weekly</changefreq>
    <priority>0.8</priority>
  </url>
</urlset>

xml

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/page</loc>
    <lastmod>2024-01-15</lastmod>
    <changefreq>weekly</changefreq>
    <priority>0.8</priority>
  </url>
</urlset>

Robots.txt

User-agent: *
Allow: /
Disallow: /admin/
Disallow: /api/
Sitemap: https://example.com/sitemap.xml

User-agent: *
Allow: /
Disallow: /admin/
Disallow: /api/
Sitemap: https://example.com/sitemap.xml

SPA SEO Checklist

SPA SEO检查清单

For SPAs that need SEO:

Implement SSR or pre-rendering for SEO-critical pages
Ensure each route has unique meta tags (title, description)
Use semantic HTML structure
Implement proper heading hierarchy (h1 → h2 → h3)
Add structured data (JSON-LD)
Generate XML sitemap with all routes
Handle redirects server-side (301/302), not client-side
Implement canonical URLs
Ensure internal links are crawlable
```
<a href>
```
tags
Test with Google Search Console's URL Inspection tool

针对需要SEO的SPA：

为SEO关键页面实现SSR或预渲染
确保每个路由拥有唯一的Meta标签（标题、描述）
使用语义化HTML结构
实现正确的标题层级（h1 → h2 → h3）
添加结构化数据（JSON-LD）
生成包含所有路由的XML站点地图
在服务器端处理重定向（301/302），而非客户端
实现规范URL（Canonical）
确保内部链接为可爬取的
```
<a href>
```
标签
使用Google Search Console的URL检测工具进行测试

Common SEO Mistakes

常见SEO错误

Mistake	Problem	Solution
Client-side only meta tags	Crawler sees defaults	SSR or head management
JavaScript-only navigation	Links not crawlable	Use `<a href>` tags
Infinite scroll	Content not discoverable	Pagination or "Load More" with URLs
Hash-based routing	URLs not indexed	Use History API ( `/path` not `/#/path` )
Duplicate content	Diluted rankings	Canonical tags
Slow loading	Poor rankings	Optimize Core Web Vitals

错误	问题	解决方案
仅客户端Meta标签	爬虫只能看到默认内容	采用SSR或头部管理方案
纯JavaScript导航	链接无法被爬取	使用 `<a href>` 标签
无限滚动	内容无法被发现	采用分页或带URL的“加载更多”按钮
基于哈希的路由	URL无法被索引	使用History API（采用 `/path` 而非 `/#/path` ）
重复内容	排名权重被分散	使用规范标签（Canonical）
加载缓慢	排名表现差	优化Core Web Vitals

Testing SEO

SEO测试

Tools

工具

Google Search Console: Indexing status, issues
Lighthouse: Core Web Vitals, SEO audit
View Page Source: What crawler sees (not DevTools)
Google Rich Results Test: Structured data validation

Google Search Console：索引状态、问题排查
Lighthouse：Core Web Vitals检测、SEO审计
查看页面源代码：爬虫能看到的内容（而非开发者工具）
Google富媒体结果测试：结构化数据验证

Quick Checks

快速检查

bash

undefined

bash

undefined

See what Google sees

curl -A "Googlebot" https://example.com/page

Check robots.txt

curl https://example.com/robots.txt

View without JavaScript (approximate crawler view)

Disable JavaScript in browser DevTools

---

---

Deep Dive: Understanding Search Engines From First Principles

深入探讨：从原理理解搜索引擎

How Googlebot Actually Works

Googlebot的实际工作机制

Googlebot is not one crawler - it's a massive distributed system:

THE GOOGLE CRAWLING INFRASTRUCTURE:

┌──────────────────────────────────────────────────────────────┐
│                     URL FRONTIER                              │
│  (Priority queue of URLs to crawl - billions of entries)     │
│  Priority based on: PageRank, freshness, crawl budget        │
└─────────────────────────┬────────────────────────────────────┘
                          │
                          ▼
┌──────────────────────────────────────────────────────────────┐
│                  CRAWLER FLEET                                │
│  Thousands of servers making HTTP requests in parallel        │
│  Respects robots.txt, crawl-delay, politeness policies       │
└─────────────────────────┬────────────────────────────────────┘
                          │
                          ▼
┌──────────────────────────────────────────────────────────────┐
│               HTML PROCESSING QUEUE                           │
│  Parse HTML, extract text, links, metadata                    │
│  This is FAST - pure HTML parsing                             │
└─────────────────────────┬────────────────────────────────────┘
                          │
                          ▼
┌──────────────────────────────────────────────────────────────┐
│              RENDERING QUEUE (WRS)                            │
│  Web Rendering Service - headless Chrome                      │
│  Executes JavaScript for dynamic content                      │
│  This is SLOW and EXPENSIVE - limited capacity                │
└─────────────────────────┬────────────────────────────────────┘
                          │
                          ▼
┌──────────────────────────────────────────────────────────────┐
│                    INDEXER                                    │
│  Processes content, builds inverted index                     │
│  Maps words → documents for fast retrieval                    │
└──────────────────────────────────────────────────────────────┘

The critical insight: JavaScript rendering is a SEPARATE phase that happens LATER, if at all.

Googlebot并非单一爬虫，而是一个庞大的分布式系统：

THE GOOGLE CRAWLING INFRASTRUCTURE:

┌──────────────────────────────────────────────────────────────┐
│                     URL FRONTIER                              │
│  (Priority queue of URLs to crawl - billions of entries)     │
│  Priority based on: PageRank, freshness, crawl budget        │
└─────────────────────────┬────────────────────────────────────┘
                          │
                          ▼
┌──────────────────────────────────────────────────────────────┐
│                  CRAWLER FLEET                                │
│  Thousands of servers making HTTP requests in parallel        │
│  Respects robots.txt, crawl-delay, politeness policies       │
└─────────────────────────┬────────────────────────────────────┘
                          │
                          ▼
┌──────────────────────────────────────────────────────────────┐
│               HTML PROCESSING QUEUE                           │
│  Parse HTML, extract text, links, metadata                    │
│  This is FAST - pure HTML parsing                             │
└─────────────────────────┬────────────────────────────────────┘
                          │
                          ▼
┌──────────────────────────────────────────────────────────────┐
│              RENDERING QUEUE (WRS)                            │
│  Web Rendering Service - headless Chrome                      │
│  Executes JavaScript for dynamic content                      │
│  This is SLOW and EXPENSIVE - limited capacity                │
└─────────────────────────┬────────────────────────────────────┘
                          │
                          ▼
┌──────────────────────────────────────────────────────────────┐
│                    INDEXER                                    │
│  Processes content, builds inverted index                     │
│  Maps words → documents for fast retrieval                    │
└──────────────────────────────────────────────────────────────┘

关键结论： JavaScript渲染是一个独立的后续阶段，且不一定会执行。

What Happens When Googlebot Visits Your Page

Googlebot访问页面的完整流程

STEP 1: URL Discovery
- Found via: sitemap, internal links, external links, Search Console
- Added to URL Frontier with priority score

STEP 2: HTTP Request
- Googlebot requests your URL
- Sends headers: User-Agent: Googlebot, Accept-Language, etc.
- Follows redirects (301, 302, 307, 308)

STEP 3: Response Analysis
- HTTP status: 200, 404, 500, etc.
- Content-Type: text/html, application/json, etc.
- Headers: X-Robots-Tag, Canonical, etc.

STEP 4: HTML Parsing (IMMEDIATE)
GET /products/shoes HTTP/1.1
Host: example.com
User-Agent: Googlebot

Response:
<html>
<head>
  <title>Running Shoes | Example</title>
  <meta name="description" content="...">
</head>
<body>
  <div id="root">Loading...</div>
  <script src="/app.js"></script>  ← NOT EXECUTED YET
</body>
</html>

EXTRACTED IMMEDIATELY:
- Title: "Running Shoes | Example"
- Meta description
- Any visible text: "Loading..."
- Links for further crawling
- Nothing from JavaScript!

STEP 5: Rendering Queue (DELAYED)
- If page seems JS-dependent, added to rendering queue
- Could be minutes, hours, or days later
- Headless Chrome executes JavaScript
- Final DOM captured for indexing

STEP 1: URL Discovery
- 发现途径：站点地图、内部链接、外部链接、Search Console
- 被添加到URL Frontier并赋予优先级分数

STEP 2: HTTP Request
- Googlebot请求您的URL
- 发送请求头：User-Agent: Googlebot、Accept-Language等
- 遵循重定向规则（301、302、307、308）

STEP 3: Response Analysis
- HTTP状态码：200、404、500等
- Content-Type：text/html、application/json等
- 响应头：X-Robots-Tag、Canonical等

STEP 4: HTML Parsing (IMMEDIATE)
GET /products/shoes HTTP/1.1
Host: example.com
User-Agent: Googlebot

Response:
<html>
<head>
  <title>Running Shoes | Example</title>
  <meta name="description" content="...">
</head>
<body>
  <div id="root">Loading...</div>
  <script src="/app.js"></script>  ← NOT EXECUTED YET
</body>
</html>

EXTRACTED IMMEDIATELY:
- Title: "Running Shoes | Example"
- Meta description
- 可见文本："Loading..."
- 用于后续爬取的链接
- 无法获取任何JavaScript生成的内容！

STEP 5: Rendering Queue (DELAYED)
- 如果页面依赖JavaScript，会被添加到渲染队列
- 可能在数分钟、数小时甚至数天后才会执行
- 无头Chrome执行JavaScript
- 捕获最终DOM用于索引

The Rendering Budget Problem

渲染预算问题

Google allocates finite resources to rendering:

GOOGLE'S RENDERING CONSTRAINTS:

Total pages to render: ~billions
Rendering capacity: ~millions per day (estimated)
Your site's share: depends on "crawl budget"

CRAWL BUDGET FACTORS:
1. Site authority (PageRank-like signals)
2. Update frequency (how often content changes)
3. Server response time (fast = more crawling)
4. Errors encountered (errors = less crawling)

IMPLICATIONS:
- Large sites: Not all pages get rendered
- Low-authority sites: Lower priority in queue
- Slow sites: Fewer resources allocated
- Error-prone sites: Crawl budget wasted

RECOMMENDATION:
Don't RELY on rendering - serve complete HTML

Google为渲染分配的资源是有限的：

GOOGLE'S RENDERING CONSTRAINTS:

Total pages to render: ~billions
Rendering capacity: ~millions per day (estimated)
Your site's share: depends on "crawl budget"

CRAWL BUDGET FACTORS:
1. Site authority (PageRank-like signals)
2. Update frequency (how often content changes)
3. Server response time (fast = more crawling)
4. Errors encountered (errors = less crawling)

IMPLICATIONS:
- 大型站点：并非所有页面都会被渲染
- 低权重站点：在队列中的优先级较低
- 慢速站点：分配的资源更少
- 易出错站点：爬取预算被浪费

RECOMMENDATION:
Don't RELY on rendering - serve complete HTML

Understanding Indexing vs Ranking

区分索引与排名

Many developers confuse these:

INDEXING: Is your page in Google's database?
- Google knows the page exists
- Has parsed its content
- Stored in the index

Check: site:example.com/your-page
If it appears: indexed
If it doesn't: not indexed (or blocked)

RANKING: Where does your page appear in results?
- Page is indexed, now competing with millions of others
- Algorithm determines position
- 200+ ranking factors

A page can be:
✓ Indexed but ranking poorly (page 10+)
✓ Indexed but not ranking for your target keywords
✗ Not indexed at all (biggest problem for SPAs)

许多开发者会混淆这两个概念：

INDEXING: Is your page in Google's database?
- Google知道该页面存在
- 已解析页面内容
- 存储在索引中

Check: site:example.com/your-page
If it appears: indexed
If it doesn't: not indexed (or blocked)

RANKING: Where does your page appear in results?
- 页面已被索引，现在与数百万其他页面竞争
- 算法决定排名位置
- 涉及200+排名因素

A page can be:
✓ Indexed but ranking poorly (page 10+)
✓ Indexed but not ranking for your target keywords
✗ Not indexed at all (biggest problem for SPAs)

How Google Processes JavaScript SPAs

Google如何处理JavaScript SPA

javascript

// YOUR REACT SPA:
// Server returns:
<!DOCTYPE html>
<html>
<head>
  <title>My App</title>  <!-- Google sees this -->
</head>
<body>
  <div id="root"></div>  <!-- Google sees EMPTY DIV -->
  <script src="/bundle.js"></script>
</body>
</html>

// PHASE 1: HTML PARSING (immediate)
// Google extracts:
// - Title: "My App"
// - Body text: "" (empty)
// - Links: none found in content

// PHASE 2: RENDERING (delayed)
// Hours or days later, if ever:
// - Chrome loads page
// - Executes bundle.js
// - React renders into #root
// - Final HTML captured:

<div id="root">
  <header>...</header>
  <main>
    <h1>Welcome to My App</h1>
    <p>Content that was invisible before</p>
  </main>
</div>

// NOW Google can index the real content
// But this delay means:
// - Time-sensitive content may be stale
// - Pages might rank for "Loading..." text
// - Some pages may never get rendered

javascript

// YOUR REACT SPA:
// Server returns:
<!DOCTYPE html>
<html>
<head>
  <title>My App</title>  <!-- Google sees this -->
</head>
<body>
  <div id="root"></div>  <!-- Google sees EMPTY DIV -->
  <script src="/bundle.js"></script>
</body>
</html>

// PHASE 1: HTML PARSING (immediate)
// Google extracts:
// - Title: "My App"
// - Body text: "" (empty)
// - Links: none found in content

// PHASE 2: RENDERING (delayed)
// Hours or days later, if ever:
// - Chrome loads page
// - Executes bundle.js
// - React renders into #root
// - Final HTML captured:

<div id="root">
  <header>...</header>
  <main>
    <h1>Welcome to My App</h1>
    <p>Content that was invisible before</p>
  </main>
</div>

// NOW Google can index the real content
// But this delay means:
// - Time-sensitive content may be stale
// - Pages might rank for "Loading..." text
// - Some pages may never get rendered

The Two-Wave Indexing Phenomenon

两阶段索引现象

SPAs often show strange indexing behavior:

WAVE 1: HTML-only indexing
- Title and meta description captured
- Body appears empty or "Loading..."
- May rank for title keywords only
- Incomplete representation in SERPs

WAVE 2: Post-render indexing (if it happens)
- Full content now visible
- Rankings may change dramatically
- Could take days or weeks

OBSERVABLE SYMPTOMS:
- Search result shows "Loading..." as snippet
- Page ranks for title but not body content
- Search Console shows "Page is not indexed" then later "Indexed"
- Rankings fluctuate as rendering catches up

SPA通常会出现奇怪的索引行为：

WAVE 1: HTML-only indexing
- 标题和Meta描述被捕获
- 正文显示为空或“Loading...”
- 可能仅针对标题关键词排名
- 在搜索结果中的展示不完整

WAVE 2: Post-render indexing (if it happens)
- 完整内容现在可见
- 排名可能发生巨大变化
- 可能需要数天或数周时间

OBSERVABLE SYMPTOMS:
- 搜索结果的摘要显示“Loading...”
- 页面针对标题关键词排名，但不针对正文内容
- Search Console先显示“页面未被索引”，之后显示“已索引”
- 排名随着渲染进度波动

Core Web Vitals: The Technical Details

Core Web Vitals：技术细节

Understanding how metrics are measured:

javascript

// LARGEST CONTENTFUL PAINT (LCP)
// Measures: When largest visible content renders
// Elements considered: images, videos, block-level text

// Browser tracks LCP candidates:
// t=0ms:    Navigation starts
// t=100ms:  First text paints (small heading) - LCP candidate 1
// t=500ms:  Hero image loads - LCP candidate 2 (larger, replaces)
// t=2500ms: No more updates - final LCP = 500ms ✓ GOOD

// LCP KILLERS:
// - Slow server response (TTFB)
// - Render-blocking JavaScript
// - Slow image loading
// - Client-side rendering (content waits for JS)


// INTERACTION TO NEXT PAINT (INP)
// Measures: Responsiveness to user input
// Captures: click, tap, keypress → visual update

// How it works:
// 1. User clicks button
// 2. Browser creates "click" event
// 3. Your JavaScript handler runs (event processing time)
// 4. React re-renders (presentation delay)
// 5. Browser paints the update
// 6. INP = time from click to paint complete

// INP KILLERS:
// - Long JavaScript tasks (>50ms)
// - Hydration blocking main thread
// - Heavy re-renders
// - Too many event listeners


// CUMULATIVE LAYOUT SHIFT (CLS)
// Measures: Visual stability (unexpected movement)
// Calculated: impact fraction × distance fraction

// Example of bad CLS:
// t=0ms:    Heading renders at y=0
// t=500ms:  Ad loads above heading, pushes it to y=250px
// Impact: 100% of viewport affected
// Distance: 250px / viewport height

// CLS KILLERS:
// - Images without dimensions
// - Ads/embeds without reserved space
// - Dynamically injected content
// - Web fonts causing text resize

了解指标的测量方式：

javascript

// LARGEST CONTENTFUL PAINT (LCP)
// Measures: When largest visible content renders
// Elements considered: images, videos, block-level text

// Browser tracks LCP candidates:
// t=0ms:    Navigation starts
// t=100ms:  First text paints (small heading) - LCP candidate 1
// t=500ms:  Hero image loads - LCP candidate 2 (larger, replaces)
// t=2500ms: No more updates - final LCP = 500ms ✓ GOOD

// LCP KILLERS:
// - Slow server response (TTFB)
// - Render-blocking JavaScript
// - Slow image loading
// - Client-side rendering (content waits for JS)


// INTERACTION TO NEXT PAINT (INP)
// Measures: Responsiveness to user input
// Captures: click, tap, keypress → visual update

// How it works:
// 1. User clicks button
// 2. Browser creates "click" event
// 3. Your JavaScript handler runs (event processing time)
// 4. React re-renders (presentation delay)
// 5. Browser paints the update
// 6. INP = time from click to paint complete

// INP KILLERS:
// - Long JavaScript tasks (>50ms)
// - Hydration blocking main thread
// - Heavy re-renders
// - Too many event listeners


// CUMULATIVE LAYOUT SHIFT (CLS)
// Measures: Visual stability (unexpected movement)
// Calculated: impact fraction × distance fraction

// Example of bad CLS:
// t=0ms:    Heading renders at y=0
// t=500ms:  Ad loads above heading, pushes it to y=250px
// Impact: 100% of viewport affected
// Distance: 250px / viewport height

// CLS KILLERS:
// - Images without dimensions
// - Ads/embeds without reserved space
// - Dynamically injected content
// - Web fonts causing text resize

How Google Evaluates Page Quality

Google如何评估页面质量

Beyond technical SEO, Google assesses quality:

E-E-A-T SIGNALS (Experience, Expertise, Authoritativeness, Trust):

EXPERIENCE:
- Does content show first-hand experience?
- Product reviews: Did you actually use it?
- Travel guides: Did you actually visit?

EXPERTISE:
- Is the author qualified to write this?
- For medical content: Is author a doctor?
- For legal content: Is author a lawyer?

AUTHORITATIVENESS:
- Is this site a known authority?
- Do other sites link to it?
- Is it cited in the industry?

TRUSTWORTHINESS:
- Secure connection (HTTPS)?
- Clear contact information?
- No deceptive practices?

HOW GOOGLE MEASURES:
- External links (authority)
- Author bios and credentials
- Site reputation
- User behavior signals
- Content accuracy (fact-checking)

除了技术SEO，Google还会评估页面质量：

E-E-A-T SIGNALS (Experience, Expertise, Authoritativeness, Trust):

EXPERIENCE:
- Does content show first-hand experience?
- Product reviews: Did you actually use it?
- Travel guides: Did you actually visit?

EXPERTISE:
- Is the author qualified to write this?
- For medical content: Is author a doctor?
- For legal content: Is author a lawyer?

AUTHORITATIVENESS:
- Is this site a known authority?
- Do other sites link to it?
- Is it cited in the industry?

TRUSTWORTHINESS:
- Secure connection (HTTPS)?
- Clear contact information?
- No deceptive practices?

HOW GOOGLE MEASURES:
- External links (authority)
- Author bios and credentials
- Site reputation
- User behavior signals
- Content accuracy (fact-checking)

Canonical URLs: Preventing Duplicate Content

规范URL：避免重复内容

Duplicate content confuses Google:

SCENARIO: Same product at multiple URLs

/products/shoes
/products/shoes?color=red
/products/shoes?color=red&size=10
/products/shoes?utm_source=facebook

PROBLEM:
- Google sees 4 different "pages"
- Splits ranking signals across them
- May pick wrong one as "canonical"

SOLUTION: Canonical tags

<link rel="canonical" href="https://example.com/products/shoes" />

EVERY variant should point to THE ONE canonical URL.

HOW GOOGLE USES CANONICAL:
1. Sees multiple URLs with same/similar content
2. Checks for canonical tag
3. Consolidates signals to canonical URL
4. Returns canonical URL in search results

CANONICAL RULES:
- Self-referencing canonicals are GOOD (each page points to itself)
- Cross-domain canonicals work (if you have duplicate on 2 domains)
- Canonical is a HINT, not directive (Google may ignore)
- Conflicting signals = Google chooses (may be wrong)

重复内容会让Google产生困惑：

SCENARIO: Same product at multiple URLs

/products/shoes
/products/shoes?color=red
/products/shoes?color=red&size=10
/products/shoes?utm_source=facebook

PROBLEM:
- Google sees 4 different "pages"
- Splits ranking signals across them
- May pick wrong one as "canonical"

SOLUTION: Canonical tags

<link rel="canonical" href="https://example.com/products/shoes" />

EVERY variant should point to THE ONE canonical URL.

HOW GOOGLE USES CANONICAL:
1. Sees multiple URLs with same/similar content
2. Checks for canonical tag
3. Consolidates signals to canonical URL
4. Returns canonical URL in search results

CANONICAL RULES:
- Self-referencing canonicals are GOOD (each page points to itself)
- Cross-domain canonicals work (if you have duplicate on 2 domains)
- Canonical is a HINT, not directive (Google may ignore)
- Conflicting signals = Google chooses (may be wrong)

Structured Data: How Machines Understand Content

结构化数据：让机器理解内容

Structured data helps Google understand meaning:

javascript

// WITHOUT STRUCTURED DATA:
// Google sees text: "Nike Air Max - $129.99 - In Stock"
// Google has to GUESS: Is this a product? What's the price?

// WITH STRUCTURED DATA:
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Product",
  "name": "Nike Air Max",
  "offers": {
    "@type": "Offer",
    "price": "129.99",
    "priceCurrency": "USD",
    "availability": "https://schema.org/InStock"
  }
}
</script>

// NOW Google KNOWS:
// - This is a Product (not an article, event, etc.)
// - The name is "Nike Air Max"
// - It costs $129.99 USD
// - It's in stock

// BENEFITS:
// - Rich snippets in search results (stars, prices, availability)
// - Product panels in shopping results
// - Voice assistant answers
// - Google Merchant Center integration

结构化数据帮助Google理解内容的含义：

javascript

// WITHOUT STRUCTURED DATA:
// Google sees text: "Nike Air Max - $129.99 - In Stock"
// Google has to GUESS: Is this a product? What's the price?

// WITH STRUCTURED DATA:
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Product",
  "name": "Nike Air Max",
  "offers": {
    "@type": "Offer",
    "price": "129.99",
    "priceCurrency": "USD",
    "availability": "https://schema.org/InStock"
  }
}
</script>

// NOW Google KNOWS:
// - This is a Product (not an article, event, etc.)
// - The name is "Nike Air Max"
// - It costs $129.99 USD
// - It's in stock

// BENEFITS:
// - Rich snippets in search results (stars, prices, availability)
// - Product panels in shopping results
// - Voice assistant answers
// - Google Merchant Center integration

The JavaScript SEO Testing Protocol

JavaScript SEO测试流程

How to verify your SPA is SEO-ready:

bash

undefined

如何验证您的SPA是否做好了SEO准备：

bash

undefined

STEP 1: View raw HTML (what crawler sees first)

curl -s https://yoursite.com/page | head -100

Look for:

- Real content in HTML

- Not just <div id="root"></div>

- Proper <title> and <meta description>

STEP 2: Compare rendered vs raw

In Chrome DevTools:

- View Page Source (raw HTML)

- Inspect Element (rendered DOM)

- If they're very different = SEO risk

STEP 3: Google's Mobile-Friendly Test

https://search.google.com/test/mobile-friendly

Shows JavaScript-rendered version

Reveals what Google actually sees

STEP 4: URL Inspection Tool (Search Console)

Shows exactly how Google indexed your page

"View Crawled Page" shows HTML Google has

"View Tested Page" shows rendered version

STEP 5: Site search

site:yoursite.com/specific-page

If it appears with wrong snippet = indexing issue

If it doesn't appear = not indexed

undefined

undefined

Real-World SEO Architecture Patterns

实际SEO架构模式

PATTERN 1: SSR/SSG FOR ALL (Safest)
- Every page server-rendered
- No JavaScript rendering dependencies
- Works for all crawlers
- Best for: content sites, e-commerce

PATTERN 2: HYBRID (Practical)
- Public pages: SSR/SSG (SEO critical)
- Authenticated pages: CSR (no SEO needed)
- Example:
  / → SSG
  /products/* → ISR
  /blog/* → SSG
  /dashboard/* → CSR (behind login)

PATTERN 3: EDGE RENDERING (Modern)
- Render at CDN edge for speed
- Still SSR, but geographically distributed
- Best for: global sites, performance critical

PATTERN 4: STREAMING SSR (Advanced)
- Stream HTML progressively
- Critical content first
- Non-critical streams later
- Best for: large pages, slow data sources

ANTI-PATTERN: CSR FOR PUBLIC CONTENT
- Hoping Google will render JavaScript
- Relying on "they support JS now"
- Will have inconsistent indexing
- May rank poorly vs competitors

PATTERN 1: SSR/SSG FOR ALL (Safest)
- Every page server-rendered
- No JavaScript rendering dependencies
- Works for all crawlers
- Best for: content sites, e-commerce

PATTERN 2: HYBRID (Practical)
- Public pages: SSR/SSG (SEO critical)
- Authenticated pages: CSR (no SEO needed)
- Example:
  / → SSG
  /products/* → ISR
  /blog/* → SSG
  /dashboard/* → CSR (behind login)

PATTERN 3: EDGE RENDERING (Modern)
- Render at CDN edge for speed
- Still SSR, but geographically distributed
- Best for: global sites, performance critical

PATTERN 4: STREAMING SSR (Advanced)
- Stream HTML progressively
- Critical content first
- Non-critical streams later
- Best for: large pages, slow data sources

ANTI-PATTERN: CSR FOR PUBLIC CONTENT
- Hoping Google will render JavaScript
- Relying on "they support JS now"
- Will have inconsistent indexing
- May rank poorly vs competitors

For Framework Authors: Building SEO Systems

框架开发者指南：构建SEO系统

Implementation Note: The patterns and code examples below represent one proven approach to building SEO systems. Head management can be handled via components (React Helmet), framework APIs (Next.js Metadata), or universal libraries (unhead). The direction shown here provides core concepts—adapt based on your framework's rendering model, SSR implementation, and whether you need streaming support.

实现说明：以下模式和代码示例代表了一种经过验证的SEO系统构建方法。头部管理可通过组件（如React Helmet）、框架API（如Next.js Metadata）或通用库（如unhead）实现。此处展示的方向提供了核心概念——请根据您的框架渲染模型、SSR实现以及是否需要流式传输支持进行调整。

Implementing Head Management

实现头部管理

javascript

// DOCUMENT HEAD MANAGEMENT

class HeadManager {
  constructor() {
    this.tags = new Map();
    this.order = [];
  }
  
  // Add or update a head tag
  setTag(id, tag) {
    if (!this.tags.has(id)) {
      this.order.push(id);
    }
    this.tags.set(id, tag);
  }
  
  // Remove a tag
  removeTag(id) {
    this.tags.delete(id);
    this.order = this.order.filter(i => i !== id);
  }
  
  // Render to string (for SSR)
  toString() {
    return this.order
      .map(id => this.renderTag(this.tags.get(id)))
      .join('\n');
  }
  
  renderTag(tag) {
    const { type, ...attrs } = tag;
    
    if (type === 'title') {
      return `<title>${escapeHtml(attrs.children)}</title>`;
    }
    
    const attrStr = Object.entries(attrs)
      .filter(([k]) => k !== 'children')
      .map(([k, v]) => `${k}="${escapeHtml(v)}"`)
      .join(' ');
    
    if (tag.children) {
      return `<${type} ${attrStr}>${tag.children}</${type}>`;
    }
    
    return `<${type} ${attrStr}>`;
  }
  
  // Apply to DOM (for client-side)
  applyToDOM() {
    const head = document.head;
    
    for (const [id, tag] of this.tags) {
      let existing = head.querySelector(`[data-head-id="${id}"]`);
      
      if (!existing) {
        existing = document.createElement(tag.type);
        existing.setAttribute('data-head-id', id);
        head.appendChild(existing);
      }
      
      // Update attributes
      for (const [key, value] of Object.entries(tag)) {
        if (key === 'type') continue;
        if (key === 'children') {
          existing.textContent = value;
        } else {
          existing.setAttribute(key, value);
        }
      }
    }
  }
}

// React hook for head management
function useHead(tags) {
  const headManager = useContext(HeadContext);
  const id = useId();
  
  useEffect(() => {
    // Apply on mount
    Object.entries(tags).forEach(([key, value]) => {
      headManager.setTag(`${id}-${key}`, value);
    });
    headManager.applyToDOM();
    
    // Cleanup on unmount
    return () => {
      Object.keys(tags).forEach(key => {
        headManager.removeTag(`${id}-${key}`);
      });
      headManager.applyToDOM();
    };
  }, [JSON.stringify(tags)]);
}

// Usage
function ProductPage({ product }) {
  useHead({
    title: { type: 'title', children: `${product.name} | Store` },
    description: { type: 'meta', name: 'description', content: product.summary },
    ogTitle: { type: 'meta', property: 'og:title', content: product.name },
    ogImage: { type: 'meta', property: 'og:image', content: product.image },
  });
  
  return <div>{/* ... */}</div>;
}

javascript

// DOCUMENT HEAD MANAGEMENT

class HeadManager {
  constructor() {
    this.tags = new Map();
    this.order = [];
  }
  
  // Add or update a head tag
  setTag(id, tag) {
    if (!this.tags.has(id)) {
      this.order.push(id);
    }
    this.tags.set(id, tag);
  }
  
  // Remove a tag
  removeTag(id) {
    this.tags.delete(id);
    this.order = this.order.filter(i => i !== id);
  }
  
  // Render to string (for SSR)
  toString() {
    return this.order
      .map(id => this.renderTag(this.tags.get(id)))
      .join('\n');
  }
  
  renderTag(tag) {
    const { type, ...attrs } = tag;
    
    if (type === 'title') {
      return `<title>${escapeHtml(attrs.children)}</title>`;
    }
    
    const attrStr = Object.entries(attrs)
      .filter(([k]) => k !== 'children')
      .map(([k, v]) => `${k}="${escapeHtml(v)}"`)
      .join(' ');
    
    if (tag.children) {
      return `<${type} ${attrStr}>${tag.children}</${type}>`;
    }
    
    return `<${type} ${attrStr}>`;
  }
  
  // Apply to DOM (for client-side)
  applyToDOM() {
    const head = document.head;
    
    for (const [id, tag] of this.tags) {
      let existing = head.querySelector(`[data-head-id="${id}"]`);
      
      if (!existing) {
        existing = document.createElement(tag.type);
        existing.setAttribute('data-head-id', id);
        head.appendChild(existing);
      }
      
      // Update attributes
      for (const [key, value] of Object.entries(tag)) {
        if (key === 'type') continue;
        if (key === 'children') {
          existing.textContent = value;
        } else {
          existing.setAttribute(key, value);
        }
      }
    }
  }
}

// React hook for head management
function useHead(tags) {
  const headManager = useContext(HeadContext);
  const id = useId();
  
  useEffect(() => {
    // Apply on mount
    Object.entries(tags).forEach(([key, value]) => {
      headManager.setTag(`${id}-${key}`, value);
    });
    headManager.applyToDOM();
    
    // Cleanup on unmount
    return () => {
      Object.keys(tags).forEach(key => {
        headManager.removeTag(`${id}-${key}`);
      });
      headManager.applyToDOM();
    };
  }, [JSON.stringify(tags)]);
}

// Usage
function ProductPage({ product }) {
  useHead({
    title: { type: 'title', children: `${product.name} | Store` },
    description: { type: 'meta', name: 'description', content: product.summary },
    ogTitle: { type: 'meta', property: 'og:title', content: product.name },
    ogImage: { type: 'meta', property: 'og:image', content: product.image },
  });
  
  return <div>{/* ... */}</div>;
}

Building Structured Data Injection

构建结构化数据注入系统

javascript

// STRUCTURED DATA (JSON-LD) SYSTEM

class StructuredDataManager {
  constructor() {
    this.schemas = new Map();
  }
  
  // Register schema for a route
  setSchema(id, schema) {
    this.schemas.set(id, schema);
  }
  
  // Build JSON-LD from data
  buildProductSchema(product) {
    return {
      '@context': 'https://schema.org',
      '@type': 'Product',
      name: product.name,
      description: product.description,
      image: product.images,
      sku: product.sku,
      brand: {
        '@type': 'Brand',
        name: product.brand,
      },
      offers: {
        '@type': 'Offer',
        url: product.url,
        priceCurrency: product.currency,
        price: product.price,
        availability: product.inStock 
          ? 'https://schema.org/InStock' 
          : 'https://schema.org/OutOfStock',
      },
      aggregateRating: product.rating ? {
        '@type': 'AggregateRating',
        ratingValue: product.rating,
        reviewCount: product.reviewCount,
      } : undefined,
    };
  }
  
  buildBreadcrumbSchema(breadcrumbs) {
    return {
      '@context': 'https://schema.org',
      '@type': 'BreadcrumbList',
      itemListElement: breadcrumbs.map((crumb, i) => ({
        '@type': 'ListItem',
        position: i + 1,
        item: {
          '@id': crumb.url,
          name: crumb.name,
        },
      })),
    };
  }
  
  buildArticleSchema(article) {
    return {
      '@context': 'https://schema.org',
      '@type': 'Article',
      headline: article.title,
      description: article.excerpt,
      image: article.image,
      author: {
        '@type': 'Person',
        name: article.author.name,
        url: article.author.url,
      },
      publisher: {
        '@type': 'Organization',
        name: article.publisher.name,
        logo: {
          '@type': 'ImageObject',
          url: article.publisher.logo,
        },
      },
      datePublished: article.publishedAt,
      dateModified: article.updatedAt,
    };
  }
  
  // Render all schemas
  toString() {
    const schemas = Array.from(this.schemas.values());
    if (schemas.length === 0) return '';
    
    const combined = schemas.length === 1 
      ? schemas[0] 
      : { '@context': 'https://schema.org', '@graph': schemas };
    
    return `<script type="application/ld+json">${
      JSON.stringify(combined).replace(/</g, '\\u003c')
    }</script>`;
  }
}

javascript

// STRUCTURED DATA (JSON-LD) SYSTEM

class StructuredDataManager {
  constructor() {
    this.schemas = new Map();
  }
  
  // Register schema for a route
  setSchema(id, schema) {
    this.schemas.set(id, schema);
  }
  
  // Build JSON-LD from data
  buildProductSchema(product) {
    return {
      '@context': 'https://schema.org',
      '@type': 'Product',
      name: product.name,
      description: product.description,
      image: product.images,
      sku: product.sku,
      brand: {
        '@type': 'Brand',
        name: product.brand,
      },
      offers: {
        '@type': 'Offer',
        url: product.url,
        priceCurrency: product.currency,
        price: product.price,
        availability: product.inStock 
          ? 'https://schema.org/InStock' 
          : 'https://schema.org/OutOfStock',
      },
      aggregateRating: product.rating ? {
        '@type': 'AggregateRating',
        ratingValue: product.rating,
        reviewCount: product.reviewCount,
      } : undefined,
    };
  }
  
  buildBreadcrumbSchema(breadcrumbs) {
    return {
      '@context': 'https://schema.org',
      '@type': 'BreadcrumbList',
      itemListElement: breadcrumbs.map((crumb, i) => ({
        '@type': 'ListItem',
        position: i + 1,
        item: {
          '@id': crumb.url,
          name: crumb.name,
        },
      })),
    };
  }
  
  buildArticleSchema(article) {
    return {
      '@context': 'https://schema.org',
      '@type': 'Article',
      headline: article.title,
      description: article.excerpt,
      image: article.image,
      author: {
        '@type': 'Person',
        name: article.author.name,
        url: article.author.url,
      },
      publisher: {
        '@type': 'Organization',
        name: article.publisher.name,
        logo: {
          '@type': 'ImageObject',
          url: article.publisher.logo,
        },
      },
      datePublished: article.publishedAt,
      dateModified: article.updatedAt,
    };
  }
  
  // Render all schemas
  toString() {
    const schemas = Array.from(this.schemas.values());
    if (schemas.length === 0) return '';
    
    const combined = schemas.length === 1 
      ? schemas[0] 
      : { '@context': 'https://schema.org', '@graph': schemas };
    
    return `<script type="application/ld+json">${
      JSON.stringify(combined).replace(/</g, '\\u003c')
    }</script>`;
  }
}

Sitemap Generation

站点地图生成

javascript

// SITEMAP GENERATOR

async function generateSitemap(config) {
  const { baseUrl, routes, outputPath } = config;
  
  const urls = [];
  
  for (const route of routes) {
    // Static routes
    if (!route.isDynamic) {
      urls.push({
        loc: `${baseUrl}${route.path}`,
        lastmod: route.lastModified || new Date().toISOString(),
        changefreq: route.changefreq || 'weekly',
        priority: route.priority || 0.7,
      });
      continue;
    }
    
    // Dynamic routes - get all paths
    if (route.getStaticPaths) {
      const paths = await route.getStaticPaths();
      for (const path of paths) {
        const fullPath = route.path.replace(
          /\[(\w+)\]/g,
          (_, param) => path.params[param]
        );
        urls.push({
          loc: `${baseUrl}${fullPath}`,
          lastmod: path.lastModified || new Date().toISOString(),
          changefreq: path.changefreq || 'weekly',
          priority: path.priority || 0.5,
        });
      }
    }
  }
  
  // Generate XML
  const xml = `<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
${urls.map(url => `  <url>
    <loc>${escapeXml(url.loc)}</loc>
    <lastmod>${url.lastmod}</lastmod>
    <changefreq>${url.changefreq}</changefreq>
    <priority>${url.priority}</priority>
  </url>`).join('\n')}
</urlset>`;
  
  await fs.writeFile(outputPath, xml);
  
  // Generate sitemap index if too large
  if (urls.length > 50000) {
    return generateSitemapIndex(urls, config);
  }
  
  return xml;
}

// Robots.txt generator
function generateRobotsTxt(config) {
  const { baseUrl, disallow = [], sitemap = true } = config;
  
  let content = `User-agent: *\n`;
  
  for (const path of disallow) {
    content += `Disallow: ${path}\n`;
  }
  
  if (sitemap) {
    content += `\nSitemap: ${baseUrl}/sitemap.xml\n`;
  }
  
  return content;
}

javascript

// SITEMAP GENERATOR

async function generateSitemap(config) {
  const { baseUrl, routes, outputPath } = config;
  
  const urls = [];
  
  for (const route of routes) {
    // Static routes
    if (!route.isDynamic) {
      urls.push({
        loc: `${baseUrl}${route.path}`,
        lastmod: route.lastModified || new Date().toISOString(),
        changefreq: route.changefreq || 'weekly',
        priority: route.priority || 0.7,
      });
      continue;
    }
    
    // Dynamic routes - get all paths
    if (route.getStaticPaths) {
      const paths = await route.getStaticPaths();
      for (const path of paths) {
        const fullPath = route.path.replace(
          /\[(\w+)\]/g,
          (_, param) => path.params[param]
        );
        urls.push({
          loc: `${baseUrl}${fullPath}`,
          lastmod: path.lastModified || new Date().toISOString(),
          changefreq: path.changefreq || 'weekly',
          priority: path.priority || 0.5,
        });
      }
    }
  }
  
  // Generate XML
  const xml = `<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
${urls.map(url => `  <url>
    <loc>${escapeXml(url.loc)}</loc>
    <lastmod>${url.lastmod}</lastmod>
    <changefreq>${url.changefreq}</changefreq>
    <priority>${url.priority}</priority>
  </url>`).join('\n')}
</urlset>`;
  
  await fs.writeFile(outputPath, xml);
  
  // Generate sitemap index if too large
  if (urls.length > 50000) {
    return generateSitemapIndex(urls, config);
  }
  
  return xml;
}

// Robots.txt generator
function generateRobotsTxt(config) {
  const { baseUrl, disallow = [], sitemap = true } = config;
  
  let content = `User-agent: *\n`;
  
  for (const path of disallow) {
    content += `Disallow: ${path}\n`;
  }
  
  if (sitemap) {
    content += `\nSitemap: ${baseUrl}/sitemap.xml\n`;
  }
  
  return content;
}

Canonical URL Management

规范URL管理

javascript

// CANONICAL URL SYSTEM

class CanonicalManager {
  constructor(baseUrl) {
    this.baseUrl = baseUrl;
  }
  
  // Generate canonical for a route
  getCanonical(path, params = {}) {
    // Remove trailing slash
    let canonical = path.replace(/\/$/, '') || '/';
    
    // Normalize query params (sorted, only allowed ones)
    const allowedParams = ['page', 'category', 'sort'];
    const query = new URLSearchParams();
    
    for (const [key, value] of Object.entries(params)) {
      if (allowedParams.includes(key) && value) {
        query.set(key, value);
      }
    }
    
    const queryStr = query.toString();
    if (queryStr) {
      canonical += `?${queryStr}`;
    }
    
    return `${this.baseUrl}${canonical}`;
  }
  
  // Handle pagination canonicals
  getPaginationCanonical(path, page, totalPages) {
    // Page 1 should canonical to base URL
    if (page === 1) {
      return this.getCanonical(path);
    }
    
    return this.getCanonical(path, { page });
  }
  
  // Handle locale canonicals
  getLocaleCanonical(path, locale, defaultLocale) {
    if (locale === defaultLocale) {
      return this.getCanonical(path);
    }
    return this.getCanonical(`/${locale}${path}`);
  }
  
  // Generate hreflang tags
  getHreflangTags(path, locales, defaultLocale) {
    return locales.map(locale => ({
      type: 'link',
      rel: 'alternate',
      hreflang: locale,
      href: this.getLocaleCanonical(path, locale, defaultLocale),
    })).concat({
      type: 'link',
      rel: 'alternate',
      hreflang: 'x-default',
      href: this.getCanonical(path),
    });
  }
}

javascript

// CANONICAL URL SYSTEM

class CanonicalManager {
  constructor(baseUrl) {
    this.baseUrl = baseUrl;
  }
  
  // Generate canonical for a route
  getCanonical(path, params = {}) {
    // Remove trailing slash
    let canonical = path.replace(/\/$/, '') || '/';
    
    // Normalize query params (sorted, only allowed ones)
    const allowedParams = ['page', 'category', 'sort'];
    const query = new URLSearchParams();
    
    for (const [key, value] of Object.entries(params)) {
      if (allowedParams.includes(key) && value) {
        query.set(key, value);
      }
    }
    
    const queryStr = query.toString();
    if (queryStr) {
      canonical += `?${queryStr}`;
    }
    
    return `${this.baseUrl}${canonical}`;
  }
  
  // Handle pagination canonicals
  getPaginationCanonical(path, page, totalPages) {
    // Page 1 should canonical to base URL
    if (page === 1) {
      return this.getCanonical(path);
    }
    
    return this.getCanonical(path, { page });
  }
  
  // Handle locale canonicals
  getLocaleCanonical(path, locale, defaultLocale) {
    if (locale === defaultLocale) {
      return this.getCanonical(path);
    }
    return this.getCanonical(`/${locale}${path}`);
  }
  
  // Generate hreflang tags
  getHreflangTags(path, locales, defaultLocale) {
    return locales.map(locale => ({
      type: 'link',
      rel: 'alternate',
      hreflang: locale,
      href: this.getLocaleCanonical(path, locale, defaultLocale),
    })).concat({
      type: 'link',
      rel: 'alternate',
      hreflang: 'x-default',
      href: this.getCanonical(path),
    });
  }
}

Meta Tag Deduplication

Meta标签去重

javascript

// META TAG DEDUPLICATION

class MetaDeduplicator {
  constructor() {
    this.tags = [];
  }
  
  // Add tag with deduplication key
  add(tag) {
    const key = this.getDeduplicationKey(tag);
    
    // Remove existing tag with same key
    this.tags = this.tags.filter(t => 
      this.getDeduplicationKey(t) !== key
    );
    
    this.tags.push(tag);
  }
  
  getDeduplicationKey(tag) {
    if (tag.type === 'title') return 'title';
    if (tag.name) return `name:${tag.name}`;
    if (tag.property) return `property:${tag.property}`;
    if (tag.httpEquiv) return `http-equiv:${tag.httpEquiv}`;
    if (tag.rel === 'canonical') return 'canonical';
    return JSON.stringify(tag);
  }
  
  // Get final list (last added wins for duplicates)
  getTags() {
    return this.tags;
  }
}

// Integration with nested routes
function collectMetaTags(routeHierarchy) {
  const dedup = new MetaDeduplicator();
  
  // Apply from root to leaf (later overrides earlier)
  for (const route of routeHierarchy) {
    if (route.meta) {
      for (const tag of route.meta) {
        dedup.add(tag);
      }
    }
  }
  
  return dedup.getTags();
}

javascript

// META TAG DEDUPLICATION

class MetaDeduplicator {
  constructor() {
    this.tags = [];
  }
  
  // Add tag with deduplication key
  add(tag) {
    const key = this.getDeduplicationKey(tag);
    
    // Remove existing tag with same key
    this.tags = this.tags.filter(t => 
      this.getDeduplicationKey(t) !== key
    );
    
    this.tags.push(tag);
  }
  
  getDeduplicationKey(tag) {
    if (tag.type === 'title') return 'title';
    if (tag.name) return `name:${tag.name}`;
    if (tag.property) return `property:${tag.property}`;
    if (tag.httpEquiv) return `http-equiv:${tag.httpEquiv}`;
    if (tag.rel === 'canonical') return 'canonical';
    return JSON.stringify(tag);
  }
  
  // Get final list (last added wins for duplicates)
  getTags() {
    return this.tags;
  }
}

// Integration with nested routes
function collectMetaTags(routeHierarchy) {
  const dedup = new MetaDeduplicator();
  
  // Apply from root to leaf (later overrides earlier)
  for (const route of routeHierarchy) {
    if (route.meta) {
      for (const tag of route.meta) {
        dedup.add(tag);
      }
    }
  }
  
  return dedup.getTags();
}