seo-fundamentals
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSEO Fundamentals for Web Applications
Web应用SEO基础
Overview
概述
SEO (Search Engine Optimization) ensures search engines can discover, understand, and rank your content. Architecture decisions significantly impact SEO capability.
SEO(搜索引擎优化)确保搜索引擎能够发现、理解并对您的内容进行排名。架构决策会对SEO能力产生重大影响。
How Search Engines Work
搜索引擎的工作原理
The Three Phases
三个阶段
1. CRAWLING: Googlebot discovers URLs and downloads content
2. INDEXING: Google parses content and stores in search index
3. RANKING: Algorithm determines position in search results1. CRAWLING: Googlebot discovers URLs and downloads content
2. INDEXING: Google parses content and stores in search index
3. RANKING: Algorithm determines position in search resultsWhat Crawlers See
爬虫能看到什么
Crawlers primarily process the initial HTML response. JavaScript execution is:
- Delayed (crawl queue, separate rendering queue)
- Resource-intensive (limited render budget)
- Not guaranteed for all pages
Your Server Response What Googlebot Sees First
───────────────────── ──────────────────────────
<!DOCTYPE html> Same HTML (good for SSR/SSG)
<html>
<body> OR
<div id="root"></div> Empty div (bad for CSR)
<script src="app.js"> Script tag (won't execute immediately)
</body>
</html>爬虫主要处理初始HTML响应。JavaScript执行具有以下特点:
- 延迟执行(爬取队列与独立的渲染队列)
- 资源密集(渲染预算有限)
- 并非所有页面都能保证执行
Your Server Response What Googlebot Sees First
───────────────────── ──────────────────────────
<!DOCTYPE html> Same HTML (good for SSR/SSG)
<html>
<body> OR
<div id="root"></div> Empty div (bad for CSR)
<script src="app.js"> Script tag (won't execute immediately)
</body>
</html>Architecture Impact on SEO
架构对SEO的影响
SEO by Rendering Pattern
不同渲染模式的SEO表现
| Pattern | Initial HTML | SEO Quality | Notes |
|---|---|---|---|
| SSG | Complete | Excellent | Best for SEO |
| SSR | Complete | Excellent | Dynamic content, good SEO |
| ISR | Complete | Excellent | Fresh + fast |
| CSR | Empty shell | Poor | Requires workarounds |
| Streaming | Progressive | Good | Shell + streamed content |
| 模式 | 初始HTML | SEO质量 | 说明 |
|---|---|---|---|
| SSG | 完整内容 | 优秀 | SEO最佳选择 |
| SSR | 完整内容 | 优秀 | 支持动态内容,SEO表现良好 |
| ISR | 完整内容 | 优秀 | 兼顾内容新鲜度与加载速度 |
| CSR | 空壳结构 | 较差 | 需要采用变通方案 |
| Streaming | 渐进式内容 | 良好 | 先返回壳结构,再流式传输内容 |
The CSR Problem
CSR的问题
javascript
// What your React app renders client-side
<html>
<head><title>My App</title></head>
<body>
<div id="root">
<!-- JS renders content here AFTER page load -->
<!-- Crawler may not wait for this -->
</div>
</body>
</html>javascript
// What your React app renders client-side
<html>
<head><title>My App</title></head>
<body>
<div id="root">
<!-- JS renders content here AFTER page load -->
<!-- Crawler may not wait for this -->
</div>
</body>
</html>Solutions for CSR Apps
CSR应用的解决方案
- Server-Side Rendering (SSR): Render on server, hydrate on client
- Pre-rendering: Generate static HTML at build time for key pages
- Dynamic Rendering: Serve pre-rendered HTML to bots, SPA to users (not recommended by Google)
- 服务器端渲染(SSR):在服务器端渲染内容,客户端进行激活
- 预渲染:在构建时为关键页面生成静态HTML
- 动态渲染:向爬虫提供预渲染HTML,向用户提供SPA(Google不推荐此方案)
Technical SEO Elements
技术SEO要素
Essential Meta Tags
必备Meta标签
html
<head>
<!-- Title: 50-60 characters, unique per page -->
<title>Product Name - Category | Brand</title>
<!-- Description: 150-160 characters -->
<meta name="description" content="Compelling description with keywords">
<!-- Canonical: Prevent duplicate content -->
<link rel="canonical" href="https://example.com/page">
<!-- Robots: Control indexing -->
<meta name="robots" content="index, follow">
<!-- Open Graph: Social sharing -->
<meta property="og:title" content="Page Title">
<meta property="og:description" content="Description">
<meta property="og:image" content="https://example.com/image.jpg">
<!-- Viewport: Mobile-friendliness -->
<meta name="viewport" content="width=device-width, initial-scale=1">
</head>html
<head>
<!-- Title: 50-60 characters, unique per page -->
<title>Product Name - Category | Brand</title>
<!-- Description: 150-160 characters -->
<meta name="description" content="Compelling description with keywords">
<!-- Canonical: Prevent duplicate content -->
<link rel="canonical" href="https://example.com/page">
<!-- Robots: Control indexing -->
<meta name="robots" content="index, follow">
<!-- Open Graph: Social sharing -->
<meta property="og:title" content="Page Title">
<meta property="og:description" content="Description">
<meta property="og:image" content="https://example.com/image.jpg">
<!-- Viewport: Mobile-friendliness -->
<meta name="viewport" content="width=device-width, initial-scale=1">
</head>Structured Data (JSON-LD)
结构化数据(JSON-LD)
html
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Product",
"name": "Product Name",
"description": "Product description",
"price": "29.99",
"priceCurrency": "USD",
"availability": "https://schema.org/InStock"
}
</script>Common schema types: Product, Article, FAQ, BreadcrumbList, Organization, LocalBusiness
html
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Product",
"name": "Product Name",
"description": "Product description",
"price": "29.99",
"priceCurrency": "USD",
"availability": "https://schema.org/InStock"
}
</script>常见的Schema类型:Product(产品)、Article(文章)、FAQ(常见问题)、BreadcrumbList(面包屑导航)、Organization(组织)、LocalBusiness(本地商家)
Semantic HTML
语义化HTML
html
<!-- Good: Semantic structure -->
<main>
<article>
<header>
<h1>Main Title</h1>
</header>
<section>
<h2>Section Title</h2>
<p>Content...</p>
</section>
</article>
<aside>Related content</aside>
</main>
<!-- Bad: Div soup -->
<div class="main">
<div class="title">Main Title</div>
<div class="content">Content...</div>
</div>html
<!-- Good: Semantic structure -->
<main>
<article>
<header>
<h1>Main Title</h1>
</header>
<section>
<h2>Section Title</h2>
<p>Content...</p>
</section>
</article>
<aside>Related content</aside>
</main>
<!-- Bad: Div soup -->
<div class="main">
<div class="title">Main Title</div>
<div class="content">Content...</div>
</div>Core Web Vitals
Core Web Vitals
Google's page experience metrics that affect ranking.
Google用于评估页面体验的指标,会影响排名。
The Three Metrics
三个核心指标
| Metric | What It Measures | Good | Needs Improvement | Poor |
|---|---|---|---|---|
| LCP (Largest Contentful Paint) | Loading performance | ≤2.5s | 2.5-4s | >4s |
| INP (Interaction to Next Paint) | Interactivity | ≤200ms | 200-500ms | >500ms |
| CLS (Cumulative Layout Shift) | Visual stability | ≤0.1 | 0.1-0.25 | >0.25 |
| 指标 | 衡量内容 | 优秀 | 需要改进 | 较差 |
|---|---|---|---|---|
| LCP (Largest Contentful Paint) | 加载性能 | ≤2.5s | 2.5-4s | >4s |
| INP (Interaction to Next Paint) | 交互性 | ≤200ms | 200-500ms | >500ms |
| CLS (Cumulative Layout Shift) | 视觉稳定性 | ≤0.1 | 0.1-0.25 | >0.25 |
Common Issues by Architecture
不同架构的常见问题
SPA/CSR Issues:
- LCP: Slow due to JS loading before content
- CLS: Content shifts as JS loads and renders
SSR Issues:
- INP: Hydration can block interactivity
- TTFB: Slow server response times
SSG Issues:
- Generally best for Core Web Vitals
- CLS: Still possible with lazy-loaded images
SPA/CSR问题:
- LCP:因JS加载完成后才渲染内容,导致加载缓慢
- CLS:JS加载并渲染时,内容发生偏移
SSR问题:
- INP:激活过程可能阻塞交互
- TTFB:服务器响应时间缓慢
SSG问题:
- 通常Core Web Vitals表现最佳
- CLS:懒加载图片仍可能导致问题
Optimization Strategies
优化策略
Improve LCP:
html
<!-- Preload critical resources -->
<link rel="preload" href="hero-image.jpg" as="image">
<link rel="preload" href="critical.css" as="style">
<!-- Inline critical CSS -->
<style>/* Above-the-fold styles */</style>
<!-- Defer non-critical JS -->
<script defer src="app.js"></script>Improve CLS:
html
<!-- Reserve space for images -->
<img src="photo.jpg" width="800" height="600" alt="Description">
<!-- Reserve space for ads/embeds -->
<div style="min-height: 250px;">
<!-- Ad will load here -->
</div>Improve INP:
- Break up long tasks
- Minimize hydration cost
- Use for non-critical work
requestIdleCallback
提升LCP:
html
<!-- Preload critical resources -->
<link rel="preload" href="hero-image.jpg" as="image">
<link rel="preload" href="critical.css" as="style">
<!-- Inline critical CSS -->
<style>/* Above-the-fold styles */</style>
<!-- Defer non-critical JS -->
<script defer src="app.js"></script>提升CLS:
html
<!-- Reserve space for images -->
<img src="photo.jpg" width="800" height="600" alt="Description">
<!-- Reserve space for ads/embeds -->
<div style="min-height: 250px;">
<!-- Ad will load here -->
</div>提升INP:
- 拆分长任务
- 最小化激活成本
- 使用处理非关键任务
requestIdleCallback
URL Structure
URL结构
Best Practices
最佳实践
Good URLs:
/products/blue-running-shoes
/blog/2024/seo-guide
/category/electronics/phones
Poor URLs:
/products?id=12345
/page.php?cat=1&sub=2
/p/abc123xyzGood URLs:
/products/blue-running-shoes
/blog/2024/seo-guide
/category/electronics/phones
Poor URLs:
/products?id=12345
/page.php?cat=1&sub=2
/p/abc123xyzURL Guidelines
URL指南
- Use hyphens, not underscores
- Keep URLs short and descriptive
- Include target keywords naturally
- Use lowercase only
- Avoid query parameters for indexable content
- 使用连字符(-)而非下划线(_)
- 保持URL简短且具有描述性
- 自然融入目标关键词
- 仅使用小写字母
- 可索引内容避免使用查询参数
Sitemap and Robots.txt
站点地图与Robots.txt
XML Sitemap
XML站点地图
xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/page</loc>
<lastmod>2024-01-15</lastmod>
<changefreq>weekly</changefreq>
<priority>0.8</priority>
</url>
</urlset>xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/page</loc>
<lastmod>2024-01-15</lastmod>
<changefreq>weekly</changefreq>
<priority>0.8</priority>
</url>
</urlset>Robots.txt
Robots.txt
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /api/
Sitemap: https://example.com/sitemap.xmlUser-agent: *
Allow: /
Disallow: /admin/
Disallow: /api/
Sitemap: https://example.com/sitemap.xmlSPA SEO Checklist
SPA SEO检查清单
For SPAs that need SEO:
- Implement SSR or pre-rendering for SEO-critical pages
- Ensure each route has unique meta tags (title, description)
- Use semantic HTML structure
- Implement proper heading hierarchy (h1 → h2 → h3)
- Add structured data (JSON-LD)
- Generate XML sitemap with all routes
- Handle redirects server-side (301/302), not client-side
- Implement canonical URLs
- Ensure internal links are crawlable tags
<a href> - Test with Google Search Console's URL Inspection tool
针对需要SEO的SPA:
- 为SEO关键页面实现SSR或预渲染
- 确保每个路由拥有唯一的Meta标签(标题、描述)
- 使用语义化HTML结构
- 实现正确的标题层级(h1 → h2 → h3)
- 添加结构化数据(JSON-LD)
- 生成包含所有路由的XML站点地图
- 在服务器端处理重定向(301/302),而非客户端
- 实现规范URL(Canonical)
- 确保内部链接为可爬取的标签
<a href> - 使用Google Search Console的URL检测工具进行测试
Common SEO Mistakes
常见SEO错误
| Mistake | Problem | Solution |
|---|---|---|
| Client-side only meta tags | Crawler sees defaults | SSR or head management |
| JavaScript-only navigation | Links not crawlable | Use |
| Infinite scroll | Content not discoverable | Pagination or "Load More" with URLs |
| Hash-based routing | URLs not indexed | Use History API ( |
| Duplicate content | Diluted rankings | Canonical tags |
| Slow loading | Poor rankings | Optimize Core Web Vitals |
| 错误 | 问题 | 解决方案 |
|---|---|---|
| 仅客户端Meta标签 | 爬虫只能看到默认内容 | 采用SSR或头部管理方案 |
| 纯JavaScript导航 | 链接无法被爬取 | 使用 |
| 无限滚动 | 内容无法被发现 | 采用分页或带URL的“加载更多”按钮 |
| 基于哈希的路由 | URL无法被索引 | 使用History API(采用 |
| 重复内容 | 排名权重被分散 | 使用规范标签(Canonical) |
| 加载缓慢 | 排名表现差 | 优化Core Web Vitals |
Testing SEO
SEO测试
Tools
工具
- Google Search Console: Indexing status, issues
- Lighthouse: Core Web Vitals, SEO audit
- View Page Source: What crawler sees (not DevTools)
- Google Rich Results Test: Structured data validation
- Google Search Console:索引状态、问题排查
- Lighthouse:Core Web Vitals检测、SEO审计
- 查看页面源代码:爬虫能看到的内容(而非开发者工具)
- Google富媒体结果测试:结构化数据验证
Quick Checks
快速检查
bash
undefinedbash
undefinedSee what Google sees
See what Google sees
curl -A "Googlebot" https://example.com/page
curl -A "Googlebot" https://example.com/page
Check robots.txt
Check robots.txt
View without JavaScript (approximate crawler view)
View without JavaScript (approximate crawler view)
Disable JavaScript in browser DevTools
Disable JavaScript in browser DevTools
---
---Deep Dive: Understanding Search Engines From First Principles
深入探讨:从原理理解搜索引擎
How Googlebot Actually Works
Googlebot的实际工作机制
Googlebot is not one crawler - it's a massive distributed system:
THE GOOGLE CRAWLING INFRASTRUCTURE:
┌──────────────────────────────────────────────────────────────┐
│ URL FRONTIER │
│ (Priority queue of URLs to crawl - billions of entries) │
│ Priority based on: PageRank, freshness, crawl budget │
└─────────────────────────┬────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────┐
│ CRAWLER FLEET │
│ Thousands of servers making HTTP requests in parallel │
│ Respects robots.txt, crawl-delay, politeness policies │
└─────────────────────────┬────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────┐
│ HTML PROCESSING QUEUE │
│ Parse HTML, extract text, links, metadata │
│ This is FAST - pure HTML parsing │
└─────────────────────────┬────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────┐
│ RENDERING QUEUE (WRS) │
│ Web Rendering Service - headless Chrome │
│ Executes JavaScript for dynamic content │
│ This is SLOW and EXPENSIVE - limited capacity │
└─────────────────────────┬────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────┐
│ INDEXER │
│ Processes content, builds inverted index │
│ Maps words → documents for fast retrieval │
└──────────────────────────────────────────────────────────────┘The critical insight: JavaScript rendering is a SEPARATE phase that happens LATER, if at all.
Googlebot并非单一爬虫,而是一个庞大的分布式系统:
THE GOOGLE CRAWLING INFRASTRUCTURE:
┌──────────────────────────────────────────────────────────────┐
│ URL FRONTIER │
│ (Priority queue of URLs to crawl - billions of entries) │
│ Priority based on: PageRank, freshness, crawl budget │
└─────────────────────────┬────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────┐
│ CRAWLER FLEET │
│ Thousands of servers making HTTP requests in parallel │
│ Respects robots.txt, crawl-delay, politeness policies │
└─────────────────────────┬────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────┐
│ HTML PROCESSING QUEUE │
│ Parse HTML, extract text, links, metadata │
│ This is FAST - pure HTML parsing │
└─────────────────────────┬────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────┐
│ RENDERING QUEUE (WRS) │
│ Web Rendering Service - headless Chrome │
│ Executes JavaScript for dynamic content │
│ This is SLOW and EXPENSIVE - limited capacity │
└─────────────────────────┬────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────┐
│ INDEXER │
│ Processes content, builds inverted index │
│ Maps words → documents for fast retrieval │
└──────────────────────────────────────────────────────────────┘关键结论: JavaScript渲染是一个独立的后续阶段,且不一定会执行。
What Happens When Googlebot Visits Your Page
Googlebot访问页面的完整流程
STEP 1: URL Discovery
- Found via: sitemap, internal links, external links, Search Console
- Added to URL Frontier with priority score
STEP 2: HTTP Request
- Googlebot requests your URL
- Sends headers: User-Agent: Googlebot, Accept-Language, etc.
- Follows redirects (301, 302, 307, 308)
STEP 3: Response Analysis
- HTTP status: 200, 404, 500, etc.
- Content-Type: text/html, application/json, etc.
- Headers: X-Robots-Tag, Canonical, etc.
STEP 4: HTML Parsing (IMMEDIATE)
GET /products/shoes HTTP/1.1
Host: example.com
User-Agent: Googlebot
Response:
<html>
<head>
<title>Running Shoes | Example</title>
<meta name="description" content="...">
</head>
<body>
<div id="root">Loading...</div>
<script src="/app.js"></script> ← NOT EXECUTED YET
</body>
</html>
EXTRACTED IMMEDIATELY:
- Title: "Running Shoes | Example"
- Meta description
- Any visible text: "Loading..."
- Links for further crawling
- Nothing from JavaScript!
STEP 5: Rendering Queue (DELAYED)
- If page seems JS-dependent, added to rendering queue
- Could be minutes, hours, or days later
- Headless Chrome executes JavaScript
- Final DOM captured for indexingSTEP 1: URL Discovery
- 发现途径:站点地图、内部链接、外部链接、Search Console
- 被添加到URL Frontier并赋予优先级分数
STEP 2: HTTP Request
- Googlebot请求您的URL
- 发送请求头:User-Agent: Googlebot、Accept-Language等
- 遵循重定向规则(301、302、307、308)
STEP 3: Response Analysis
- HTTP状态码:200、404、500等
- Content-Type:text/html、application/json等
- 响应头:X-Robots-Tag、Canonical等
STEP 4: HTML Parsing (IMMEDIATE)
GET /products/shoes HTTP/1.1
Host: example.com
User-Agent: Googlebot
Response:
<html>
<head>
<title>Running Shoes | Example</title>
<meta name="description" content="...">
</head>
<body>
<div id="root">Loading...</div>
<script src="/app.js"></script> ← NOT EXECUTED YET
</body>
</html>
EXTRACTED IMMEDIATELY:
- Title: "Running Shoes | Example"
- Meta description
- 可见文本:"Loading..."
- 用于后续爬取的链接
- 无法获取任何JavaScript生成的内容!
STEP 5: Rendering Queue (DELAYED)
- 如果页面依赖JavaScript,会被添加到渲染队列
- 可能在数分钟、数小时甚至数天后才会执行
- 无头Chrome执行JavaScript
- 捕获最终DOM用于索引The Rendering Budget Problem
渲染预算问题
Google allocates finite resources to rendering:
GOOGLE'S RENDERING CONSTRAINTS:
Total pages to render: ~billions
Rendering capacity: ~millions per day (estimated)
Your site's share: depends on "crawl budget"
CRAWL BUDGET FACTORS:
1. Site authority (PageRank-like signals)
2. Update frequency (how often content changes)
3. Server response time (fast = more crawling)
4. Errors encountered (errors = less crawling)
IMPLICATIONS:
- Large sites: Not all pages get rendered
- Low-authority sites: Lower priority in queue
- Slow sites: Fewer resources allocated
- Error-prone sites: Crawl budget wasted
RECOMMENDATION:
Don't RELY on rendering - serve complete HTMLGoogle为渲染分配的资源是有限的:
GOOGLE'S RENDERING CONSTRAINTS:
Total pages to render: ~billions
Rendering capacity: ~millions per day (estimated)
Your site's share: depends on "crawl budget"
CRAWL BUDGET FACTORS:
1. Site authority (PageRank-like signals)
2. Update frequency (how often content changes)
3. Server response time (fast = more crawling)
4. Errors encountered (errors = less crawling)
IMPLICATIONS:
- 大型站点:并非所有页面都会被渲染
- 低权重站点:在队列中的优先级较低
- 慢速站点:分配的资源更少
- 易出错站点:爬取预算被浪费
RECOMMENDATION:
Don't RELY on rendering - serve complete HTMLUnderstanding Indexing vs Ranking
区分索引与排名
Many developers confuse these:
INDEXING: Is your page in Google's database?
- Google knows the page exists
- Has parsed its content
- Stored in the index
Check: site:example.com/your-page
If it appears: indexed
If it doesn't: not indexed (or blocked)
RANKING: Where does your page appear in results?
- Page is indexed, now competing with millions of others
- Algorithm determines position
- 200+ ranking factors
A page can be:
✓ Indexed but ranking poorly (page 10+)
✓ Indexed but not ranking for your target keywords
✗ Not indexed at all (biggest problem for SPAs)许多开发者会混淆这两个概念:
INDEXING: Is your page in Google's database?
- Google知道该页面存在
- 已解析页面内容
- 存储在索引中
Check: site:example.com/your-page
If it appears: indexed
If it doesn't: not indexed (or blocked)
RANKING: Where does your page appear in results?
- 页面已被索引,现在与数百万其他页面竞争
- 算法决定排名位置
- 涉及200+排名因素
A page can be:
✓ Indexed but ranking poorly (page 10+)
✓ Indexed but not ranking for your target keywords
✗ Not indexed at all (biggest problem for SPAs)How Google Processes JavaScript SPAs
Google如何处理JavaScript SPA
javascript
// YOUR REACT SPA:
// Server returns:
<!DOCTYPE html>
<html>
<head>
<title>My App</title> <!-- Google sees this -->
</head>
<body>
<div id="root"></div> <!-- Google sees EMPTY DIV -->
<script src="/bundle.js"></script>
</body>
</html>
// PHASE 1: HTML PARSING (immediate)
// Google extracts:
// - Title: "My App"
// - Body text: "" (empty)
// - Links: none found in content
// PHASE 2: RENDERING (delayed)
// Hours or days later, if ever:
// - Chrome loads page
// - Executes bundle.js
// - React renders into #root
// - Final HTML captured:
<div id="root">
<header>...</header>
<main>
<h1>Welcome to My App</h1>
<p>Content that was invisible before</p>
</main>
</div>
// NOW Google can index the real content
// But this delay means:
// - Time-sensitive content may be stale
// - Pages might rank for "Loading..." text
// - Some pages may never get renderedjavascript
// YOUR REACT SPA:
// Server returns:
<!DOCTYPE html>
<html>
<head>
<title>My App</title> <!-- Google sees this -->
</head>
<body>
<div id="root"></div> <!-- Google sees EMPTY DIV -->
<script src="/bundle.js"></script>
</body>
</html>
// PHASE 1: HTML PARSING (immediate)
// Google extracts:
// - Title: "My App"
// - Body text: "" (empty)
// - Links: none found in content
// PHASE 2: RENDERING (delayed)
// Hours or days later, if ever:
// - Chrome loads page
// - Executes bundle.js
// - React renders into #root
// - Final HTML captured:
<div id="root">
<header>...</header>
<main>
<h1>Welcome to My App</h1>
<p>Content that was invisible before</p>
</main>
</div>
// NOW Google can index the real content
// But this delay means:
// - Time-sensitive content may be stale
// - Pages might rank for "Loading..." text
// - Some pages may never get renderedThe Two-Wave Indexing Phenomenon
两阶段索引现象
SPAs often show strange indexing behavior:
WAVE 1: HTML-only indexing
- Title and meta description captured
- Body appears empty or "Loading..."
- May rank for title keywords only
- Incomplete representation in SERPs
WAVE 2: Post-render indexing (if it happens)
- Full content now visible
- Rankings may change dramatically
- Could take days or weeks
OBSERVABLE SYMPTOMS:
- Search result shows "Loading..." as snippet
- Page ranks for title but not body content
- Search Console shows "Page is not indexed" then later "Indexed"
- Rankings fluctuate as rendering catches upSPA通常会出现奇怪的索引行为:
WAVE 1: HTML-only indexing
- 标题和Meta描述被捕获
- 正文显示为空或“Loading...”
- 可能仅针对标题关键词排名
- 在搜索结果中的展示不完整
WAVE 2: Post-render indexing (if it happens)
- 完整内容现在可见
- 排名可能发生巨大变化
- 可能需要数天或数周时间
OBSERVABLE SYMPTOMS:
- 搜索结果的摘要显示“Loading...”
- 页面针对标题关键词排名,但不针对正文内容
- Search Console先显示“页面未被索引”,之后显示“已索引”
- 排名随着渲染进度波动Core Web Vitals: The Technical Details
Core Web Vitals:技术细节
Understanding how metrics are measured:
javascript
// LARGEST CONTENTFUL PAINT (LCP)
// Measures: When largest visible content renders
// Elements considered: images, videos, block-level text
// Browser tracks LCP candidates:
// t=0ms: Navigation starts
// t=100ms: First text paints (small heading) - LCP candidate 1
// t=500ms: Hero image loads - LCP candidate 2 (larger, replaces)
// t=2500ms: No more updates - final LCP = 500ms ✓ GOOD
// LCP KILLERS:
// - Slow server response (TTFB)
// - Render-blocking JavaScript
// - Slow image loading
// - Client-side rendering (content waits for JS)
// INTERACTION TO NEXT PAINT (INP)
// Measures: Responsiveness to user input
// Captures: click, tap, keypress → visual update
// How it works:
// 1. User clicks button
// 2. Browser creates "click" event
// 3. Your JavaScript handler runs (event processing time)
// 4. React re-renders (presentation delay)
// 5. Browser paints the update
// 6. INP = time from click to paint complete
// INP KILLERS:
// - Long JavaScript tasks (>50ms)
// - Hydration blocking main thread
// - Heavy re-renders
// - Too many event listeners
// CUMULATIVE LAYOUT SHIFT (CLS)
// Measures: Visual stability (unexpected movement)
// Calculated: impact fraction × distance fraction
// Example of bad CLS:
// t=0ms: Heading renders at y=0
// t=500ms: Ad loads above heading, pushes it to y=250px
// Impact: 100% of viewport affected
// Distance: 250px / viewport height
// CLS KILLERS:
// - Images without dimensions
// - Ads/embeds without reserved space
// - Dynamically injected content
// - Web fonts causing text resize了解指标的测量方式:
javascript
// LARGEST CONTENTFUL PAINT (LCP)
// Measures: When largest visible content renders
// Elements considered: images, videos, block-level text
// Browser tracks LCP candidates:
// t=0ms: Navigation starts
// t=100ms: First text paints (small heading) - LCP candidate 1
// t=500ms: Hero image loads - LCP candidate 2 (larger, replaces)
// t=2500ms: No more updates - final LCP = 500ms ✓ GOOD
// LCP KILLERS:
// - Slow server response (TTFB)
// - Render-blocking JavaScript
// - Slow image loading
// - Client-side rendering (content waits for JS)
// INTERACTION TO NEXT PAINT (INP)
// Measures: Responsiveness to user input
// Captures: click, tap, keypress → visual update
// How it works:
// 1. User clicks button
// 2. Browser creates "click" event
// 3. Your JavaScript handler runs (event processing time)
// 4. React re-renders (presentation delay)
// 5. Browser paints the update
// 6. INP = time from click to paint complete
// INP KILLERS:
// - Long JavaScript tasks (>50ms)
// - Hydration blocking main thread
// - Heavy re-renders
// - Too many event listeners
// CUMULATIVE LAYOUT SHIFT (CLS)
// Measures: Visual stability (unexpected movement)
// Calculated: impact fraction × distance fraction
// Example of bad CLS:
// t=0ms: Heading renders at y=0
// t=500ms: Ad loads above heading, pushes it to y=250px
// Impact: 100% of viewport affected
// Distance: 250px / viewport height
// CLS KILLERS:
// - Images without dimensions
// - Ads/embeds without reserved space
// - Dynamically injected content
// - Web fonts causing text resizeHow Google Evaluates Page Quality
Google如何评估页面质量
Beyond technical SEO, Google assesses quality:
E-E-A-T SIGNALS (Experience, Expertise, Authoritativeness, Trust):
EXPERIENCE:
- Does content show first-hand experience?
- Product reviews: Did you actually use it?
- Travel guides: Did you actually visit?
EXPERTISE:
- Is the author qualified to write this?
- For medical content: Is author a doctor?
- For legal content: Is author a lawyer?
AUTHORITATIVENESS:
- Is this site a known authority?
- Do other sites link to it?
- Is it cited in the industry?
TRUSTWORTHINESS:
- Secure connection (HTTPS)?
- Clear contact information?
- No deceptive practices?
HOW GOOGLE MEASURES:
- External links (authority)
- Author bios and credentials
- Site reputation
- User behavior signals
- Content accuracy (fact-checking)除了技术SEO,Google还会评估页面质量:
E-E-A-T SIGNALS (Experience, Expertise, Authoritativeness, Trust):
EXPERIENCE:
- Does content show first-hand experience?
- Product reviews: Did you actually use it?
- Travel guides: Did you actually visit?
EXPERTISE:
- Is the author qualified to write this?
- For medical content: Is author a doctor?
- For legal content: Is author a lawyer?
AUTHORITATIVENESS:
- Is this site a known authority?
- Do other sites link to it?
- Is it cited in the industry?
TRUSTWORTHINESS:
- Secure connection (HTTPS)?
- Clear contact information?
- No deceptive practices?
HOW GOOGLE MEASURES:
- External links (authority)
- Author bios and credentials
- Site reputation
- User behavior signals
- Content accuracy (fact-checking)Canonical URLs: Preventing Duplicate Content
规范URL:避免重复内容
Duplicate content confuses Google:
SCENARIO: Same product at multiple URLs
/products/shoes
/products/shoes?color=red
/products/shoes?color=red&size=10
/products/shoes?utm_source=facebook
PROBLEM:
- Google sees 4 different "pages"
- Splits ranking signals across them
- May pick wrong one as "canonical"
SOLUTION: Canonical tags
<link rel="canonical" href="https://example.com/products/shoes" />
EVERY variant should point to THE ONE canonical URL.
HOW GOOGLE USES CANONICAL:
1. Sees multiple URLs with same/similar content
2. Checks for canonical tag
3. Consolidates signals to canonical URL
4. Returns canonical URL in search results
CANONICAL RULES:
- Self-referencing canonicals are GOOD (each page points to itself)
- Cross-domain canonicals work (if you have duplicate on 2 domains)
- Canonical is a HINT, not directive (Google may ignore)
- Conflicting signals = Google chooses (may be wrong)重复内容会让Google产生困惑:
SCENARIO: Same product at multiple URLs
/products/shoes
/products/shoes?color=red
/products/shoes?color=red&size=10
/products/shoes?utm_source=facebook
PROBLEM:
- Google sees 4 different "pages"
- Splits ranking signals across them
- May pick wrong one as "canonical"
SOLUTION: Canonical tags
<link rel="canonical" href="https://example.com/products/shoes" />
EVERY variant should point to THE ONE canonical URL.
HOW GOOGLE USES CANONICAL:
1. Sees multiple URLs with same/similar content
2. Checks for canonical tag
3. Consolidates signals to canonical URL
4. Returns canonical URL in search results
CANONICAL RULES:
- Self-referencing canonicals are GOOD (each page points to itself)
- Cross-domain canonicals work (if you have duplicate on 2 domains)
- Canonical is a HINT, not directive (Google may ignore)
- Conflicting signals = Google chooses (may be wrong)Structured Data: How Machines Understand Content
结构化数据:让机器理解内容
Structured data helps Google understand meaning:
javascript
// WITHOUT STRUCTURED DATA:
// Google sees text: "Nike Air Max - $129.99 - In Stock"
// Google has to GUESS: Is this a product? What's the price?
// WITH STRUCTURED DATA:
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Product",
"name": "Nike Air Max",
"offers": {
"@type": "Offer",
"price": "129.99",
"priceCurrency": "USD",
"availability": "https://schema.org/InStock"
}
}
</script>
// NOW Google KNOWS:
// - This is a Product (not an article, event, etc.)
// - The name is "Nike Air Max"
// - It costs $129.99 USD
// - It's in stock
// BENEFITS:
// - Rich snippets in search results (stars, prices, availability)
// - Product panels in shopping results
// - Voice assistant answers
// - Google Merchant Center integration结构化数据帮助Google理解内容的含义:
javascript
// WITHOUT STRUCTURED DATA:
// Google sees text: "Nike Air Max - $129.99 - In Stock"
// Google has to GUESS: Is this a product? What's the price?
// WITH STRUCTURED DATA:
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Product",
"name": "Nike Air Max",
"offers": {
"@type": "Offer",
"price": "129.99",
"priceCurrency": "USD",
"availability": "https://schema.org/InStock"
}
}
</script>
// NOW Google KNOWS:
// - This is a Product (not an article, event, etc.)
// - The name is "Nike Air Max"
// - It costs $129.99 USD
// - It's in stock
// BENEFITS:
// - Rich snippets in search results (stars, prices, availability)
// - Product panels in shopping results
// - Voice assistant answers
// - Google Merchant Center integrationThe JavaScript SEO Testing Protocol
JavaScript SEO测试流程
How to verify your SPA is SEO-ready:
bash
undefined如何验证您的SPA是否做好了SEO准备:
bash
undefinedSTEP 1: View raw HTML (what crawler sees first)
STEP 1: View raw HTML (what crawler sees first)
curl -s https://yoursite.com/page | head -100
curl -s https://yoursite.com/page | head -100
Look for:
Look for:
- Real content in HTML
- Real content in HTML
- Not just <div id="root"></div>
- Not just <div id="root"></div>
- Proper <title> and <meta description>
- Proper <title> and <meta description>
STEP 2: Compare rendered vs raw
STEP 2: Compare rendered vs raw
In Chrome DevTools:
In Chrome DevTools:
- View Page Source (raw HTML)
- View Page Source (raw HTML)
- Inspect Element (rendered DOM)
- Inspect Element (rendered DOM)
- If they're very different = SEO risk
- If they're very different = SEO risk
STEP 3: Google's Mobile-Friendly Test
STEP 3: Google's Mobile-Friendly Test
Shows JavaScript-rendered version
Shows JavaScript-rendered version
Reveals what Google actually sees
Reveals what Google actually sees
STEP 4: URL Inspection Tool (Search Console)
STEP 4: URL Inspection Tool (Search Console)
Shows exactly how Google indexed your page
Shows exactly how Google indexed your page
"View Crawled Page" shows HTML Google has
"View Crawled Page" shows HTML Google has
"View Tested Page" shows rendered version
"View Tested Page" shows rendered version
STEP 5: Site search
STEP 5: Site search
site:yoursite.com/specific-page
site:yoursite.com/specific-page
If it appears with wrong snippet = indexing issue
If it appears with wrong snippet = indexing issue
If it doesn't appear = not indexed
If it doesn't appear = not indexed
undefinedundefinedReal-World SEO Architecture Patterns
实际SEO架构模式
PATTERN 1: SSR/SSG FOR ALL (Safest)
- Every page server-rendered
- No JavaScript rendering dependencies
- Works for all crawlers
- Best for: content sites, e-commerce
PATTERN 2: HYBRID (Practical)
- Public pages: SSR/SSG (SEO critical)
- Authenticated pages: CSR (no SEO needed)
- Example:
/ → SSG
/products/* → ISR
/blog/* → SSG
/dashboard/* → CSR (behind login)
PATTERN 3: EDGE RENDERING (Modern)
- Render at CDN edge for speed
- Still SSR, but geographically distributed
- Best for: global sites, performance critical
PATTERN 4: STREAMING SSR (Advanced)
- Stream HTML progressively
- Critical content first
- Non-critical streams later
- Best for: large pages, slow data sources
ANTI-PATTERN: CSR FOR PUBLIC CONTENT
- Hoping Google will render JavaScript
- Relying on "they support JS now"
- Will have inconsistent indexing
- May rank poorly vs competitorsPATTERN 1: SSR/SSG FOR ALL (Safest)
- Every page server-rendered
- No JavaScript rendering dependencies
- Works for all crawlers
- Best for: content sites, e-commerce
PATTERN 2: HYBRID (Practical)
- Public pages: SSR/SSG (SEO critical)
- Authenticated pages: CSR (no SEO needed)
- Example:
/ → SSG
/products/* → ISR
/blog/* → SSG
/dashboard/* → CSR (behind login)
PATTERN 3: EDGE RENDERING (Modern)
- Render at CDN edge for speed
- Still SSR, but geographically distributed
- Best for: global sites, performance critical
PATTERN 4: STREAMING SSR (Advanced)
- Stream HTML progressively
- Critical content first
- Non-critical streams later
- Best for: large pages, slow data sources
ANTI-PATTERN: CSR FOR PUBLIC CONTENT
- Hoping Google will render JavaScript
- Relying on "they support JS now"
- Will have inconsistent indexing
- May rank poorly vs competitorsFor Framework Authors: Building SEO Systems
框架开发者指南:构建SEO系统
Implementation Note: The patterns and code examples below represent one proven approach to building SEO systems. Head management can be handled via components (React Helmet), framework APIs (Next.js Metadata), or universal libraries (unhead). The direction shown here provides core concepts—adapt based on your framework's rendering model, SSR implementation, and whether you need streaming support.
实现说明:以下模式和代码示例代表了一种经过验证的SEO系统构建方法。头部管理可通过组件(如React Helmet)、框架API(如Next.js Metadata)或通用库(如unhead)实现。此处展示的方向提供了核心概念——请根据您的框架渲染模型、SSR实现以及是否需要流式传输支持进行调整。
Implementing Head Management
实现头部管理
javascript
// DOCUMENT HEAD MANAGEMENT
class HeadManager {
constructor() {
this.tags = new Map();
this.order = [];
}
// Add or update a head tag
setTag(id, tag) {
if (!this.tags.has(id)) {
this.order.push(id);
}
this.tags.set(id, tag);
}
// Remove a tag
removeTag(id) {
this.tags.delete(id);
this.order = this.order.filter(i => i !== id);
}
// Render to string (for SSR)
toString() {
return this.order
.map(id => this.renderTag(this.tags.get(id)))
.join('\n');
}
renderTag(tag) {
const { type, ...attrs } = tag;
if (type === 'title') {
return `<title>${escapeHtml(attrs.children)}</title>`;
}
const attrStr = Object.entries(attrs)
.filter(([k]) => k !== 'children')
.map(([k, v]) => `${k}="${escapeHtml(v)}"`)
.join(' ');
if (tag.children) {
return `<${type} ${attrStr}>${tag.children}</${type}>`;
}
return `<${type} ${attrStr}>`;
}
// Apply to DOM (for client-side)
applyToDOM() {
const head = document.head;
for (const [id, tag] of this.tags) {
let existing = head.querySelector(`[data-head-id="${id}"]`);
if (!existing) {
existing = document.createElement(tag.type);
existing.setAttribute('data-head-id', id);
head.appendChild(existing);
}
// Update attributes
for (const [key, value] of Object.entries(tag)) {
if (key === 'type') continue;
if (key === 'children') {
existing.textContent = value;
} else {
existing.setAttribute(key, value);
}
}
}
}
}
// React hook for head management
function useHead(tags) {
const headManager = useContext(HeadContext);
const id = useId();
useEffect(() => {
// Apply on mount
Object.entries(tags).forEach(([key, value]) => {
headManager.setTag(`${id}-${key}`, value);
});
headManager.applyToDOM();
// Cleanup on unmount
return () => {
Object.keys(tags).forEach(key => {
headManager.removeTag(`${id}-${key}`);
});
headManager.applyToDOM();
};
}, [JSON.stringify(tags)]);
}
// Usage
function ProductPage({ product }) {
useHead({
title: { type: 'title', children: `${product.name} | Store` },
description: { type: 'meta', name: 'description', content: product.summary },
ogTitle: { type: 'meta', property: 'og:title', content: product.name },
ogImage: { type: 'meta', property: 'og:image', content: product.image },
});
return <div>{/* ... */}</div>;
}javascript
// DOCUMENT HEAD MANAGEMENT
class HeadManager {
constructor() {
this.tags = new Map();
this.order = [];
}
// Add or update a head tag
setTag(id, tag) {
if (!this.tags.has(id)) {
this.order.push(id);
}
this.tags.set(id, tag);
}
// Remove a tag
removeTag(id) {
this.tags.delete(id);
this.order = this.order.filter(i => i !== id);
}
// Render to string (for SSR)
toString() {
return this.order
.map(id => this.renderTag(this.tags.get(id)))
.join('\n');
}
renderTag(tag) {
const { type, ...attrs } = tag;
if (type === 'title') {
return `<title>${escapeHtml(attrs.children)}</title>`;
}
const attrStr = Object.entries(attrs)
.filter(([k]) => k !== 'children')
.map(([k, v]) => `${k}="${escapeHtml(v)}"`)
.join(' ');
if (tag.children) {
return `<${type} ${attrStr}>${tag.children}</${type}>`;
}
return `<${type} ${attrStr}>`;
}
// Apply to DOM (for client-side)
applyToDOM() {
const head = document.head;
for (const [id, tag] of this.tags) {
let existing = head.querySelector(`[data-head-id="${id}"]`);
if (!existing) {
existing = document.createElement(tag.type);
existing.setAttribute('data-head-id', id);
head.appendChild(existing);
}
// Update attributes
for (const [key, value] of Object.entries(tag)) {
if (key === 'type') continue;
if (key === 'children') {
existing.textContent = value;
} else {
existing.setAttribute(key, value);
}
}
}
}
}
// React hook for head management
function useHead(tags) {
const headManager = useContext(HeadContext);
const id = useId();
useEffect(() => {
// Apply on mount
Object.entries(tags).forEach(([key, value]) => {
headManager.setTag(`${id}-${key}`, value);
});
headManager.applyToDOM();
// Cleanup on unmount
return () => {
Object.keys(tags).forEach(key => {
headManager.removeTag(`${id}-${key}`);
});
headManager.applyToDOM();
};
}, [JSON.stringify(tags)]);
}
// Usage
function ProductPage({ product }) {
useHead({
title: { type: 'title', children: `${product.name} | Store` },
description: { type: 'meta', name: 'description', content: product.summary },
ogTitle: { type: 'meta', property: 'og:title', content: product.name },
ogImage: { type: 'meta', property: 'og:image', content: product.image },
});
return <div>{/* ... */}</div>;
}Building Structured Data Injection
构建结构化数据注入系统
javascript
// STRUCTURED DATA (JSON-LD) SYSTEM
class StructuredDataManager {
constructor() {
this.schemas = new Map();
}
// Register schema for a route
setSchema(id, schema) {
this.schemas.set(id, schema);
}
// Build JSON-LD from data
buildProductSchema(product) {
return {
'@context': 'https://schema.org',
'@type': 'Product',
name: product.name,
description: product.description,
image: product.images,
sku: product.sku,
brand: {
'@type': 'Brand',
name: product.brand,
},
offers: {
'@type': 'Offer',
url: product.url,
priceCurrency: product.currency,
price: product.price,
availability: product.inStock
? 'https://schema.org/InStock'
: 'https://schema.org/OutOfStock',
},
aggregateRating: product.rating ? {
'@type': 'AggregateRating',
ratingValue: product.rating,
reviewCount: product.reviewCount,
} : undefined,
};
}
buildBreadcrumbSchema(breadcrumbs) {
return {
'@context': 'https://schema.org',
'@type': 'BreadcrumbList',
itemListElement: breadcrumbs.map((crumb, i) => ({
'@type': 'ListItem',
position: i + 1,
item: {
'@id': crumb.url,
name: crumb.name,
},
})),
};
}
buildArticleSchema(article) {
return {
'@context': 'https://schema.org',
'@type': 'Article',
headline: article.title,
description: article.excerpt,
image: article.image,
author: {
'@type': 'Person',
name: article.author.name,
url: article.author.url,
},
publisher: {
'@type': 'Organization',
name: article.publisher.name,
logo: {
'@type': 'ImageObject',
url: article.publisher.logo,
},
},
datePublished: article.publishedAt,
dateModified: article.updatedAt,
};
}
// Render all schemas
toString() {
const schemas = Array.from(this.schemas.values());
if (schemas.length === 0) return '';
const combined = schemas.length === 1
? schemas[0]
: { '@context': 'https://schema.org', '@graph': schemas };
return `<script type="application/ld+json">${
JSON.stringify(combined).replace(/</g, '\\u003c')
}</script>`;
}
}javascript
// STRUCTURED DATA (JSON-LD) SYSTEM
class StructuredDataManager {
constructor() {
this.schemas = new Map();
}
// Register schema for a route
setSchema(id, schema) {
this.schemas.set(id, schema);
}
// Build JSON-LD from data
buildProductSchema(product) {
return {
'@context': 'https://schema.org',
'@type': 'Product',
name: product.name,
description: product.description,
image: product.images,
sku: product.sku,
brand: {
'@type': 'Brand',
name: product.brand,
},
offers: {
'@type': 'Offer',
url: product.url,
priceCurrency: product.currency,
price: product.price,
availability: product.inStock
? 'https://schema.org/InStock'
: 'https://schema.org/OutOfStock',
},
aggregateRating: product.rating ? {
'@type': 'AggregateRating',
ratingValue: product.rating,
reviewCount: product.reviewCount,
} : undefined,
};
}
buildBreadcrumbSchema(breadcrumbs) {
return {
'@context': 'https://schema.org',
'@type': 'BreadcrumbList',
itemListElement: breadcrumbs.map((crumb, i) => ({
'@type': 'ListItem',
position: i + 1,
item: {
'@id': crumb.url,
name: crumb.name,
},
})),
};
}
buildArticleSchema(article) {
return {
'@context': 'https://schema.org',
'@type': 'Article',
headline: article.title,
description: article.excerpt,
image: article.image,
author: {
'@type': 'Person',
name: article.author.name,
url: article.author.url,
},
publisher: {
'@type': 'Organization',
name: article.publisher.name,
logo: {
'@type': 'ImageObject',
url: article.publisher.logo,
},
},
datePublished: article.publishedAt,
dateModified: article.updatedAt,
};
}
// Render all schemas
toString() {
const schemas = Array.from(this.schemas.values());
if (schemas.length === 0) return '';
const combined = schemas.length === 1
? schemas[0]
: { '@context': 'https://schema.org', '@graph': schemas };
return `<script type="application/ld+json">${
JSON.stringify(combined).replace(/</g, '\\u003c')
}</script>`;
}
}Sitemap Generation
站点地图生成
javascript
// SITEMAP GENERATOR
async function generateSitemap(config) {
const { baseUrl, routes, outputPath } = config;
const urls = [];
for (const route of routes) {
// Static routes
if (!route.isDynamic) {
urls.push({
loc: `${baseUrl}${route.path}`,
lastmod: route.lastModified || new Date().toISOString(),
changefreq: route.changefreq || 'weekly',
priority: route.priority || 0.7,
});
continue;
}
// Dynamic routes - get all paths
if (route.getStaticPaths) {
const paths = await route.getStaticPaths();
for (const path of paths) {
const fullPath = route.path.replace(
/\[(\w+)\]/g,
(_, param) => path.params[param]
);
urls.push({
loc: `${baseUrl}${fullPath}`,
lastmod: path.lastModified || new Date().toISOString(),
changefreq: path.changefreq || 'weekly',
priority: path.priority || 0.5,
});
}
}
}
// Generate XML
const xml = `<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
${urls.map(url => ` <url>
<loc>${escapeXml(url.loc)}</loc>
<lastmod>${url.lastmod}</lastmod>
<changefreq>${url.changefreq}</changefreq>
<priority>${url.priority}</priority>
</url>`).join('\n')}
</urlset>`;
await fs.writeFile(outputPath, xml);
// Generate sitemap index if too large
if (urls.length > 50000) {
return generateSitemapIndex(urls, config);
}
return xml;
}
// Robots.txt generator
function generateRobotsTxt(config) {
const { baseUrl, disallow = [], sitemap = true } = config;
let content = `User-agent: *\n`;
for (const path of disallow) {
content += `Disallow: ${path}\n`;
}
if (sitemap) {
content += `\nSitemap: ${baseUrl}/sitemap.xml\n`;
}
return content;
}javascript
// SITEMAP GENERATOR
async function generateSitemap(config) {
const { baseUrl, routes, outputPath } = config;
const urls = [];
for (const route of routes) {
// Static routes
if (!route.isDynamic) {
urls.push({
loc: `${baseUrl}${route.path}`,
lastmod: route.lastModified || new Date().toISOString(),
changefreq: route.changefreq || 'weekly',
priority: route.priority || 0.7,
});
continue;
}
// Dynamic routes - get all paths
if (route.getStaticPaths) {
const paths = await route.getStaticPaths();
for (const path of paths) {
const fullPath = route.path.replace(
/\[(\w+)\]/g,
(_, param) => path.params[param]
);
urls.push({
loc: `${baseUrl}${fullPath}`,
lastmod: path.lastModified || new Date().toISOString(),
changefreq: path.changefreq || 'weekly',
priority: path.priority || 0.5,
});
}
}
}
// Generate XML
const xml = `<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
${urls.map(url => ` <url>
<loc>${escapeXml(url.loc)}</loc>
<lastmod>${url.lastmod}</lastmod>
<changefreq>${url.changefreq}</changefreq>
<priority>${url.priority}</priority>
</url>`).join('\n')}
</urlset>`;
await fs.writeFile(outputPath, xml);
// Generate sitemap index if too large
if (urls.length > 50000) {
return generateSitemapIndex(urls, config);
}
return xml;
}
// Robots.txt generator
function generateRobotsTxt(config) {
const { baseUrl, disallow = [], sitemap = true } = config;
let content = `User-agent: *\n`;
for (const path of disallow) {
content += `Disallow: ${path}\n`;
}
if (sitemap) {
content += `\nSitemap: ${baseUrl}/sitemap.xml\n`;
}
return content;
}Canonical URL Management
规范URL管理
javascript
// CANONICAL URL SYSTEM
class CanonicalManager {
constructor(baseUrl) {
this.baseUrl = baseUrl;
}
// Generate canonical for a route
getCanonical(path, params = {}) {
// Remove trailing slash
let canonical = path.replace(/\/$/, '') || '/';
// Normalize query params (sorted, only allowed ones)
const allowedParams = ['page', 'category', 'sort'];
const query = new URLSearchParams();
for (const [key, value] of Object.entries(params)) {
if (allowedParams.includes(key) && value) {
query.set(key, value);
}
}
const queryStr = query.toString();
if (queryStr) {
canonical += `?${queryStr}`;
}
return `${this.baseUrl}${canonical}`;
}
// Handle pagination canonicals
getPaginationCanonical(path, page, totalPages) {
// Page 1 should canonical to base URL
if (page === 1) {
return this.getCanonical(path);
}
return this.getCanonical(path, { page });
}
// Handle locale canonicals
getLocaleCanonical(path, locale, defaultLocale) {
if (locale === defaultLocale) {
return this.getCanonical(path);
}
return this.getCanonical(`/${locale}${path}`);
}
// Generate hreflang tags
getHreflangTags(path, locales, defaultLocale) {
return locales.map(locale => ({
type: 'link',
rel: 'alternate',
hreflang: locale,
href: this.getLocaleCanonical(path, locale, defaultLocale),
})).concat({
type: 'link',
rel: 'alternate',
hreflang: 'x-default',
href: this.getCanonical(path),
});
}
}javascript
// CANONICAL URL SYSTEM
class CanonicalManager {
constructor(baseUrl) {
this.baseUrl = baseUrl;
}
// Generate canonical for a route
getCanonical(path, params = {}) {
// Remove trailing slash
let canonical = path.replace(/\/$/, '') || '/';
// Normalize query params (sorted, only allowed ones)
const allowedParams = ['page', 'category', 'sort'];
const query = new URLSearchParams();
for (const [key, value] of Object.entries(params)) {
if (allowedParams.includes(key) && value) {
query.set(key, value);
}
}
const queryStr = query.toString();
if (queryStr) {
canonical += `?${queryStr}`;
}
return `${this.baseUrl}${canonical}`;
}
// Handle pagination canonicals
getPaginationCanonical(path, page, totalPages) {
// Page 1 should canonical to base URL
if (page === 1) {
return this.getCanonical(path);
}
return this.getCanonical(path, { page });
}
// Handle locale canonicals
getLocaleCanonical(path, locale, defaultLocale) {
if (locale === defaultLocale) {
return this.getCanonical(path);
}
return this.getCanonical(`/${locale}${path}`);
}
// Generate hreflang tags
getHreflangTags(path, locales, defaultLocale) {
return locales.map(locale => ({
type: 'link',
rel: 'alternate',
hreflang: locale,
href: this.getLocaleCanonical(path, locale, defaultLocale),
})).concat({
type: 'link',
rel: 'alternate',
hreflang: 'x-default',
href: this.getCanonical(path),
});
}
}Meta Tag Deduplication
Meta标签去重
javascript
// META TAG DEDUPLICATION
class MetaDeduplicator {
constructor() {
this.tags = [];
}
// Add tag with deduplication key
add(tag) {
const key = this.getDeduplicationKey(tag);
// Remove existing tag with same key
this.tags = this.tags.filter(t =>
this.getDeduplicationKey(t) !== key
);
this.tags.push(tag);
}
getDeduplicationKey(tag) {
if (tag.type === 'title') return 'title';
if (tag.name) return `name:${tag.name}`;
if (tag.property) return `property:${tag.property}`;
if (tag.httpEquiv) return `http-equiv:${tag.httpEquiv}`;
if (tag.rel === 'canonical') return 'canonical';
return JSON.stringify(tag);
}
// Get final list (last added wins for duplicates)
getTags() {
return this.tags;
}
}
// Integration with nested routes
function collectMetaTags(routeHierarchy) {
const dedup = new MetaDeduplicator();
// Apply from root to leaf (later overrides earlier)
for (const route of routeHierarchy) {
if (route.meta) {
for (const tag of route.meta) {
dedup.add(tag);
}
}
}
return dedup.getTags();
}javascript
// META TAG DEDUPLICATION
class MetaDeduplicator {
constructor() {
this.tags = [];
}
// Add tag with deduplication key
add(tag) {
const key = this.getDeduplicationKey(tag);
// Remove existing tag with same key
this.tags = this.tags.filter(t =>
this.getDeduplicationKey(t) !== key
);
this.tags.push(tag);
}
getDeduplicationKey(tag) {
if (tag.type === 'title') return 'title';
if (tag.name) return `name:${tag.name}`;
if (tag.property) return `property:${tag.property}`;
if (tag.httpEquiv) return `http-equiv:${tag.httpEquiv}`;
if (tag.rel === 'canonical') return 'canonical';
return JSON.stringify(tag);
}
// Get final list (last added wins for duplicates)
getTags() {
return this.tags;
}
}
// Integration with nested routes
function collectMetaTags(routeHierarchy) {
const dedup = new MetaDeduplicator();
// Apply from root to leaf (later overrides earlier)
for (const route of routeHierarchy) {
if (route.meta) {
for (const tag of route.meta) {
dedup.add(tag);
}
}
}
return dedup.getTags();
}Related Skills
相关技能
- See web-app-architectures for SPA vs MPA
- See rendering-patterns for SSR, SSG impact
- See meta-frameworks-overview for built-in SEO features
- 查看 web-app-architectures 了解SPA与MPA的对比
- 查看 rendering-patterns 了解SSR、SSG的影响
- 查看 meta-frameworks-overview 了解内置SEO功能