blog-cannibalization

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Blog Cannibalization - Keyword Overlap Detection

Blog Cannibalization - 关键词重叠检测

Detect when multiple blog posts compete for the same search keywords. Two modes: local-only analysis (default) and DataForSEO API mode for SERP-level data.

检测多个博客文章是否针对相同搜索关键词展开竞争。支持两种模式：默认的纯本地分析模式，以及用于获取SERP级数据的DataForSEO API模式。

Two Modes

两种模式

Mode	Flag	Cost	Data Source
Local	(default)	Free	File content analysis via Grep/Read
API	`--api`	~$0.01/call	DataForSEO Page Intersection + Ranked Keywords

Local mode works without any API keys. API mode requires DataForSEO credentials set as environment variables:

DATAFORSEO_LOGIN

and

DATAFORSEO_PASSWORD

模式	标识	成本	数据源
本地模式	（默认）	免费	通过Grep/Read分析文件内容
API模式	`--api`	约0.01美元/次调用	DataForSEO Page Intersection + 排名关键词

本地模式无需任何API密钥。API模式需要将DataForSEO凭据设置为环境变量：

DATAFORSEO_LOGIN

和

DATAFORSEO_PASSWORD

。

Local Mode Workflow

本地模式工作流程

Step 1: Scan Blog Files

步骤1：扫描博客文件

Use Glob to find all content files in the target directory:

Patterns:
```
**/*.md
```
,
```
**/*.mdx
```
,
```
**/*.html
```
Skip files in
```
node_modules/
```
,
```
.git/
```
,
```
drafts/
```

使用Glob查找目标目录中的所有内容文件：

匹配模式：
```
**/*.md
```
,
```
**/*.mdx
```
,
```
**/*.html
```
跳过
```
node_modules/
```
,
```
.git/
```
,
```
drafts/
```
目录下的文件

Step 2: Extract Primary Keywords

步骤2：提取核心关键词

For each file, read and extract keyword signals from:

Title tag or H1 heading (highest weight)
H2 headings (medium weight)
First paragraph (supporting signal)
Meta description if present in frontmatter

Primary keyword extraction method:

Tokenize title and H1 into 1-gram, 2-gram, and 3-gram phrases
Score each phrase by frequency across title + H2s + first paragraph
Select the top-scoring 2-3 word phrase as the primary keyword
Record secondary keywords from H2 headings

针对每个文件，读取并从以下位置提取关键词信号：

标题标签或H1标题（权重最高）
H2标题（权重中等）
第一段内容（辅助信号）
若前置元数据中存在元描述，也会纳入提取

核心关键词提取方法：

将标题和H1拆分为1词、2词、3词短语
根据短语在标题+H2标题+第一段中的出现频率打分
选择得分最高的2-3词短语作为核心关键词
记录H2标题中的次要关键词

Step 3: Cluster by Similarity

步骤3：按相似度聚类

Group posts into clusters using these matching rules (in priority order):

Exact match - identical primary keyword across 2+ posts
Stem match - same root word (e.g., "optimize" vs "optimization")
Semantic overlap - Claude determines that two keywords target the same search intent (e.g., "best CRM software" vs "top CRM tools 2026")
Subset match - one keyword contains another (e.g., "email marketing" vs "email marketing for startups")

按照以下优先级规则将文章分组为聚类：

完全匹配 - 2篇及以上文章的核心关键词完全相同
词干匹配 - 词根相同（例如："optimize" 与 "optimization"）
语义重叠 - Claude判定两个关键词针对相同搜索意图（例如："best CRM software" 与 "top CRM tools 2026"）
子集匹配 - 一个关键词包含另一个（例如："email marketing" 与 "email marketing for startups"）

Step 4: Score and Flag

步骤4：评分与标记

For each cluster with 2+ posts, assess severity and generate a recommendation.

针对包含2篇及以上文章的每个聚类，评估严重程度并生成建议。

Step 5: Output Report

步骤5：输出报告

Display the results table and per-cluster recommendations.

显示结果表格及每个聚类的建议。

API Mode Workflow (DataForSEO)

API模式工作流程（DataForSEO）

Requires the

--api

flag. Uses WebFetch to call DataForSEO endpoints.

需要添加

--api

标识。使用WebFetch调用DataForSEO端点。

Endpoints Used

使用的端点

Page Intersection - find keywords where multiple URLs rank:

POST https://api.dataforseo.com/v3/dataforseo_labs/google/page_intersection/live
Authorization: Basic <base64(login:password)>

{
  "pages": {
    "1": "https://example.com/post-a",
    "2": "https://example.com/post-b"
  },
  "language_code": "en",
  "location_code": 2840
}

Cost: ~$0.01 per call. Returns overlapping keywords with position, volume, CPC.

Ranked Keywords - get all keywords a single URL ranks for:

POST https://api.dataforseo.com/v3/dataforseo_labs/google/ranked_keywords/live

{
  "target": "https://example.com/post-a",
  "language_code": "en",
  "location_code": 2840
}

Page Intersection - 查找多个URL共同排名的关键词：

POST https://api.dataforseo.com/v3/dataforseo_labs/google/page_intersection/live
Authorization: Basic <base64(login:password)>

{
  "pages": {
    "1": "https://example.com/post-a",
    "2": "https://example.com/post-b"
  },
  "language_code": "en",
  "location_code": 2840
}

成本：约0.01美元/次调用。返回包含排名位置、搜索量、CPC的重叠关键词。

Ranked Keywords - 获取单个URL排名的所有关键词：

POST https://api.dataforseo.com/v3/dataforseo_labs/google/ranked_keywords/live

{
  "target": "https://example.com/post-a",
  "language_code": "en",
  "location_code": 2840
}

API Analysis Steps

API分析步骤

Collect all published URLs from the user (or sitemap)
Run Ranked Keywords for each URL to build keyword profiles
Run Page Intersection for URL pairs that share keyword clusters
Calculate severity using the formula below
Output enriched report with search volume and position data

收集用户提供的所有已发布URL（或从站点地图获取）
针对每个URL调用Ranked Keywords接口，构建关键词档案
针对共享关键词聚类的URL对调用Page Intersection接口
使用下方公式计算严重程度
输出包含搜索量和排名数据的增强版报告

Severity Scoring

严重程度评分

Four severity levels based on overlap signals:

Level	Criteria	Action Urgency
Critical	Same exact keyword, both pages in top 20	Immediate
High	Same keyword cluster, one page outranks the other	This week
Medium	Related keywords with partial SERP overlap	This month
Low	Semantic similarity but different confirmed intents	Monitor

基于重叠信号分为四个严重等级：

等级	判定标准	处理优先级
紧急	完全相同的关键词，且两个页面均处于搜索结果前20名	立即处理
高	同一关键词聚类，其中一个页面排名高于另一个	本周内处理
中	相关关键词，存在部分SERP重叠	本月内处理
低	语义相似但已确认意图不同	持续监控

Severity Formula (API Mode)

API模式严重程度公式

severity_score = overlap_count x avg_search_volume x (1 / position_gap)

Where:

```
overlap_count
```
= number of shared ranking keywords
```
avg_search_volume
```
= mean monthly volume of shared keywords
```
position_gap
```
= absolute difference in average ranking position (min 1)

Higher score = more urgent cannibalization problem.

severity_score = overlap_count x avg_search_volume x (1 / position_gap)

其中：

```
overlap_count
```
= 共享排名关键词的数量
```
avg_search_volume
```
= 共享关键词的平均月搜索量
```
position_gap
```
= 平均排名位置的绝对差值（最小值为1）

分数越高，关键词自相竞争问题越紧急。

Severity Heuristic (Local Mode)

本地模式严重程度判定规则

Without SERP data, use a simplified scoring:

Critical: Exact primary keyword match between posts
High: Stem match on primary keyword, or 3+ shared H2 keywords
Medium: Semantic overlap on primary keyword
Low: Subset match only, or shared secondary keywords

无SERP数据时，使用简化评分规则：

紧急：文章间核心关键词完全匹配
高：核心关键词词干匹配，或共享3个及以上H2关键词
中：核心关键词语义重叠
低：仅存在子集匹配，或共享次要关键词

Output Format

输出格式

Summary Table

汇总表格

| Post A | Post B | Shared Keywords | Severity | Recommendation |
|--------|--------|-----------------|----------|----------------|
| /best-crm-tools | /top-crm-software | best crm, crm tools, crm software | Critical | MERGE |
| /email-tips | /email-marketing-guide | email marketing | High | DIFFERENTIATE |
| /seo-basics | /seo-for-beginners | seo basics, beginner seo | Critical | CANONICAL |
| /react-hooks | /react-state-mgmt | react, state | Low | NO ACTION |

| 文章A | 文章B | 共享关键词 | 严重程度 | 建议 |
|--------|--------|-----------------|----------|----------------|
| /best-crm-tools | /top-crm-software | best crm, crm tools, crm software | 紧急 | MERGE |
| /email-tips | /email-marketing-guide | email marketing | 高 | DIFFERENTIATE |
| /seo-basics | /seo-for-beginners | seo basics, beginner seo | 紧急 | CANONICAL |
| /react-hooks | /react-state-mgmt | react, state | 低 | NO ACTION |

Per-Cluster Detail

聚类详情

For each flagged cluster, provide:

Both post titles and URLs
Full list of overlapping keywords (with volume if API mode)
Which post is stronger (more comprehensive, better structured)
Specific recommendation with rationale

针对每个标记的聚类，提供：

两篇文章的标题和URL
重叠关键词完整列表（API模式下包含搜索量）
哪篇文章更具优势（内容更全面、结构更优）
带有理由的具体建议

Recommendations

建议方案

Four possible actions for each cannibalization cluster:

针对每个关键词自相竞争聚类，有四种可选处理方式：

MERGE

MERGE（合并）

When both pages are thin or cover the same intent with similar depth.

Combine the best content from both into one comprehensive post
301 redirect the weaker URL to the merged post
Preserve all internal links pointing to either URL

当两个页面内容单薄，或针对相同意图且深度相近时：

将两篇文章的优质内容合并为一篇综合性文章
将较弱页面设置301 redirect到合并后的文章
保留指向任意一篇文章的所有内部链接

DIFFERENTIATE

DIFFERENTIATE（差异化）

When pages serve different intents but keyword targeting overlaps.

Shift the primary keyword of the weaker post to a related long-tail
Update the title, H1, and meta description to reflect the new focus
Add internal links between the two posts to signal distinct topics

当页面服务不同意图但关键词定位重叠时：

将较弱页面的核心关键词转向相关长尾词
更新标题、H1和元描述以体现新定位
在两篇文章间添加内部链接，明确主题差异

CANONICAL

CANONICAL（规范链接）

When one post is clearly the authority and the other is a lesser duplicate.

Add
```
rel="canonical"
```
on the weaker page pointing to the authority
Consider noindexing the weaker page if it adds no unique value
Link from the weaker page to the authority page

当一篇文章明显是权威内容，另一篇为次要重复内容时：

在较弱页面添加
```
rel="canonical"
```
标签指向权威页面
如果较弱页面无独特价值，可考虑设置noindex
在较弱页面添加指向权威页面的链接

NO ACTION

NO ACTION（无需处理）

When intent is genuinely different despite surface-level keyword similarity.

Document the reasoning for future audits
Monitor rankings quarterly for any position changes
Re-evaluate if either post drops in rankings

当表面关键词相似但实际意图完全不同时：

记录理由以便未来审计
每季度监控排名变化
若任意一篇文章排名下降，重新评估

Error Handling

错误处理

No blog files found: If the directory contains no .md, .mdx, or .html files, report "No blog files found in [directory]" and suggest checking the path
DataForSEO credentials missing: In API mode, if credentials are not configured, fall back to local mode automatically and notify the user
API rate limits: DataForSEO has per-minute rate limits. If a 429 response is received, wait and retry once. If it persists, switch to local mode for remaining URLs
WebFetch failures: If a source URL is unreachable, skip it and note "Unable to verify - source unavailable" in the report
Single-post directory: If only one blog post exists, report "Cannibalization analysis requires at least 2 posts" and exit gracefully

未找到博客文件：若目录中无.md、.mdx或.html文件，报告“在[目录]中未找到博客文件”并建议检查路径
缺少DataForSEO凭据：API模式下若未配置凭据，自动 fallback到本地模式并通知用户
API速率限制：DataForSEO有每分钟速率限制。若收到429响应，等待并重试一次。若问题持续，剩余URL切换为本地模式处理
WebFetch失败：若源URL无法访问，跳过该URL并在报告中注明“无法验证 - 源不可用”
单文章目录：若仅存在一篇博客文章，报告“关键词自相竞争分析至少需要2篇文章”并优雅退出