blog-cannibalization
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseBlog Cannibalization - Keyword Overlap Detection
Blog Cannibalization - 关键词重叠检测
Detect when multiple blog posts compete for the same search keywords. Two modes:
local-only analysis (default) and DataForSEO API mode for SERP-level data.
检测多个博客文章是否针对相同搜索关键词展开竞争。支持两种模式:默认的纯本地分析模式,以及用于获取SERP级数据的DataForSEO API模式。
Two Modes
两种模式
| Mode | Flag | Cost | Data Source |
|---|---|---|---|
| Local | (default) | Free | File content analysis via Grep/Read |
| API | | ~$0.01/call | DataForSEO Page Intersection + Ranked Keywords |
Local mode works without any API keys. API mode requires DataForSEO credentials
set as environment variables: and .
DATAFORSEO_LOGINDATAFORSEO_PASSWORD| 模式 | 标识 | 成本 | 数据源 |
|---|---|---|---|
| 本地模式 | (默认) | 免费 | 通过Grep/Read分析文件内容 |
| API模式 | | 约0.01美元/次调用 | DataForSEO Page Intersection + 排名关键词 |
本地模式无需任何API密钥。API模式需要将DataForSEO凭据设置为环境变量: 和 。
DATAFORSEO_LOGINDATAFORSEO_PASSWORDLocal Mode Workflow
本地模式工作流程
Step 1: Scan Blog Files
步骤1:扫描博客文件
Use Glob to find all content files in the target directory:
- Patterns: ,
**/*.md,**/*.mdx**/*.html - Skip files in ,
node_modules/,.git/drafts/
使用Glob查找目标目录中的所有内容文件:
- 匹配模式:,
**/*.md,**/*.mdx**/*.html - 跳过 ,
node_modules/,.git/目录下的文件drafts/
Step 2: Extract Primary Keywords
步骤2:提取核心关键词
For each file, read and extract keyword signals from:
- Title tag or H1 heading (highest weight)
- H2 headings (medium weight)
- First paragraph (supporting signal)
- Meta description if present in frontmatter
Primary keyword extraction method:
- Tokenize title and H1 into 1-gram, 2-gram, and 3-gram phrases
- Score each phrase by frequency across title + H2s + first paragraph
- Select the top-scoring 2-3 word phrase as the primary keyword
- Record secondary keywords from H2 headings
针对每个文件,读取并从以下位置提取关键词信号:
- 标题标签或H1标题(权重最高)
- H2标题(权重中等)
- 第一段内容(辅助信号)
- 若前置元数据中存在元描述,也会纳入提取
核心关键词提取方法:
- 将标题和H1拆分为1词、2词、3词短语
- 根据短语在标题+H2标题+第一段中的出现频率打分
- 选择得分最高的2-3词短语作为核心关键词
- 记录H2标题中的次要关键词
Step 3: Cluster by Similarity
步骤3:按相似度聚类
Group posts into clusters using these matching rules (in priority order):
- Exact match - identical primary keyword across 2+ posts
- Stem match - same root word (e.g., "optimize" vs "optimization")
- Semantic overlap - Claude determines that two keywords target the same search intent (e.g., "best CRM software" vs "top CRM tools 2026")
- Subset match - one keyword contains another (e.g., "email marketing" vs "email marketing for startups")
按照以下优先级规则将文章分组为聚类:
- 完全匹配 - 2篇及以上文章的核心关键词完全相同
- 词干匹配 - 词根相同(例如:"optimize" 与 "optimization")
- 语义重叠 - Claude判定两个关键词针对相同搜索意图(例如:"best CRM software" 与 "top CRM tools 2026")
- 子集匹配 - 一个关键词包含另一个(例如:"email marketing" 与 "email marketing for startups")
Step 4: Score and Flag
步骤4:评分与标记
For each cluster with 2+ posts, assess severity and generate a recommendation.
针对包含2篇及以上文章的每个聚类,评估严重程度并生成建议。
Step 5: Output Report
步骤5:输出报告
Display the results table and per-cluster recommendations.
显示结果表格及每个聚类的建议。
API Mode Workflow (DataForSEO)
API模式工作流程(DataForSEO)
Requires the flag. Uses WebFetch to call DataForSEO endpoints.
--api需要添加 标识。使用WebFetch调用DataForSEO端点。
--apiEndpoints Used
使用的端点
Page Intersection - find keywords where multiple URLs rank:
POST https://api.dataforseo.com/v3/dataforseo_labs/google/page_intersection/live
Authorization: Basic <base64(login:password)>
{
"pages": {
"1": "https://example.com/post-a",
"2": "https://example.com/post-b"
},
"language_code": "en",
"location_code": 2840
}Cost: ~$0.01 per call. Returns overlapping keywords with position, volume, CPC.
Ranked Keywords - get all keywords a single URL ranks for:
POST https://api.dataforseo.com/v3/dataforseo_labs/google/ranked_keywords/live
{
"target": "https://example.com/post-a",
"language_code": "en",
"location_code": 2840
}Page Intersection - 查找多个URL共同排名的关键词:
POST https://api.dataforseo.com/v3/dataforseo_labs/google/page_intersection/live
Authorization: Basic <base64(login:password)>
{
"pages": {
"1": "https://example.com/post-a",
"2": "https://example.com/post-b"
},
"language_code": "en",
"location_code": 2840
}成本:约0.01美元/次调用。返回包含排名位置、搜索量、CPC的重叠关键词。
Ranked Keywords - 获取单个URL排名的所有关键词:
POST https://api.dataforseo.com/v3/dataforseo_labs/google/ranked_keywords/live
{
"target": "https://example.com/post-a",
"language_code": "en",
"location_code": 2840
}API Analysis Steps
API分析步骤
- Collect all published URLs from the user (or sitemap)
- Run Ranked Keywords for each URL to build keyword profiles
- Run Page Intersection for URL pairs that share keyword clusters
- Calculate severity using the formula below
- Output enriched report with search volume and position data
- 收集用户提供的所有已发布URL(或从站点地图获取)
- 针对每个URL调用Ranked Keywords接口,构建关键词档案
- 针对共享关键词聚类的URL对调用Page Intersection接口
- 使用下方公式计算严重程度
- 输出包含搜索量和排名数据的增强版报告
Severity Scoring
严重程度评分
Four severity levels based on overlap signals:
| Level | Criteria | Action Urgency |
|---|---|---|
| Critical | Same exact keyword, both pages in top 20 | Immediate |
| High | Same keyword cluster, one page outranks the other | This week |
| Medium | Related keywords with partial SERP overlap | This month |
| Low | Semantic similarity but different confirmed intents | Monitor |
基于重叠信号分为四个严重等级:
| 等级 | 判定标准 | 处理优先级 |
|---|---|---|
| 紧急 | 完全相同的关键词,且两个页面均处于搜索结果前20名 | 立即处理 |
| 高 | 同一关键词聚类,其中一个页面排名高于另一个 | 本周内处理 |
| 中 | 相关关键词,存在部分SERP重叠 | 本月内处理 |
| 低 | 语义相似但已确认意图不同 | 持续监控 |
Severity Formula (API Mode)
API模式严重程度公式
severity_score = overlap_count x avg_search_volume x (1 / position_gap)Where:
- = number of shared ranking keywords
overlap_count - = mean monthly volume of shared keywords
avg_search_volume - = absolute difference in average ranking position (min 1)
position_gap
Higher score = more urgent cannibalization problem.
severity_score = overlap_count x avg_search_volume x (1 / position_gap)其中:
- = 共享排名关键词的数量
overlap_count - = 共享关键词的平均月搜索量
avg_search_volume - = 平均排名位置的绝对差值(最小值为1)
position_gap
分数越高,关键词自相竞争问题越紧急。
Severity Heuristic (Local Mode)
本地模式严重程度判定规则
Without SERP data, use a simplified scoring:
- Critical: Exact primary keyword match between posts
- High: Stem match on primary keyword, or 3+ shared H2 keywords
- Medium: Semantic overlap on primary keyword
- Low: Subset match only, or shared secondary keywords
无SERP数据时,使用简化评分规则:
- 紧急:文章间核心关键词完全匹配
- 高:核心关键词词干匹配,或共享3个及以上H2关键词
- 中:核心关键词语义重叠
- 低:仅存在子集匹配,或共享次要关键词
Output Format
输出格式
Summary Table
汇总表格
| Post A | Post B | Shared Keywords | Severity | Recommendation |
|--------|--------|-----------------|----------|----------------|
| /best-crm-tools | /top-crm-software | best crm, crm tools, crm software | Critical | MERGE |
| /email-tips | /email-marketing-guide | email marketing | High | DIFFERENTIATE |
| /seo-basics | /seo-for-beginners | seo basics, beginner seo | Critical | CANONICAL |
| /react-hooks | /react-state-mgmt | react, state | Low | NO ACTION || 文章A | 文章B | 共享关键词 | 严重程度 | 建议 |
|--------|--------|-----------------|----------|----------------|
| /best-crm-tools | /top-crm-software | best crm, crm tools, crm software | 紧急 | MERGE |
| /email-tips | /email-marketing-guide | email marketing | 高 | DIFFERENTIATE |
| /seo-basics | /seo-for-beginners | seo basics, beginner seo | 紧急 | CANONICAL |
| /react-hooks | /react-state-mgmt | react, state | 低 | NO ACTION |Per-Cluster Detail
聚类详情
For each flagged cluster, provide:
- Both post titles and URLs
- Full list of overlapping keywords (with volume if API mode)
- Which post is stronger (more comprehensive, better structured)
- Specific recommendation with rationale
针对每个标记的聚类,提供:
- 两篇文章的标题和URL
- 重叠关键词完整列表(API模式下包含搜索量)
- 哪篇文章更具优势(内容更全面、结构更优)
- 带有理由的具体建议
Recommendations
建议方案
Four possible actions for each cannibalization cluster:
针对每个关键词自相竞争聚类,有四种可选处理方式:
MERGE
MERGE(合并)
When both pages are thin or cover the same intent with similar depth.
- Combine the best content from both into one comprehensive post
- 301 redirect the weaker URL to the merged post
- Preserve all internal links pointing to either URL
当两个页面内容单薄,或针对相同意图且深度相近时:
- 将两篇文章的优质内容合并为一篇综合性文章
- 将较弱页面设置301 redirect到合并后的文章
- 保留指向任意一篇文章的所有内部链接
DIFFERENTIATE
DIFFERENTIATE(差异化)
When pages serve different intents but keyword targeting overlaps.
- Shift the primary keyword of the weaker post to a related long-tail
- Update the title, H1, and meta description to reflect the new focus
- Add internal links between the two posts to signal distinct topics
当页面服务不同意图但关键词定位重叠时:
- 将较弱页面的核心关键词转向相关长尾词
- 更新标题、H1和元描述以体现新定位
- 在两篇文章间添加内部链接,明确主题差异
CANONICAL
CANONICAL(规范链接)
When one post is clearly the authority and the other is a lesser duplicate.
- Add on the weaker page pointing to the authority
rel="canonical" - Consider noindexing the weaker page if it adds no unique value
- Link from the weaker page to the authority page
当一篇文章明显是权威内容,另一篇为次要重复内容时:
- 在较弱页面添加 标签指向权威页面
rel="canonical" - 如果较弱页面无独特价值,可考虑设置noindex
- 在较弱页面添加指向权威页面的链接
NO ACTION
NO ACTION(无需处理)
When intent is genuinely different despite surface-level keyword similarity.
- Document the reasoning for future audits
- Monitor rankings quarterly for any position changes
- Re-evaluate if either post drops in rankings
当表面关键词相似但实际意图完全不同时:
- 记录理由以便未来审计
- 每季度监控排名变化
- 若任意一篇文章排名下降,重新评估
Error Handling
错误处理
- No blog files found: If the directory contains no .md, .mdx, or .html files, report "No blog files found in [directory]" and suggest checking the path
- DataForSEO credentials missing: In API mode, if credentials are not configured, fall back to local mode automatically and notify the user
- API rate limits: DataForSEO has per-minute rate limits. If a 429 response is received, wait and retry once. If it persists, switch to local mode for remaining URLs
- WebFetch failures: If a source URL is unreachable, skip it and note "Unable to verify - source unavailable" in the report
- Single-post directory: If only one blog post exists, report "Cannibalization analysis requires at least 2 posts" and exit gracefully
- 未找到博客文件:若目录中无.md、.mdx或.html文件,报告“在[目录]中未找到博客文件”并建议检查路径
- 缺少DataForSEO凭据:API模式下若未配置凭据,自动 fallback到本地模式并通知用户
- API速率限制:DataForSEO有每分钟速率限制。若收到429响应,等待并重试一次。若问题持续,剩余URL切换为本地模式处理
- WebFetch失败:若源URL无法访问,跳过该URL并在报告中注明“无法验证 - 源不可用”
- 单文章目录:若仅存在一篇博客文章,报告“关键词自相竞争分析至少需要2篇文章”并优雅退出