blog-factcheck

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Blog Fact-Check

博客事实核查

Verify statistics, claims, and source attributions in blog posts. Pure Claude pipeline with no external NLP dependencies.
校验博客文章中的统计数据、声明和来源归属。纯Claude工作流,无外部NLP依赖。

Workflow

工作流

Step 1: Read the Blog Post

步骤1:读取博客文章

Read the target file and identify all sections containing data claims.
读取目标文件,识别所有包含数据声明的段落。

Step 2: Extract Statistical Claims

步骤2:提取统计声明

Scan the full text for every claim that includes a number, percentage, dollar amount, or named source. Build a claims list with these fields:
FieldDescription
claim_textThe exact sentence or phrase containing the statistic
valueThe numeric value (e.g., "42%", "$1.2M", "3x")
attributionNamed source if present (e.g., "HubSpot", "Gartner 2025")
urlCited URL if present (from markdown link or parenthetical)
locationHeading or line number where the claim appears
扫描全文,找出所有包含数字、百分比、金额或命名来源的声明,构建声明列表,包含以下字段:
字段描述
claim_text包含统计数据的完整句子或短语
value数值(例如:"42%", "$1.2M", "3x")
attribution命名来源(如果存在,例如:"HubSpot", "Gartner 2025")
url被引用的URL(如果存在,来自markdown链接或括号注释)
location声明出现的标题或行号

Step 3: Verify Cited Claims

步骤3:校验带引用的声明

For each claim that includes a URL:
  1. Fetch the source page via WebFetch
  2. Search the returned content for the specific numeric value
  3. If exact value found, check surrounding context matches the claim topic
  4. Assign a confidence score (see Verification Scoring below)
Process claims sequentially to avoid rate-limiting source sites.
针对每个附带URL的声明:
  1. 通过WebFetch拉取源页面内容
  2. 在返回内容中搜索指定的数值
  3. 如果找到完全匹配的数值,检查上下文是否与声明主题匹配
  4. 给出置信度评分(参考下方的核验评分规则)
按顺序处理声明,避免触发源站的访问频率限制。

Step 4: Flag Uncited Claims

步骤4:标记无引用的声明

For claims without a URL:
  • Mark status as UNVERIFIED
  • Suggest a search query the user can run to find a source
  • If the attribution names a specific organization, suggest their domain
针对没有附带URL的声明:
  • 将状态标记为UNVERIFIED
  • 建议用户可执行的搜索查询来查找对应来源
  • 如果声明标注了特定机构来源,建议查询该机构的官方域名

Step 5: Generate Verification Report

步骤5:生成核验报告

Output the full results table, summary statistics, and recommended actions.
输出完整的结果表格、统计概要和建议操作。

Claim Extraction Patterns

声明提取规则

Identify claims matching these structures:
Fully cited (highest priority):
  • [Number]% [claim] ([Source], [Year])
    - parenthetical citation
  • [claim] [Number]% ... [markdown link to source]
    - inline link
  • According to [Source], [Number]...
    - attribution lead
Uncited statistics (flag for sourcing):
  • [Number]% of [noun phrase]
    - standalone percentage
  • [Number]x more/less/higher/lower
    - multiplier claims
  • $[Number] [claim]
    - dollar figures without attribution
Weak signals (check context before extracting):
  • studies show
    ,
    research indicates
    ,
    data suggests
    + nearby number
  • survey found
    ,
    report reveals
    ,
    analysis shows
    + nearby number
  • Round numbers in isolation (e.g., "millions of users") - skip unless specific
识别符合以下结构的声明:
完整引用(最高优先级):
  • [数字]% [声明内容] ([来源], [年份])
    - 括号形式引用
  • [声明内容] [数字]% ... [指向来源的markdown链接]
    - 行内链接
  • 根据[来源],[数字]...
    - 开头标注来源
无引用统计数据(标记需补充来源):
  • [数字]% of [名词短语]
    - 独立出现的百分比
  • [数字]x 更多/更少/更高/更低
    - 倍数类声明
  • $[数字] [声明内容]
    - 无来源标注的金额数据
弱信号(提取前先检查上下文):
  • 研究表明
    数据显示
    + 附近出现数字
  • 调研发现
    报告披露
    分析表明
    + 附近出现数字
  • 单独出现的约数(例如:“数百万用户”)- 无具体数值则跳过

Verification Scoring

核验评分规则

ScoreStatusCriteria
1.0VERIFIEDExact number found on cited page in matching context
0.7-0.9PARAPHRASESimilar data found but with different wording, rounding, or timeframe
0.3-0.6WEAKSource page exists and covers the topic but the specific statistic is not visible
0.0NOT FOUNDCited page does not contain the claimed data anywhere
N/AUNVERIFIEDNo source URL provided for the claim
Scoring guidance:
  • A claim of "43%" when the source says "nearly half" scores 0.8
  • A claim of "2024" data when the source only has "2023" scores 0.7
  • A claim citing a homepage when the stat lives on a subpage scores 0.3
  • A 404 or unreachable URL scores 0.0
评分状态判定标准
1.0VERIFIED被引用页面中找到了完全匹配的数值,且上下文匹配
0.7-0.9PARAPHRASE找到了相似数据,但表述、取整或时间范围存在差异
0.3-0.6WEAK源页面存在且覆盖相关主题,但未找到指定的统计数据
0.0NOT FOUND被引用页面中完全不存在声明的相关数据
N/AUNVERIFIED声明未提供来源URL
评分指引
  • 声明为“43%”,但来源标注为“接近一半”,评分0.8
  • 声明为“2024年”数据,但来源仅提供2023年数据,评分0.7
  • 声明引用首页,但统计数据实际在子页面,评分0.3
  • URL返回404或无法访问,评分0.0

Output Format

输出格式

Verification Report: [Post Title]

核验报告:[文章标题]

File: [path] Claims found: [total] Verified: [count] | Paraphrase: [count] | Weak: [count] | Not Found: [count] | Unverified: [count]
#ClaimSource URLScoreStatusNotes
1"73% of marketers..."https://example.com/report1.0VERIFIEDExact match found in section 3
2"5x ROI improvement"https://example.com/study0.8PARAPHRASESource says "nearly 5x"
3"60% prefer video"(none)N/AUNVERIFIEDTry: "video preference statistics 2025"
文件:[路径] 找到声明总数:[总数] 已验证:[数量] | 释义匹配:[数量] | 弱匹配:[数量] | 未找到:[数量] | 未验证:[数量]
序号声明内容来源URL评分状态备注
1"73%的营销人员..."https://example.com/report1.0VERIFIED在第3部分找到完全匹配内容
2"ROI提升5倍"https://example.com/study0.8PARAPHRASE来源标注为“接近5倍”
3"60%用户偏好视频"(无)N/AUNVERIFIED建议搜索:“2025年视频偏好统计数据”

Recommended Actions

建议操作

  • [List claims that need source URLs]
  • [List claims with weak or not-found scores that need replacement sources]
  • [List claims where the source data may be outdated]
  • [列出需要补充来源URL的声明]
  • [列出匹配度弱或未找到来源、需要替换来源的声明]
  • [列出来源数据可能过时的声明]

Integration

集成说明

This skill can be called from
blog-analyze
as an optional deep-verification step. When invoked from the analyzer, only claims scoring below 0.7 are flagged in the analysis report.
Standalone usage:
/blog factcheck path/to/post.md
本skill可作为可选的深度校验步骤,从
blog-analyze
中调用。从分析器调用时,仅评分低于0.7的声明会在分析报告中标记。
独立使用命令:
/blog factcheck path/to/post.md

Limitations

局限性

  • Paywalled content: WebFetch cannot access content behind login walls. These score as WEAK (0.5) with a note about paywall detection.
  • Dynamic pages: JavaScript-rendered content may not be available via WebFetch. If the page returns minimal content, note this in the status.
  • PDF sources: WebFetch may not extract PDF text reliably. Flag PDF URLs for manual verification.
  • Archived pages: If a URL returns 404, suggest checking web.archive.org.
  • Rate limits: Process no more than 10 URLs per run to avoid overwhelming source servers. If a post has more than 10 cited URLs, verify the first 10 and list the remainder as SKIPPED.
付费墙内容:WebFetch无法访问登录墙后的内容,此类情况评分为WEAK(0.5),并备注检测到付费墙。 动态页面:JavaScript渲染的内容可能无法通过WebFetch获取,如果页面返回内容极少,需在状态中备注。 PDF来源:WebFetch可能无法可靠提取PDF文本,标记PDF类URL需手动核验。 归档页面:如果URL返回404,建议查询web.archive.org。 频率限制:单次运行最多处理10个URL,避免给源站造成过大压力。如果文章引用URL超过10个,仅核验前10个,其余标记为SKIPPED。