blog-factcheck
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseBlog Fact-Check
博客事实核查
Verify statistics, claims, and source attributions in blog posts. Pure Claude
pipeline with no external NLP dependencies.
校验博客文章中的统计数据、声明和来源归属。纯Claude工作流,无外部NLP依赖。
Workflow
工作流
Step 1: Read the Blog Post
步骤1:读取博客文章
Read the target file and identify all sections containing data claims.
读取目标文件,识别所有包含数据声明的段落。
Step 2: Extract Statistical Claims
步骤2:提取统计声明
Scan the full text for every claim that includes a number, percentage, dollar
amount, or named source. Build a claims list with these fields:
| Field | Description |
|---|---|
| claim_text | The exact sentence or phrase containing the statistic |
| value | The numeric value (e.g., "42%", "$1.2M", "3x") |
| attribution | Named source if present (e.g., "HubSpot", "Gartner 2025") |
| url | Cited URL if present (from markdown link or parenthetical) |
| location | Heading or line number where the claim appears |
扫描全文,找出所有包含数字、百分比、金额或命名来源的声明,构建声明列表,包含以下字段:
| 字段 | 描述 |
|---|---|
| claim_text | 包含统计数据的完整句子或短语 |
| value | 数值(例如:"42%", "$1.2M", "3x") |
| attribution | 命名来源(如果存在,例如:"HubSpot", "Gartner 2025") |
| url | 被引用的URL(如果存在,来自markdown链接或括号注释) |
| location | 声明出现的标题或行号 |
Step 3: Verify Cited Claims
步骤3:校验带引用的声明
For each claim that includes a URL:
- Fetch the source page via WebFetch
- Search the returned content for the specific numeric value
- If exact value found, check surrounding context matches the claim topic
- Assign a confidence score (see Verification Scoring below)
Process claims sequentially to avoid rate-limiting source sites.
针对每个附带URL的声明:
- 通过WebFetch拉取源页面内容
- 在返回内容中搜索指定的数值
- 如果找到完全匹配的数值,检查上下文是否与声明主题匹配
- 给出置信度评分(参考下方的核验评分规则)
按顺序处理声明,避免触发源站的访问频率限制。
Step 4: Flag Uncited Claims
步骤4:标记无引用的声明
For claims without a URL:
- Mark status as UNVERIFIED
- Suggest a search query the user can run to find a source
- If the attribution names a specific organization, suggest their domain
针对没有附带URL的声明:
- 将状态标记为UNVERIFIED
- 建议用户可执行的搜索查询来查找对应来源
- 如果声明标注了特定机构来源,建议查询该机构的官方域名
Step 5: Generate Verification Report
步骤5:生成核验报告
Output the full results table, summary statistics, and recommended actions.
输出完整的结果表格、统计概要和建议操作。
Claim Extraction Patterns
声明提取规则
Identify claims matching these structures:
Fully cited (highest priority):
- - parenthetical citation
[Number]% [claim] ([Source], [Year]) - - inline link
[claim] [Number]% ... [markdown link to source] - - attribution lead
According to [Source], [Number]...
Uncited statistics (flag for sourcing):
- - standalone percentage
[Number]% of [noun phrase] - - multiplier claims
[Number]x more/less/higher/lower - - dollar figures without attribution
$[Number] [claim]
Weak signals (check context before extracting):
- ,
studies show,research indicates+ nearby numberdata suggests - ,
survey found,report reveals+ nearby numberanalysis shows - Round numbers in isolation (e.g., "millions of users") - skip unless specific
识别符合以下结构的声明:
完整引用(最高优先级):
- - 括号形式引用
[数字]% [声明内容] ([来源], [年份]) - - 行内链接
[声明内容] [数字]% ... [指向来源的markdown链接] - - 开头标注来源
根据[来源],[数字]...
无引用统计数据(标记需补充来源):
- - 独立出现的百分比
[数字]% of [名词短语] - - 倍数类声明
[数字]x 更多/更少/更高/更低 - - 无来源标注的金额数据
$[数字] [声明内容]
弱信号(提取前先检查上下文):
- 、
研究表明+ 附近出现数字数据显示 - 、
调研发现、报告披露+ 附近出现数字分析表明 - 单独出现的约数(例如:“数百万用户”)- 无具体数值则跳过
Verification Scoring
核验评分规则
| Score | Status | Criteria |
|---|---|---|
| 1.0 | VERIFIED | Exact number found on cited page in matching context |
| 0.7-0.9 | PARAPHRASE | Similar data found but with different wording, rounding, or timeframe |
| 0.3-0.6 | WEAK | Source page exists and covers the topic but the specific statistic is not visible |
| 0.0 | NOT FOUND | Cited page does not contain the claimed data anywhere |
| N/A | UNVERIFIED | No source URL provided for the claim |
Scoring guidance:
- A claim of "43%" when the source says "nearly half" scores 0.8
- A claim of "2024" data when the source only has "2023" scores 0.7
- A claim citing a homepage when the stat lives on a subpage scores 0.3
- A 404 or unreachable URL scores 0.0
| 评分 | 状态 | 判定标准 |
|---|---|---|
| 1.0 | VERIFIED | 被引用页面中找到了完全匹配的数值,且上下文匹配 |
| 0.7-0.9 | PARAPHRASE | 找到了相似数据,但表述、取整或时间范围存在差异 |
| 0.3-0.6 | WEAK | 源页面存在且覆盖相关主题,但未找到指定的统计数据 |
| 0.0 | NOT FOUND | 被引用页面中完全不存在声明的相关数据 |
| N/A | UNVERIFIED | 声明未提供来源URL |
评分指引:
- 声明为“43%”,但来源标注为“接近一半”,评分0.8
- 声明为“2024年”数据,但来源仅提供2023年数据,评分0.7
- 声明引用首页,但统计数据实际在子页面,评分0.3
- URL返回404或无法访问,评分0.0
Output Format
输出格式
Verification Report: [Post Title]
核验报告:[文章标题]
File: [path]
Claims found: [total]
Verified: [count] | Paraphrase: [count] | Weak: [count] | Not Found: [count] | Unverified: [count]
| # | Claim | Source URL | Score | Status | Notes |
|---|---|---|---|---|---|
| 1 | "73% of marketers..." | https://example.com/report | 1.0 | VERIFIED | Exact match found in section 3 |
| 2 | "5x ROI improvement" | https://example.com/study | 0.8 | PARAPHRASE | Source says "nearly 5x" |
| 3 | "60% prefer video" | (none) | N/A | UNVERIFIED | Try: "video preference statistics 2025" |
文件:[路径]
找到声明总数:[总数]
已验证:[数量] | 释义匹配:[数量] | 弱匹配:[数量] | 未找到:[数量] | 未验证:[数量]
| 序号 | 声明内容 | 来源URL | 评分 | 状态 | 备注 |
|---|---|---|---|---|---|
| 1 | "73%的营销人员..." | https://example.com/report | 1.0 | VERIFIED | 在第3部分找到完全匹配内容 |
| 2 | "ROI提升5倍" | https://example.com/study | 0.8 | PARAPHRASE | 来源标注为“接近5倍” |
| 3 | "60%用户偏好视频" | (无) | N/A | UNVERIFIED | 建议搜索:“2025年视频偏好统计数据” |
Recommended Actions
建议操作
- [List claims that need source URLs]
- [List claims with weak or not-found scores that need replacement sources]
- [List claims where the source data may be outdated]
- [列出需要补充来源URL的声明]
- [列出匹配度弱或未找到来源、需要替换来源的声明]
- [列出来源数据可能过时的声明]
Integration
集成说明
This skill can be called from as an optional deep-verification step.
When invoked from the analyzer, only claims scoring below 0.7 are flagged in the
analysis report.
blog-analyzeStandalone usage:
/blog factcheck path/to/post.md本skill可作为可选的深度校验步骤,从中调用。从分析器调用时,仅评分低于0.7的声明会在分析报告中标记。
blog-analyze独立使用命令:
/blog factcheck path/to/post.mdLimitations
局限性
- Paywalled content: WebFetch cannot access content behind login walls. These score as WEAK (0.5) with a note about paywall detection.
- Dynamic pages: JavaScript-rendered content may not be available via WebFetch. If the page returns minimal content, note this in the status.
- PDF sources: WebFetch may not extract PDF text reliably. Flag PDF URLs for manual verification.
- Archived pages: If a URL returns 404, suggest checking web.archive.org.
- Rate limits: Process no more than 10 URLs per run to avoid overwhelming source servers. If a post has more than 10 cited URLs, verify the first 10 and list the remainder as SKIPPED.
付费墙内容:WebFetch无法访问登录墙后的内容,此类情况评分为WEAK(0.5),并备注检测到付费墙。
动态页面:JavaScript渲染的内容可能无法通过WebFetch获取,如果页面返回内容极少,需在状态中备注。
PDF来源:WebFetch可能无法可靠提取PDF文本,标记PDF类URL需手动核验。
归档页面:如果URL返回404,建议查询web.archive.org。
频率限制:单次运行最多处理10个URL,避免给源站造成过大压力。如果文章引用URL超过10个,仅核验前10个,其余标记为SKIPPED。