blog-factcheck

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Blog Fact-Check

博客事实核查

Verify statistics, claims, and source attributions in blog posts. Pure Claude pipeline with no external NLP dependencies.

校验博客文章中的统计数据、声明和来源归属。纯Claude工作流，无外部NLP依赖。

Workflow

工作流

Step 1: Read the Blog Post

步骤1：读取博客文章

Read the target file and identify all sections containing data claims.

读取目标文件，识别所有包含数据声明的段落。

Step 2: Extract Statistical Claims

步骤2：提取统计声明

Scan the full text for every claim that includes a number, percentage, dollar amount, or named source. Build a claims list with these fields:

Field	Description
claim_text	The exact sentence or phrase containing the statistic
value	The numeric value (e.g., "42%", "$1.2M", "3x")
attribution	Named source if present (e.g., "HubSpot", "Gartner 2025")
url	Cited URL if present (from markdown link or parenthetical)
location	Heading or line number where the claim appears

扫描全文，找出所有包含数字、百分比、金额或命名来源的声明，构建声明列表，包含以下字段：

字段	描述
claim_text	包含统计数据的完整句子或短语
value	数值（例如："42%", "$1.2M", "3x"）
attribution	命名来源（如果存在，例如："HubSpot", "Gartner 2025"）
url	被引用的URL（如果存在，来自markdown链接或括号注释）
location	声明出现的标题或行号

Step 3: Verify Cited Claims

步骤3：校验带引用的声明

For each claim that includes a URL:

Fetch the source page via WebFetch
Search the returned content for the specific numeric value
If exact value found, check surrounding context matches the claim topic
Assign a confidence score (see Verification Scoring below)

Process claims sequentially to avoid rate-limiting source sites.

针对每个附带URL的声明：

通过WebFetch拉取源页面内容
在返回内容中搜索指定的数值
如果找到完全匹配的数值，检查上下文是否与声明主题匹配
给出置信度评分（参考下方的核验评分规则）

按顺序处理声明，避免触发源站的访问频率限制。

Step 4: Flag Uncited Claims

步骤4：标记无引用的声明

For claims without a URL:

Mark status as UNVERIFIED
Suggest a search query the user can run to find a source
If the attribution names a specific organization, suggest their domain

针对没有附带URL的声明：

将状态标记为UNVERIFIED
建议用户可执行的搜索查询来查找对应来源
如果声明标注了特定机构来源，建议查询该机构的官方域名

Step 5: Generate Verification Report

步骤5：生成核验报告

Output the full results table, summary statistics, and recommended actions.

输出完整的结果表格、统计概要和建议操作。

Claim Extraction Patterns

声明提取规则

Identify claims matching these structures:

Fully cited (highest priority):

```
[Number]% [claim] ([Source], [Year])
```
- parenthetical citation

[claim] [Number]% ... [markdown link to source]

- inline link

```
According to [Source], [Number]...
```
- attribution lead

Uncited statistics (flag for sourcing):

```
[Number]% of [noun phrase]
```
- standalone percentage
```
[Number]x more/less/higher/lower
```
- multiplier claims
```
$[Number] [claim]
```
- dollar figures without attribution

Weak signals (check context before extracting):

studies show

research indicates

data suggests

+ nearby number

survey found

report reveals

analysis shows

+ nearby number

Round numbers in isolation (e.g., "millions of users") - skip unless specific

识别符合以下结构的声明：

完整引用（最高优先级）：

[数字]% [声明内容] ([来源], [年份])

- 括号形式引用

[声明内容] [数字]% ... [指向来源的markdown链接]

- 行内链接

```
根据[来源]，[数字]...
```
- 开头标注来源

无引用统计数据（标记需补充来源）：

```
[数字]% of [名词短语]
```
- 独立出现的百分比
```
[数字]x 更多/更少/更高/更低
```
- 倍数类声明
```
$[数字] [声明内容]
```
- 无来源标注的金额数据

弱信号（提取前先检查上下文）：

```
研究表明
```
、
```
数据显示
```
+ 附近出现数字
```
调研发现
```
、
```
报告披露
```
、
```
分析表明
```
+ 附近出现数字
单独出现的约数（例如：“数百万用户”）- 无具体数值则跳过

Verification Scoring

核验评分规则

Score	Status	Criteria
1.0	VERIFIED	Exact number found on cited page in matching context
0.7-0.9	PARAPHRASE	Similar data found but with different wording, rounding, or timeframe
0.3-0.6	WEAK	Source page exists and covers the topic but the specific statistic is not visible
0.0	NOT FOUND	Cited page does not contain the claimed data anywhere
N/A	UNVERIFIED	No source URL provided for the claim

Scoring guidance:

A claim of "43%" when the source says "nearly half" scores 0.8
A claim of "2024" data when the source only has "2023" scores 0.7
A claim citing a homepage when the stat lives on a subpage scores 0.3
A 404 or unreachable URL scores 0.0

评分	状态	判定标准
1.0	VERIFIED	被引用页面中找到了完全匹配的数值，且上下文匹配
0.7-0.9	PARAPHRASE	找到了相似数据，但表述、取整或时间范围存在差异
0.3-0.6	WEAK	源页面存在且覆盖相关主题，但未找到指定的统计数据
0.0	NOT FOUND	被引用页面中完全不存在声明的相关数据
N/A	UNVERIFIED	声明未提供来源URL

评分指引：

声明为“43%”，但来源标注为“接近一半”，评分0.8
声明为“2024年”数据，但来源仅提供2023年数据，评分0.7
声明引用首页，但统计数据实际在子页面，评分0.3
URL返回404或无法访问，评分0.0

Output Format

输出格式

Verification Report: [Post Title]

核验报告：[文章标题]

File: [path] Claims found: [total] Verified: [count] | Paraphrase: [count] | Weak: [count] | Not Found: [count] | Unverified: [count]

#	Claim	Source URL	Score	Status	Notes
1	"73% of marketers..."	https://example.com/report	1.0	VERIFIED	Exact match found in section 3
2	"5x ROI improvement"	https://example.com/study	0.8	PARAPHRASE	Source says "nearly 5x"
3	"60% prefer video"	(none)	N/A	UNVERIFIED	Try: "video preference statistics 2025"

文件：[路径] 找到声明总数：[总数] 已验证：[数量] | 释义匹配：[数量] | 弱匹配：[数量] | 未找到：[数量] | 未验证：[数量]

序号	声明内容	来源URL	评分	状态	备注
1	"73%的营销人员..."	https://example.com/report	1.0	VERIFIED	在第3部分找到完全匹配内容
2	"ROI提升5倍"	https://example.com/study	0.8	PARAPHRASE	来源标注为“接近5倍”
3	"60%用户偏好视频"	（无）	N/A	UNVERIFIED	建议搜索：“2025年视频偏好统计数据”

Recommended Actions

建议操作

[List claims that need source URLs]
[List claims with weak or not-found scores that need replacement sources]
[List claims where the source data may be outdated]

[列出需要补充来源URL的声明]
[列出匹配度弱或未找到来源、需要替换来源的声明]
[列出来源数据可能过时的声明]

Integration

集成说明

This skill can be called from

blog-analyze

as an optional deep-verification step. When invoked from the analyzer, only claims scoring below 0.7 are flagged in the analysis report.

Standalone usage:

/blog factcheck path/to/post.md

本skill可作为可选的深度校验步骤，从

blog-analyze

中调用。从分析器调用时，仅评分低于0.7的声明会在分析报告中标记。

独立使用命令：

/blog factcheck path/to/post.md

Limitations

局限性

Paywalled content: WebFetch cannot access content behind login walls. These score as WEAK (0.5) with a note about paywall detection.
Dynamic pages: JavaScript-rendered content may not be available via WebFetch. If the page returns minimal content, note this in the status.
PDF sources: WebFetch may not extract PDF text reliably. Flag PDF URLs for manual verification.
Archived pages: If a URL returns 404, suggest checking web.archive.org.
Rate limits: Process no more than 10 URLs per run to avoid overwhelming source servers. If a post has more than 10 cited URLs, verify the first 10 and list the remainder as SKIPPED.

付费墙内容：WebFetch无法访问登录墙后的内容，此类情况评分为WEAK（0.5），并备注检测到付费墙。 动态页面：JavaScript渲染的内容可能无法通过WebFetch获取，如果页面返回内容极少，需在状态中备注。 PDF来源：WebFetch可能无法可靠提取PDF文本，标记PDF类URL需手动核验。 归档页面：如果URL返回404，建议查询web.archive.org。 频率限制：单次运行最多处理10个URL，避免给源站造成过大压力。如果文章引用URL超过10个，仅核验前10个，其余标记为SKIPPED。