Blog Fact-Check
Verify statistics, claims, and source attributions in blog posts. Pure Claude
pipeline with no external NLP dependencies.
Workflow
Step 1: Read the Blog Post
Read the target file and identify all sections containing data claims.
Step 2: Extract Statistical Claims
Scan the full text for every claim that includes a number, percentage, dollar
amount, or named source. Build a claims list with these fields:
| Field | Description |
|---|
| claim_text | The exact sentence or phrase containing the statistic |
| value | The numeric value (e.g., "42%", "$1.2M", "3x") |
| attribution | Named source if present (e.g., "HubSpot", "Gartner 2025") |
| url | Cited URL if present (from markdown link or parenthetical) |
| location | Heading or line number where the claim appears |
Step 3: Verify Cited Claims
For each claim that includes a URL:
- Fetch the source page via WebFetch
- Search the returned content for the specific numeric value
- If exact value found, check surrounding context matches the claim topic
- Assign a confidence score (see Verification Scoring below)
Process claims sequentially to avoid rate-limiting source sites.
Step 4: Flag Uncited Claims
For claims without a URL:
- Mark status as UNVERIFIED
- Suggest a search query the user can run to find a source
- If the attribution names a specific organization, suggest their domain
Step 5: Generate Verification Report
Output the full results table, summary statistics, and recommended actions.
Claim Extraction Patterns
Identify claims matching these structures:
Fully cited (highest priority):
[Number]% [claim] ([Source], [Year])
- parenthetical citation
[claim] [Number]% ... [markdown link to source]
- inline link
According to [Source], [Number]...
- attribution lead
Uncited statistics (flag for sourcing):
[Number]% of [noun phrase]
- standalone percentage
[Number]x more/less/higher/lower
- multiplier claims
- - dollar figures without attribution
Weak signals (check context before extracting):
- , , + nearby number
- , , + nearby number
- Round numbers in isolation (e.g., "millions of users") - skip unless specific
Verification Scoring
| Score | Status | Criteria |
|---|
| 1.0 | VERIFIED | Exact number found on cited page in matching context |
| 0.7-0.9 | PARAPHRASE | Similar data found but with different wording, rounding, or timeframe |
| 0.3-0.6 | WEAK | Source page exists and covers the topic but the specific statistic is not visible |
| 0.0 | NOT FOUND | Cited page does not contain the claimed data anywhere |
| N/A | UNVERIFIED | No source URL provided for the claim |
Scoring guidance:
- A claim of "43%" when the source says "nearly half" scores 0.8
- A claim of "2024" data when the source only has "2023" scores 0.7
- A claim citing a homepage when the stat lives on a subpage scores 0.3
- A 404 or unreachable URL scores 0.0
Output Format
Verification Report: [Post Title]
File: [path]
Claims found: [total]
Verified: [count] | Paraphrase: [count] | Weak: [count] | Not Found: [count] | Unverified: [count]
| # | Claim | Source URL | Score | Status | Notes |
|---|
| 1 | "73% of marketers..." | https://example.com/report | 1.0 | VERIFIED | Exact match found in section 3 |
| 2 | "5x ROI improvement" | https://example.com/study | 0.8 | PARAPHRASE | Source says "nearly 5x" |
| 3 | "60% prefer video" | (none) | N/A | UNVERIFIED | Try: "video preference statistics 2025" |
Recommended Actions
- [List claims that need source URLs]
- [List claims with weak or not-found scores that need replacement sources]
- [List claims where the source data may be outdated]
Integration
This skill can be called from
as an optional deep-verification step.
When invoked from the analyzer, only claims scoring below 0.7 are flagged in the
analysis report.
Standalone usage:
/blog factcheck path/to/post.md
Limitations
- Paywalled content: WebFetch cannot access content behind login walls. These
score as WEAK (0.5) with a note about paywall detection.
- Dynamic pages: JavaScript-rendered content may not be available via WebFetch.
If the page returns minimal content, note this in the status.
- PDF sources: WebFetch may not extract PDF text reliably. Flag PDF URLs for
manual verification.
- Archived pages: If a URL returns 404, suggest checking web.archive.org.
- Rate limits: Process no more than 10 URLs per run to avoid overwhelming
source servers. If a post has more than 10 cited URLs, verify the first 10 and
list the remainder as SKIPPED.