citation-verifier

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Citation Verifier

引用验证工具

Detect and verify citations in scientific documents to identify hallucinated, broken, or invalid references.
检测并验证科学文献中的引用,识别虚构、失效或无效的参考文献。

Purpose

用途

AI-generated content sometimes includes plausible-looking but fake citations. This skill systematically extracts all citation identifiers from a document and verifies each one against authoritative sources, producing a detailed report with verification status and suggestions for fixing invalid citations.
AI生成的内容有时会包含看似合理但实际虚假的引用。此技能会系统性地从文档中提取所有引用标识,并对照权威来源逐一验证,生成包含验证状态和无效引用修复建议的详细报告。

When to Use

使用场景

This skill should be invoked when:
  • User asks to "verify citations" or "check references" in a document
  • User suspects hallucinated citations in AI-generated content
  • User wants to validate DOIs, URLs, or other identifiers in a paper
  • User asks to audit a document for broken links or fake references
  • User mentions "citation verification", "reference checking", or "DOI validation"
在以下情况调用此技能:
  • 用户要求“验证引用”或“检查参考文献”
  • 用户怀疑AI生成内容中存在虚构引用
  • 用户希望验证论文中的DOI、URL或其他标识
  • 用户要求审计文档中的失效链接或虚假引用
  • 用户提及“引用验证”“参考文献检查”或“DOI验证”

Supported Document Formats

支持的文档格式

  1. Markdown (.md): Inline links
    [text](url)
    , reference links
    [text][ref]
    , bare URLs, DOIs
  2. LaTeX/BibTeX (.tex, .bib):
    \cite{}
    ,
    @article{}
    , DOI fields, URL fields
  3. Org-mode (.org):
    [[url][text]]
    links,
    #+BIBLIOGRAPHY
    , cite links
  4. Plain text (.txt): Bare URLs, DOIs, arXiv IDs, author-year patterns
  1. Markdown(.md):内联链接
    [text](url)
    、参考链接
    [text][ref]
    、纯URL、DOI
  2. LaTeX/BibTeX(.tex, .bib):
    \cite{}
    @article{}
    、DOI字段、URL字段
  3. Org-mode(.org):
    [[url][text]]
    链接、
    #+BIBLIOGRAPHY
    、引用链接
  4. 纯文本(.txt):纯URL、DOI、arXiv ID、作者-年份格式

Citation Identifiers Detected

可检测的引用标识

DOIs (Digital Object Identifiers)

DOI(数字对象标识符)

  • Pattern:
    10.\d{4,}/[^\s]+
    or
    doi.org/10.\d{4,}/[^\s]+
  • Example:
    10.1038/nature12373
    ,
    https://doi.org/10.1126/science.abc1234
  • Verification: CrossRef API at
    https://api.crossref.org/works/{doi}
  • 格式:
    10.\d{4,}/[^\s]+
    doi.org/10.\d{4,}/[^\s]+
  • 示例:
    10.1038/nature12373
    https://doi.org/10.1126/science.abc1234
  • 验证方式:通过CrossRef API
    https://api.crossref.org/works/{doi}

URLs to Papers

论文URL

  • Patterns: Links to known publishers and repositories
  • Domains: nature.com, science.org, sciencedirect.com, springer.com, wiley.com, acs.org, rsc.org, pnas.org, cell.com, plos.org, mdpi.com, frontiersin.org, academic.oup.com, tandfonline.com
  • Verification: HTTP HEAD/GET request, check for 200 status and paper metadata
  • 格式:指向知名出版商和知识库的链接
  • 域名:nature.com、science.org、sciencedirect.com、springer.com、wiley.com、acs.org、rsc.org、pnas.org、cell.com、plos.org、mdpi.com、frontiersin.org、academic.oup.com、tandfonline.com
  • 验证方式:HTTP HEAD/GET请求,检查200状态码和论文元数据

arXiv IDs

arXiv ID

  • Pattern:
    arXiv:\d{4}\.\d{4,5}(v\d+)?
    or
    arxiv.org/abs/\d{4}\.\d{4,5}
  • Example:
    arXiv:2301.07041
    ,
    https://arxiv.org/abs/2301.07041v2
  • Verification: arXiv API or direct URL check
  • 格式:
    arXiv:\d{4}\.\d{4,5}(v\d+)?
    arxiv.org/abs/\d{4}\.\d{4,5}
  • 示例:
    arXiv:2301.07041
    https://arxiv.org/abs/2301.07041v2
  • 验证方式:arXiv API或直接URL检查

PubMed IDs (PMIDs)

PubMed ID(PMID)

  • Pattern:
    PMID:\s*\d+
    or
    pubmed.ncbi.nlm.nih.gov/\d+
  • Example:
    PMID: 12345678
  • Verification: PubMed URL
    https://pubmed.ncbi.nlm.nih.gov/{pmid}/
  • 格式:
    PMID:\s*\d+
    pubmed.ncbi.nlm.nih.gov/\d+
  • 示例:
    PMID: 12345678
  • 验证方式:PubMed URL
    https://pubmed.ncbi.nlm.nih.gov/{pmid}/

ISBNs

ISBN

  • Pattern:
    ISBN[:\s]*[\d-]{10,17}
    (ISBN-10 or ISBN-13)
  • Example:
    ISBN: 978-0-13-468599-1
  • Verification: Open Library API
    https://openlibrary.org/isbn/{isbn}.json
  • 格式:
    ISBN[:\s]*[\d-]{10,17}
    (ISBN-10或ISBN-13)
  • 示例:
    ISBN: 978-0-13-468599-1
  • 验证方式:Open Library API
    https://openlibrary.org/isbn/{isbn}.json

Author-Year Citations

作者-年份引用

  • Pattern:
    ([A-Z][a-z]+(?:\s+(?:et\s+al\.?|and|&)\s+[A-Z][a-z]+)?,?\s*\d{4})
  • Example:
    (Smith et al., 2023)
    ,
    (Johnson and Lee, 2022)
  • Verification: WebSearch to find matching paper (lower confidence)
  • 格式:
    ([A-Z][a-z]+(?:\s+(?:et\s+al\.?|and|&)\s+[A-Z][a-z]+)?,?\s*\d{4})
  • 示例:
    (Smith et al., 2023)
    (Johnson and Lee, 2022)
  • 验证方式:网络搜索查找匹配论文(置信度较低)

Verification Procedure

验证流程

Step 1: Read and Parse Document

步骤1:读取并解析文档

Use the Read tool to load the document. Extract all citation identifiers using pattern matching:
DOI patterns:
- https?://(?:dx\.)?doi\.org/(10\.\d{4,}/[^\s\])"'>]+)
- doi:\s*(10\.\d{4,}/[^\s\])"'>]+)
- (10\.\d{4,9}/[-._;()/:A-Z0-9]+)  (bare DOI)

arXiv patterns:
- arXiv:(\d{4}\.\d{4,5}(?:v\d+)?)
- arxiv\.org/abs/(\d{4}\.\d{4,5}(?:v\d+)?)

PubMed patterns:
- PMID:\s*(\d+)
- pubmed\.ncbi\.nlm\.nih\.gov/(\d+)

URL patterns:
- https?://[^\s\])"'<>]+  (filter for academic domains)

ISBN patterns:
- ISBN[:\s-]*((?:\d[-\s]?){9}[\dXx]|(?:\d[-\s]?){13})
使用读取工具加载文档,通过模式匹配提取所有引用标识:
DOI patterns:
- https?://(?:dx\.)?doi\.org/(10\.\d{4,}/[^\s\])"'>]+)
- doi:\s*(10\.\d{4,}/[^\s\])"'>]+)
- (10\.\d{4,9}/[-._;()/:A-Z0-9]+)  (bare DOI)

arXiv patterns:
- arXiv:(\d{4}\.\d{4,5}(?:v\d+)?)
- arxiv\.org/abs/(\d{4}\.\d{4,5}(?:v\d+)?)

PubMed patterns:
- PMID:\s*(\d+)
- pubmed\.ncbi\.nlm\.nih\.gov/(\d+)

URL patterns:
- https?://[^\s\])"'<>]+  (filter for academic domains)

ISBN patterns:
- ISBN[:\s-]*((?:\d[-\s]?){9}[\dXx]|(?:\d[-\s]?){13})

Step 2: Deduplicate and Categorize

步骤2:去重并分类

Create a list of unique identifiers, categorized by type:
  • DOIs
  • arXiv IDs
  • PubMed IDs
  • ISBNs
  • URLs (academic)
  • Author-year citations (text-based)
创建唯一标识列表,按类型分类:
  • DOI
  • arXiv ID
  • PubMed ID
  • ISBN
  • URL(学术类)
  • 作者-年份引用(文本类)

Step 3: Verify Each Identifier

步骤3:逐一验证标识

For each identifier, perform verification in order of reliability:
按可靠性顺序对每个标识进行验证:

DOI Verification

DOI验证

  1. Construct CrossRef API URL:
    https://api.crossref.org/works/{doi}
  2. Use WebFetch to check the API
  3. If successful, extract: title, authors, journal, year
  4. If 404 or error: mark as INVALID
  1. 构建CrossRef API URL:
    https://api.crossref.org/works/{doi}
  2. 使用网络请求工具检查API
  3. 验证成功则提取:标题、作者、期刊、年份
  4. 若返回404或错误:标记为INVALID(无效)

arXiv Verification

arXiv验证

  1. Construct URL:
    https://arxiv.org/abs/{arxiv_id}
  2. Use WebFetch to verify page exists
  3. Extract: title, authors, abstract snippet
  4. If 404: mark as INVALID
  1. 构建URL:
    https://arxiv.org/abs/{arxiv_id}
  2. 使用网络请求工具验证页面存在
  3. 提取:标题、作者、摘要片段
  4. 若返回404:标记为INVALID(无效)

PubMed Verification

PubMed验证

  1. Construct URL:
    https://pubmed.ncbi.nlm.nih.gov/{pmid}/
  2. Use WebFetch to verify
  3. Extract: title, authors, journal
  4. If 404: mark as INVALID
  1. 构建URL:
    https://pubmed.ncbi.nlm.nih.gov/{pmid}/
  2. 使用网络请求工具验证
  3. 提取:标题、作者、期刊
  4. 若返回404:标记为INVALID(无效)

ISBN Verification

ISBN验证

  1. Construct URL:
    https://openlibrary.org/isbn/{isbn}.json
  2. Use WebFetch to check
  3. Extract: title, authors, publisher
  4. If 404: mark as INVALID
  1. 构建URL:
    https://openlibrary.org/isbn/{isbn}.json
  2. 使用网络请求工具检查
  3. 提取:标题、作者、出版商
  4. 若返回404:标记为INVALID(无效)

URL Verification

URL验证

  1. Use WebFetch to access the URL
  2. Check for HTTP 200 and academic content indicators
  3. Look for: paper title, authors, DOI on page
  4. If unreachable or non-academic: mark as SUSPICIOUS
  1. 使用网络请求工具访问URL
  2. 检查HTTP 200状态码和学术内容标识
  3. 查找:论文标题、作者、页面上的DOI
  4. 若无法访问或非学术内容:标记为SUSPICIOUS(可疑)

Author-Year Verification (lowest confidence)

作者-年份验证(置信度最低)

  1. Use WebSearch with query:
    "{author}" "{year}" paper
  2. Look for matching papers in results
  3. If found: mark as LIKELY VALID with source
  4. If not found: mark as UNVERIFIED
  1. 使用网络搜索,查询词:
    "{author}" "{year}" paper
  2. 在结果中查找匹配论文
  3. 若找到:标记为LIKELY VALID(可能有效)并注明来源
  4. 若未找到:标记为UNVERIFIED(未验证)

Step 4: Generate Report

步骤4:生成报告

Produce a structured verification report:
markdown
undefined
生成结构化验证报告:
markdown
undefined

Citation Verification Report

引用验证报告

Document: [filename] Date: [date] Total citations found: [count]
文档: [文件名] 日期: [日期] 发现的引用总数: [数量]

Summary

摘要

  • Valid: [count]
  • Invalid: [count]
  • Suspicious: [count]
  • Unverified: [count]
  • 有效:[数量]
  • 无效:[数量]
  • 可疑:[数量]
  • 未验证:[数量]

Detailed Results

详细结果

Valid Citations

有效引用

IDTypeTitleSource
10.1038/xxxDOIPaper TitleCrossRef
ID类型标题来源
10.1038/xxxDOI论文标题CrossRef

Invalid Citations (HALLUCINATED)

无效引用(疑似虚构)

IDTypeErrorSuggestion
10.9999/fakeDOI404 Not FoundRemove or find correct DOI
ID类型错误信息建议
10.9999/fakeDOI404 未找到删除或查找正确的DOI

Suspicious Citations

可疑引用

IDTypeIssueRecommendation
https://...URLTimeoutVerify manually
ID类型问题建议
https://...URL请求超时手动验证

Unverified Citations

未验证引用

CitationTypeNotes
(Smith, 2023)Author-yearNo matching paper found via search
undefined
引用内容类型备注
(Smith, 2023)作者-年份未通过搜索找到匹配论文
undefined

Verification Status Definitions

验证状态定义

  • VALID: Identifier resolves to a real paper with matching metadata
  • INVALID: Identifier does not exist or returns 404 (likely hallucinated)
  • SUSPICIOUS: Could not fully verify; may be rate-limited, paywalled, or temporarily unavailable
  • UNVERIFIED: Text-based citation that couldn't be confirmed (conservative approach)
  • VALID(有效):标识指向真实论文且元数据匹配
  • INVALID(无效):标识不存在或返回404(疑似虚构)
  • SUSPICIOUS(可疑):无法完全验证;可能受速率限制、付费墙限制或临时不可用
  • UNVERIFIED(未验证):基于文本的引用无法确认(保守判断)

Best Practices

最佳实践

  1. Batch similar requests: Group DOI checks together to minimize API calls
  2. Respect rate limits: Add delays between requests if hitting rate limits
  3. Cross-reference: If a URL contains a DOI, verify the DOI directly
  4. Context matters: Note where citations appear (methods vs. claims)
  5. Report uncertainty: Always distinguish between "confirmed invalid" and "could not verify"
  1. 批量处理相似请求:将DOI检查分组,减少API调用次数
  2. 遵守速率限制:若触发速率限制,在请求间添加延迟
  3. 交叉验证:若URL包含DOI,直接验证DOI
  4. 关注上下文:记录引用出现的位置(方法部分 vs 结论部分)
  5. 报告不确定性:明确区分“确认无效”和“无法验证”

Output Suggestions for Invalid Citations

无效引用的输出建议

For each invalid citation, provide actionable suggestions:
  • Wrong DOI format: "DOI appears malformed. Check for typos or extra characters."
  • Non-existent DOI: "No paper found. This may be hallucinated. Search for the actual paper title."
  • Dead URL: "URL returns 404. Try searching for the paper title on Google Scholar."
  • Suspicious journal: "Publisher not recognized. Verify this is a legitimate source."
  • Author-year not found: "Could not verify. Add DOI or URL for confirmation."
针对每个无效引用,提供可操作的建议:
  • DOI格式错误:“DOI格式似乎有误。检查是否存在拼写错误或多余字符。”
  • DOI不存在:“未找到对应论文。此引用可能是虚构的。请搜索真实的论文标题。”
  • URL失效:“URL返回404。尝试在Google Scholar上搜索论文标题。”
  • 期刊可疑:“出版商未被识别。请验证此来源是否合法。”
  • 作者-年份未找到:“无法验证。请添加DOI或URL以确认。”

Example Verification Session

验证会话示例

User request: "Verify the citations in my-paper.md"
Expected behavior:
  1. Read my-paper.md
  2. Extract all DOIs, URLs, arXiv IDs, etc.
  3. Report: "Found 15 citations: 8 DOIs, 5 URLs, 2 arXiv IDs"
  4. Verify each identifier using appropriate API/fetch
  5. Generate report showing:
    • 10 valid citations with metadata
    • 3 invalid citations (404 errors) marked as likely hallucinated
    • 2 suspicious citations (timeouts) requiring manual check
  6. Provide suggestions for fixing invalid citations
用户请求: “验证我的论文my-paper.md中的引用”
预期行为:
  1. 读取my-paper.md
  2. 提取所有DOI、URL、arXiv ID等
  3. 报告:“共发现15处引用:8个DOI、5个URL、2个arXiv ID”
  4. 使用合适的API/网络请求逐一验证每个标识
  5. 生成报告,显示:
    • 10个有效引用及元数据
    • 3个无效引用(404错误),标记为疑似虚构
    • 2个可疑引用(请求超时),需手动检查
  6. 提供修复无效引用的建议

Limitations

局限性

  • Rate limits: CrossRef and other APIs may rate-limit requests
  • Paywalled content: Cannot verify full content behind paywalls
  • New papers: Very recent papers may not be indexed yet
  • Author-year citations: Low confidence without additional identifiers
  • Non-English sources: Limited support for non-English citation formats
  • Private/institutional URLs: Cannot access authenticated content
  • 速率限制:CrossRef及其他API可能会限制请求速率
  • 付费墙内容:无法验证付费墙后的完整内容
  • 最新论文:极新的论文可能尚未被索引
  • 作者-年份引用:无额外标识时置信度较低
  • 非英文来源:对非英文引用格式的支持有限
  • 私有/机构URL:无法访问需要认证的内容

Related Skills

相关技能

  • literature-review: For conducting systematic literature searches
  • scientific-reviewer: For reviewing scientific document quality
  • scientific-writing: For writing with proper citations
  • literature-review:用于系统性文献检索
  • scientific-reviewer:用于审核科学文档质量
  • scientific-writing:用于撰写规范引用的内容