citation-verifier
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseCitation Verifier
引用验证工具
Detect and verify citations in scientific documents to identify hallucinated, broken, or invalid references.
检测并验证科学文献中的引用,识别虚构、失效或无效的参考文献。
Purpose
用途
AI-generated content sometimes includes plausible-looking but fake citations. This skill systematically extracts all citation identifiers from a document and verifies each one against authoritative sources, producing a detailed report with verification status and suggestions for fixing invalid citations.
AI生成的内容有时会包含看似合理但实际虚假的引用。此技能会系统性地从文档中提取所有引用标识,并对照权威来源逐一验证,生成包含验证状态和无效引用修复建议的详细报告。
When to Use
使用场景
This skill should be invoked when:
- User asks to "verify citations" or "check references" in a document
- User suspects hallucinated citations in AI-generated content
- User wants to validate DOIs, URLs, or other identifiers in a paper
- User asks to audit a document for broken links or fake references
- User mentions "citation verification", "reference checking", or "DOI validation"
在以下情况调用此技能:
- 用户要求“验证引用”或“检查参考文献”
- 用户怀疑AI生成内容中存在虚构引用
- 用户希望验证论文中的DOI、URL或其他标识
- 用户要求审计文档中的失效链接或虚假引用
- 用户提及“引用验证”“参考文献检查”或“DOI验证”
Supported Document Formats
支持的文档格式
- Markdown (.md): Inline links , reference links
[text](url), bare URLs, DOIs[text][ref] - LaTeX/BibTeX (.tex, .bib): ,
\cite{}, DOI fields, URL fields@article{} - Org-mode (.org): links,
[[url][text]], cite links#+BIBLIOGRAPHY - Plain text (.txt): Bare URLs, DOIs, arXiv IDs, author-year patterns
- Markdown(.md):内联链接、参考链接
[text](url)、纯URL、DOI[text][ref] - LaTeX/BibTeX(.tex, .bib):、
\cite{}、DOI字段、URL字段@article{} - Org-mode(.org):链接、
[[url][text]]、引用链接#+BIBLIOGRAPHY - 纯文本(.txt):纯URL、DOI、arXiv ID、作者-年份格式
Citation Identifiers Detected
可检测的引用标识
DOIs (Digital Object Identifiers)
DOI(数字对象标识符)
- Pattern: or
10.\d{4,}/[^\s]+doi.org/10.\d{4,}/[^\s]+ - Example: ,
10.1038/nature12373https://doi.org/10.1126/science.abc1234 - Verification: CrossRef API at
https://api.crossref.org/works/{doi}
- 格式:或
10.\d{4,}/[^\s]+doi.org/10.\d{4,}/[^\s]+ - 示例:、
10.1038/nature12373https://doi.org/10.1126/science.abc1234 - 验证方式:通过CrossRef API
https://api.crossref.org/works/{doi}
URLs to Papers
论文URL
- Patterns: Links to known publishers and repositories
- Domains: nature.com, science.org, sciencedirect.com, springer.com, wiley.com, acs.org, rsc.org, pnas.org, cell.com, plos.org, mdpi.com, frontiersin.org, academic.oup.com, tandfonline.com
- Verification: HTTP HEAD/GET request, check for 200 status and paper metadata
- 格式:指向知名出版商和知识库的链接
- 域名:nature.com、science.org、sciencedirect.com、springer.com、wiley.com、acs.org、rsc.org、pnas.org、cell.com、plos.org、mdpi.com、frontiersin.org、academic.oup.com、tandfonline.com
- 验证方式:HTTP HEAD/GET请求,检查200状态码和论文元数据
arXiv IDs
arXiv ID
- Pattern: or
arXiv:\d{4}\.\d{4,5}(v\d+)?arxiv.org/abs/\d{4}\.\d{4,5} - Example: ,
arXiv:2301.07041https://arxiv.org/abs/2301.07041v2 - Verification: arXiv API or direct URL check
- 格式:或
arXiv:\d{4}\.\d{4,5}(v\d+)?arxiv.org/abs/\d{4}\.\d{4,5} - 示例:、
arXiv:2301.07041https://arxiv.org/abs/2301.07041v2 - 验证方式:arXiv API或直接URL检查
PubMed IDs (PMIDs)
PubMed ID(PMID)
- Pattern: or
PMID:\s*\d+pubmed.ncbi.nlm.nih.gov/\d+ - Example:
PMID: 12345678 - Verification: PubMed URL
https://pubmed.ncbi.nlm.nih.gov/{pmid}/
- 格式:或
PMID:\s*\d+pubmed.ncbi.nlm.nih.gov/\d+ - 示例:
PMID: 12345678 - 验证方式:PubMed URL
https://pubmed.ncbi.nlm.nih.gov/{pmid}/
ISBNs
ISBN
- Pattern: (ISBN-10 or ISBN-13)
ISBN[:\s]*[\d-]{10,17} - Example:
ISBN: 978-0-13-468599-1 - Verification: Open Library API
https://openlibrary.org/isbn/{isbn}.json
- 格式:(ISBN-10或ISBN-13)
ISBN[:\s]*[\d-]{10,17} - 示例:
ISBN: 978-0-13-468599-1 - 验证方式:Open Library API
https://openlibrary.org/isbn/{isbn}.json
Author-Year Citations
作者-年份引用
- Pattern:
([A-Z][a-z]+(?:\s+(?:et\s+al\.?|and|&)\s+[A-Z][a-z]+)?,?\s*\d{4}) - Example: ,
(Smith et al., 2023)(Johnson and Lee, 2022) - Verification: WebSearch to find matching paper (lower confidence)
- 格式:
([A-Z][a-z]+(?:\s+(?:et\s+al\.?|and|&)\s+[A-Z][a-z]+)?,?\s*\d{4}) - 示例:、
(Smith et al., 2023)(Johnson and Lee, 2022) - 验证方式:网络搜索查找匹配论文(置信度较低)
Verification Procedure
验证流程
Step 1: Read and Parse Document
步骤1:读取并解析文档
Use the Read tool to load the document. Extract all citation identifiers using pattern matching:
DOI patterns:
- https?://(?:dx\.)?doi\.org/(10\.\d{4,}/[^\s\])"'>]+)
- doi:\s*(10\.\d{4,}/[^\s\])"'>]+)
- (10\.\d{4,9}/[-._;()/:A-Z0-9]+) (bare DOI)
arXiv patterns:
- arXiv:(\d{4}\.\d{4,5}(?:v\d+)?)
- arxiv\.org/abs/(\d{4}\.\d{4,5}(?:v\d+)?)
PubMed patterns:
- PMID:\s*(\d+)
- pubmed\.ncbi\.nlm\.nih\.gov/(\d+)
URL patterns:
- https?://[^\s\])"'<>]+ (filter for academic domains)
ISBN patterns:
- ISBN[:\s-]*((?:\d[-\s]?){9}[\dXx]|(?:\d[-\s]?){13})使用读取工具加载文档,通过模式匹配提取所有引用标识:
DOI patterns:
- https?://(?:dx\.)?doi\.org/(10\.\d{4,}/[^\s\])"'>]+)
- doi:\s*(10\.\d{4,}/[^\s\])"'>]+)
- (10\.\d{4,9}/[-._;()/:A-Z0-9]+) (bare DOI)
arXiv patterns:
- arXiv:(\d{4}\.\d{4,5}(?:v\d+)?)
- arxiv\.org/abs/(\d{4}\.\d{4,5}(?:v\d+)?)
PubMed patterns:
- PMID:\s*(\d+)
- pubmed\.ncbi\.nlm\.nih\.gov/(\d+)
URL patterns:
- https?://[^\s\])"'<>]+ (filter for academic domains)
ISBN patterns:
- ISBN[:\s-]*((?:\d[-\s]?){9}[\dXx]|(?:\d[-\s]?){13})Step 2: Deduplicate and Categorize
步骤2:去重并分类
Create a list of unique identifiers, categorized by type:
- DOIs
- arXiv IDs
- PubMed IDs
- ISBNs
- URLs (academic)
- Author-year citations (text-based)
创建唯一标识列表,按类型分类:
- DOI
- arXiv ID
- PubMed ID
- ISBN
- URL(学术类)
- 作者-年份引用(文本类)
Step 3: Verify Each Identifier
步骤3:逐一验证标识
For each identifier, perform verification in order of reliability:
按可靠性顺序对每个标识进行验证:
DOI Verification
DOI验证
- Construct CrossRef API URL:
https://api.crossref.org/works/{doi} - Use WebFetch to check the API
- If successful, extract: title, authors, journal, year
- If 404 or error: mark as INVALID
- 构建CrossRef API URL:
https://api.crossref.org/works/{doi} - 使用网络请求工具检查API
- 验证成功则提取:标题、作者、期刊、年份
- 若返回404或错误:标记为INVALID(无效)
arXiv Verification
arXiv验证
- Construct URL:
https://arxiv.org/abs/{arxiv_id} - Use WebFetch to verify page exists
- Extract: title, authors, abstract snippet
- If 404: mark as INVALID
- 构建URL:
https://arxiv.org/abs/{arxiv_id} - 使用网络请求工具验证页面存在
- 提取:标题、作者、摘要片段
- 若返回404:标记为INVALID(无效)
PubMed Verification
PubMed验证
- Construct URL:
https://pubmed.ncbi.nlm.nih.gov/{pmid}/ - Use WebFetch to verify
- Extract: title, authors, journal
- If 404: mark as INVALID
- 构建URL:
https://pubmed.ncbi.nlm.nih.gov/{pmid}/ - 使用网络请求工具验证
- 提取:标题、作者、期刊
- 若返回404:标记为INVALID(无效)
ISBN Verification
ISBN验证
- Construct URL:
https://openlibrary.org/isbn/{isbn}.json - Use WebFetch to check
- Extract: title, authors, publisher
- If 404: mark as INVALID
- 构建URL:
https://openlibrary.org/isbn/{isbn}.json - 使用网络请求工具检查
- 提取:标题、作者、出版商
- 若返回404:标记为INVALID(无效)
URL Verification
URL验证
- Use WebFetch to access the URL
- Check for HTTP 200 and academic content indicators
- Look for: paper title, authors, DOI on page
- If unreachable or non-academic: mark as SUSPICIOUS
- 使用网络请求工具访问URL
- 检查HTTP 200状态码和学术内容标识
- 查找:论文标题、作者、页面上的DOI
- 若无法访问或非学术内容:标记为SUSPICIOUS(可疑)
Author-Year Verification (lowest confidence)
作者-年份验证(置信度最低)
- Use WebSearch with query:
"{author}" "{year}" paper - Look for matching papers in results
- If found: mark as LIKELY VALID with source
- If not found: mark as UNVERIFIED
- 使用网络搜索,查询词:
"{author}" "{year}" paper - 在结果中查找匹配论文
- 若找到:标记为LIKELY VALID(可能有效)并注明来源
- 若未找到:标记为UNVERIFIED(未验证)
Step 4: Generate Report
步骤4:生成报告
Produce a structured verification report:
markdown
undefined生成结构化验证报告:
markdown
undefinedCitation Verification Report
引用验证报告
Document: [filename]
Date: [date]
Total citations found: [count]
文档: [文件名]
日期: [日期]
发现的引用总数: [数量]
Summary
摘要
- Valid: [count]
- Invalid: [count]
- Suspicious: [count]
- Unverified: [count]
- 有效:[数量]
- 无效:[数量]
- 可疑:[数量]
- 未验证:[数量]
Detailed Results
详细结果
Valid Citations
有效引用
| ID | Type | Title | Source |
|---|---|---|---|
| 10.1038/xxx | DOI | Paper Title | CrossRef |
| ID | 类型 | 标题 | 来源 |
|---|---|---|---|
| 10.1038/xxx | DOI | 论文标题 | CrossRef |
Invalid Citations (HALLUCINATED)
无效引用(疑似虚构)
| ID | Type | Error | Suggestion |
|---|---|---|---|
| 10.9999/fake | DOI | 404 Not Found | Remove or find correct DOI |
| ID | 类型 | 错误信息 | 建议 |
|---|---|---|---|
| 10.9999/fake | DOI | 404 未找到 | 删除或查找正确的DOI |
Suspicious Citations
可疑引用
| ID | Type | Issue | Recommendation |
|---|---|---|---|
| https://... | URL | Timeout | Verify manually |
| ID | 类型 | 问题 | 建议 |
|---|---|---|---|
| https://... | URL | 请求超时 | 手动验证 |
Unverified Citations
未验证引用
| Citation | Type | Notes |
|---|---|---|
| (Smith, 2023) | Author-year | No matching paper found via search |
undefined| 引用内容 | 类型 | 备注 |
|---|---|---|
| (Smith, 2023) | 作者-年份 | 未通过搜索找到匹配论文 |
undefinedVerification Status Definitions
验证状态定义
- VALID: Identifier resolves to a real paper with matching metadata
- INVALID: Identifier does not exist or returns 404 (likely hallucinated)
- SUSPICIOUS: Could not fully verify; may be rate-limited, paywalled, or temporarily unavailable
- UNVERIFIED: Text-based citation that couldn't be confirmed (conservative approach)
- VALID(有效):标识指向真实论文且元数据匹配
- INVALID(无效):标识不存在或返回404(疑似虚构)
- SUSPICIOUS(可疑):无法完全验证;可能受速率限制、付费墙限制或临时不可用
- UNVERIFIED(未验证):基于文本的引用无法确认(保守判断)
Best Practices
最佳实践
- Batch similar requests: Group DOI checks together to minimize API calls
- Respect rate limits: Add delays between requests if hitting rate limits
- Cross-reference: If a URL contains a DOI, verify the DOI directly
- Context matters: Note where citations appear (methods vs. claims)
- Report uncertainty: Always distinguish between "confirmed invalid" and "could not verify"
- 批量处理相似请求:将DOI检查分组,减少API调用次数
- 遵守速率限制:若触发速率限制,在请求间添加延迟
- 交叉验证:若URL包含DOI,直接验证DOI
- 关注上下文:记录引用出现的位置(方法部分 vs 结论部分)
- 报告不确定性:明确区分“确认无效”和“无法验证”
Output Suggestions for Invalid Citations
无效引用的输出建议
For each invalid citation, provide actionable suggestions:
- Wrong DOI format: "DOI appears malformed. Check for typos or extra characters."
- Non-existent DOI: "No paper found. This may be hallucinated. Search for the actual paper title."
- Dead URL: "URL returns 404. Try searching for the paper title on Google Scholar."
- Suspicious journal: "Publisher not recognized. Verify this is a legitimate source."
- Author-year not found: "Could not verify. Add DOI or URL for confirmation."
针对每个无效引用,提供可操作的建议:
- DOI格式错误:“DOI格式似乎有误。检查是否存在拼写错误或多余字符。”
- DOI不存在:“未找到对应论文。此引用可能是虚构的。请搜索真实的论文标题。”
- URL失效:“URL返回404。尝试在Google Scholar上搜索论文标题。”
- 期刊可疑:“出版商未被识别。请验证此来源是否合法。”
- 作者-年份未找到:“无法验证。请添加DOI或URL以确认。”
Example Verification Session
验证会话示例
User request: "Verify the citations in my-paper.md"
Expected behavior:
- Read my-paper.md
- Extract all DOIs, URLs, arXiv IDs, etc.
- Report: "Found 15 citations: 8 DOIs, 5 URLs, 2 arXiv IDs"
- Verify each identifier using appropriate API/fetch
- Generate report showing:
- 10 valid citations with metadata
- 3 invalid citations (404 errors) marked as likely hallucinated
- 2 suspicious citations (timeouts) requiring manual check
- Provide suggestions for fixing invalid citations
用户请求: “验证我的论文my-paper.md中的引用”
预期行为:
- 读取my-paper.md
- 提取所有DOI、URL、arXiv ID等
- 报告:“共发现15处引用:8个DOI、5个URL、2个arXiv ID”
- 使用合适的API/网络请求逐一验证每个标识
- 生成报告,显示:
- 10个有效引用及元数据
- 3个无效引用(404错误),标记为疑似虚构
- 2个可疑引用(请求超时),需手动检查
- 提供修复无效引用的建议
Limitations
局限性
- Rate limits: CrossRef and other APIs may rate-limit requests
- Paywalled content: Cannot verify full content behind paywalls
- New papers: Very recent papers may not be indexed yet
- Author-year citations: Low confidence without additional identifiers
- Non-English sources: Limited support for non-English citation formats
- Private/institutional URLs: Cannot access authenticated content
- 速率限制:CrossRef及其他API可能会限制请求速率
- 付费墙内容:无法验证付费墙后的完整内容
- 最新论文:极新的论文可能尚未被索引
- 作者-年份引用:无额外标识时置信度较低
- 非英文来源:对非英文引用格式的支持有限
- 私有/机构URL:无法访问需要认证的内容
Related Skills
相关技能
- literature-review: For conducting systematic literature searches
- scientific-reviewer: For reviewing scientific document quality
- scientific-writing: For writing with proper citations
- literature-review:用于系统性文献检索
- scientific-reviewer:用于审核科学文档质量
- scientific-writing:用于撰写规范引用的内容