paper-lookup

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Paper Lookup

论文检索

You have access to 10 academic paper databases through their REST APIs. Your job is to figure out which database(s) best serve the user's query, call them, and return the results.
你可通过REST API访问10个学术论文数据库。你的任务是判断哪些数据库最适合用户的查询需求,调用对应API并返回结果。

Core Workflow

核心工作流程

  1. Understand the query -- What is the user looking for? A specific paper by DOI? Papers on a topic? An author's publications? Open access PDFs? Full text? This determines which database(s) to hit.
  2. Select database(s) -- Use the database selection guide below. Many queries benefit from hitting multiple databases -- for example, searching PubMed for papers and then checking Unpaywall for open access copies.
  3. Read the reference file -- Each database has a reference file in
    references/
    with endpoint details, query formats, and example calls. Read the relevant file(s) before making API calls.
  4. Make the API call(s) -- See the Making API Calls section below for which HTTP fetch tool to use on your platform.
  5. Return results -- Always return:
    • The raw JSON (or parsed XML for arXiv) response from each database
    • A list of databases queried with the specific endpoints used
    • If a query returned no results, say so explicitly rather than omitting it
  1. 理解查询需求——用户要找什么?通过DOI查找特定论文?某主题的论文?作者的发表文献?开放获取PDF?全文?这将决定调用哪些数据库。
  2. 选择数据库——参考下方的数据库选择指南。许多查询需要调用多个数据库,例如先在PubMed中搜索论文,再通过Unpaywall查找开放获取版本。
  3. 查阅参考文件——每个数据库在
    references/
    目录下都有参考文件,包含端点详情、查询格式和调用示例。调用API前请阅读相关文件。
  4. 调用API——参考下方「调用API」部分,根据你的平台选择合适的HTTP请求工具。
  5. 返回结果——必须返回:
    • 每个数据库返回的**原始JSON(或arXiv的解析后XML)**响应
    • 已查询的数据库列表及使用的具体端点
    • 如果查询无结果,请明确说明,不要省略

Database Selection Guide

数据库选择指南

Match the user's intent to the right database(s).
根据用户的需求匹配合适的数据库。

By Use Case

按使用场景分类

User is asking about...Primary database(s)Also consider
Papers on a biomedical topicPubMedSemantic Scholar, OpenAlex
Full text of a biomedical articlePMCCORE
Biology preprintsbioRxivSemantic Scholar, OpenAlex
Health/medical preprintsmedRxivSemantic Scholar, OpenAlex
Physics, math, or CS preprintsarXivSemantic Scholar, OpenAlex
Papers across all fieldsOpenAlexSemantic Scholar, Crossref
A specific paper by DOICrossrefUnpaywall, Semantic Scholar
Open access PDF for a paperUnpaywallCORE, PMC
Citation graph (who cites whom)Semantic ScholarOpenAlex
Author's publicationsSemantic ScholarOpenAlex
Paper recommendationsSemantic Scholar--
Full text (any field)COREPMC (biomedical only)
Journal/publisher metadataCrossrefOpenAlex
Funder informationCrossrefOpenAlex
Convert between PMID/PMCID/DOIPMC (ID Converter)Crossref
Recent preprints by datebioRxiv, medRxivarXiv
用户需求首选数据库可补充查询
生物医学主题论文PubMedSemantic Scholar、OpenAlex
生物医学论文全文PMCCORE
生物学预印本bioRxivSemantic Scholar、OpenAlex
健康/医学预印本medRxivSemantic Scholar、OpenAlex
物理、数学或计算机科学预印本arXivSemantic Scholar、OpenAlex
全领域论文OpenAlexSemantic Scholar、Crossref
通过DOI查找特定论文CrossrefUnpaywall、Semantic Scholar
论文的开放获取PDFUnpaywallCORE、PMC
引用图谱(谁引用了该论文)Semantic ScholarOpenAlex
作者发表文献Semantic ScholarOpenAlex
论文推荐Semantic Scholar--
全领域论文全文COREPMC(仅限生物医学领域)
期刊/出版商元数据CrossrefOpenAlex
资助方信息CrossrefOpenAlex
PMID/PMCID/DOI互转PMC(ID转换器)Crossref
最新预印本(按日期)bioRxiv、medRxivarXiv

Cross-Database Queries

跨数据库查询

User is asking about...Databases to query
Everything about a paper (metadata + citations + OA)Crossref + Semantic Scholar + Unpaywall
Comprehensive literature searchPubMed + OpenAlex + Semantic Scholar
Find and read a paperPubMed (find) + Unpaywall (OA link) + PMC or CORE (full text)
Preprint and its published versionbioRxiv/medRxiv + Crossref
Author overview with citation metricsSemantic Scholar + OpenAlex
When a query spans multiple needs (e.g., "find papers about CRISPR and get me the PDFs"), query the relevant databases in parallel.
用户需求需查询的数据库
论文的完整信息(元数据+引用+开放获取状态)Crossref + Semantic Scholar + Unpaywall
全面的文献搜索PubMed + OpenAlex + Semantic Scholar
查找并获取论文PubMed(查找) + Unpaywall(开放获取链接) + PMC或CORE(全文)
预印本及其已发表版本bioRxiv/medRxiv + Crossref
作者概况及引用指标Semantic Scholar + OpenAlex
当查询涉及多种需求时(例如“查找CRISPR相关论文并获取PDF”),可并行调用相关数据库。

Common Identifier Formats

常见标识符格式

Different databases use different identifier systems. If a query fails, the identifier format may be wrong.
IdentifierFormatExampleUsed by
DOI
10.xxxx/xxxxx
10.1038/nature12373
All databases
PMIDInteger
34567890
PubMed, PMC, Semantic Scholar
PMCID
PMC
+ digits
PMC7029759
PMC, Europe PMC
arXiv ID
YYMM.NNNNN
2103.15348
arXiv, Semantic Scholar
OpenAlex ID
W
+ digits
W2741809807
OpenAlex
Semantic Scholar ID40-char hex
649def34f8be...
Semantic Scholar
ORCID
0000-XXXX-XXXX-XXXX
0000-0001-6187-6610
OpenAlex, Crossref
ISSN
XXXX-XXXX
0028-0836
Crossref, OpenAlex
Cross-referencing IDs: Semantic Scholar accepts DOI, PMID, PMCID, and arXiv ID via prefixes (e.g.,
DOI:10.1038/nature12373
,
PMID:34567890
,
ARXIV:2103.15348
). OpenAlex accepts DOI and PMID via prefixes (
doi:10.1038/...
,
pmid:34567890
). Use the PMC ID Converter to translate between PMID, PMCID, and DOI.
不同数据库使用不同的标识符体系。如果查询失败,可能是标识符格式错误。
标识符格式示例使用数据库
DOI
10.xxxx/xxxxx
10.1038/nature12373
所有数据库
PMID整数
34567890
PubMed、PMC、Semantic Scholar
PMCID
PMC
+ 数字
PMC7029759
PMC、Europe PMC
arXiv ID
YYMM.NNNNN
2103.15348
arXiv、Semantic Scholar
OpenAlex ID
W
+ 数字
W2741809807
OpenAlex
Semantic Scholar ID40位十六进制字符串
649def34f8be...
Semantic Scholar
ORCID
0000-XXXX-XXXX-XXXX
0000-0001-6187-6610
OpenAlex、Crossref
ISSN
XXXX-XXXX
0028-0836
Crossref、OpenAlex
标识符交叉引用:Semantic Scholar支持通过前缀识别DOI、PMID、PMCID和arXiv ID(例如
DOI:10.1038/nature12373
PMID:34567890
ARXIV:2103.15348
)。OpenAlex支持通过前缀识别DOI和PMID(
doi:10.1038/...
pmid:34567890
)。使用PMC ID转换器可实现PMID、PMCID和DOI之间的互转。

API Keys and Access

API密钥与访问权限

Most of these databases are fully open. A few benefit from API keys for higher rate limits.
大多数数据库完全开放访问。部分数据库使用API密钥可获得更高的请求速率限制。

Databases requiring or benefiting from API keys

需要或推荐使用API密钥的数据库

DatabaseEnv VariableRequired?Registration
NCBI (PubMed, PMC)
NCBI_API_KEY
No (3 req/s without, 10 with)https://www.ncbi.nlm.nih.gov/account/settings/
CORE
CORE_API_KEY
Yes for full texthttps://core.ac.uk/services/api
Semantic Scholar
S2_API_KEY
No (shared pool without)https://www.semanticscholar.org/product/api#api-key-form
OpenAlex
OPENALEX_API_KEY
Recommendedhttps://openalex.org/settings/api
数据库环境变量是否必填注册地址
NCBI(PubMed、PMC)
NCBI_API_KEY
否(无密钥时3请求/秒,有密钥时10请求/秒)https://www.ncbi.nlm.nih.gov/account/settings/
CORE
CORE_API_KEY
全文查询必填https://core.ac.uk/services/api
Semantic Scholar
S2_API_KEY
否(无密钥时使用共享资源池)https://www.semanticscholar.org/product/api#api-key-form
OpenAlex
OPENALEX_API_KEY
推荐使用https://openalex.org/settings/api

Fully open databases (no key needed)

完全开放的数据库(无需密钥)

DatabaseNotes
bioRxiv / medRxivNo auth, no documented rate limits
arXivNo auth, max 1 request per 3 seconds
CrossrefNo auth; add
mailto
param for polite pool (2x rate limit)
UnpaywallNo auth; requires
email
parameter
数据库说明
bioRxiv / medRxiv无需认证,无明确速率限制
arXiv无需认证,最多每3秒1次请求
Crossref无需认证;添加
mailto
参数可进入礼貌请求池(速率限制翻倍)
Unpaywall无需认证;需提供
email
参数

Loading API keys

加载API密钥

  1. Check the environment first -- the key may already be exported (e.g.,
    $NCBI_API_KEY
    ).
  2. Fall back to
    .env
    -- check
    .env
    in the current working directory.
  3. Proceed without -- most APIs still work at lower rate limits. Tell the user which key is missing and how to get one.
  1. 优先检查环境变量——密钥可能已导出(例如
    $NCBI_API_KEY
    )。
  2. 其次检查
    .env
    文件
    ——查看当前工作目录下的
    .env
    文件。
  3. 无密钥时继续——大多数API在低速率限制下仍可使用。告知用户缺少的密钥及获取方式。

Making API Calls

调用API

Use your environment's HTTP fetch tool to call REST endpoints:
PlatformHTTP Fetch ToolFallback
Claude Code
WebFetch
curl
via Bash
Gemini CLI
web_fetch
curl
via shell
Windsurf
read_url_content
curl
via terminal
CursorNo dedicated fetch tool
curl
via
run_terminal_cmd
Codex CLINo dedicated fetch tool
curl
via
shell
ClineNo dedicated fetch tool
curl
via
execute_command
If the fetch tool fails, fall back to
curl
via whatever shell tool is available.
使用你所在环境的HTTP请求工具调用REST端点:
平台HTTP请求工具备选方案
Claude Code
WebFetch
通过Bash使用
curl
Gemini CLI
web_fetch
通过Shell使用
curl
Windsurf
read_url_content
通过终端使用
curl
Cursor无专用请求工具通过
run_terminal_cmd
使用
curl
Codex CLI无专用请求工具通过
shell
使用
curl
Cline无专用请求工具通过
execute_command
使用
curl
如果请求工具失败,可使用任何可用的Shell工具调用
curl
作为备选。

Special cases

特殊情况

  • arXiv returns Atom XML, not JSON. Parse it or use
    curl
    and extract the relevant fields. Consider piping through a simple parser if available.
  • PMC eFetch returns JATS XML for full text. This is expected -- full text articles are in XML format.
  • Crossref and Unpaywall benefit from including a
    mailto
    parameter or email for the polite/fast pool.
  • arXiv返回Atom XML格式,而非JSON。请解析该格式,或使用
    curl
    提取相关字段。若有可用工具,可通过简单解析器处理。
  • PMC eFetch返回JATS XML格式的全文。这是预期结果——全文文章采用XML格式。
  • Crossref和Unpaywall添加
    mailto
    参数或邮箱地址可进入礼貌/高速请求池。

Request guidelines

请求准则

  • For NCBI APIs (PubMed, PMC): max 3 req/sec without key, 10 with key. Make requests sequentially.
  • For arXiv: max 1 request every 3 seconds. Be patient.
  • For Crossref: 5 req/sec (public), 10 req/sec (polite pool with
    mailto
    ).
  • For other APIs with no strict limits, you can query multiple databases in parallel.
  • If you get HTTP 429 (rate limit), wait briefly and retry once.
  • NCBI API(PubMed、PMC):无密钥时最多3请求/秒,有密钥时最多10请求/秒。请按顺序发起请求。
  • arXiv:最多每3秒1次请求。请耐心等待。
  • Crossref:公开池5请求/秒,礼貌请求池(带
    mailto
    )10请求/秒。
  • 对于无严格速率限制的其他API,可并行调用多个数据库。
  • 若收到HTTP 429(速率限制)错误,请稍作等待后重试一次。

Error recovery

错误恢复

  1. Check the identifier format -- use the Common Identifier Formats table. A PMID won't work in arXiv, an arXiv ID won't work in PubMed directly.
  2. Try alternative identifiers -- if a DOI fails in one database, try the title or PMID instead.
  3. Try a different database -- if PubMed returns nothing for a CS paper, try Semantic Scholar or OpenAlex.
  4. Report the failure -- tell the user which database failed, the error, and what you tried instead.
  1. 检查标识符格式——参考常见标识符格式表。PMID无法在arXiv中使用,arXiv ID无法直接在PubMed中使用。
  2. 尝试替代标识符——若DOI在某数据库中查询失败,可尝试使用标题或PMID。
  3. 尝试其他数据库——若PubMed未找到计算机科学领域的论文,可尝试Semantic Scholar或OpenAlex。
  4. 报告失败情况——告知用户哪个数据库查询失败、错误信息及你尝试的替代方案。

Output Format

输出格式

Structure your response like this:
undefined
请按以下结构返回响应:
undefined

Databases Queried

已查询数据库

  • PubMed -- esearch + esummary for "CRISPR gene therapy"
  • Unpaywall -- DOI lookup for 10.1038/...
  • PubMed -- 针对"CRISPR基因疗法"调用esearch + esummary接口
  • Unpaywall -- 检索DOI为10.1038/...的文献

Results

结果

PubMed

PubMed

[raw JSON response or formatted results]
[原始JSON响应或格式化结果]

Unpaywall

Unpaywall

[raw JSON response]

If results are very large, present the most relevant portion and note that more data is available. But default to showing the full raw JSON -- the user asked for it.
[原始JSON响应]

若结果过大,可展示最相关部分并注明还有更多数据可用。但默认应展示完整的原始JSON——用户要求获取该内容。

Available Databases

可用数据库

Read the relevant reference file before making any API call.
调用任何API前,请阅读相关参考文件。

Biomedical Literature

生物医学文献

DatabaseReference FileWhat it covers
PubMed
references/pubmed.md
37M+ biomedical citations, abstracts, MeSH terms
PMC
references/pmc.md
10M+ full-text biomedical articles (JATS XML), ID conversion
数据库参考文件覆盖内容
PubMed
references/pubmed.md
3700万+生物医学引用、摘要、MeSH术语
PMC
references/pmc.md
1000万+生物医学全文文章(JATS XML格式)、ID转换

Preprint Servers

预印本服务器

DatabaseReference FileWhat it covers
bioRxiv
references/biorxiv.md
Biology preprints (browse by date/DOI, no keyword search)
medRxiv
references/medrxiv.md
Health sciences preprints (browse by date/DOI, no keyword search)
arXiv
references/arxiv.md
Physics, math, CS, biology, economics preprints (keyword search, Atom XML)
数据库参考文件覆盖内容
bioRxiv
references/biorxiv.md
生物学预印本(可按日期/DOI浏览,无关键词搜索)
medRxiv
references/medrxiv.md
健康科学预印本(可按日期/DOI浏览,无关键词搜索)
arXiv
references/arxiv.md
物理、数学、计算机科学、生物学、经济学预印本(支持关键词搜索,返回Atom XML格式)

Multidisciplinary Indexes

多学科索引

DatabaseReference FileWhat it covers
OpenAlex
references/openalex.md
250M+ works, authors, institutions, topics, citation data
Crossref
references/crossref.md
150M+ DOI metadata, journals, funders, references
Semantic Scholar
references/semantic-scholar.md
200M+ papers, citation graphs, AI-generated TLDRs, recommendations
数据库参考文件覆盖内容
OpenAlex
references/openalex.md
2.5亿+文献、作者、机构、主题、引用数据
Crossref
references/crossref.md
1.5亿+DOI元数据、期刊、资助方、参考文献
Semantic Scholar
references/semantic-scholar.md
2亿+论文、引用图谱、AI生成的TLDR、推荐内容

Open Access & Full Text

开放获取与全文

DatabaseReference FileWhat it covers
CORE
references/core.md
37M+ full texts from OA repositories worldwide
Unpaywall
references/unpaywall.md
OA status and PDF links for any DOI
数据库参考文件覆盖内容
CORE
references/core.md
3700万+来自全球开放获取知识库的全文
Unpaywall
references/unpaywall.md
任何DOI的开放获取状态及PDF链接