paper-lookup
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePaper Lookup
论文检索
You have access to 10 academic paper databases through their REST APIs. Your job is to figure out which database(s) best serve the user's query, call them, and return the results.
你可通过REST API访问10个学术论文数据库。你的任务是判断哪些数据库最适合用户的查询需求,调用对应API并返回结果。
Core Workflow
核心工作流程
-
Understand the query -- What is the user looking for? A specific paper by DOI? Papers on a topic? An author's publications? Open access PDFs? Full text? This determines which database(s) to hit.
-
Select database(s) -- Use the database selection guide below. Many queries benefit from hitting multiple databases -- for example, searching PubMed for papers and then checking Unpaywall for open access copies.
-
Read the reference file -- Each database has a reference file inwith endpoint details, query formats, and example calls. Read the relevant file(s) before making API calls.
references/ -
Make the API call(s) -- See the Making API Calls section below for which HTTP fetch tool to use on your platform.
-
Return results -- Always return:
- The raw JSON (or parsed XML for arXiv) response from each database
- A list of databases queried with the specific endpoints used
- If a query returned no results, say so explicitly rather than omitting it
-
理解查询需求——用户要找什么?通过DOI查找特定论文?某主题的论文?作者的发表文献?开放获取PDF?全文?这将决定调用哪些数据库。
-
选择数据库——参考下方的数据库选择指南。许多查询需要调用多个数据库,例如先在PubMed中搜索论文,再通过Unpaywall查找开放获取版本。
-
查阅参考文件——每个数据库在目录下都有参考文件,包含端点详情、查询格式和调用示例。调用API前请阅读相关文件。
references/ -
调用API——参考下方「调用API」部分,根据你的平台选择合适的HTTP请求工具。
-
返回结果——必须返回:
- 每个数据库返回的**原始JSON(或arXiv的解析后XML)**响应
- 已查询的数据库列表及使用的具体端点
- 如果查询无结果,请明确说明,不要省略
Database Selection Guide
数据库选择指南
Match the user's intent to the right database(s).
根据用户的需求匹配合适的数据库。
By Use Case
按使用场景分类
| User is asking about... | Primary database(s) | Also consider |
|---|---|---|
| Papers on a biomedical topic | PubMed | Semantic Scholar, OpenAlex |
| Full text of a biomedical article | PMC | CORE |
| Biology preprints | bioRxiv | Semantic Scholar, OpenAlex |
| Health/medical preprints | medRxiv | Semantic Scholar, OpenAlex |
| Physics, math, or CS preprints | arXiv | Semantic Scholar, OpenAlex |
| Papers across all fields | OpenAlex | Semantic Scholar, Crossref |
| A specific paper by DOI | Crossref | Unpaywall, Semantic Scholar |
| Open access PDF for a paper | Unpaywall | CORE, PMC |
| Citation graph (who cites whom) | Semantic Scholar | OpenAlex |
| Author's publications | Semantic Scholar | OpenAlex |
| Paper recommendations | Semantic Scholar | -- |
| Full text (any field) | CORE | PMC (biomedical only) |
| Journal/publisher metadata | Crossref | OpenAlex |
| Funder information | Crossref | OpenAlex |
| Convert between PMID/PMCID/DOI | PMC (ID Converter) | Crossref |
| Recent preprints by date | bioRxiv, medRxiv | arXiv |
| 用户需求 | 首选数据库 | 可补充查询 |
|---|---|---|
| 生物医学主题论文 | PubMed | Semantic Scholar、OpenAlex |
| 生物医学论文全文 | PMC | CORE |
| 生物学预印本 | bioRxiv | Semantic Scholar、OpenAlex |
| 健康/医学预印本 | medRxiv | Semantic Scholar、OpenAlex |
| 物理、数学或计算机科学预印本 | arXiv | Semantic Scholar、OpenAlex |
| 全领域论文 | OpenAlex | Semantic Scholar、Crossref |
| 通过DOI查找特定论文 | Crossref | Unpaywall、Semantic Scholar |
| 论文的开放获取PDF | Unpaywall | CORE、PMC |
| 引用图谱(谁引用了该论文) | Semantic Scholar | OpenAlex |
| 作者发表文献 | Semantic Scholar | OpenAlex |
| 论文推荐 | Semantic Scholar | -- |
| 全领域论文全文 | CORE | PMC(仅限生物医学领域) |
| 期刊/出版商元数据 | Crossref | OpenAlex |
| 资助方信息 | Crossref | OpenAlex |
| PMID/PMCID/DOI互转 | PMC(ID转换器) | Crossref |
| 最新预印本(按日期) | bioRxiv、medRxiv | arXiv |
Cross-Database Queries
跨数据库查询
| User is asking about... | Databases to query |
|---|---|
| Everything about a paper (metadata + citations + OA) | Crossref + Semantic Scholar + Unpaywall |
| Comprehensive literature search | PubMed + OpenAlex + Semantic Scholar |
| Find and read a paper | PubMed (find) + Unpaywall (OA link) + PMC or CORE (full text) |
| Preprint and its published version | bioRxiv/medRxiv + Crossref |
| Author overview with citation metrics | Semantic Scholar + OpenAlex |
When a query spans multiple needs (e.g., "find papers about CRISPR and get me the PDFs"), query the relevant databases in parallel.
| 用户需求 | 需查询的数据库 |
|---|---|
| 论文的完整信息(元数据+引用+开放获取状态) | Crossref + Semantic Scholar + Unpaywall |
| 全面的文献搜索 | PubMed + OpenAlex + Semantic Scholar |
| 查找并获取论文 | PubMed(查找) + Unpaywall(开放获取链接) + PMC或CORE(全文) |
| 预印本及其已发表版本 | bioRxiv/medRxiv + Crossref |
| 作者概况及引用指标 | Semantic Scholar + OpenAlex |
当查询涉及多种需求时(例如“查找CRISPR相关论文并获取PDF”),可并行调用相关数据库。
Common Identifier Formats
常见标识符格式
Different databases use different identifier systems. If a query fails, the identifier format may be wrong.
| Identifier | Format | Example | Used by |
|---|---|---|---|
| DOI | | | All databases |
| PMID | Integer | | PubMed, PMC, Semantic Scholar |
| PMCID | | | PMC, Europe PMC |
| arXiv ID | | | arXiv, Semantic Scholar |
| OpenAlex ID | | | OpenAlex |
| Semantic Scholar ID | 40-char hex | | Semantic Scholar |
| ORCID | | | OpenAlex, Crossref |
| ISSN | | | Crossref, OpenAlex |
Cross-referencing IDs: Semantic Scholar accepts DOI, PMID, PMCID, and arXiv ID via prefixes (e.g., , , ). OpenAlex accepts DOI and PMID via prefixes (, ). Use the PMC ID Converter to translate between PMID, PMCID, and DOI.
DOI:10.1038/nature12373PMID:34567890ARXIV:2103.15348doi:10.1038/...pmid:34567890不同数据库使用不同的标识符体系。如果查询失败,可能是标识符格式错误。
| 标识符 | 格式 | 示例 | 使用数据库 |
|---|---|---|---|
| DOI | | | 所有数据库 |
| PMID | 整数 | | PubMed、PMC、Semantic Scholar |
| PMCID | | | PMC、Europe PMC |
| arXiv ID | | | arXiv、Semantic Scholar |
| OpenAlex ID | | | OpenAlex |
| Semantic Scholar ID | 40位十六进制字符串 | | Semantic Scholar |
| ORCID | | | OpenAlex、Crossref |
| ISSN | | | Crossref、OpenAlex |
标识符交叉引用:Semantic Scholar支持通过前缀识别DOI、PMID、PMCID和arXiv ID(例如、、)。OpenAlex支持通过前缀识别DOI和PMID(、)。使用PMC ID转换器可实现PMID、PMCID和DOI之间的互转。
DOI:10.1038/nature12373PMID:34567890ARXIV:2103.15348doi:10.1038/...pmid:34567890API Keys and Access
API密钥与访问权限
Most of these databases are fully open. A few benefit from API keys for higher rate limits.
大多数数据库完全开放访问。部分数据库使用API密钥可获得更高的请求速率限制。
Databases requiring or benefiting from API keys
需要或推荐使用API密钥的数据库
| Database | Env Variable | Required? | Registration |
|---|---|---|---|
| NCBI (PubMed, PMC) | | No (3 req/s without, 10 with) | https://www.ncbi.nlm.nih.gov/account/settings/ |
| CORE | | Yes for full text | https://core.ac.uk/services/api |
| Semantic Scholar | | No (shared pool without) | https://www.semanticscholar.org/product/api#api-key-form |
| OpenAlex | | Recommended | https://openalex.org/settings/api |
| 数据库 | 环境变量 | 是否必填 | 注册地址 |
|---|---|---|---|
| NCBI(PubMed、PMC) | | 否(无密钥时3请求/秒,有密钥时10请求/秒) | https://www.ncbi.nlm.nih.gov/account/settings/ |
| CORE | | 全文查询必填 | https://core.ac.uk/services/api |
| Semantic Scholar | | 否(无密钥时使用共享资源池) | https://www.semanticscholar.org/product/api#api-key-form |
| OpenAlex | | 推荐使用 | https://openalex.org/settings/api |
Fully open databases (no key needed)
完全开放的数据库(无需密钥)
| Database | Notes |
|---|---|
| bioRxiv / medRxiv | No auth, no documented rate limits |
| arXiv | No auth, max 1 request per 3 seconds |
| Crossref | No auth; add |
| Unpaywall | No auth; requires |
| 数据库 | 说明 |
|---|---|
| bioRxiv / medRxiv | 无需认证,无明确速率限制 |
| arXiv | 无需认证,最多每3秒1次请求 |
| Crossref | 无需认证;添加 |
| Unpaywall | 无需认证;需提供 |
Loading API keys
加载API密钥
- Check the environment first -- the key may already be exported (e.g., ).
$NCBI_API_KEY - Fall back to -- check
.envin the current working directory..env - Proceed without -- most APIs still work at lower rate limits. Tell the user which key is missing and how to get one.
- 优先检查环境变量——密钥可能已导出(例如)。
$NCBI_API_KEY - 其次检查文件——查看当前工作目录下的
.env文件。.env - 无密钥时继续——大多数API在低速率限制下仍可使用。告知用户缺少的密钥及获取方式。
Making API Calls
调用API
Use your environment's HTTP fetch tool to call REST endpoints:
| Platform | HTTP Fetch Tool | Fallback |
|---|---|---|
| Claude Code | | |
| Gemini CLI | | |
| Windsurf | | |
| Cursor | No dedicated fetch tool | |
| Codex CLI | No dedicated fetch tool | |
| Cline | No dedicated fetch tool | |
If the fetch tool fails, fall back to via whatever shell tool is available.
curl使用你所在环境的HTTP请求工具调用REST端点:
| 平台 | HTTP请求工具 | 备选方案 |
|---|---|---|
| Claude Code | | 通过Bash使用 |
| Gemini CLI | | 通过Shell使用 |
| Windsurf | | 通过终端使用 |
| Cursor | 无专用请求工具 | 通过 |
| Codex CLI | 无专用请求工具 | 通过 |
| Cline | 无专用请求工具 | 通过 |
如果请求工具失败,可使用任何可用的Shell工具调用作为备选。
curlSpecial cases
特殊情况
- arXiv returns Atom XML, not JSON. Parse it or use and extract the relevant fields. Consider piping through a simple parser if available.
curl - PMC eFetch returns JATS XML for full text. This is expected -- full text articles are in XML format.
- Crossref and Unpaywall benefit from including a parameter or email for the polite/fast pool.
mailto
- arXiv返回Atom XML格式,而非JSON。请解析该格式,或使用提取相关字段。若有可用工具,可通过简单解析器处理。
curl - PMC eFetch返回JATS XML格式的全文。这是预期结果——全文文章采用XML格式。
- Crossref和Unpaywall添加参数或邮箱地址可进入礼貌/高速请求池。
mailto
Request guidelines
请求准则
- For NCBI APIs (PubMed, PMC): max 3 req/sec without key, 10 with key. Make requests sequentially.
- For arXiv: max 1 request every 3 seconds. Be patient.
- For Crossref: 5 req/sec (public), 10 req/sec (polite pool with ).
mailto - For other APIs with no strict limits, you can query multiple databases in parallel.
- If you get HTTP 429 (rate limit), wait briefly and retry once.
- NCBI API(PubMed、PMC):无密钥时最多3请求/秒,有密钥时最多10请求/秒。请按顺序发起请求。
- arXiv:最多每3秒1次请求。请耐心等待。
- Crossref:公开池5请求/秒,礼貌请求池(带)10请求/秒。
mailto - 对于无严格速率限制的其他API,可并行调用多个数据库。
- 若收到HTTP 429(速率限制)错误,请稍作等待后重试一次。
Error recovery
错误恢复
- Check the identifier format -- use the Common Identifier Formats table. A PMID won't work in arXiv, an arXiv ID won't work in PubMed directly.
- Try alternative identifiers -- if a DOI fails in one database, try the title or PMID instead.
- Try a different database -- if PubMed returns nothing for a CS paper, try Semantic Scholar or OpenAlex.
- Report the failure -- tell the user which database failed, the error, and what you tried instead.
- 检查标识符格式——参考常见标识符格式表。PMID无法在arXiv中使用,arXiv ID无法直接在PubMed中使用。
- 尝试替代标识符——若DOI在某数据库中查询失败,可尝试使用标题或PMID。
- 尝试其他数据库——若PubMed未找到计算机科学领域的论文,可尝试Semantic Scholar或OpenAlex。
- 报告失败情况——告知用户哪个数据库查询失败、错误信息及你尝试的替代方案。
Output Format
输出格式
Structure your response like this:
undefined请按以下结构返回响应:
undefinedDatabases Queried
已查询数据库
- PubMed -- esearch + esummary for "CRISPR gene therapy"
- Unpaywall -- DOI lookup for 10.1038/...
- PubMed -- 针对"CRISPR基因疗法"调用esearch + esummary接口
- Unpaywall -- 检索DOI为10.1038/...的文献
Results
结果
PubMed
PubMed
[raw JSON response or formatted results]
[原始JSON响应或格式化结果]
Unpaywall
Unpaywall
[raw JSON response]
If results are very large, present the most relevant portion and note that more data is available. But default to showing the full raw JSON -- the user asked for it.[原始JSON响应]
若结果过大,可展示最相关部分并注明还有更多数据可用。但默认应展示完整的原始JSON——用户要求获取该内容。Available Databases
可用数据库
Read the relevant reference file before making any API call.
调用任何API前,请阅读相关参考文件。
Biomedical Literature
生物医学文献
| Database | Reference File | What it covers |
|---|---|---|
| PubMed | | 37M+ biomedical citations, abstracts, MeSH terms |
| PMC | | 10M+ full-text biomedical articles (JATS XML), ID conversion |
| 数据库 | 参考文件 | 覆盖内容 |
|---|---|---|
| PubMed | | 3700万+生物医学引用、摘要、MeSH术语 |
| PMC | | 1000万+生物医学全文文章(JATS XML格式)、ID转换 |
Preprint Servers
预印本服务器
| Database | Reference File | What it covers |
|---|---|---|
| bioRxiv | | Biology preprints (browse by date/DOI, no keyword search) |
| medRxiv | | Health sciences preprints (browse by date/DOI, no keyword search) |
| arXiv | | Physics, math, CS, biology, economics preprints (keyword search, Atom XML) |
| 数据库 | 参考文件 | 覆盖内容 |
|---|---|---|
| bioRxiv | | 生物学预印本(可按日期/DOI浏览,无关键词搜索) |
| medRxiv | | 健康科学预印本(可按日期/DOI浏览,无关键词搜索) |
| arXiv | | 物理、数学、计算机科学、生物学、经济学预印本(支持关键词搜索,返回Atom XML格式) |
Multidisciplinary Indexes
多学科索引
| Database | Reference File | What it covers |
|---|---|---|
| OpenAlex | | 250M+ works, authors, institutions, topics, citation data |
| Crossref | | 150M+ DOI metadata, journals, funders, references |
| Semantic Scholar | | 200M+ papers, citation graphs, AI-generated TLDRs, recommendations |
| 数据库 | 参考文件 | 覆盖内容 |
|---|---|---|
| OpenAlex | | 2.5亿+文献、作者、机构、主题、引用数据 |
| Crossref | | 1.5亿+DOI元数据、期刊、资助方、参考文献 |
| Semantic Scholar | | 2亿+论文、引用图谱、AI生成的TLDR、推荐内容 |
Open Access & Full Text
开放获取与全文
| Database | Reference File | What it covers |
|---|---|---|
| CORE | | 37M+ full texts from OA repositories worldwide |
| Unpaywall | | OA status and PDF links for any DOI |
| 数据库 | 参考文件 | 覆盖内容 |
|---|---|---|
| CORE | | 3700万+来自全球开放获取知识库的全文 |
| Unpaywall | | 任何DOI的开放获取状态及PDF链接 |