literature-search-europepmc
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseEurope PMC Database
Europe PMC数据库
A skill for searching, downloading, and exploring open-access papers from
Europe PMC — a comprehensive, free life-science
literature database with over 43 million abstracts and 9 million full-text
articles.
这是一款用于搜索、下载和浏览Europe PMC开放获取论文的skill——Europe PMC是一个全面的免费生命科学文献数据库,包含超过4300万篇摘要和900万篇全文文章。
Prerequisites
前置条件
- : Read the
uvskill and follow its Setup instructions to ensureuvis installed and on PATH.uv - User Notification: If LICENSE_NOTIFICATION.txt does not already exist in this skill directory then (1) prominently notify the user to check the terms at https://europepmc.org/ and to always check the license of the papers retrieved by the skill for any restrictions, then (2) create the file recording the notification text and timestamp.
- :阅读
uvskill并按照其设置说明确保uv已安装且在PATH中。uv - 用户通知:如果此skill目录中不存在LICENSE_NOTIFICATION.txt文件,则(1)显著通知用户查看https://europepmc.org/上的条款,并始终检查该skill检索到的论文的许可证是否存在任何限制,然后(2)创建记录通知文本和时间戳的文件。
Core Rules
核心规则
- Open Access Only: This skill exclusively searches open-access content.
The script automatically appends to every search query. Do NOT remove or override this filter.
OPEN_ACCESS:y - NEVER run python3 or python3 -c directly: the system Python does not necessarily have all key dependencies. Do not attempt to pip install or create new venvs.
- Use the Wrapper: ALWAYS use the provided script rather than calling the API directly. The script handles rate limiting (1 req/s) and errors.
- Output Files: All subcommands require to write results to a file. Read the output file separately to avoid context overflow.
--output - List Sources. If this skill is used, ensure this is mentioned in the output AND list the URLs of all papers that were used in producing the output.
- 仅开放获取内容:本skill仅搜索开放获取内容。脚本会自动在每个搜索查询后追加。请勿移除或覆盖此过滤器。
OPEN_ACCESS:y - 切勿直接运行python3或python3 -c:系统Python不一定包含所有关键依赖。请勿尝试使用pip安装或创建新的虚拟环境。
- 使用封装脚本:始终使用提供的脚本而非直接调用API。该脚本会处理速率限制(每秒1次请求)和错误。
- 输出文件:所有子命令都需要参数将结果写入文件。请单独读取输出文件以避免上下文溢出。
--output - 列出来源:如果使用本skill,请确保在输出中提及这一点,并列出所有用于生成输出的论文的URL。
Utility Scripts
实用脚本
All commands are subcommands of . Rate limiting and
retries are handled automatically.
scripts/europepmc_api.py所有命令都是的子命令。速率限制和重试会自动处理。
scripts/europepmc_api.py1. Search (search
)
search1. 搜索(search
)
searchSearch Europe PMC by query. Supports DOI lookup, keyword search, author search,
PMID lookup, and the full
Europe PMC search syntax.
bash
undefined通过查询语句搜索Europe PMC。支持DOI查找、关键词搜索、作者搜索、PMID查找以及完整的Europe PMC搜索语法。
bash
undefinedLook up a paper by DOI
通过DOI查找论文
uv run scripts/europepmc_api.py search "DOI:10.1038/s41586-021-03819-2" --output result.json
uv run scripts/europepmc_api.py search "DOI:10.1038/s41586-021-03819-2" --output result.json
Keyword search
关键词搜索
uv run scripts/europepmc_api.py search "CRISPR cancer" --max_results 5 --output results.json
uv run scripts/europepmc_api.py search "CRISPR cancer" --max_results 5 --output results.json
Author search
作者搜索
uv run scripts/europepmc_api.py search "AUTH:Jumper J" --max_results 10 --output results.json
uv run scripts/europepmc_api.py search "AUTH:Jumper J" --max_results 10 --output results.json
PMID lookup
PMID查找
uv run scripts/europepmc_api.py search "EXT_ID:34265844 AND SRC:MED" --output result.json
uv run scripts/europepmc_api.py search "EXT_ID:34265844 AND SRC:MED" --output result.json
Sorted by citations
按引用量排序
uv run scripts/europepmc_api.py search "machine learning"
--sort "CITED desc" --max_results 20 --output results.json
--sort "CITED desc" --max_results 20 --output results.json
**Arguments:**
- `query` (str, required) — search query using Europe PMC syntax
- `--output` (str, required) — output JSON file path
- `--max_results` (int, default 10) — maximum results per page (max 1000)
- `--result_type` (str, default `core`) — `core` (full metadata) or `lite`
- `--cursor` (str, default `*`) — cursor mark for pagination; pass the
`nextCursorMark` value from a previous response to get the next page
- `--sort` (str) — sort order, e.g. `CITED desc`, `P_PDATE_D desc`
(publication date descending), `P_PDATE_D asc`
**Output:** JSON file with three fields:
- `hitCount` (int) — total number of matching articles
- `nextCursorMark` (str) — cursor for next page; empty string if no more pages
- `results` (list) — array of article metadata objects
**Search Syntax Quick Reference:**
- `DOI:10.xxxx/yyyy` — look up by DOI
- `EXT_ID:12345678 AND SRC:MED` — look up by PMID
- `AUTH:surname initials` — author search
- `TITLE:keyword` — search in title only
- `JOURNAL:name` — search by journal
- `PUB_YEAR:2024` or `(FIRST_PDATE:[2023-01-01 TO 2023-12-31])` — date filter
- `HAS_FT:y` — restrict to articles with full text in Europe PMC
- Boolean operators: `AND`, `OR`, `NOT`
> **Note**: `OPEN_ACCESS:y` is automatically appended to all queries. You do not
> need to add it manually.uv run scripts/europepmc_api.py search "machine learning"
--sort "CITED desc" --max_results 20 --output results.json
--sort "CITED desc" --max_results 20 --output results.json
**参数:**
- `query`(字符串,必填)—— 使用Europe PMC语法的搜索查询语句
- `--output`(字符串,必填)—— 输出JSON文件路径
- `--max_results`(整数,默认值10)—— 每页最大结果数(最多1000)
- `--result_type`(字符串,默认值`core`)—— `core`(完整元数据)或`lite`(精简元数据)
- `--cursor`(字符串,默认值`*`)—— 分页游标标记;传入上一次响应中的`nextCursorMark`值以获取下一页
- `--sort`(字符串)—— 排序顺序,例如`CITED desc`(引用量降序)、`P_PDATE_D desc`(发表日期降序)、`P_PDATE_D asc`(发表日期升序)
**输出:** 包含三个字段的JSON文件:
- `hitCount`(整数)—— 匹配文章的总数
- `nextCursorMark`(字符串)—— 下一页的游标;若无更多页面则为空字符串
- `results`(列表)—— 文章元数据对象数组
**搜索语法速查:**
- `DOI:10.xxxx/yyyy` —— 通过DOI查找
- `EXT_ID:12345678 AND SRC:MED` —— 通过PMID查找
- `AUTH:surname initials` —— 作者搜索
- `TITLE:keyword` —— 仅在标题中搜索
- `JOURNAL:name` —— 通过期刊名称搜索
- `PUB_YEAR:2024` 或 `(FIRST_PDATE:[2023-01-01 TO 2023-12-31])` —— 日期过滤
- `HAS_FT:y` —— 限制为Europe PMC中有全文的文章
- 布尔运算符:`AND`、`OR`、`NOT`
> **注意**:`OPEN_ACCESS:y`会自动追加到所有查询语句中,无需手动添加。2. Download PDF (download_pdf
)
download_pdf2. 下载PDF(download_pdf
)
download_pdfDownload an open-access PDF from Europe PMC by PMCID.
bash
uv run scripts/europepmc_api.py download_pdf PMC8371605 --output alphafold.pdfArguments:
- (str, required) — PubMed Central ID (e.g.,
pmcid)PMC8371605 - (str, required) — filepath to save the PDF
--output
Output: Saves the PDF to the specified file. Exits with an error if the
PMCID is not found or the response is not a valid PDF. Whenever you download a
PDF, check the pdf downloaded is not empty or corrupted.
通过PMCID从Europe PMC下载开放获取的PDF文件。
bash
uv run scripts/europepmc_api.py download_pdf PMC8371605 --output alphafold.pdf参数:
- (字符串,必填)—— PubMed Central ID(例如
pmcid)PMC8371605 - (字符串,必填)—— 保存PDF的文件路径
--output
输出: 将PDF保存到指定文件。如果未找到PMCID或响应不是有效的PDF,则返回错误。下载PDF时,请检查文件是否为空或损坏。
3. Get Full Text (get_fulltext
)
get_fulltext3. 获取全文(get_fulltext
)
get_fulltextRetrieve the full text of an open-access article and save to a file. Returns
plain text (XML tags stripped) by default, or raw XML with .
--format xmlbash
undefined检索开放获取文章的全文并保存到文件。默认返回纯文本(去除XML标签),使用可返回原始XML。
--format xmlbash
undefinedGet plain text (default)
获取纯文本(默认)
uv run scripts/europepmc_api.py get_fulltext PMC8371605 --output fulltext.txt
uv run scripts/europepmc_api.py get_fulltext PMC8371605 --output fulltext.txt
Get raw XML
获取原始XML
uv run scripts/europepmc_api.py get_fulltext PMC8371605 --format xml --output fulltext.xml
**Arguments:**
- `pmcid` (str, required) — PubMed Central ID
- `--output` (str, required) — output file path
- `--format` (str, default `text`) — `text` (plain text) or `xml` (raw JATS
XML)
**Output:** Full text written to the specified file. Exits with an error if the
article is not in the Europe PMC open-access subset.
> **Important**: Only articles in the PMC Open Access Subset have full text
> available. If retrieval fails, use `search` to check the `isOpenAccess` field
> and fall back to the abstract.uv run scripts/europepmc_api.py get_fulltext PMC8371605 --format xml --output fulltext.xml
**参数:**
- `pmcid`(字符串,必填)—— PubMed Central ID
- `--output`(字符串,必填)—— 输出文件路径
- `--format`(字符串,默认值`text`)—— `text`(纯文本)或`xml`(原始JATS XML)
**输出:** 将全文写入指定文件。如果文章不在Europe PMC开放获取子集中,则返回错误。
> **重要提示**:只有PMC开放获取子集中的文章才有全文可用。如果检索失败,请使用`search`命令检查`isOpenAccess`字段,若为否则退而获取摘要。4. Get Citations (get_citations
)
get_citations4. 获取引用文献(get_citations
)
get_citationsRetrieve articles that cite a given paper.
bash
undefined检索引用某篇论文的文章。
bash
undefinedGet citations for the AlphaFold paper (PMID 34265844)
获取AlphaFold论文的引用文献(PMID 34265844)
uv run scripts/europepmc_api.py get_citations MED 34265844
--page_size 25 --output citations.json
--page_size 25 --output citations.json
**Arguments:**
- `source` (str, required) — source database: `MED` (PubMed), `PMC`, `PPR`
(preprints), `PAT` (patents)
- `article_id` (str, required) — article ID in the source database
- `--output` (str, required) — output JSON file path
- `--page` (int, default 1) — page number
- `--page_size` (int, default 25) — results per page
**Output:** JSON file with `hitCount` and `citations` array.uv run scripts/europepmc_api.py get_citations MED 34265844
--page_size 25 --output citations.json
--page_size 25 --output citations.json
**参数:**
- `source`(字符串,必填)—— 来源数据库:`MED`(PubMed)、`PMC`、`PPR`(预印本)、`PAT`(专利)
- `article_id`(字符串,必填)—— 来源数据库中的文章ID
- `--output`(字符串,必填)—— 输出JSON文件路径
- `--page`(整数,默认值1)—— 页码
- `--page_size`(整数,默认值25)—— 每页结果数
**输出:** 包含`hitCount`和`citations`数组的JSON文件。5. Get References (get_references
)
get_references5. 获取参考文献(get_references
)
get_referencesRetrieve the reference list (bibliography) of a given paper.
bash
undefined检索某篇论文的参考文献列表(目录)。
bash
undefinedGet references from the AlphaFold paper
获取AlphaFold论文的参考文献
uv run scripts/europepmc_api.py get_references MED 34265844
--page_size 100 --output references.json
--page_size 100 --output references.json
**Arguments:**
- `source` (str, required) — source database: `MED`, `PMC`, `PPR`, `PAT`
- `article_id` (str, required) — article ID in the source database
- `--output` (str, required) — output JSON file path
- `--page` (int, default 1) — page number
- `--page_size` (int, default 25) — results per page
**Output:** JSON file with `hitCount` and `references` array.uv run scripts/europepmc_api.py get_references MED 34265844
--page_size 100 --output references.json
--page_size 100 --output references.json
**参数:**
- `source`(字符串,必填)—— 来源数据库:`MED`、`PMC`、`PPR`、`PAT`
- `article_id`(字符串,必填)—— 来源数据库中的文章ID
- `--output`(字符串,必填)—— 输出JSON文件路径
- `--page`(整数,默认值1)—— 页码
- `--page_size`(整数,默认值25)—— 每页结果数
**输出:** 包含`hitCount`和`references`数组的JSON文件。Common Workflows
常见工作流
DOI to PDF
DOI转PDF
bash
undefinedbash
undefinedStep 1: Search for the PMCID
步骤1:搜索PMCID
uv run scripts/europepmc_api.py search "DOI:10.1038/s41586-021-03819-2" --output result.json
PMCID=$(jq -r '.results[0].pmcid // empty' result.json)
uv run scripts/europepmc_api.py search "DOI:10.1038/s41586-021-03819-2" --output result.json
PMCID=$(jq -r '.results[0].pmcid // empty' result.json)
Step 2: Download the PDF
步骤2:下载PDF
uv run scripts/europepmc_api.py download_pdf "$PMCID" --output paper.pdf
undefineduv run scripts/europepmc_api.py download_pdf "$PMCID" --output paper.pdf
undefinedPMID to Full Text
PMID转全文
bash
undefinedbash
undefinedStep 1: Find the PMCID from a PMID
步骤1:通过PMCID查找PMID
uv run scripts/europepmc_api.py search "EXT_ID:34265844 AND SRC:MED" --output result.json
PMCID=$(jq -r '.results[0].pmcid // empty' result.json)
uv run scripts/europepmc_api.py search "EXT_ID:34265844 AND SRC:MED" --output result.json
PMCID=$(jq -r '.results[0].pmcid // empty' result.json)
Step 2: Get the full text
步骤2:获取全文
uv run scripts/europepmc_api.py get_fulltext "$PMCID" --output fulltext.txt
undefineduv run scripts/europepmc_api.py get_fulltext "$PMCID" --output fulltext.txt
undefinedCitation Graph Traversal
引用图谱遍历
bash
undefinedbash
undefinedFind what papers cite a landmark study, then check their references
查找引用某篇里程碑研究的论文,然后查看它们的参考文献
uv run scripts/europepmc_api.py get_citations MED 34265844 --page_size 50 --output citing.json
uv run scripts/europepmc_api.py get_citations MED 34265844 --page_size 50 --output citing.json
Parse a cited paper's PMID and explore its references
解析被引用论文的PMID并浏览其参考文献
uv run scripts/europepmc_api.py get_references MED <CITING_PMID> --output refs.json
undefineduv run scripts/europepmc_api.py get_references MED <CITING_PMID> --output refs.json
undefinedSearch with Pagination
分页搜索
bash
undefinedbash
undefinedFirst page
第一页
uv run scripts/europepmc_api.py search "CRISPR" --max_results 100 --output page1.json
uv run scripts/europepmc_api.py search "CRISPR" --max_results 100 --output page1.json
Extract cursor for next page
提取下一页的游标
CURSOR=$(jq -r '.nextCursorMark // empty' page1.json)
CURSOR=$(jq -r '.nextCursorMark // empty' page1.json)
Next page
下一页
uv run scripts/europepmc_api.py search "CRISPR" --max_results 100 --cursor "$CURSOR" --output page2.json
undefineduv run scripts/europepmc_api.py search "CRISPR" --max_results 100 --cursor "$CURSOR" --output page2.json
undefined