literature-search-europepmc

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Europe PMC Database

Europe PMC数据库

A skill for searching, downloading, and exploring open-access papers from Europe PMC — a comprehensive, free life-science literature database with over 43 million abstracts and 9 million full-text articles.
这是一款用于搜索、下载和浏览Europe PMC开放获取论文的skill——Europe PMC是一个全面的免费生命科学文献数据库,包含超过4300万篇摘要和900万篇全文文章。

Prerequisites

前置条件

  1. uv
    : Read the
    uv
    skill and follow its Setup instructions to ensure
    uv
    is installed and on PATH.
  2. User Notification: If LICENSE_NOTIFICATION.txt does not already exist in this skill directory then (1) prominently notify the user to check the terms at https://europepmc.org/ and to always check the license of the papers retrieved by the skill for any restrictions, then (2) create the file recording the notification text and timestamp.
  1. uv
    :阅读
    uv
    skill并按照其设置说明确保
    uv
    已安装且在PATH中。
  2. 用户通知:如果此skill目录中不存在LICENSE_NOTIFICATION.txt文件,则(1)显著通知用户查看https://europepmc.org/上的条款,并始终检查该skill检索到的论文的许可证是否存在任何限制,然后(2)创建记录通知文本和时间戳的文件。

Core Rules

核心规则

  • Open Access Only: This skill exclusively searches open-access content. The script automatically appends
    OPEN_ACCESS:y
    to every search query. Do NOT remove or override this filter.
  • NEVER run python3 or python3 -c directly: the system Python does not necessarily have all key dependencies. Do not attempt to pip install or create new venvs.
  • Use the Wrapper: ALWAYS use the provided script rather than calling the API directly. The script handles rate limiting (1 req/s) and errors.
  • Output Files: All subcommands require
    --output
    to write results to a file. Read the output file separately to avoid context overflow.
  • List Sources. If this skill is used, ensure this is mentioned in the output AND list the URLs of all papers that were used in producing the output.
  • 仅开放获取内容:本skill仅搜索开放获取内容。脚本会自动在每个搜索查询后追加
    OPEN_ACCESS:y
    。请勿移除或覆盖此过滤器。
  • 切勿直接运行python3或python3 -c:系统Python不一定包含所有关键依赖。请勿尝试使用pip安装或创建新的虚拟环境。
  • 使用封装脚本:始终使用提供的脚本而非直接调用API。该脚本会处理速率限制(每秒1次请求)和错误。
  • 输出文件:所有子命令都需要
    --output
    参数将结果写入文件。请单独读取输出文件以避免上下文溢出。
  • 列出来源:如果使用本skill,请确保在输出中提及这一点,并列出所有用于生成输出的论文的URL。

Utility Scripts

实用脚本

All commands are subcommands of
scripts/europepmc_api.py
. Rate limiting and retries are handled automatically.
所有命令都是
scripts/europepmc_api.py
的子命令。速率限制和重试会自动处理。

1. Search (
search
)

1. 搜索(
search

Search Europe PMC by query. Supports DOI lookup, keyword search, author search, PMID lookup, and the full Europe PMC search syntax.
bash
undefined
通过查询语句搜索Europe PMC。支持DOI查找、关键词搜索、作者搜索、PMID查找以及完整的Europe PMC搜索语法
bash
undefined

Look up a paper by DOI

通过DOI查找论文

uv run scripts/europepmc_api.py search "DOI:10.1038/s41586-021-03819-2" --output result.json
uv run scripts/europepmc_api.py search "DOI:10.1038/s41586-021-03819-2" --output result.json

Keyword search

关键词搜索

uv run scripts/europepmc_api.py search "CRISPR cancer" --max_results 5 --output results.json
uv run scripts/europepmc_api.py search "CRISPR cancer" --max_results 5 --output results.json

Author search

作者搜索

uv run scripts/europepmc_api.py search "AUTH:Jumper J" --max_results 10 --output results.json
uv run scripts/europepmc_api.py search "AUTH:Jumper J" --max_results 10 --output results.json

PMID lookup

PMID查找

uv run scripts/europepmc_api.py search "EXT_ID:34265844 AND SRC:MED" --output result.json
uv run scripts/europepmc_api.py search "EXT_ID:34265844 AND SRC:MED" --output result.json

Sorted by citations

按引用量排序

uv run scripts/europepmc_api.py search "machine learning"
--sort "CITED desc" --max_results 20 --output results.json

**Arguments:**

-   `query` (str, required) — search query using Europe PMC syntax
-   `--output` (str, required) — output JSON file path
-   `--max_results` (int, default 10) — maximum results per page (max 1000)
-   `--result_type` (str, default `core`) — `core` (full metadata) or `lite`
-   `--cursor` (str, default `*`) — cursor mark for pagination; pass the
    `nextCursorMark` value from a previous response to get the next page
-   `--sort` (str) — sort order, e.g. `CITED desc`, `P_PDATE_D desc`
    (publication date descending), `P_PDATE_D asc`

**Output:** JSON file with three fields:

-   `hitCount` (int) — total number of matching articles
-   `nextCursorMark` (str) — cursor for next page; empty string if no more pages
-   `results` (list) — array of article metadata objects

**Search Syntax Quick Reference:**

-   `DOI:10.xxxx/yyyy` — look up by DOI
-   `EXT_ID:12345678 AND SRC:MED` — look up by PMID
-   `AUTH:surname initials` — author search
-   `TITLE:keyword` — search in title only
-   `JOURNAL:name` — search by journal
-   `PUB_YEAR:2024` or `(FIRST_PDATE:[2023-01-01 TO 2023-12-31])` — date filter
-   `HAS_FT:y` — restrict to articles with full text in Europe PMC
-   Boolean operators: `AND`, `OR`, `NOT`

> **Note**: `OPEN_ACCESS:y` is automatically appended to all queries. You do not
> need to add it manually.
uv run scripts/europepmc_api.py search "machine learning"
--sort "CITED desc" --max_results 20 --output results.json

**参数:**

-   `query`(字符串,必填)—— 使用Europe PMC语法的搜索查询语句
-   `--output`(字符串,必填)—— 输出JSON文件路径
-   `--max_results`(整数,默认值10)—— 每页最大结果数(最多1000)
-   `--result_type`(字符串,默认值`core`)—— `core`(完整元数据)或`lite`(精简元数据)
-   `--cursor`(字符串,默认值`*`)—— 分页游标标记;传入上一次响应中的`nextCursorMark`值以获取下一页
-   `--sort`(字符串)—— 排序顺序,例如`CITED desc`(引用量降序)、`P_PDATE_D desc`(发表日期降序)、`P_PDATE_D asc`(发表日期升序)

**输出:** 包含三个字段的JSON文件:

-   `hitCount`(整数)—— 匹配文章的总数
-   `nextCursorMark`(字符串)—— 下一页的游标;若无更多页面则为空字符串
-   `results`(列表)—— 文章元数据对象数组

**搜索语法速查:**

-   `DOI:10.xxxx/yyyy` —— 通过DOI查找
-   `EXT_ID:12345678 AND SRC:MED` —— 通过PMID查找
-   `AUTH:surname initials` —— 作者搜索
-   `TITLE:keyword` —— 仅在标题中搜索
-   `JOURNAL:name` —— 通过期刊名称搜索
-   `PUB_YEAR:2024` 或 `(FIRST_PDATE:[2023-01-01 TO 2023-12-31])` —— 日期过滤
-   `HAS_FT:y` —— 限制为Europe PMC中有全文的文章
-   布尔运算符:`AND`、`OR`、`NOT`

> **注意**:`OPEN_ACCESS:y`会自动追加到所有查询语句中,无需手动添加。

2. Download PDF (
download_pdf
)

2. 下载PDF(
download_pdf

Download an open-access PDF from Europe PMC by PMCID.
bash
uv run scripts/europepmc_api.py download_pdf PMC8371605 --output alphafold.pdf
Arguments:
  • pmcid
    (str, required) — PubMed Central ID (e.g.,
    PMC8371605
    )
  • --output
    (str, required) — filepath to save the PDF
Output: Saves the PDF to the specified file. Exits with an error if the PMCID is not found or the response is not a valid PDF. Whenever you download a PDF, check the pdf downloaded is not empty or corrupted.
通过PMCID从Europe PMC下载开放获取的PDF文件。
bash
uv run scripts/europepmc_api.py download_pdf PMC8371605 --output alphafold.pdf
参数:
  • pmcid
    (字符串,必填)—— PubMed Central ID(例如
    PMC8371605
  • --output
    (字符串,必填)—— 保存PDF的文件路径
输出: 将PDF保存到指定文件。如果未找到PMCID或响应不是有效的PDF,则返回错误。下载PDF时,请检查文件是否为空或损坏。

3. Get Full Text (
get_fulltext
)

3. 获取全文(
get_fulltext

Retrieve the full text of an open-access article and save to a file. Returns plain text (XML tags stripped) by default, or raw XML with
--format xml
.
bash
undefined
检索开放获取文章的全文并保存到文件。默认返回纯文本(去除XML标签),使用
--format xml
可返回原始XML。
bash
undefined

Get plain text (default)

获取纯文本(默认)

uv run scripts/europepmc_api.py get_fulltext PMC8371605 --output fulltext.txt
uv run scripts/europepmc_api.py get_fulltext PMC8371605 --output fulltext.txt

Get raw XML

获取原始XML

uv run scripts/europepmc_api.py get_fulltext PMC8371605 --format xml --output fulltext.xml

**Arguments:**

-   `pmcid` (str, required) — PubMed Central ID
-   `--output` (str, required) — output file path
-   `--format` (str, default `text`) — `text` (plain text) or `xml` (raw JATS
    XML)

**Output:** Full text written to the specified file. Exits with an error if the
article is not in the Europe PMC open-access subset.

> **Important**: Only articles in the PMC Open Access Subset have full text
> available. If retrieval fails, use `search` to check the `isOpenAccess` field
> and fall back to the abstract.
uv run scripts/europepmc_api.py get_fulltext PMC8371605 --format xml --output fulltext.xml

**参数:**

-   `pmcid`(字符串,必填)—— PubMed Central ID
-   `--output`(字符串,必填)—— 输出文件路径
-   `--format`(字符串,默认值`text`)—— `text`(纯文本)或`xml`(原始JATS XML)

**输出:** 将全文写入指定文件。如果文章不在Europe PMC开放获取子集中,则返回错误。

> **重要提示**:只有PMC开放获取子集中的文章才有全文可用。如果检索失败,请使用`search`命令检查`isOpenAccess`字段,若为否则退而获取摘要。

4. Get Citations (
get_citations
)

4. 获取引用文献(
get_citations

Retrieve articles that cite a given paper.
bash
undefined
检索引用某篇论文的文章。
bash
undefined

Get citations for the AlphaFold paper (PMID 34265844)

获取AlphaFold论文的引用文献(PMID 34265844)

uv run scripts/europepmc_api.py get_citations MED 34265844
--page_size 25 --output citations.json

**Arguments:**

-   `source` (str, required) — source database: `MED` (PubMed), `PMC`, `PPR`
    (preprints), `PAT` (patents)
-   `article_id` (str, required) — article ID in the source database
-   `--output` (str, required) — output JSON file path
-   `--page` (int, default 1) — page number
-   `--page_size` (int, default 25) — results per page

**Output:** JSON file with `hitCount` and `citations` array.
uv run scripts/europepmc_api.py get_citations MED 34265844
--page_size 25 --output citations.json

**参数:**

-   `source`(字符串,必填)—— 来源数据库:`MED`(PubMed)、`PMC`、`PPR`(预印本)、`PAT`(专利)
-   `article_id`(字符串,必填)—— 来源数据库中的文章ID
-   `--output`(字符串,必填)—— 输出JSON文件路径
-   `--page`(整数,默认值1)—— 页码
-   `--page_size`(整数,默认值25)—— 每页结果数

**输出:** 包含`hitCount`和`citations`数组的JSON文件。

5. Get References (
get_references
)

5. 获取参考文献(
get_references

Retrieve the reference list (bibliography) of a given paper.
bash
undefined
检索某篇论文的参考文献列表(目录)。
bash
undefined

Get references from the AlphaFold paper

获取AlphaFold论文的参考文献

uv run scripts/europepmc_api.py get_references MED 34265844
--page_size 100 --output references.json

**Arguments:**

-   `source` (str, required) — source database: `MED`, `PMC`, `PPR`, `PAT`
-   `article_id` (str, required) — article ID in the source database
-   `--output` (str, required) — output JSON file path
-   `--page` (int, default 1) — page number
-   `--page_size` (int, default 25) — results per page

**Output:** JSON file with `hitCount` and `references` array.
uv run scripts/europepmc_api.py get_references MED 34265844
--page_size 100 --output references.json

**参数:**

-   `source`(字符串,必填)—— 来源数据库:`MED`、`PMC`、`PPR`、`PAT`
-   `article_id`(字符串,必填)—— 来源数据库中的文章ID
-   `--output`(字符串,必填)—— 输出JSON文件路径
-   `--page`(整数,默认值1)—— 页码
-   `--page_size`(整数,默认值25)—— 每页结果数

**输出:** 包含`hitCount`和`references`数组的JSON文件。

Common Workflows

常见工作流

DOI to PDF

DOI转PDF

bash
undefined
bash
undefined

Step 1: Search for the PMCID

步骤1:搜索PMCID

uv run scripts/europepmc_api.py search "DOI:10.1038/s41586-021-03819-2" --output result.json PMCID=$(jq -r '.results[0].pmcid // empty' result.json)
uv run scripts/europepmc_api.py search "DOI:10.1038/s41586-021-03819-2" --output result.json PMCID=$(jq -r '.results[0].pmcid // empty' result.json)

Step 2: Download the PDF

步骤2:下载PDF

uv run scripts/europepmc_api.py download_pdf "$PMCID" --output paper.pdf
undefined
uv run scripts/europepmc_api.py download_pdf "$PMCID" --output paper.pdf
undefined

PMID to Full Text

PMID转全文

bash
undefined
bash
undefined

Step 1: Find the PMCID from a PMID

步骤1:通过PMCID查找PMID

uv run scripts/europepmc_api.py search "EXT_ID:34265844 AND SRC:MED" --output result.json PMCID=$(jq -r '.results[0].pmcid // empty' result.json)
uv run scripts/europepmc_api.py search "EXT_ID:34265844 AND SRC:MED" --output result.json PMCID=$(jq -r '.results[0].pmcid // empty' result.json)

Step 2: Get the full text

步骤2:获取全文

uv run scripts/europepmc_api.py get_fulltext "$PMCID" --output fulltext.txt
undefined
uv run scripts/europepmc_api.py get_fulltext "$PMCID" --output fulltext.txt
undefined

Citation Graph Traversal

引用图谱遍历

bash
undefined
bash
undefined

Find what papers cite a landmark study, then check their references

查找引用某篇里程碑研究的论文,然后查看它们的参考文献

uv run scripts/europepmc_api.py get_citations MED 34265844 --page_size 50 --output citing.json
uv run scripts/europepmc_api.py get_citations MED 34265844 --page_size 50 --output citing.json

Parse a cited paper's PMID and explore its references

解析被引用论文的PMID并浏览其参考文献

uv run scripts/europepmc_api.py get_references MED <CITING_PMID> --output refs.json
undefined
uv run scripts/europepmc_api.py get_references MED <CITING_PMID> --output refs.json
undefined

Search with Pagination

分页搜索

bash
undefined
bash
undefined

First page

第一页

uv run scripts/europepmc_api.py search "CRISPR" --max_results 100 --output page1.json
uv run scripts/europepmc_api.py search "CRISPR" --max_results 100 --output page1.json

Extract cursor for next page

提取下一页的游标

CURSOR=$(jq -r '.nextCursorMark // empty' page1.json)
CURSOR=$(jq -r '.nextCursorMark // empty' page1.json)

Next page

下一页

uv run scripts/europepmc_api.py search "CRISPR" --max_results 100 --cursor "$CURSOR" --output page2.json
undefined
uv run scripts/europepmc_api.py search "CRISPR" --max_results 100 --cursor "$CURSOR" --output page2.json
undefined