pubmed-database

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

PubMed API

PubMed API

Prerequisites

前置条件

  1. uv
    : Read the
    uv
    skill and follow its Setup instructions to ensure
    uv
    is installed and on PATH.
  2. User Notification: If LICENSE_NOTIFICATION.txt does not already exist in this skill directory then (1) prominently notify the user to check the terms at https://pubmed.ncbi.nlm.nih.gov/disclaimer/ and https://www.ncbi.nlm.nih.gov/home/about/policies/ and to always check the license of the papers retrieved by the skill for any restrictions, then (2) create the file recording the notification text and timestamp.
  3. .env
    file
    : Make sure the
    .env
    file exists in your home directory. Create one if it does not exist.
  4. NCBI_API_KEY
    (optional): Raises the NCBI E-utilities rate limit from 3 to 10 requests/second. The skill works without it, but a key is recommended if the user plans many queries or encounters a 429 error. The user can obtain one for free by registering at https://www.ncbi.nlm.nih.gov/account/settings/
  5. USER_EMAIL
    (optional but recommended): Identifies the caller to NCBI (recommended by their Terms of Use).
If the variables are missing from
.env
, do NOT ask the user to paste them into the chat (this would leak keys into the agent's context). Instead, give the user these commands — substituting
ENV_FILE
with the resolved literal path to the
.env
file
:
bash
printf "Enter NCBI API key (typing hidden): " && read -s key && echo && echo "NCBI_API_KEY=$key" >> "ENV_FILE" && echo "Saved."
bash
printf "Enter contact email: " && read email && echo "USER_EMAIL=$email" >> "ENV_FILE" && echo "Saved."
The scripts load credentials automatically via
dotenv
. NEVER read, print, or inspect the
.env
file or its variables (e.g. no
cat
,
grep
,
echo
,
printenv
, or
os.environ.get
on keys). Credentials must stay out of the agent's context.
This skill provides CLI access to the NCBI PubMed and PubMed Central APIs via
scripts/pubmed_api.py
— a single CLI with 10 functions covering search, fetch, linking, full text, spelling, discovery, citation matching, and caching.
  1. uv
    :阅读
    uv
    技能文档并遵循其设置说明,确保
    uv
    已安装且添加至系统PATH。
  2. 用户通知:若技能目录中不存在LICENSE_NOTIFICATION.txt文件,则需(1)显著提醒用户查看https://pubmed.ncbi.nlm.nih.gov/disclaimer/和https://www.ncbi.nlm.nih.gov/home/about/policies/上的条款,并务必检查该技能检索到的论文的许可协议以确认是否存在使用限制;(2)创建LICENSE_NOTIFICATION.txt文件,记录通知文本和时间戳。
  3. .env
    文件
    :确保主目录中存在
    .env
    文件,若不存在则创建一个。
  4. NCBI_API_KEY
    (可选):可将NCBI E-utilities的请求速率限制从3次/秒提升至10次/秒。即使没有该密钥,技能也能正常运行,但如果用户计划进行大量查询或遇到429错误,建议获取密钥。用户可通过注册https://www.ncbi.nlm.nih.gov/account/settings/免费获取。
  5. USER_EMAIL
    (可选但推荐):用于向NCBI标识调用者(其使用条款建议提供)。
如果
.env
文件中缺少上述变量,请勿要求用户在聊天中粘贴(否则会导致密钥泄露至Agent上下文)。请向用户提供以下命令——
ENV_FILE
替换为
.env
文件的实际路径
bash
printf "Enter NCBI API key (typing hidden): " && read -s key && echo && echo "NCBI_API_KEY=$key" >> "ENV_FILE" && echo "Saved."
bash
printf "Enter contact email: " && read email && echo "USER_EMAIL=$email" >> "ENV_FILE" && echo "Saved."
脚本会通过
dotenv
自动加载凭据。严禁读取、打印或检查
.env
文件及其变量(例如使用
cat
grep
echo
printenv
os.environ.get
获取密钥)。凭据必须远离Agent上下文。
本技能通过
scripts/pubmed_api.py
提供对NCBI PubMed和PubMed Central APIs的CLI访问——这是一个包含10个功能的单一CLI,涵盖搜索、获取、关联、全文检索、拼写检查、发现、引文匹配和缓存等功能。

Core Rules

核心规则

  • API Use: Always use the provided wrapper
    scripts/pubmed_api.py
    which manages rate limits automatically and prevents API abuse. Setting the
    NCBI_API_KEY
    environment variable raises the rate limit from 3 to 10 requests/second. Querying the API any other way (e.g. via curl, wget, or hand-written code) is strictly forbidden.
  • JSON Processing: Use
    jq
    to filter and transform JSON output (or python equivalents if
    jq
    is not available) to prevent hallucinations and context overflow.
  • Temporary Files: To avoid polluting the working directory with JSON files, use a temporary directory inside the current directory. When running multiple agents or tasks in parallel, ensure each uses a unique subdirectory name (e.g.,
    tmp_$TASK_ID/
    ) to avoid file collisions.
  • Notification: If this skill is used, ensure this is mentioned in the output AND list the URLs of all papers that were used in producing the output.
  • API 使用规范:必须使用提供的封装脚本
    scripts/pubmed_api.py
    ,它会自动管理请求速率限制并防止API滥用。设置
    NCBI_API_KEY
    环境变量可将请求速率限制从3次/秒提升至10次/秒。严禁通过其他方式调用API(例如curl、wget或手写代码)。
  • JSON 处理:使用
    jq
    过滤和转换JSON输出(若
    jq
    不可用,可使用Python等效工具),以避免幻觉输出和上下文溢出。
  • 临时文件:为避免JSON文件污染工作目录,请在当前目录内使用临时目录。当并行运行多个Agent或任务时,确保每个任务使用唯一的子目录名称(例如
    tmp_$TASK_ID/
    )以避免文件冲突。
  • 通知要求:若使用本技能,需在输出中提及这一点,并列出所有用于生成输出的论文的URL。

Structure of the skill folder

技能文件夹结构

  • SKILL.md
    - This file
  • scripts/pubmed_api.py
    - The skill CLI
  • references/
    - Directory with detailed function specifications
    • advanced-linking.md
    • advanced-search.md
    • bulk-workflows.md
    • citation-matching.md
    • cross-database-linking.md
    • fetch-and-resolve.md
    • search-and-discovery.md
    • utilities.md
  • SKILL.md
    - 本文档
  • scripts/pubmed_api.py
    - 技能CLI
  • references/
    - 包含详细功能说明的目录
    • advanced-linking.md
    • advanced-search.md
    • bulk-workflows.md
    • citation-matching.md
    • cross-database-linking.md
    • fetch-and-resolve.md
    • search-and-discovery.md
    • utilities.md

CLI Usage

CLI 使用方法

bash
uv run scripts/pubmed_api.py <output_file> <function_name> <required_args> [--flag value ...]
  • Positional Arguments: Arguments are positional; list arguments are passed as comma-separated strings without spaces (e.g.
    "35113657,31234568"
    ).
  • Flag Options: Optional arguments can be passed as
    --flag value
    instead of positional args.
  • Output Handling: On success, JSON is written to
    output_file
    . On error, the process exits with a non-zero code and no output file is written.
bash
uv run scripts/pubmed_api.py <output_file> <function_name> <required_args> [--flag value ...]
  • 位置参数:参数为位置参数;列表参数需以无空格的逗号分隔字符串形式传递(例如
    "35113657,31234568"
    )。
  • 标志选项:可选参数可通过
    --flag value
    形式传递,而非位置参数。
  • 输出处理:成功时,JSON将写入
    output_file
    ;失败时,进程将以非零代码退出,且不会生成输出文件。

Example Usage

使用示例

bash
uv run scripts/pubmed_api.py ./search_results.json search_pubmed "BRCA1" --max_results 5
cat ./search_results.json | jq '.[]' -r
uv run scripts/pubmed_api.py ./abstracts.json fetch_article_abstracts "35113657"
cat ./abstracts.json | jq '.[0].title' -r
bash
uv run scripts/pubmed_api.py ./search_results.json search_pubmed "BRCA1" --max_results 5
cat ./search_results.json | jq '.[]' -r
uv run scripts/pubmed_api.py ./abstracts.json fetch_article_abstracts "35113657"
cat ./abstracts.json | jq '.[0].title' -r

Essential Recipes

实用操作指南

Join PMIDs for the next call (most common chaining pattern):
bash
cat ./search_results.json | jq -r 'join(",")'
Slim abstracts to essential fields and truncate long abstracts:
bash
cat ./abstracts.json | jq '[.[] | {pmid, title, snippet: (.abstract // "")[:500]}]'
Filter by keyword (null-safe):
bash
cat ./abstracts.json | jq '[.[] | select((.title // "") | contains("Review"))]'
将PMID拼接用于下一次调用(最常见的链式操作模式):
bash
cat ./search_results.json | jq -r 'join(",")'
精简摘要至核心字段并截断长摘要:
bash
cat ./abstracts.json | jq '[.[] | {pmid, title, snippet: (.abstract // "")[:500]}]'
按关键词过滤(空值安全):
bash
cat ./abstracts.json | jq '[.[] | select((.title // "") | contains("Review"))]'

Context Management & Accuracy

上下文管理与准确性

When processing larger result sets (>10 abstracts):
  1. Filter Early: Use
    jq
    to verify keywords in abstracts before reading the full JSON into context.
  2. Slimming: Extract only
    title
    and
    abstract
    fields unless explicitly instructed otherwise. Author lists and metadata contribute to noise.
  3. Bulk Operations (N > 10): Avoid fetching or processing IDs one-by-one. The API and History Server are designed for bulk retrieval. Fetch all data in a single turn and use shell pipelines to slim the results before reading into context. This prevents turn exhaustion and context overflow.
  4. Grounding: Never use internal knowledge to provide specific identifiers (PMIDs, CIDs, Gene IDs) if no results are found. Report the tool's output accurately to ensure results are grounded in the current database state.
  5. Search Termination: When asked to find papers that may not exist, limit exploration to 3–5 high-quality, varied search queries. If no results match after these attempts, conclude that no papers meet the criteria rather than continuing to iterate — unless explicitly instructed to be thorough.
处理较大结果集(>10篇摘要)时:
  1. 提前过滤:在将完整JSON读取到上下文之前,使用
    jq
    验证摘要中的关键词。
  2. 精简数据:除非明确要求,否则仅提取
    title
    abstract
    字段。作者列表和元数据会增加干扰信息。
  3. 批量操作(N > 10):避免逐个获取或处理ID。API和历史服务器专为批量检索设计。在单次操作中获取所有数据,并使用Shell管道精简结果后再读取到上下文。这可避免操作次数耗尽和上下文溢出。
  4. 结果锚定:如果未找到结果,切勿使用内部知识提供特定标识符(PMID、CID、基因ID)。准确报告工具输出,确保结果基于当前数据库状态。
  5. 搜索终止:当被要求查找可能不存在的论文时,限制探索次数为3–5次高质量、多样化的搜索查询。如果这些尝试后仍无匹配结果,则判定没有符合条件的论文,而非继续迭代——除非明确要求进行全面检索。

Functions

功能函数

⚠️ MANDATORY: You MUST read the linked reference file for a function group before calling any function in that group. The tables below only describe what each function does — not how to call it. Argument names, argument order, flags, and output schemas are only documented in the reference files. Do NOT guess or infer arguments from function names. If you call a function without first reading its reference, you will produce incorrect invocations.
⚠️ 强制要求:在调用任何函数组中的函数之前,你必须阅读对应的参考文档。下表仅描述每个函数的功能——而非调用方式。参数名称、参数顺序、标志和输出模式在参考文档中记录。请勿根据函数名称猜测或推断参数。如果未阅读参考文档就调用函数,必然会产生错误的调用。

Search

搜索

  • search_pubmed
    : Find PMIDs matching a free-text or structured NCBI query.
  • global_database_discovery
    : Count how many records match a query across every NCBI database.
  • search_pubmed
    :查找匹配自由文本或结构化NCBI查询的PMID。
  • global_database_discovery
    :统计每个NCBI数据库中匹配查询的记录数量。

Fetch & Resolve

获取与解析

  • fetch_article_abstracts
    : Retrieve metadata and abstracts for a batch of PMIDs.
  • get_full_text_pmc
    : Retrieve open-access full text from PMC.
  • fetch_database_summary
    : Resolve opaque UIDs from any NCBI database into human-readable metadata.
  • fetch_article_abstracts
    :批量检索PMID对应的元数据和摘要。
  • get_full_text_pmc
    :从PMC获取开放获取的全文内容。
  • fetch_database_summary
    :将任意NCBI数据库中的不透明UID解析为人类可读的元数据。

Cross-Database Linking

跨数据库关联

  • find_linked_biological_data
    : Find records in other NCBI databases linked to a source record.
  • discover_available_links
    : List all available ELink linknames for a given record.
  • find_linked_biological_data
    :查找与源记录关联的其他NCBI数据库中的记录。
  • discover_available_links
    :列出给定记录的所有可用ELink链接名称。

Bulk Workflows

批量工作流

When working with more than ~10 PMIDs, avoid processing IDs one-by-one. Upload them to the NCBI History Server via
cache_results_history
to get a session handle (
webenv
+
query_key
), then pass that handle to
fetch_article_abstracts
or
find_linked_biological_data
for a single bulk call. Chain with
jq
shell pipelines to slim results before reading into context. This prevents turn exhaustion and context overflow. See the reference for complete workflow recipes (search→fetch, cross-db exploration, citation resolution, and bulk retrieval with data slimming).
  • cache_results_history
    : Upload PMIDs to the NCBI History Server for bulk retrieval.
当处理约10个以上PMID时,避免逐个处理ID。通过
cache_results_history
将PMID上传至NCBI历史服务器以获取会话句柄(
webenv
+
query_key
),然后将该句柄传递给
fetch_article_abstracts
find_linked_biological_data
进行单次批量调用。结合
jq
Shell管道在读取到上下文之前精简结果。这可避免操作次数耗尽和上下文溢出。请查看参考文档获取完整工作流示例(搜索→获取、跨数据库探索、引文解析、带数据精简的批量检索)。
  • cache_results_history
    :将PMID上传至NCBI历史服务器以支持批量检索。

Utilities

工具函数

  • verify_medical_spelling
    : Spell-check biomedical terms before searching.
  • match_raw_citations
    : Resolve incomplete bibliographic citations to PMIDs.
  • verify_medical_spelling
    :在搜索前检查生物医学术语的拼写。
  • match_raw_citations
    :将不完整的书目引文解析为PMID。