pubmed-database
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePubMed API
PubMed API
Prerequisites
前置条件
- : Read the
uvskill and follow its Setup instructions to ensureuvis installed and on PATH.uv - User Notification: If LICENSE_NOTIFICATION.txt does not already exist in this skill directory then (1) prominently notify the user to check the terms at https://pubmed.ncbi.nlm.nih.gov/disclaimer/ and https://www.ncbi.nlm.nih.gov/home/about/policies/ and to always check the license of the papers retrieved by the skill for any restrictions, then (2) create the file recording the notification text and timestamp.
- file: Make sure the
.envfile exists in your home directory. Create one if it does not exist..env - (optional): Raises the NCBI E-utilities rate limit from 3 to 10 requests/second. The skill works without it, but a key is recommended if the user plans many queries or encounters a 429 error. The user can obtain one for free by registering at https://www.ncbi.nlm.nih.gov/account/settings/
NCBI_API_KEY - (optional but recommended): Identifies the caller to NCBI (recommended by their Terms of Use).
USER_EMAIL
If the variables are missing from , do NOT ask the user to paste them into
the chat (this would leak keys into the agent's context). Instead, give the user
these commands — substituting with the resolved literal path to the
file:
.envENV_FILE.envbash
printf "Enter NCBI API key (typing hidden): " && read -s key && echo && echo "NCBI_API_KEY=$key" >> "ENV_FILE" && echo "Saved."bash
printf "Enter contact email: " && read email && echo "USER_EMAIL=$email" >> "ENV_FILE" && echo "Saved."The scripts load credentials automatically via . NEVER read,
print, or inspect the file or its variables (e.g. no , ,
, , or on keys). Credentials must stay
out of the agent's context.
dotenv.envcatgrepechoprintenvos.environ.getThis skill provides CLI access to the NCBI PubMed and PubMed Central APIs via
— a single CLI with 10 functions covering search, fetch,
linking, full text, spelling, discovery, citation matching, and caching.
scripts/pubmed_api.py- :阅读
uv技能文档并遵循其设置说明,确保uv已安装且添加至系统PATH。uv - 用户通知:若技能目录中不存在LICENSE_NOTIFICATION.txt文件,则需(1)显著提醒用户查看https://pubmed.ncbi.nlm.nih.gov/disclaimer/和https://www.ncbi.nlm.nih.gov/home/about/policies/上的条款,并务必检查该技能检索到的论文的许可协议以确认是否存在使用限制;(2)创建LICENSE_NOTIFICATION.txt文件,记录通知文本和时间戳。
- 文件:确保主目录中存在
.env文件,若不存在则创建一个。.env - (可选):可将NCBI E-utilities的请求速率限制从3次/秒提升至10次/秒。即使没有该密钥,技能也能正常运行,但如果用户计划进行大量查询或遇到429错误,建议获取密钥。用户可通过注册https://www.ncbi.nlm.nih.gov/account/settings/免费获取。
NCBI_API_KEY - (可选但推荐):用于向NCBI标识调用者(其使用条款建议提供)。
USER_EMAIL
如果文件中缺少上述变量,请勿要求用户在聊天中粘贴(否则会导致密钥泄露至Agent上下文)。请向用户提供以下命令——将替换为文件的实际路径:
.envENV_FILE.envbash
printf "Enter NCBI API key (typing hidden): " && read -s key && echo && echo "NCBI_API_KEY=$key" >> "ENV_FILE" && echo "Saved."bash
printf "Enter contact email: " && read email && echo "USER_EMAIL=$email" >> "ENV_FILE" && echo "Saved."脚本会通过自动加载凭据。严禁读取、打印或检查文件及其变量(例如使用、、、或获取密钥)。凭据必须远离Agent上下文。
dotenv.envcatgrepechoprintenvos.environ.get本技能通过提供对NCBI PubMed和PubMed Central APIs的CLI访问——这是一个包含10个功能的单一CLI,涵盖搜索、获取、关联、全文检索、拼写检查、发现、引文匹配和缓存等功能。
scripts/pubmed_api.pyCore Rules
核心规则
- API Use: Always use the provided wrapper which manages rate limits automatically and prevents API abuse. Setting the
scripts/pubmed_api.pyenvironment variable raises the rate limit from 3 to 10 requests/second. Querying the API any other way (e.g. via curl, wget, or hand-written code) is strictly forbidden.NCBI_API_KEY - JSON Processing: Use to filter and transform JSON output (or python equivalents if
jqis not available) to prevent hallucinations and context overflow.jq - Temporary Files: To avoid polluting the working directory with JSON
files, use a temporary directory inside the current directory. When running
multiple agents or tasks in parallel, ensure each uses a unique subdirectory
name (e.g., ) to avoid file collisions.
tmp_$TASK_ID/ - Notification: If this skill is used, ensure this is mentioned in the output AND list the URLs of all papers that were used in producing the output.
- API 使用规范:必须使用提供的封装脚本,它会自动管理请求速率限制并防止API滥用。设置
scripts/pubmed_api.py环境变量可将请求速率限制从3次/秒提升至10次/秒。严禁通过其他方式调用API(例如curl、wget或手写代码)。NCBI_API_KEY - JSON 处理:使用过滤和转换JSON输出(若
jq不可用,可使用Python等效工具),以避免幻觉输出和上下文溢出。jq - 临时文件:为避免JSON文件污染工作目录,请在当前目录内使用临时目录。当并行运行多个Agent或任务时,确保每个任务使用唯一的子目录名称(例如)以避免文件冲突。
tmp_$TASK_ID/ - 通知要求:若使用本技能,需在输出中提及这一点,并列出所有用于生成输出的论文的URL。
Structure of the skill folder
技能文件夹结构
- - This file
SKILL.md - - The skill CLI
scripts/pubmed_api.py - - Directory with detailed function specifications
references/advanced-linking.mdadvanced-search.mdbulk-workflows.mdcitation-matching.mdcross-database-linking.mdfetch-and-resolve.mdsearch-and-discovery.mdutilities.md
- - 本文档
SKILL.md - - 技能CLI
scripts/pubmed_api.py - - 包含详细功能说明的目录
references/advanced-linking.mdadvanced-search.mdbulk-workflows.mdcitation-matching.mdcross-database-linking.mdfetch-and-resolve.mdsearch-and-discovery.mdutilities.md
CLI Usage
CLI 使用方法
bash
uv run scripts/pubmed_api.py <output_file> <function_name> <required_args> [--flag value ...]- Positional Arguments: Arguments are positional; list arguments are
passed as comma-separated strings without spaces (e.g.
).
"35113657,31234568" - Flag Options: Optional arguments can be passed as instead of positional args.
--flag value - Output Handling: On success, JSON is written to . On error, the process exits with a non-zero code and no output file is written.
output_file
bash
uv run scripts/pubmed_api.py <output_file> <function_name> <required_args> [--flag value ...]- 位置参数:参数为位置参数;列表参数需以无空格的逗号分隔字符串形式传递(例如)。
"35113657,31234568" - 标志选项:可选参数可通过形式传递,而非位置参数。
--flag value - 输出处理:成功时,JSON将写入;失败时,进程将以非零代码退出,且不会生成输出文件。
output_file
Example Usage
使用示例
bash
uv run scripts/pubmed_api.py ./search_results.json search_pubmed "BRCA1" --max_results 5
cat ./search_results.json | jq '.[]' -r
uv run scripts/pubmed_api.py ./abstracts.json fetch_article_abstracts "35113657"
cat ./abstracts.json | jq '.[0].title' -rbash
uv run scripts/pubmed_api.py ./search_results.json search_pubmed "BRCA1" --max_results 5
cat ./search_results.json | jq '.[]' -r
uv run scripts/pubmed_api.py ./abstracts.json fetch_article_abstracts "35113657"
cat ./abstracts.json | jq '.[0].title' -rEssential Recipes
实用操作指南
Join PMIDs for the next call (most common chaining pattern):
bash
cat ./search_results.json | jq -r 'join(",")'Slim abstracts to essential fields and truncate long abstracts:
bash
cat ./abstracts.json | jq '[.[] | {pmid, title, snippet: (.abstract // "")[:500]}]'Filter by keyword (null-safe):
bash
cat ./abstracts.json | jq '[.[] | select((.title // "") | contains("Review"))]'将PMID拼接用于下一次调用(最常见的链式操作模式):
bash
cat ./search_results.json | jq -r 'join(",")'精简摘要至核心字段并截断长摘要:
bash
cat ./abstracts.json | jq '[.[] | {pmid, title, snippet: (.abstract // "")[:500]}]'按关键词过滤(空值安全):
bash
cat ./abstracts.json | jq '[.[] | select((.title // "") | contains("Review"))]'Context Management & Accuracy
上下文管理与准确性
When processing larger result sets (>10 abstracts):
- Filter Early: Use to verify keywords in abstracts before reading the full JSON into context.
jq - Slimming: Extract only and
titlefields unless explicitly instructed otherwise. Author lists and metadata contribute to noise.abstract - Bulk Operations (N > 10): Avoid fetching or processing IDs one-by-one. The API and History Server are designed for bulk retrieval. Fetch all data in a single turn and use shell pipelines to slim the results before reading into context. This prevents turn exhaustion and context overflow.
- Grounding: Never use internal knowledge to provide specific identifiers (PMIDs, CIDs, Gene IDs) if no results are found. Report the tool's output accurately to ensure results are grounded in the current database state.
- Search Termination: When asked to find papers that may not exist, limit exploration to 3–5 high-quality, varied search queries. If no results match after these attempts, conclude that no papers meet the criteria rather than continuing to iterate — unless explicitly instructed to be thorough.
处理较大结果集(>10篇摘要)时:
- 提前过滤:在将完整JSON读取到上下文之前,使用验证摘要中的关键词。
jq - 精简数据:除非明确要求,否则仅提取和
title字段。作者列表和元数据会增加干扰信息。abstract - 批量操作(N > 10):避免逐个获取或处理ID。API和历史服务器专为批量检索设计。在单次操作中获取所有数据,并使用Shell管道精简结果后再读取到上下文。这可避免操作次数耗尽和上下文溢出。
- 结果锚定:如果未找到结果,切勿使用内部知识提供特定标识符(PMID、CID、基因ID)。准确报告工具输出,确保结果基于当前数据库状态。
- 搜索终止:当被要求查找可能不存在的论文时,限制探索次数为3–5次高质量、多样化的搜索查询。如果这些尝试后仍无匹配结果,则判定没有符合条件的论文,而非继续迭代——除非明确要求进行全面检索。
Functions
功能函数
⚠️ MANDATORY: You MUST read the linked reference file for a function group before calling any function in that group. The tables below only describe what each function does — not how to call it. Argument names, argument order, flags, and output schemas are only documented in the reference files. Do NOT guess or infer arguments from function names. If you call a function without first reading its reference, you will produce incorrect invocations.
⚠️ 强制要求:在调用任何函数组中的函数之前,你必须阅读对应的参考文档。下表仅描述每个函数的功能——而非调用方式。参数名称、参数顺序、标志和输出模式仅在参考文档中记录。请勿根据函数名称猜测或推断参数。如果未阅读参考文档就调用函数,必然会产生错误的调用。
Search
搜索
- : Find PMIDs matching a free-text or structured NCBI query.
search_pubmed - : Count how many records match a query across every NCBI database.
global_database_discovery
- :查找匹配自由文本或结构化NCBI查询的PMID。
search_pubmed - :统计每个NCBI数据库中匹配查询的记录数量。
global_database_discovery
Fetch & Resolve
获取与解析
- : Retrieve metadata and abstracts for a batch of PMIDs.
fetch_article_abstracts - : Retrieve open-access full text from PMC.
get_full_text_pmc - : Resolve opaque UIDs from any NCBI database into human-readable metadata.
fetch_database_summary
- :批量检索PMID对应的元数据和摘要。
fetch_article_abstracts - :从PMC获取开放获取的全文内容。
get_full_text_pmc - :将任意NCBI数据库中的不透明UID解析为人类可读的元数据。
fetch_database_summary
Cross-Database Linking
跨数据库关联
- : Find records in other NCBI databases linked to a source record.
find_linked_biological_data - : List all available ELink linknames for a given record.
discover_available_links
- :查找与源记录关联的其他NCBI数据库中的记录。
find_linked_biological_data - :列出给定记录的所有可用ELink链接名称。
discover_available_links
Bulk Workflows
批量工作流
When working with more than ~10 PMIDs, avoid processing IDs one-by-one.
Upload them to the NCBI History Server via to get a
session handle ( + ), then pass that handle to
or for a single bulk
call. Chain with shell pipelines to slim results before reading into
context. This prevents turn exhaustion and context overflow. See the reference
for complete workflow recipes (search→fetch, cross-db exploration, citation
resolution, and bulk retrieval with data slimming).
cache_results_historywebenvquery_keyfetch_article_abstractsfind_linked_biological_datajq- : Upload PMIDs to the NCBI History Server for bulk retrieval.
cache_results_history
当处理约10个以上PMID时,避免逐个处理ID。通过将PMID上传至NCBI历史服务器以获取会话句柄( + ),然后将该句柄传递给或进行单次批量调用。结合 Shell管道在读取到上下文之前精简结果。这可避免操作次数耗尽和上下文溢出。请查看参考文档获取完整工作流示例(搜索→获取、跨数据库探索、引文解析、带数据精简的批量检索)。
cache_results_historywebenvquery_keyfetch_article_abstractsfind_linked_biological_datajq- :将PMID上传至NCBI历史服务器以支持批量检索。
cache_results_history
Utilities
工具函数
- : Spell-check biomedical terms before searching.
verify_medical_spelling - : Resolve incomplete bibliographic citations to PMIDs.
match_raw_citations
- :在搜索前检查生物医学术语的拼写。
verify_medical_spelling - :将不完整的书目引文解析为PMID。
match_raw_citations