literature-search-biorxiv
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesebioRxiv and medRxiv Literature Search
bioRxiv与medRxiv文献检索
Prerequisites
前提条件
- : Read the
uvskill and follow its Setup instructions to ensureuvis installed and on PATH.uv - User Notification: If LICENSE_NOTIFICATION.txt does not already exist in this skill directory then (1) prominently notify the user to check the terms at https://api.biorxiv.org/ and https://www.biorxiv.org/content/about-biorxiv and to always check the license of the papers retrieved by the skill for any restrictions, then (2) create the file recording the notification text and timestamp.
- :阅读
uv技能文档并遵循其设置说明,确保uv已安装且添加至系统PATH。uv - 用户通知:若此技能目录中不存在LICENSE_NOTIFICATION.txt文件,则需(1)显著通知用户查看https://api.biorxiv.org/和https://www.biorxiv.org/content/about-biorxiv上的条款,并务必检查该技能检索到的论文的许可证是否存在使用限制;(2)创建记录通知文本和时间戳的文件。
Search Strategy Guide (Read First)
检索策略指南(请先阅读)
This skill browses a date-based preprint archive. It is NOT a keyword search
engine. Choose your approach based on what you already know:
- A DOI (e.g., from a citation): Use . Fast and reliable.
search_by_doi.py - Approximate date + category: Use with a 1–4 week range and
search_by_dates.py.--category - Only a topic or keywords, no date: Do NOT use this skill for discovery. Use a keyword-capable literature skill first to find relevant DOIs, then return here to fetch metadata.
CRITICAL ANTI-PATTERN — Do NOT do this: Do NOT attempt to search broad date ranges (months or years) withhoping to find a specific paper. The bioRxiv API does not support server-side keyword search. The script must download ALL metadata for the entire date range and filter locally in Python. Broad ranges will result in thousands of API calls, timeouts, and your request being blocked for API abuse. This is the #1 reason this skill fails.--keywords
**本技能用于浏览基于日期的预印本文档库,并非关键词搜索引擎。**请根据已知信息选择合适的方法:
- 已知DOI(例如来自引用):使用,快速且可靠。
search_by_doi.py - 大致日期+分类:使用,设置1-4周的时间范围并搭配
search_by_dates.py参数。--category - 仅知道主题或关键词,无日期信息:请勿使用本技能进行探索性检索。请先使用支持关键词检索的文献技能找到相关DOI,再返回此处获取元数据。
重要反模式——请勿执行此操作:请勿尝试使用参数搜索宽泛的日期范围(数月或数年)以查找特定论文。bioRxiv API不支持服务器端关键词搜索,脚本必须下载整个日期范围内的所有元数据并在Python中本地筛选。宽泛的日期范围会导致数千次API调用、超时,甚至因API滥用导致请求被拦截。这是本技能失效的首要原因。--keywords
Core Rules
核心规则
- Use the Wrapper: ALWAYS execute the provided helper scripts to query the database rather than accessing the database directly. The scripts automatically enforce the required rate limit gracefully.
- Local Filtering (CRITICAL WARNING): Unlike arXiv, the bioRxiv API does
not support server-side keyword or author searches. Keyword and author
filtering is performed locally by the scripts after downloading all
metadata for a specified date range. You MUST use narrow date ranges
(e.g., 1-4 weeks) AND the filter when searching with
--categoryor--keywords.--author - Abstracts Excluded By Default: To save context space in the resulting
JSON, abstracts are stripped from the output by default. If you are
searching by and want to read the abstracts of the resulting papers to understand their context, you MUST pass the
--keywordsflag.--include_abstracts - Output Redirection: Search commands output JSON arrays to standard
output. Always redirect output to a file (e.g., ) and parse the file separately.
> results.json - List Sources If this skill is used, ensure this is mentioned in the output AND list the URLs of all papers that were used in producing the output.
- 使用封装脚本:始终执行提供的辅助脚本查询数据库,而非直接访问数据库。脚本会自动优雅地执行必要的速率限制。
- 本地筛选(重要警告):与arXiv不同,bioRxiv API不支持服务器端关键词或作者检索。关键词和作者筛选由脚本在下载指定日期范围内的所有元数据后在本地执行。使用或
--keywords参数检索时,必须使用较窄的日期范围(例如1-4周)并搭配--author筛选条件。--category - 默认排除摘要:为节省生成的JSON的上下文空间,默认会从输出中移除摘要。若使用检索并希望查看结果论文的摘要以了解其背景,必须传递
--keywords标志。--include_abstracts - 输出重定向:检索命令会将JSON数组输出至标准输出。请始终将输出重定向至文件(例如)并单独解析该文件。
> results.json - 列出来源:若使用本技能,请确保在输出中提及这一点,并列出所有用于生成输出的论文的URL。
Utility Scripts
实用脚本
All tools enforce a cross-process rate limits and retry with backoff on failure.
To ensure you respect terms-of-service, do NOT write custom queries.
curlPagination: The bioRxiv API returns results in pages of up to 100 papers.
The script automatically fetches all pages and reports
pagination progress to stderr (e.g., ). The
JSON output to stdout contains the complete filtered result set across all
pages — no manual pagination is needed.
search_by_dates.py[Page 2] Fetched 200/543 papers...所有工具都会执行跨进程速率限制,并在失败时自动重试并退避。为确保遵守服务条款,请勿编写自定义查询。
curl分页处理:bioRxiv API会以每页最多100篇论文的形式返回结果。脚本会自动获取所有页面,并向标准错误输出报告分页进度(例如)。向标准输出输出的JSON包含所有页面的完整筛选结果集——无需手动分页。
search_by_dates.py[Page 2] Fetched 200/543 papers...1. Search by Dates (search_by_dates.py
)
search_by_dates.py1. 按日期检索(search_by_dates.py
)
search_by_dates.pySearch for preprints within an explicit date range, optionally filtering by
category, keywords, or author.
bash
undefined在指定日期范围内检索预印本,可选择按分类、关键词或作者筛选。
bash
undefinedBroad category search over a 2-week period
对神经科学分类进行为期2周的宽泛检索
uv run scripts/search_by_dates.py --server biorxiv
--start_date 2024-01-01 --end_date 2024-01-14
--category neuroscience > results.json
--start_date 2024-01-01 --end_date 2024-01-14
--category neuroscience > results.json
uv run scripts/search_by_dates.py --server biorxiv
--start_date 2024-01-01 --end_date 2024-01-14
--category neuroscience > results.json
--start_date 2024-01-01 --end_date 2024-01-14
--category neuroscience > results.json
Deep keyword filtering using OR logic and including abstracts
使用OR逻辑进行深度关键词筛选并包含摘要
uv run scripts/search_by_dates.py --server medrxiv
--start_date 2023-11-01 --end_date 2023-11-30
--category infectious_diseases
--keywords "covid" "sars-cov-2" --match_logic OR
--include_abstracts > covid_papers.json
--start_date 2023-11-01 --end_date 2023-11-30
--category infectious_diseases
--keywords "covid" "sars-cov-2" --match_logic OR
--include_abstracts > covid_papers.json
uv run scripts/search_by_dates.py --server medrxiv
--start_date 2023-11-01 --end_date 2023-11-30
--category infectious_diseases
--keywords "covid" "sars-cov-2" --match_logic OR
--include_abstracts > covid_papers.json
--start_date 2023-11-01 --end_date 2023-11-30
--category infectious_diseases
--keywords "covid" "sars-cov-2" --match_logic OR
--include_abstracts > covid_papers.json
Finding papers by a specific author in a narrow window
在窄时间窗口内查找特定作者的论文
uv run scripts/search_by_dates.py
--start_date 2024-05-01 --end_date 2024-05-14
--author "Smith" > smith_papers.json
--start_date 2024-05-01 --end_date 2024-05-14
--author "Smith" > smith_papers.json
*Required Arguments:*
- `--start_date`: YYYY-MM-DD
- `--end_date`: YYYY-MM-DD
*Optional Arguments:*
- `--server`: `biorxiv` (default) or `medrxiv`
- `--category`: A valid subject category (see below). **Highly recommended** —
dramatically reduces the data the script must download and filter.
- `--keywords`: List of strings to search in the title/abstract.
- `--match_logic`: `AND` (default) or `OR` for keywords.
- `--author`: Author name (case-insensitive string match).
- `--include_abstracts`: Flag to include full abstracts in the JSON output.uv run scripts/search_by_dates.py
--start_date 2024-05-01 --end_date 2024-05-14
--author "Smith" > smith_papers.json
--start_date 2024-05-01 --end_date 2024-05-14
--author "Smith" > smith_papers.json
*必填参数:*
- `--start_date`:格式为YYYY-MM-DD
- `--end_date`:格式为YYYY-MM-DD
*可选参数:*
- `--server`:`biorxiv`(默认)或`medrxiv`
- `--category`:有效的主题分类(见下文)。**强烈推荐使用**——可大幅减少脚本需下载和筛选的数据量。
- `--keywords`:用于在标题/摘要中搜索的字符串列表。
- `--match_logic`:关键词匹配逻辑,可选`AND`(默认)或`OR`。
- `--author`:作者姓名(不区分大小写的字符串匹配)。
- `--include_abstracts`:在JSON输出中包含完整摘要的标志。2. Fetch Metadata by DOI (search_by_doi.py
)
search_by_doi.py2. 通过DOI获取元数据(search_by_doi.py
)
search_by_doi.pyRetrieve the detailed JSON metadata for a single paper if you already know its
DOI. This is the most reliable entry point.
bash
uv run scripts/search_by_doi.py --server biorxiv \
--doi "10.1101/2023.08.15.551388" \
--include_abstracts > paper_info.json若已知论文的DOI,可检索单篇论文的详细JSON元数据。这是最可靠的入口方式。
bash
uv run scripts/search_by_doi.py --server biorxiv \
--doi "10.1101/2023.08.15.551388" \
--include_abstracts > paper_info.jsonDownloading Full-Text PDFs
下载全文PDF
This skill does NOT support PDF downloads. To download the full-text PDF of a bioRxiv or medRxiv preprint, use theskill. First, use the paper's DOI to look up its PMCID via EuropePMC, then use EuropePMC's PDF retrieval to download the document.literature-search-europepmc
本技能不支持PDF下载。若要下载bioRxiv或medRxiv预印本的全文PDF,请使用****技能。首先通过EuropePMC使用论文的DOI查找其PMCID,然后使用EuropePMC的PDF检索功能下载文档。literature-search-europepmc
Valid Subject Categories
有效主题分类
You can pass these to the flag in . The script
will strictly validate them.
--categorysearch_by_dates.py可将以下分类传递给中的参数,脚本会严格验证这些分类。
search_by_dates.py--categorybioRxiv Categories:
bioRxiv分类:
animal_behavior_and_cognitionbiochemistrybioengineeringbioinformaticsbiophysicscancer_biologycell_biologyclinical_trialsdevelopmental_biologyecologyepidemiologyevolutionary_biologygeneticsgenomicsimmunologymicrobiologymolecular_biologyneurosciencepaleontologypathologypharmacology_and_toxicologyphysiologyplant_biologyscientific_communication_and_educationsynthetic_biologysystems_biologyzoologyanimal_behavior_and_cognitionbiochemistrybioengineeringbioinformaticsbiophysicscancer_biologycell_biologyclinical_trialsdevelopmental_biologyecologyepidemiologyevolutionary_biologygeneticsgenomicsimmunologymicrobiologymolecular_biologyneurosciencepaleontologypathologypharmacology_and_toxicologyphysiologyplant_biologyscientific_communication_and_educationsynthetic_biologysystems_biologyzoologymedRxiv Categories:
medRxiv分类:
addiction_medicineallergy_and_immunologyanesthesiacardiovascular_medicinedentistry_and_oral_medicinedermatologyemergency_medicineendocrinologyepidemiologyforensic_medicinegastroenterologygenetic_and_genomic_medicinehealth_informaticshealth_economics_and_outcomes_researchhealth_policyhealth_systems_and_quality_improvementhematologyhiv_aidsinfectious_diseasesintensive_care_and_critical_care_medicinemedical_educationmedical_ethicsnephrologyneurologynursingnutritionobstetrics_and_gynecologyoccupational_and_environmental_healthoncologyophthalmologyorthopedicsotolaryngologypain_medicinepalliative_carepathologypediatricspharmacology_and_therapeuticsprimary_care_researchpsychiatry_and_clinical_psychologypublic_and_global_healthradiology_and_imagingrehabilitation_medicine_and_physical_therapyrespiratory_medicinerheumatologysexual_and_reproductive_healthsports_medicinesurgerytoxicologytransplantationurologyaddiction_medicineallergy_and_immunologyanesthesiacardiovascular_medicinedentistry_and_oral_medicinedermatologyemergency_medicineendocrinologyepidemiologyforensic_medicinegastroenterologygenetic_and_genomic_medicinehealth_informaticshealth_economics_and_outcomes_researchhealth_policyhealth_systems_and_quality_improvementhematologyhiv_aidsinfectious_diseasesintensive_care_and_critical_care_medicinemedical_educationmedical_ethicsnephrologyneurologynursingnutritionobstetrics_and_gynecologyoccupational_and_environmental_healthoncologyophthalmologyorthopedicsotolaryngologypain_medicinepalliative_carepathologypediatricspharmacology_and_therapeuticsprimary_care_researchpsychiatry_and_clinical_psychologypublic_and_global_healthradiology_and_imagingrehabilitation_medicine_and_physical_therapyrespiratory_medicinerheumatologysexual_and_reproductive_healthsports_medicinesurgerytoxicologytransplantationurology