biorxiv-database

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

bioRxiv Database

bioRxiv数据库

A Python toolkit for programmatic access to bioRxiv preprints. Supports comprehensive metadata retrieval with structured JSON output for integration into research workflows.
一个用于程序化访问bioRxiv预印本的Python工具包。支持全面的元数据检索,并输出结构化JSON格式,以便集成到研究工作流中。

Use Cases

使用场景

  • Query recent preprints by topic or research domain
  • Monitor publications from specific researchers
  • Perform systematic literature reviews
  • Analyze publication trends across time periods
  • Retrieve citation metadata and DOIs
  • Download preprint PDFs for text analysis
  • Filter results by subject category
  • 按主题或研究领域查询近期预印本
  • 监控特定研究人员的出版物
  • 进行系统性文献综述
  • 分析不同时间段的出版趋势
  • 检索引用元数据和DOI
  • 下载预印本PDF用于文本分析
  • 按学科类别筛选结果

Quick Start

快速开始

bash
undefined
bash
undefined

Install dependencies

安装依赖

pip install requests
pip install requests

Search by keywords

按关键词搜索

python scripts/biorxiv_client.py --terms "protein folding" --recent 30 --out results.json
python scripts/biorxiv_client.py --terms "protein folding" --recent 30 --out results.json

Search by author

按作者搜索

python scripts/biorxiv_client.py --author "Chen" --recent 180
python scripts/biorxiv_client.py --author "Chen" --recent 180

Get specific paper by DOI

通过DOI获取特定论文

python scripts/biorxiv_client.py --doi "10.1101/2024.05.22.594321"
python scripts/biorxiv_client.py --doi "10.1101/2024.05.22.594321"

Download PDF

下载PDF

python scripts/biorxiv_client.py --doi "10.1101/2024.05.22.594321" --fetch-pdf paper.pdf
undefined
python scripts/biorxiv_client.py --doi "10.1101/2024.05.22.594321" --fetch-pdf paper.pdf
undefined

Command-Line Options

命令行选项

OptionDescription
-t, --terms
Search keywords (multiple allowed)
-a, --author
Author name to search
--doi
Specific DOI to retrieve
--since
Start date (YYYY-MM-DD)
--until
End date (YYYY-MM-DD)
--recent
Search last N days
-s, --subject
Subject category filter
--fields
Fields to search: title, abstract, authors
-o, --out
Output file (default: stdout)
--max
Maximum results to return
--fetch-pdf
Download PDF (requires --doi)
-v, --verbose
Enable debug output
选项描述
-t, --terms
搜索关键词(可输入多个)
-a, --author
要搜索的作者姓名
--doi
要检索的特定DOI
--since
起始日期(YYYY-MM-DD)
--until
结束日期(YYYY-MM-DD)
--recent
搜索最近N天的内容
-s, --subject
学科类别筛选器
--fields
要搜索的字段:标题、摘要、作者
-o, --out
输出文件(默认:标准输出)
--max
要返回的最大结果数
--fetch-pdf
下载PDF(需要搭配--doi使用)
-v, --verbose
启用调试输出

Programmatic API

程序化API

python
from scripts.biorxiv_client import PreprintClient

client = PreprintClient(debug=True)
python
from scripts.biorxiv_client import PreprintClient

client = PreprintClient(debug=True)

Search by keywords

按关键词搜索

results = client.find_by_terms( terms=["enzyme engineering"], since="2024-01-01", until="2024-12-31", subject="biochemistry" )
results = client.find_by_terms( terms=["enzyme engineering"], since="2024-01-01", until="2024-12-31", subject="biochemistry" )

Search by author

按作者搜索

papers = client.find_by_author(name="Garcia", since="2023-01-01")
papers = client.find_by_author(name="Garcia", since="2023-01-01")

Get paper by DOI

通过DOI获取论文

metadata = client.get_by_doi("10.1101/2024.05.22.594321")
metadata = client.get_by_doi("10.1101/2024.05.22.594321")

Download PDF

下载PDF

client.fetch_pdf(doi="10.1101/2024.05.22.594321", destination="paper.pdf")
client.fetch_pdf(doi="10.1101/2024.05.22.594321", destination="paper.pdf")

Normalize output

标准化输出

formatted = client.normalize(metadata, include_abstract=True)
undefined
formatted = client.normalize(metadata, include_abstract=True)
undefined

Subject Categories

学科类别

CategoryCategory
animal-behavior-and-cognitionmolecular-biology
biochemistryneuroscience
bioengineeringpaleontology
bioinformaticspathology
biophysicspharmacology-and-toxicology
cancer-biologyphysiology
cell-biologyplant-biology
clinical-trialsscientific-communication-and-education
developmental-biologysynthetic-biology
ecologysystems-biology
epidemiologyzoology
evolutionary-biology
genetics
genomics
immunology
microbiology
类别类别
animal-behavior-and-cognitionmolecular-biology
biochemistryneuroscience
bioengineeringpaleontology
bioinformaticspathology
biophysicspharmacology-and-toxicology
cancer-biologyphysiology
cell-biologyplant-biology
clinical-trialsscientific-communication-and-education
developmental-biologysynthetic-biology
ecologysystems-biology
epidemiologyzoology
evolutionary-biology
genetics
genomics
immunology
microbiology

Response Structure

响应结构

json
{
  "query": {
    "terms": ["protein folding"],
    "since": "2024-03-01",
    "until": "2024-09-30",
    "subject": "biophysics"
  },
  "count": 87,
  "papers": [
    {
      "doi": "10.1101/2024.05.22.594321",
      "title": "Example Preprint Title",
      "authors": "Chen L, Patel R, Kim S",
      "corresponding_author": "Chen L",
      "institution": "Research Institute",
      "posted": "2024-05-22",
      "revision": "1",
      "category": "biophysics",
      "license": "cc_by",
      "paper_type": "new results",
      "abstract": "Abstract content here...",
      "pdf_link": "https://www.biorxiv.org/content/10.1101/2024.05.22.594321v1.full.pdf",
      "web_link": "https://www.biorxiv.org/content/10.1101/2024.05.22.594321v1",
      "journal_ref": ""
    }
  ]
}
json
{
  "query": {
    "terms": ["protein folding"],
    "since": "2024-03-01",
    "until": "2024-09-30",
    "subject": "biophysics"
  },
  "count": 87,
  "papers": [
    {
      "doi": "10.1101/2024.05.22.594321",
      "title": "Example Preprint Title",
      "authors": "Chen L, Patel R, Kim S",
      "corresponding_author": "Chen L",
      "institution": "Research Institute",
      "posted": "2024-05-22",
      "revision": "1",
      "category": "biophysics",
      "license": "cc_by",
      "paper_type": "new results",
      "abstract": "Abstract content here...",
      "pdf_link": "https://www.biorxiv.org/content/10.1101/2024.05.22.594321v1.full.pdf",
      "web_link": "https://www.biorxiv.org/content/10.1101/2024.05.22.594321v1",
      "journal_ref": ""
    }
  ]
}

Best Practices

最佳实践

RecommendationDetails
Date rangesNarrow ranges improve response time. Split large queries into chunks.
Category filtersUse
--subject
to reduce bandwidth and improve precision.
Rate limitingBuilt-in 0.5s delay between requests. Add more for bulk operations.
Result cachingSave JSON outputs to avoid redundant API calls.
Version awarenessPreprints may have multiple versions. PDF URLs encode version numbers.
Error checkingVerify
count
in outputs. Zero results may indicate date or connectivity issues.
Debug modeUse
--verbose
for detailed request/response logging.
建议详情
日期范围缩小日期范围可提升响应速度。将大型查询拆分为多个小查询。
类别筛选使用
--subject
选项减少带宽占用并提高检索精度。
请求频率限制内置0.5秒的请求间隔。批量操作时可增加间隔时间。
结果缓存保存JSON输出以避免重复调用API。
版本意识预印本可能有多个版本。PDF链接中包含版本号。
错误检查验证输出中的
count
值。结果为零可能表示日期设置或连接问题。
调试模式使用
--verbose
选项查看详细的请求/响应日志。

Reference Files

参考文件

FileContents
api-reference.mdComplete bioRxiv REST API documentation
examples.mdExtended code examples and workflow patterns
文件内容
api-reference.md完整的bioRxiv REST API文档
examples.md扩展代码示例和工作流模式