biorxiv-database
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesebioRxiv Database
bioRxiv数据库
A Python toolkit for programmatic access to bioRxiv preprints. Supports comprehensive metadata retrieval with structured JSON output for integration into research workflows.
一个用于程序化访问bioRxiv预印本的Python工具包。支持全面的元数据检索,并输出结构化JSON格式,以便集成到研究工作流中。
Use Cases
使用场景
- Query recent preprints by topic or research domain
- Monitor publications from specific researchers
- Perform systematic literature reviews
- Analyze publication trends across time periods
- Retrieve citation metadata and DOIs
- Download preprint PDFs for text analysis
- Filter results by subject category
- 按主题或研究领域查询近期预印本
- 监控特定研究人员的出版物
- 进行系统性文献综述
- 分析不同时间段的出版趋势
- 检索引用元数据和DOI
- 下载预印本PDF用于文本分析
- 按学科类别筛选结果
Quick Start
快速开始
bash
undefinedbash
undefinedInstall dependencies
安装依赖
pip install requests
pip install requests
Search by keywords
按关键词搜索
python scripts/biorxiv_client.py --terms "protein folding" --recent 30 --out results.json
python scripts/biorxiv_client.py --terms "protein folding" --recent 30 --out results.json
Search by author
按作者搜索
python scripts/biorxiv_client.py --author "Chen" --recent 180
python scripts/biorxiv_client.py --author "Chen" --recent 180
Get specific paper by DOI
通过DOI获取特定论文
python scripts/biorxiv_client.py --doi "10.1101/2024.05.22.594321"
python scripts/biorxiv_client.py --doi "10.1101/2024.05.22.594321"
Download PDF
下载PDF
python scripts/biorxiv_client.py --doi "10.1101/2024.05.22.594321" --fetch-pdf paper.pdf
undefinedpython scripts/biorxiv_client.py --doi "10.1101/2024.05.22.594321" --fetch-pdf paper.pdf
undefinedCommand-Line Options
命令行选项
| Option | Description |
|---|---|
| Search keywords (multiple allowed) |
| Author name to search |
| Specific DOI to retrieve |
| Start date (YYYY-MM-DD) |
| End date (YYYY-MM-DD) |
| Search last N days |
| Subject category filter |
| Fields to search: title, abstract, authors |
| Output file (default: stdout) |
| Maximum results to return |
| Download PDF (requires --doi) |
| Enable debug output |
| 选项 | 描述 |
|---|---|
| 搜索关键词(可输入多个) |
| 要搜索的作者姓名 |
| 要检索的特定DOI |
| 起始日期(YYYY-MM-DD) |
| 结束日期(YYYY-MM-DD) |
| 搜索最近N天的内容 |
| 学科类别筛选器 |
| 要搜索的字段:标题、摘要、作者 |
| 输出文件(默认:标准输出) |
| 要返回的最大结果数 |
| 下载PDF(需要搭配--doi使用) |
| 启用调试输出 |
Programmatic API
程序化API
python
from scripts.biorxiv_client import PreprintClient
client = PreprintClient(debug=True)python
from scripts.biorxiv_client import PreprintClient
client = PreprintClient(debug=True)Search by keywords
按关键词搜索
results = client.find_by_terms(
terms=["enzyme engineering"],
since="2024-01-01",
until="2024-12-31",
subject="biochemistry"
)
results = client.find_by_terms(
terms=["enzyme engineering"],
since="2024-01-01",
until="2024-12-31",
subject="biochemistry"
)
Search by author
按作者搜索
papers = client.find_by_author(name="Garcia", since="2023-01-01")
papers = client.find_by_author(name="Garcia", since="2023-01-01")
Get paper by DOI
通过DOI获取论文
metadata = client.get_by_doi("10.1101/2024.05.22.594321")
metadata = client.get_by_doi("10.1101/2024.05.22.594321")
Download PDF
下载PDF
client.fetch_pdf(doi="10.1101/2024.05.22.594321", destination="paper.pdf")
client.fetch_pdf(doi="10.1101/2024.05.22.594321", destination="paper.pdf")
Normalize output
标准化输出
formatted = client.normalize(metadata, include_abstract=True)
undefinedformatted = client.normalize(metadata, include_abstract=True)
undefinedSubject Categories
学科类别
| Category | Category |
|---|---|
| animal-behavior-and-cognition | molecular-biology |
| biochemistry | neuroscience |
| bioengineering | paleontology |
| bioinformatics | pathology |
| biophysics | pharmacology-and-toxicology |
| cancer-biology | physiology |
| cell-biology | plant-biology |
| clinical-trials | scientific-communication-and-education |
| developmental-biology | synthetic-biology |
| ecology | systems-biology |
| epidemiology | zoology |
| evolutionary-biology | |
| genetics | |
| genomics | |
| immunology | |
| microbiology |
| 类别 | 类别 |
|---|---|
| animal-behavior-and-cognition | molecular-biology |
| biochemistry | neuroscience |
| bioengineering | paleontology |
| bioinformatics | pathology |
| biophysics | pharmacology-and-toxicology |
| cancer-biology | physiology |
| cell-biology | plant-biology |
| clinical-trials | scientific-communication-and-education |
| developmental-biology | synthetic-biology |
| ecology | systems-biology |
| epidemiology | zoology |
| evolutionary-biology | |
| genetics | |
| genomics | |
| immunology | |
| microbiology |
Response Structure
响应结构
json
{
"query": {
"terms": ["protein folding"],
"since": "2024-03-01",
"until": "2024-09-30",
"subject": "biophysics"
},
"count": 87,
"papers": [
{
"doi": "10.1101/2024.05.22.594321",
"title": "Example Preprint Title",
"authors": "Chen L, Patel R, Kim S",
"corresponding_author": "Chen L",
"institution": "Research Institute",
"posted": "2024-05-22",
"revision": "1",
"category": "biophysics",
"license": "cc_by",
"paper_type": "new results",
"abstract": "Abstract content here...",
"pdf_link": "https://www.biorxiv.org/content/10.1101/2024.05.22.594321v1.full.pdf",
"web_link": "https://www.biorxiv.org/content/10.1101/2024.05.22.594321v1",
"journal_ref": ""
}
]
}json
{
"query": {
"terms": ["protein folding"],
"since": "2024-03-01",
"until": "2024-09-30",
"subject": "biophysics"
},
"count": 87,
"papers": [
{
"doi": "10.1101/2024.05.22.594321",
"title": "Example Preprint Title",
"authors": "Chen L, Patel R, Kim S",
"corresponding_author": "Chen L",
"institution": "Research Institute",
"posted": "2024-05-22",
"revision": "1",
"category": "biophysics",
"license": "cc_by",
"paper_type": "new results",
"abstract": "Abstract content here...",
"pdf_link": "https://www.biorxiv.org/content/10.1101/2024.05.22.594321v1.full.pdf",
"web_link": "https://www.biorxiv.org/content/10.1101/2024.05.22.594321v1",
"journal_ref": ""
}
]
}Best Practices
最佳实践
| Recommendation | Details |
|---|---|
| Date ranges | Narrow ranges improve response time. Split large queries into chunks. |
| Category filters | Use |
| Rate limiting | Built-in 0.5s delay between requests. Add more for bulk operations. |
| Result caching | Save JSON outputs to avoid redundant API calls. |
| Version awareness | Preprints may have multiple versions. PDF URLs encode version numbers. |
| Error checking | Verify |
| Debug mode | Use |
| 建议 | 详情 |
|---|---|
| 日期范围 | 缩小日期范围可提升响应速度。将大型查询拆分为多个小查询。 |
| 类别筛选 | 使用 |
| 请求频率限制 | 内置0.5秒的请求间隔。批量操作时可增加间隔时间。 |
| 结果缓存 | 保存JSON输出以避免重复调用API。 |
| 版本意识 | 预印本可能有多个版本。PDF链接中包含版本号。 |
| 错误检查 | 验证输出中的 |
| 调试模式 | 使用 |
Reference Files
参考文件
| File | Contents |
|---|---|
| api-reference.md | Complete bioRxiv REST API documentation |
| examples.md | Extended code examples and workflow patterns |
| 文件 | 内容 |
|---|---|
| api-reference.md | 完整的bioRxiv REST API文档 |
| examples.md | 扩展代码示例和工作流模式 |