biorxiv-database

Original🇺🇸 English
Translated
1 scriptsChecked / no sensitive code detected

Search and retrieve preprints from bioRxiv. Use when asked to "search bioRxiv", "find preprints", "look up bioRxiv papers", or retrieve life sciences literature.

11installs
Added on

NPX Install

npx skill4agent add aminoanalytica/amina-skills biorxiv-database

bioRxiv Database

A Python toolkit for programmatic access to bioRxiv preprints. Supports comprehensive metadata retrieval with structured JSON output for integration into research workflows.

Use Cases

  • Query recent preprints by topic or research domain
  • Monitor publications from specific researchers
  • Perform systematic literature reviews
  • Analyze publication trends across time periods
  • Retrieve citation metadata and DOIs
  • Download preprint PDFs for text analysis
  • Filter results by subject category

Quick Start

bash
# Install dependencies
pip install requests

# Search by keywords
python scripts/biorxiv_client.py --terms "protein folding" --recent 30 --out results.json

# Search by author
python scripts/biorxiv_client.py --author "Chen" --recent 180

# Get specific paper by DOI
python scripts/biorxiv_client.py --doi "10.1101/2024.05.22.594321"

# Download PDF
python scripts/biorxiv_client.py --doi "10.1101/2024.05.22.594321" --fetch-pdf paper.pdf

Command-Line Options

OptionDescription
-t, --terms
Search keywords (multiple allowed)
-a, --author
Author name to search
--doi
Specific DOI to retrieve
--since
Start date (YYYY-MM-DD)
--until
End date (YYYY-MM-DD)
--recent
Search last N days
-s, --subject
Subject category filter
--fields
Fields to search: title, abstract, authors
-o, --out
Output file (default: stdout)
--max
Maximum results to return
--fetch-pdf
Download PDF (requires --doi)
-v, --verbose
Enable debug output

Programmatic API

python
from scripts.biorxiv_client import PreprintClient

client = PreprintClient(debug=True)

# Search by keywords
results = client.find_by_terms(
    terms=["enzyme engineering"],
    since="2024-01-01",
    until="2024-12-31",
    subject="biochemistry"
)

# Search by author
papers = client.find_by_author(name="Garcia", since="2023-01-01")

# Get paper by DOI
metadata = client.get_by_doi("10.1101/2024.05.22.594321")

# Download PDF
client.fetch_pdf(doi="10.1101/2024.05.22.594321", destination="paper.pdf")

# Normalize output
formatted = client.normalize(metadata, include_abstract=True)

Subject Categories

CategoryCategory
animal-behavior-and-cognitionmolecular-biology
biochemistryneuroscience
bioengineeringpaleontology
bioinformaticspathology
biophysicspharmacology-and-toxicology
cancer-biologyphysiology
cell-biologyplant-biology
clinical-trialsscientific-communication-and-education
developmental-biologysynthetic-biology
ecologysystems-biology
epidemiologyzoology
evolutionary-biology
genetics
genomics
immunology
microbiology

Response Structure

json
{
  "query": {
    "terms": ["protein folding"],
    "since": "2024-03-01",
    "until": "2024-09-30",
    "subject": "biophysics"
  },
  "count": 87,
  "papers": [
    {
      "doi": "10.1101/2024.05.22.594321",
      "title": "Example Preprint Title",
      "authors": "Chen L, Patel R, Kim S",
      "corresponding_author": "Chen L",
      "institution": "Research Institute",
      "posted": "2024-05-22",
      "revision": "1",
      "category": "biophysics",
      "license": "cc_by",
      "paper_type": "new results",
      "abstract": "Abstract content here...",
      "pdf_link": "https://www.biorxiv.org/content/10.1101/2024.05.22.594321v1.full.pdf",
      "web_link": "https://www.biorxiv.org/content/10.1101/2024.05.22.594321v1",
      "journal_ref": ""
    }
  ]
}

Best Practices

RecommendationDetails
Date rangesNarrow ranges improve response time. Split large queries into chunks.
Category filtersUse
--subject
to reduce bandwidth and improve precision.
Rate limitingBuilt-in 0.5s delay between requests. Add more for bulk operations.
Result cachingSave JSON outputs to avoid redundant API calls.
Version awarenessPreprints may have multiple versions. PDF URLs encode version numbers.
Error checkingVerify
count
in outputs. Zero results may indicate date or connectivity issues.
Debug modeUse
--verbose
for detailed request/response logging.

Reference Files

FileContents
api-reference.mdComplete bioRxiv REST API documentation
examples.mdExtended code examples and workflow patterns