openalex-database

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

OpenAlex Database

OpenAlex数据库

Overview

概述

OpenAlex is a comprehensive open catalog of 240M+ scholarly works, authors, institutions, topics, sources, publishers, and funders. This skill provides tools and workflows for querying the OpenAlex API to search literature, analyze research output, track citations, and conduct bibliometric studies.

OpenAlex是一个包含2.4亿+学术成果、作者、机构、主题、来源、出版商和资助方的综合性开放目录。该技能提供了查询OpenAlex API的工具和工作流，用于检索文献、分析研究产出、追踪引用情况以及开展文献计量研究。

Quick Start

快速开始

Basic Setup

基础设置

Always initialize the client with an email address to access the polite pool (10x rate limit boost):

python

from scripts.openalex_client import OpenAlexClient

client = OpenAlexClient(email="your-email@example.edu")

请始终使用电子邮箱初始化客户端，以接入礼貌请求池（速率限制提升10倍）：

python

from scripts.openalex_client import OpenAlexClient

client = OpenAlexClient(email="your-email@example.edu")

Installation Requirements

安装要求

Install required package using uv:

bash

uv pip install requests

No API key required - OpenAlex is completely open.

使用uv安装所需包：

bash

uv pip install requests

无需API密钥 - OpenAlex完全开放。

Core Capabilities

核心功能

1. Search for Papers

1. 检索论文

Use for: Finding papers by title, abstract, or topic

python

undefined

适用场景：通过标题、摘要或主题查找论文

python

undefined

Simple search

简单检索

results = client.search_works( search="machine learning", per_page=100 )

Search with filters

带筛选条件的检索

results = client.search_works( search="CRISPR gene editing", filter_params={ "publication_year": ">2020", "is_oa": "true" }, sort="cited_by_count:desc" )

undefined

results = client.search_works( search="CRISPR gene editing", filter_params={ "publication_year": ">2020", "is_oa": "true" }, sort="cited_by_count:desc" )

undefined

2. Find Works by Author

2. 查找作者成果

Use for: Getting all publications by a specific researcher

Use the two-step pattern (entity name → ID → works):

python

from scripts.query_helpers import find_author_works

works = find_author_works(
    author_name="Jennifer Doudna",
    client=client,
    limit=100
)

Manual two-step approach:

python

undefined

适用场景：获取特定研究者的所有出版物

使用两步模式（实体名称→ID→成果）：

python

from scripts.query_helpers import find_author_works

works = find_author_works(
    author_name="Jennifer Doudna",
    client=client,
    limit=100
)

手动两步法:

python

undefined

Step 1: Get author ID

步骤1：获取作者ID

author_response = client._make_request( '/authors', params={'search': 'Jennifer Doudna', 'per-page': 1} ) author_id = author_response['results'][0]['id'].split('/')[-1]

Step 2: Get works

步骤2：获取成果

works = client.search_works( filter_params={"authorships.author.id": author_id} )

undefined

works = client.search_works( filter_params={"authorships.author.id": author_id} )

undefined

3. Find Works from Institution

3. 查找机构成果

Use for: Analyzing research output from universities or organizations

python

from scripts.query_helpers import find_institution_works

works = find_institution_works(
    institution_name="Stanford University",
    client=client,
    limit=200
)

适用场景：分析高校或机构的研究产出

python

from scripts.query_helpers import find_institution_works

works = find_institution_works(
    institution_name="Stanford University",
    client=client,
    limit=200
)

4. Highly Cited Papers

4. 高被引论文

Use for: Finding influential papers in a field

python

from scripts.query_helpers import find_highly_cited_recent_papers

papers = find_highly_cited_recent_papers(
    topic="quantum computing",
    years=">2020",
    client=client,
    limit=100
)

适用场景：查找领域内有影响力的论文

python

from scripts.query_helpers import find_highly_cited_recent_papers

papers = find_highly_cited_recent_papers(
    topic="quantum computing",
    years=">2020",
    client=client,
    limit=100
)

5. Open Access Papers

5. 开放获取论文

Use for: Finding freely available research

python

from scripts.query_helpers import get_open_access_papers

papers = get_open_access_papers(
    search_term="climate change",
    client=client,
    oa_status="any",  # or "gold", "green", "hybrid", "bronze"
    limit=200
)

适用场景：查找可免费获取的研究成果

python

from scripts.query_helpers import get_open_access_papers

papers = get_open_access_papers(
    search_term="climate change",
    client=client,
    oa_status="any",  # 或 "gold", "green", "hybrid", "bronze"
    limit=200
)

6. Publication Trends Analysis

6. 出版趋势分析

Use for: Tracking research output over time

python

from scripts.query_helpers import get_publication_trends

trends = get_publication_trends(
    search_term="artificial intelligence",
    filter_params={"is_oa": "true"},
    client=client
)

适用场景：追踪研究产出随时间的变化趋势

python

from scripts.query_helpers import get_publication_trends

trends = get_publication_trends(
    search_term="artificial intelligence",
    filter_params={"is_oa": "true"},
    client=client
)

Sort and display

排序并展示

for trend in sorted(trends, key=lambda x: x['key'])[-10:]: print(f"{trend['key']}: {trend['count']} publications")

undefined

for trend in sorted(trends, key=lambda x: x['key'])[-10:]: print(f"{trend['key']}: {trend['count']} 篇出版物")

undefined

7. Research Output Analysis

7. 研究产出分析

Use for: Comprehensive analysis of author or institution research

python

from scripts.query_helpers import analyze_research_output

analysis = analyze_research_output(
    entity_type='institution',  # or 'author'
    entity_name='MIT',
    client=client,
    years='>2020'
)

print(f"Total works: {analysis['total_works']}")
print(f"Open access: {analysis['open_access_percentage']}%")
print(f"Top topics: {analysis['top_topics'][:5]}")

适用场景：对作者或机构的研究成果进行综合分析

python

from scripts.query_helpers import analyze_research_output

analysis = analyze_research_output(
    entity_type='institution',  # 或 'author'
    entity_name='MIT',
    client=client,
    years='>2020'
)

print(f"总成果数: {analysis['total_works']}")
print(f"开放获取占比: {analysis['open_access_percentage']}%")
print(f"热门主题: {analysis['top_topics'][:5]}")

8. Batch Lookups

8. 批量查询

Use for: Getting information for multiple DOIs, ORCIDs, or IDs efficiently

python

dois = [
    "https://doi.org/10.1038/s41586-021-03819-2",
    "https://doi.org/10.1126/science.abc1234",
    # ... up to 50 DOIs
]

works = client.batch_lookup(
    entity_type='works',
    ids=dois,
    id_field='doi'
)

适用场景：高效获取多个DOI、ORCID或ID的相关信息

python

dois = [
    "https://doi.org/10.1038/s41586-021-03819-2",
    "https://doi.org/10.1126/science.abc1234",
    # ... 最多支持50个DOI
]

works = client.batch_lookup(
    entity_type='works',
    ids=dois,
    id_field='doi'
)

9. Random Sampling

9. 随机抽样

Use for: Getting representative samples for analysis

python

undefined

适用场景：获取用于分析的代表性样本

python

undefined

Small sample

小样本

works = client.sample_works( sample_size=100, seed=42, # For reproducibility filter_params={"publication_year": "2023"} )

works = client.sample_works( sample_size=100, seed=42, # 保证可复现 filter_params={"publication_year": "2023"} )

Large sample (>10k) - automatically handles multiple requests

大样本（>10k）- 自动处理多轮请求

works = client.sample_works( sample_size=25000, seed=42, filter_params={"is_oa": "true"} )

undefined

works = client.sample_works( sample_size=25000, seed=42, filter_params={"is_oa": "true"} )

undefined

10. Citation Analysis

10. 引用分析

Use for: Finding papers that cite a specific work

python

undefined

适用场景：查找引用某一特定成果的论文

python

undefined

Get the work

获取目标成果

work = client.get_entity('works', 'https://doi.org/10.1038/s41586-021-03819-2')

Get citing papers using cited_by_api_url

通过cited_by_api_url获取引用论文

import requests citing_response = requests.get( work['cited_by_api_url'], params={'mailto': client.email, 'per-page': 200} ) citing_works = citing_response.json()['results']

undefined

import requests citing_response = requests.get( work['cited_by_api_url'], params={'mailto': client.email, 'per-page': 200} ) citing_works = citing_response.json()['results']

undefined

11. Topic and Subject Analysis

11. 主题与学科分析

Use for: Understanding research focus areas

python

undefined

适用场景：了解研究聚焦领域

python

undefined

Get top topics for an institution

获取某机构的热门主题

topics = client.group_by( entity_type='works', group_field='topics.id', filter_params={ "authorships.institutions.id": "I136199984", # MIT "publication_year": ">2020" } )

for topic in topics[:10]: print(f"{topic['key_display_name']}: {topic['count']} works")

undefined

topics = client.group_by( entity_type='works', group_field='topics.id', filter_params={ "authorships.institutions.id": "I136199984", # MIT的ID "publication_year": ">2020" } )

for topic in topics[:10]: print(f"{topic['key_display_name']}: {topic['count']} 篇成果")

undefined

12. Large-Scale Data Extraction

12. 大规模数据提取

Use for: Downloading large datasets for analysis

python

undefined

适用场景：下载大型数据集用于分析

python

undefined

Paginate through all results

遍历所有结果

all_papers = client.paginate_all( endpoint='/works', params={ 'search': 'synthetic biology', 'filter': 'publication_year:2020-2024' }, max_results=10000 )

Export to CSV

导出为CSV

import csv with open('papers.csv', 'w', newline='', encoding='utf-8') as f: writer = csv.writer(f) writer.writerow(['Title', 'Year', 'Citations', 'DOI', 'OA Status'])

for paper in all_papers:
    writer.writerow([
        paper.get('title', 'N/A'),
        paper.get('publication_year', 'N/A'),
        paper.get('cited_by_count', 0),
        paper.get('doi', 'N/A'),
        paper.get('open_access', {}).get('oa_status', 'closed')
    ])

undefined

import csv with open('papers.csv', 'w', newline='', encoding='utf-8') as f: writer = csv.writer(f) writer.writerow(['标题', '年份', '引用量', 'DOI', '开放获取状态'])

for paper in all_papers:
    writer.writerow([
        paper.get('title', 'N/A'),
        paper.get('publication_year', 'N/A'),
        paper.get('cited_by_count', 0),
        paper.get('doi', 'N/A'),
        paper.get('open_access', {}).get('oa_status', 'closed')
    ])

undefined

Critical Best Practices

关键最佳实践

Always Use Email for Polite Pool

始终使用电子邮箱接入礼貌请求池

Add email to get 10x rate limit (1 req/sec → 10 req/sec):

python

client = OpenAlexClient(email="your-email@example.edu")

添加电子邮箱可获得10倍速率限制（1次请求/秒 → 10次请求/秒）：

python

client = OpenAlexClient(email="your-email@example.edu")

Use Two-Step Pattern for Entity Lookups

使用两步模式进行实体查询

Never filter by entity names directly - always get ID first:

python

undefined

切勿直接通过实体名称筛选 - 务必先获取ID：

python

undefined

✅ Correct

✅ 正确方式

1. Search for entity → get ID

1. 搜索实体 → 获取ID

2. Filter by ID

2. 通过ID筛选

❌ Wrong

❌ 错误方式

filter=author_name:Einstein # This doesn't work!

filter=author_name:Einstein # 此方式无效！

undefined

undefined

Use Maximum Page Size

使用最大分页大小

Always use

per-page=200

for efficient data retrieval:

python

results = client.search_works(search="topic", per_page=200)

始终使用

per-page=200

以高效获取数据：

python

results = client.search_works(search="topic", per_page=200)

Batch Multiple IDs

批量处理多个ID

Use batch_lookup() for multiple IDs instead of individual requests:

python

undefined

使用batch_lookup()处理多个ID，而非单独请求：

python

undefined

✅ Correct - 1 request for 50 DOIs

✅ 正确方式 - 1次请求处理50个DOI

works = client.batch_lookup('works', doi_list, 'doi')

❌ Wrong - 50 separate requests

❌ 错误方式 - 50次单独请求

for doi in doi_list: work = client.get_entity('works', doi)

undefined

for doi in doi_list: work = client.get_entity('works', doi)

undefined

Use Sample Parameter for Random Data

使用样本参数获取随机数据

Use

sample_works()

with seed for reproducible random sampling:

python

undefined

使用带seed参数的

sample_works()

获取可复现的随机样本：

python

undefined

✅ Correct

✅ 正确方式

works = client.sample_works(sample_size=100, seed=42)

❌ Wrong - random page numbers bias results

❌ 错误方式 - 随机页码会导致结果有偏差

Using random page numbers doesn't give true random sample

使用随机页码无法得到真正的随机样本

undefined

undefined

Select Only Needed Fields

仅选择所需字段

Reduce response size by selecting specific fields:

python

results = client.search_works(
    search="topic",
    select=['id', 'title', 'publication_year', 'cited_by_count']
)

通过选择特定字段减少响应数据量：

python

results = client.search_works(
    search="topic",
    select=['id', 'title', 'publication_year', 'cited_by_count']
)

Common Filter Patterns

常见筛选模式

Date Ranges

日期范围

python

undefined

python

undefined

Single year

单一年份

filter_params={"publication_year": "2023"}

After year

某年份之后

filter_params={"publication_year": ">2020"}

Range

年份范围

filter_params={"publication_year": "2020-2024"}

undefined

filter_params={"publication_year": "2020-2024"}

undefined

Multiple Filters (AND)

多条件筛选（逻辑与）

python

undefined

python

undefined

All conditions must match

所有条件必须同时满足

filter_params={ "publication_year": ">2020", "is_oa": "true", "cited_by_count": ">100" }

undefined

filter_params={ "publication_year": ">2020", "is_oa": "true", "cited_by_count": ">100" }

undefined

Multiple Values (OR)

多值筛选（逻辑或）

python

undefined

python

undefined

Any institution matches

匹配任一机构

filter_params={ "authorships.institutions.id": "I136199984|I27837315" # MIT or Harvard }

undefined

filter_params={ "authorships.institutions.id": "I136199984|I27837315" # MIT 或哈佛大学 }

undefined

Collaboration (AND within attribute)

合作筛选（属性内逻辑与）

python

undefined

python

undefined

Papers with authors from BOTH institutions

同时包含来自两个机构作者的论文

filter_params={ "authorships.institutions.id": "I136199984+I27837315" # MIT AND Harvard }

undefined

filter_params={ "authorships.institutions.id": "I136199984+I27837315" # MIT 且哈佛大学 }

undefined

Negation

否定筛选

python

undefined

python

undefined

Exclude type

排除特定类型

filter_params={ "type": "!paratext" }

undefined

filter_params={ "type": "!paratext" }

undefined

Entity Types

实体类型

OpenAlex provides these entity types:

works - Scholarly documents (articles, books, datasets)
authors - Researchers with disambiguated identities
institutions - Universities and research organizations
sources - Journals, repositories, conferences
topics - Subject classifications
publishers - Publishing organizations
funders - Funding agencies

Access any entity type using consistent patterns:

python

client.search_works(...)
client.get_entity('authors', author_id)
client.group_by('works', 'topics.id', filter_params={...})

OpenAlex提供以下实体类型：

works - 学术文献（论文、书籍、数据集）
authors - 经过身份消歧的研究者
institutions - 高校和研究机构
sources - 期刊、知识库、会议
topics - 学科分类
publishers - 出版机构
funders - 资助机构

使用统一模式访问任意实体类型：

python

client.search_works(...)
client.get_entity('authors', author_id)
client.group_by('works', 'topics.id', filter_params={...})

External IDs

外部ID

Use external identifiers directly:

python

undefined

可直接使用外部标识符：

python

undefined

DOI for works

成果的DOI

work = client.get_entity('works', 'https://doi.org/10.7717/peerj.4375')

ORCID for authors

作者的ORCID

author = client.get_entity('authors', 'https://orcid.org/0000-0003-1613-5981')

ROR for institutions

机构的ROR

institution = client.get_entity('institutions', 'https://ror.org/02y3ad647')

ISSN for sources

来源的ISSN

source = client.get_entity('sources', 'issn:0028-0836')

undefined

source = client.get_entity('sources', 'issn:0028-0836')

undefined

Reference Documentation

参考文档

Detailed API Reference

详细API参考

See

references/api_guide.md

for:

Complete filter syntax
All available endpoints
Response structures
Error handling
Performance optimization
Rate limiting details

查看

references/api_guide.md

获取：

完整筛选语法
所有可用端点
响应结构
错误处理
性能优化
速率限制详情

Common Query Examples

常见查询示例

See

references/common_queries.md

for:

Complete working examples
Real-world use cases
Complex query patterns
Data export workflows
Multi-step analysis procedures

查看

references/common_queries.md

获取：

完整可运行示例
真实场景用例
复杂查询模式
数据导出工作流
多步骤分析流程

Scripts

脚本说明

openalex_client.py

Main API client with:

Automatic rate limiting
Exponential backoff retry logic
Pagination support
Batch operations
Error handling

Use for direct API access with full control.

主API客户端，包含：

自动速率限制
指数退避重试逻辑
分页支持
批量操作
错误处理

用于需要完全控制的直接API访问场景。

query_helpers.py

High-level helper functions for common operations:

```
find_author_works()
```
- Get papers by author
```
find_institution_works()
```
- Get papers from institution
```
find_highly_cited_recent_papers()
```
- Get influential papers
```
get_open_access_papers()
```
- Find OA publications
```
get_publication_trends()
```
- Analyze trends over time
```
analyze_research_output()
```
- Comprehensive analysis

Use for common research queries with simplified interfaces.

针对常见操作的高层级辅助函数：

```
find_author_works()
```
- 获取作者的论文
```
find_institution_works()
```
- 获取机构的论文
```
find_highly_cited_recent_papers()
```
- 获取高影响力论文
```
get_open_access_papers()
```
- 查找开放获取出版物
```
get_publication_trends()
```
- 分析时间趋势
```
analyze_research_output()
```
- 综合研究产出分析

用于简化常见研究查询的场景。

Troubleshooting

故障排除

Rate Limiting

速率限制

If encountering 403 errors:

Ensure email is added to requests
Verify not exceeding 10 req/sec
Client automatically implements exponential backoff

如果遇到403错误：

确保请求中已添加电子邮箱
确认未超过10次请求/秒的限制
客户端会自动执行指数退避重试

Empty Results

无结果返回

If searches return no results:

Check filter syntax (see
```
references/api_guide.md
```
)
Use two-step pattern for entity lookups (don't filter by names)
Verify entity IDs are correct format

如果检索无结果：

检查筛选语法（参考
```
references/api_guide.md
```
）
使用两步模式进行实体查询（不要通过名称筛选）
验证实体ID格式正确

Timeout Errors

超时错误

For large queries:

Use pagination with
```
per-page=200
```
Use
```
select=
```
to limit returned fields
Break into smaller queries if needed

针对大型查询：

使用
```
per-page=200
```
进行分页
使用
```
select=
```
限制返回字段
必要时拆分为更小的查询

Rate Limits

速率限制

Default: 1 request/second, 100k requests/day
Polite pool (with email): 10 requests/second, 100k requests/day

Always use polite pool for production workflows by providing email to client.

默认限制：1次请求/秒，每日10万次请求
礼貌请求池（含电子邮箱）：10次请求/秒，每日10万次请求

生产环境工作流请始终通过向客户端提供电子邮箱接入礼貌请求池。

Notes

注意事项

No authentication required
All data is open and free
Rate limits apply globally, not per IP
Use LitLLM with OpenRouter if LLM-based analysis is needed (don't use Perplexity API directly)
Client handles pagination, retries, and rate limiting automatically

无需身份验证
所有数据均开放免费
速率限制为全局限制，而非按IP限制
如需基于大语言模型的分析，请使用LitLLM搭配OpenRouter（不要直接使用Perplexity API）
客户端会自动处理分页、重试和速率限制