openalex-paper-search

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Academic Paper Search (OpenAlex)

学术论文搜索工具(基于OpenAlex)

Search 240M+ scholarly works using the OpenAlex API -- completely free, no API key required, no SDK needed. Just
curl
or
bash
with URL construction.
通过OpenAlex API搜索2.4亿+学术成果——完全免费,无需API密钥,无需SDK。只需通过
curl
bash
构造URL即可调用。

Quick Start

快速开始

OpenAlex is a REST API. You query it by constructing URLs and fetching them with
curl
. All responses are JSON.
bash
undefined
OpenAlex是一个REST API。你可以通过构造URL并使用
curl
发起请求来调用它。所有响应均为JSON格式。
bash
undefined

Search for papers about "transformer architecture"

Search for papers about "transformer architecture"


**Important:** Always include `mailto=agent@kortix.ai` (or any valid email) in every request. Without it, you're limited to 1 request/second. With it, you get 10 requests/second (the "polite pool").

**重要提示:** 请务必在每个请求中包含`mailto=agent@kortix.ai`(或任何有效邮箱)。如果不添加,你的请求速率将被限制为1次/秒;添加后,你将进入「礼貌池」,请求速率提升至10次/秒。

Core Concepts

核心概念

Entities

实体类型

OpenAlex has these entity types (all queryable):
EntityEndpointCountDescription
Works
/works
240M+Papers, articles, books, datasets, theses
Authors
/authors
90M+People who create works
Sources
/sources
250K+Journals, repositories, conferences
Institutions
/institutions
110K+Universities, research orgs
Topics
/topics
4K+Research topics (hierarchical)
OpenAlex包含以下可查询的实体类型:
实体类型接口端点数量描述
Works
/works
2.4亿+论文、文章、书籍、数据集、学位论文
Authors
/authors
9000万+学术成果创作者
Sources
/sources
25万+期刊、知识库、会议
Institutions
/institutions
11万+大学、研究机构
Topics
/topics
4000+研究主题(层级结构)

Work Object -- Key Fields

Work对象——关键字段

When you fetch a work, these are the most useful fields:
id                        OpenAlex ID (e.g., "https://openalex.org/W2741809807")
doi                       DOI URL
title / display_name      Paper title
publication_year          Year published
publication_date          Full date (YYYY-MM-DD)
cited_by_count            Number of incoming citations
fwci                      Field-Weighted Citation Impact (normalized)
type                      article, preprint, review, book, dataset, etc.
language                  ISO 639-1 code (e.g., "en")
is_retracted              Boolean
open_access.is_oa         Boolean -- is it freely accessible?
open_access.oa_url        Direct URL to free version
authorships               List of authors with names, institutions, ORCIDs
abstract_inverted_index   Abstract as inverted index (needs reconstruction)
referenced_works          List of OpenAlex IDs this work cites (outgoing)
related_works             Algorithmically related works
cited_by_api_url          API URL to get works that cite this one (incoming)
topics                    Assigned research topics with scores
keywords                  Extracted keywords with scores
primary_location          Where the work is published (journal, repo)
best_oa_location          Best open access location with PDF link
当你获取一个Work对象时,以下是最实用的字段:
id                        OpenAlex ID(示例:"https://openalex.org/W2741809807")
doi                       DOI链接
title / display_name      论文标题
publication_year          发表年份
publication_date          完整发表日期(YYYY-MM-DD)
cited_by_count            被引用次数
fwci                      领域加权引用影响力(标准化指标)
type                      类型:article(期刊论文)、preprint(预印本)、review(综述)、book(书籍)、dataset(数据集)等
language                  ISO 639-1语言代码(示例:"en")
is_retracted              是否被撤回(布尔值)
open_access.is_oa         是否开放获取(布尔值)
open_access.oa_url        开放获取的直接链接
authorships               作者列表,包含姓名、所属机构、ORCID
abstract_inverted_index   以倒排索引形式存储的摘要(需要重构为明文)
referenced_works          该成果引用的其他OpenAlex ID列表(向外引用)
related_works             算法推荐的相关成果
cited_by_api_url          获取引用该成果的其他成果的API链接(向内引用)
topics                    分配的研究主题及对应权重
keywords                  提取的关键词及对应权重
primary_location          成果发表平台(期刊、知识库)
best_oa_location          最优开放获取位置,包含PDF链接

Reconstructing Abstracts

重构摘要明文

OpenAlex stores abstracts as inverted indexes for legal reasons. To get plaintext, reconstruct:
python
import json, sys
出于合规原因,OpenAlex以倒排索引形式存储摘要。如需获取明文摘要,可通过以下方式重构:
python
import json, sys

Read the abstract_inverted_index from a work object

从Work对象中读取abstract_inverted_index

inv_idx = work["abstract_inverted_index"] if inv_idx: words = [""] * (max(max(positions) for positions in inv_idx.values()) + 1) for word, positions in inv_idx.items(): for pos in positions: words[pos] = word abstract = " ".join(words)

Or in bash with `python3 -c`:
```bash
inv_idx = work["abstract_inverted_index"] if inv_idx: words = [""] * (max(max(positions) for positions in inv_idx.values()) + 1) for word, positions in inv_idx.items(): for pos in positions: words[pos] = word abstract = " ".join(words)

或通过bash结合`python3 -c`实现:
```bash

Pipe a work JSON into this to extract the abstract

将Work的JSON数据传入该命令以提取明文摘要

echo "$WORK_JSON" | python3 -c " import json,sys w=json.load(sys.stdin) idx=w.get('abstract_inverted_index',{}) if idx: words=['']*( max(max(p) for p in idx.values())+1 ) for word,positions in idx.items(): for pos in positions: words[pos]=word print(' '.join(words)) "
undefined
echo "$WORK_JSON" | python3 -c " import json,sys w=json.load(sys.stdin) idx=w.get('abstract_inverted_index',{}) if idx: words=['']*( max(max(p) for p in idx.values())+1 ) for word,positions in idx.items(): for pos in positions: words[pos]=word print(' '.join(words)) "
undefined

Searching for Papers

论文搜索

Basic Keyword Search

基础关键词搜索

Searches across titles, abstracts, and fulltext. Uses stemming and stop-word removal.
bash
undefined
搜索范围覆盖标题、摘要及全文。支持词干提取和停用词移除。
bash
undefined

Simple search

Simple search

With per_page limit

With per_page limit

Boolean Search

布尔搜索

Use uppercase
AND
,
OR
,
NOT
with parentheses and quoted phrases:
bash
undefined
使用大写的
AND
OR
NOT
,结合括号和引号短语实现复杂搜索:
bash
undefined

Complex boolean query

Complex boolean query

Exact phrase match (use double quotes, URL-encoded as %22)

Exact phrase match (use double quotes, URL-encoded as %22)

Search Specific Fields

指定字段搜索

bash
undefined
bash
undefined

Title only

Title only

Abstract only

Abstract only

Title and abstract combined

Title and abstract combined

Fulltext search (subset of works)

Fulltext search (subset of works)

Filtering

过滤条件

Filters are the most powerful feature. Combine them with commas (AND) or pipes (OR).
过滤是OpenAlex最强大的功能之一。可通过逗号(表示AND)或竖线(表示OR)组合多个过滤条件。

Most Useful Filters

常用过滤条件

bash
undefined
bash
undefined

By publication year

按发表年份过滤

?filter=publication_year:2024 ?filter=publication_year:2020-2024 ?filter=publication_year:>2022
?filter=publication_year:2024 ?filter=publication_year:2020-2024 ?filter=publication_year:>2022

By citation count

按被引用次数过滤

?filter=cited_by_count:>100 # highly cited ?filter=cited_by_count:>1000 # landmark papers
?filter=cited_by_count:>100 # 高被引成果 ?filter=cited_by_count:>1000 # 里程碑式成果

By open access

按开放获取状态过滤

?filter=is_oa:true # only open access ?filter=oa_status:gold # gold OA only
?filter=is_oa:true # 仅开放获取成果 ?filter=oa_status:gold # 仅金色开放获取成果

By type

按成果类型过滤

?filter=type:article # journal articles ?filter=type:preprint # preprints ?filter=type:review # review articles
?filter=type:article # 仅期刊论文 ?filter=type:preprint # 仅预印本 ?filter=type:review # 仅综述文章

By language

按语言过滤

?filter=language:en # English only
?filter=language:en # 仅英文成果

Not retracted

排除已撤回成果

?filter=is_retracted:false
?filter=is_retracted:false

Has abstract

仅包含有摘要的成果

?filter=has_abstract:true
?filter=has_abstract:true

Has downloadable PDF

仅包含可下载PDF的成果

?filter=has_content.pdf:true
?filter=has_content.pdf:true

By author (OpenAlex ID)

按作者(OpenAlex ID)过滤

?filter=author.id:A5023888391
?filter=author.id:A5023888391

By institution (OpenAlex ID)

按机构(OpenAlex ID)过滤

?filter=institutions.id:I27837315 # e.g., University of Michigan
?filter=institutions.id:I27837315 # 示例:密歇根大学

By DOI

按DOI过滤

By indexed source

按索引来源过滤

?filter=indexed_in:arxiv # arXiv papers ?filter=indexed_in:pubmed # PubMed papers ?filter=indexed_in:crossref # Crossref papers
undefined
?filter=indexed_in:arxiv # 仅arXiv收录的成果 ?filter=indexed_in:pubmed # 仅PubMed收录的成果 ?filter=indexed_in:crossref # 仅Crossref收录的成果
undefined

Combining Filters

组合过滤条件

bash
undefined
bash
undefined

AND: comma-separated

AND关系:逗号分隔

?filter=publication_year:>2022,cited_by_count:>50,is_oa:true,type:article
?filter=publication_year:>2022,cited_by_count:>50,is_oa:true,type:article

OR: pipe-separated within a filter

OR关系:同一过滤条件内用竖线分隔

?filter=publication_year:2023|2024
?filter=publication_year:2023|2024

NOT: prefix with !

NOT关系:前缀加!

?filter=type:!preprint
?filter=type:!preprint

Combined example: highly-cited OA articles from 2023-2024, not preprints

组合示例:2023-2024年发表的高被引开放获取期刊论文,排除预印本

Sorting

排序

bash
undefined
bash
undefined

Most cited first

按被引用次数降序排列

?sort=cited_by_count:desc
?sort=cited_by_count:desc

Most recent first

按发表日期降序排列(最新优先)

?sort=publication_date:desc
?sort=publication_date:desc

Most relevant first (only when using search)

按相关性得分降序排列(仅在使用search参数时生效)

?sort=relevance_score:desc
?sort=relevance_score:desc

Multiple sort keys

多字段排序

?sort=publication_year:desc,cited_by_count:desc
undefined
?sort=publication_year:desc,cited_by_count:desc
undefined

Pagination

分页

Two modes: basic paging (for browsing) and cursor paging (for collecting all results).
bash
undefined
支持两种分页模式:基础分页(用于浏览)和游标分页(用于批量获取所有结果)。
bash
undefined

Basic paging (limited to 10,000 results)

基础分页(最多支持10000条结果)

?page=1&per_page=25 ?page=2&per_page=25
?page=1&per_page=25 ?page=2&per_page=25

Cursor paging (unlimited, for collecting everything)

游标分页(无结果数量限制,用于批量采集)

?per_page=100&cursor=* # first page ?per_page=100&cursor=IlsxNjk0ODc... # next page (cursor from previous response meta)

The cursor for the next page is in `response.meta.next_cursor`. When it's `null`, you've reached the end.
?per_page=100&cursor=* # 第一页 ?per_page=100&cursor=IlsxNjk0ODc... # 下一页(游标来自上一次响应的meta字段)

下一页的游标可在`response.meta.next_cursor`中获取。当游标为`null`时,表示已获取全部结果。

Select Fields

字段选择

Reduce response size by selecting only the fields you need:
bash
undefined
通过指定所需字段,可减小响应数据体积,提升请求速度:
bash
undefined

Only get IDs, titles, citation counts, and DOIs

仅获取ID、标题、被引用次数、DOI及发表年份

?select=id,display_name,cited_by_count,doi,publication_year
?select=id,display_name,cited_by_count,doi,publication_year

Minimal metadata for scanning

仅获取用于快速浏览的核心元数据

?select=id,display_name,publication_year,cited_by_count,open_access
undefined
?select=id,display_name,publication_year,cited_by_count,open_access
undefined

Citation Graph Traversal

引用图谱遍历

Find what a paper cites (outgoing references)

查找某篇论文引用的成果(向外引用)

bash
undefined
bash
undefined

Get works cited BY a specific paper

获取某篇特定论文引用的所有成果

Find what cites a paper (incoming citations)

查找引用某篇论文的成果(向内引用)

bash
undefined
bash
undefined

Get works that CITE a specific paper

获取引用某篇特定论文的所有成果

Find related works

查找相关成果

bash
undefined
bash
undefined

Get related works (algorithmic, based on shared concepts)

获取算法推荐的相关成果(基于共享研究概念)

Citation chain: follow the references

引用链追踪:跟随引用关系拓展

  1. Get a seminal paper by DOI
  2. Find its
    referenced_works
    (what it cites)
  3. Find who cites it (
    filter=cites:WORK_ID
    )
  4. For the most cited citers, repeat
This is how you build a literature graph around a topic.
  1. 通过DOI获取一篇核心论文
  2. 查看其
    referenced_works
    字段(该论文引用的成果)
  3. 查找引用该论文的成果(使用
    filter=cites:WORK_ID
  4. 对高被引的引用者重复上述步骤
通过这种方式,你可以围绕某一主题构建完整的文献图谱。

Author Lookup

作者查询

bash
undefined
bash
undefined

Search for an author

搜索作者

Get an author's works (by OpenAlex author ID)

获取指定作者的所有成果(通过OpenAlex作者ID)

Get an author by ORCID

通过ORCID查询作者

Lookup by External ID

通过外部ID查询

bash
undefined
bash
undefined

By DOI

通过DOI查询

By PubMed ID

通过PubMed ID查询

By arXiv ID (via DOI)

通过arXiv ID查询(需转换为DOI格式)

Batch lookup: up to 50 IDs at once

批量DOI查询:最多支持50个ID,用竖线分隔

Open Access & PDF Access

开放获取与PDF访问

bash
undefined
bash
undefined

Find OA papers with direct PDF links

查找可直接获取PDF的开放获取成果


The `best_oa_location.pdf_url` field gives a direct PDF link when available. The `open_access.oa_url` gives the best available OA landing page or PDF.

当存在可用PDF时,`best_oa_location.pdf_url`字段会提供直接下载链接。`open_access.oa_url`字段则会提供最优的开放获取着陆页或PDF链接。

Practical Workflows

实用工作流

Literature Survey on a Topic

主题文献调研

bash
undefined
bash
undefined

1. Find the most-cited papers on a topic

1. 查找某主题的高被引论文

2. For the top papers, explore their citation graphs

2. 针对核心论文,探索其引用图谱

3. Find recent papers building on this work

3. 查找基于该核心论文的最新研究成果

Find Landmark/Seminal Papers

查找里程碑/开创性成果

bash
undefined
bash
undefined

Highly cited + search term

高被引+关键词搜索

Find Recent Preprints

查找最新预印本

bash
undefined
bash
undefined

Latest preprints on a topic

某主题的最新预印本

Find Review Articles

查找综述文章

bash
undefined
bash
undefined

Review/survey papers on a topic

某主题的综述/调研论文

Author Analysis

作者分析

bash
undefined
bash
undefined

1. Find the author

1. 查找目标作者

2. Get their most influential papers

2. 获取该作者最具影响力的成果

3. Get their recent work

3. 获取该作者的最新研究成果

Saving Results to Disk

将结果保存到本地

When doing deep research, save paper data to disk for later processing:
bash
undefined
在深度研究中,建议将论文数据保存到本地以便后续处理:
bash
undefined

Save search results as JSON

将搜索结果保存为JSON文件

Extract and save a clean summary

提取并保存简洁的成果摘要

curl -s "https://api.openalex.org/works?search=topic&per_page=50&select=id,display_name,publication_year,cited_by_count,doi,authorships&mailto=agent@kortix.ai" | python3 -c " import json, sys data = json.load(sys.stdin) for w in data.get('results', []): authors = ', '.join(a['author']['display_name'] for a in w.get('authorships', [])[:3]) if len(w.get('authorships', [])) > 3: authors += ' et al.' print(f"[{w.get('cited_by_count',0)} cites] {w['display_name']} ({w.get('publication_year','?')}) - {authors}") if w.get('doi'): print(f" DOI: {w['doi']}") print() " > research/papers/topic-summary.txt

For deep research, save individual paper metadata to your `sources-index.md` and raw data to `sources/`:

```bash
curl -s "https://api.openalex.org/works?search=topic&per_page=50&select=id,display_name,publication_year,cited_by_count,doi,authorships&mailto=agent@kortix.ai" | python3 -c " import json, sys data = json.load(sys.stdin) for w in data.get('results', []): authors = ', '.join(a['author']['display_name'] for a in w.get('authorships', [])[:3]) if len(w.get('authorships', [])) > 3: authors += ' et al.' print(f"[{w.get('cited_by_count',0)} cites] {w['display_name']} ({w.get('publication_year','?')}) - {authors}") if w.get('doi'): print(f" DOI: {w['doi']}") print() " > research/papers/topic-summary.txt

在深度研究中,建议将单篇论文的元数据保存到`sources-index.md`,原始数据保存到`sources/`目录:

```bash

Save a paper's full metadata

保存单篇论文的完整元数据

curl -s "https://api.openalex.org/works/W2741809807?mailto=agent@kortix.ai" > research/sources/001-paper-title.json
undefined
curl -s "https://api.openalex.org/works/W2741809807?mailto=agent@kortix.ai" > research/sources/001-paper-title.json
undefined

Rate Limits

请求速率限制

PoolRateHow to get it
Common1 req/secNo email provided
Polite10 req/secAdd
mailto=your@email.com
to requests
PremiumHigherPaid API key via
api_key
param
Always use the polite pool. Add
&mailto=agent@kortix.ai
to every request.
资源池速率限制获取方式
普通池1次/秒未提供邮箱
礼貌池10次/秒在请求中添加
mailto=your@email.com
高级池更高速率通过
api_key
参数使用付费API密钥
请始终使用礼貌池。在每个请求中添加
&mailto=agent@kortix.ai

Tips

使用技巧

  • Use
    select
    aggressively
    to reduce response size and speed up requests
  • Use
    per_page=100
    (max) when collecting lots of results to minimize request count
  • Use cursor paging (
    cursor=*
    ) when you need more than 10,000 results
  • Batch DOI lookups with OR syntax:
    filter=doi:DOI1|DOI2|DOI3
    (up to 50)
  • Reconstruct abstracts using the inverted index -- don't skip this, abstracts are gold
  • Follow citation chains to find seminal works and recent developments
  • Filter by
    has_abstract:true
    when you need abstracts (not all works have them)
  • Filter by
    indexed_in:arxiv
    or
    indexed_in:pubmed
    to target specific repositories
  • Sort by
    cited_by_count:desc
    to find the most influential papers first
  • Combine search + filters for precise results: search gives relevance, filters give precision
  • 尽可能使用
    select
    参数
    :减小响应数据体积,提升请求速度
  • 批量获取时使用
    per_page=100
    (最大值):减少请求次数
  • 当需要获取10000条以上结果时,使用游标分页
    cursor=*
  • 通过OR语法批量查询DOI
    filter=doi:DOI1|DOI2|DOI3
    (最多支持50个)
  • 通过倒排索引重构明文摘要:摘要包含关键信息,请勿跳过此步骤
  • 跟随引用链:查找开创性成果及最新研究进展
  • 当需要摘要时,添加
    filter=has_abstract:true
    :并非所有成果都包含摘要
  • 通过
    filter=indexed_in:arxiv
    indexed_in:pubmed
    :精准定位特定知识库的成果
  • cited_by_count:desc
    排序
    :优先获取最具影响力的成果
  • 结合search和filter参数:search保证相关性,filter保证精准度