openalex-paper-search
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAcademic Paper Search (OpenAlex)
学术论文搜索工具(基于OpenAlex)
Search 240M+ scholarly works using the OpenAlex API -- completely free, no API key required, no SDK needed. Just or with URL construction.
curlbashFull docs: https://docs.openalex.org
通过OpenAlex API搜索2.4亿+学术成果——完全免费,无需API密钥,无需SDK。只需通过或构造URL即可调用。
curlbashQuick Start
快速开始
OpenAlex is a REST API. You query it by constructing URLs and fetching them with . All responses are JSON.
curlbash
undefinedOpenAlex是一个REST API。你可以通过构造URL并使用发起请求来调用它。所有响应均为JSON格式。
curlbash
undefinedSearch for papers about "transformer architecture"
Search for papers about "transformer architecture"
curl -s "https://api.openalex.org/works?search=transformer+architecture&per_page=5&mailto=agent@kortix.ai" | python3 -m json.tool
**Important:** Always include `mailto=agent@kortix.ai` (or any valid email) in every request. Without it, you're limited to 1 request/second. With it, you get 10 requests/second (the "polite pool").curl -s "https://api.openalex.org/works?search=transformer+architecture&per_page=5&mailto=agent@kortix.ai" | python3 -m json.tool
**重要提示:** 请务必在每个请求中包含`mailto=agent@kortix.ai`(或任何有效邮箱)。如果不添加,你的请求速率将被限制为1次/秒;添加后,你将进入「礼貌池」,请求速率提升至10次/秒。Core Concepts
核心概念
Entities
实体类型
OpenAlex has these entity types (all queryable):
| Entity | Endpoint | Count | Description |
|---|---|---|---|
| Works | | 240M+ | Papers, articles, books, datasets, theses |
| Authors | | 90M+ | People who create works |
| Sources | | 250K+ | Journals, repositories, conferences |
| Institutions | | 110K+ | Universities, research orgs |
| Topics | | 4K+ | Research topics (hierarchical) |
OpenAlex包含以下可查询的实体类型:
| 实体类型 | 接口端点 | 数量 | 描述 |
|---|---|---|---|
| Works | | 2.4亿+ | 论文、文章、书籍、数据集、学位论文 |
| Authors | | 9000万+ | 学术成果创作者 |
| Sources | | 25万+ | 期刊、知识库、会议 |
| Institutions | | 11万+ | 大学、研究机构 |
| Topics | | 4000+ | 研究主题(层级结构) |
Work Object -- Key Fields
Work对象——关键字段
When you fetch a work, these are the most useful fields:
id OpenAlex ID (e.g., "https://openalex.org/W2741809807")
doi DOI URL
title / display_name Paper title
publication_year Year published
publication_date Full date (YYYY-MM-DD)
cited_by_count Number of incoming citations
fwci Field-Weighted Citation Impact (normalized)
type article, preprint, review, book, dataset, etc.
language ISO 639-1 code (e.g., "en")
is_retracted Boolean
open_access.is_oa Boolean -- is it freely accessible?
open_access.oa_url Direct URL to free version
authorships List of authors with names, institutions, ORCIDs
abstract_inverted_index Abstract as inverted index (needs reconstruction)
referenced_works List of OpenAlex IDs this work cites (outgoing)
related_works Algorithmically related works
cited_by_api_url API URL to get works that cite this one (incoming)
topics Assigned research topics with scores
keywords Extracted keywords with scores
primary_location Where the work is published (journal, repo)
best_oa_location Best open access location with PDF link当你获取一个Work对象时,以下是最实用的字段:
id OpenAlex ID(示例:"https://openalex.org/W2741809807")
doi DOI链接
title / display_name 论文标题
publication_year 发表年份
publication_date 完整发表日期(YYYY-MM-DD)
cited_by_count 被引用次数
fwci 领域加权引用影响力(标准化指标)
type 类型:article(期刊论文)、preprint(预印本)、review(综述)、book(书籍)、dataset(数据集)等
language ISO 639-1语言代码(示例:"en")
is_retracted 是否被撤回(布尔值)
open_access.is_oa 是否开放获取(布尔值)
open_access.oa_url 开放获取的直接链接
authorships 作者列表,包含姓名、所属机构、ORCID
abstract_inverted_index 以倒排索引形式存储的摘要(需要重构为明文)
referenced_works 该成果引用的其他OpenAlex ID列表(向外引用)
related_works 算法推荐的相关成果
cited_by_api_url 获取引用该成果的其他成果的API链接(向内引用)
topics 分配的研究主题及对应权重
keywords 提取的关键词及对应权重
primary_location 成果发表平台(期刊、知识库)
best_oa_location 最优开放获取位置,包含PDF链接Reconstructing Abstracts
重构摘要明文
OpenAlex stores abstracts as inverted indexes for legal reasons. To get plaintext, reconstruct:
python
import json, sys出于合规原因,OpenAlex以倒排索引形式存储摘要。如需获取明文摘要,可通过以下方式重构:
python
import json, sysRead the abstract_inverted_index from a work object
从Work对象中读取abstract_inverted_index
inv_idx = work["abstract_inverted_index"]
if inv_idx:
words = [""] * (max(max(positions) for positions in inv_idx.values()) + 1)
for word, positions in inv_idx.items():
for pos in positions:
words[pos] = word
abstract = " ".join(words)
Or in bash with `python3 -c`:
```bashinv_idx = work["abstract_inverted_index"]
if inv_idx:
words = [""] * (max(max(positions) for positions in inv_idx.values()) + 1)
for word, positions in inv_idx.items():
for pos in positions:
words[pos] = word
abstract = " ".join(words)
或通过bash结合`python3 -c`实现:
```bashPipe a work JSON into this to extract the abstract
将Work的JSON数据传入该命令以提取明文摘要
echo "$WORK_JSON" | python3 -c "
import json,sys
w=json.load(sys.stdin)
idx=w.get('abstract_inverted_index',{})
if idx:
words=['']*( max(max(p) for p in idx.values())+1 )
for word,positions in idx.items():
for pos in positions: words[pos]=word
print(' '.join(words))
"
undefinedecho "$WORK_JSON" | python3 -c "
import json,sys
w=json.load(sys.stdin)
idx=w.get('abstract_inverted_index',{})
if idx:
words=['']*( max(max(p) for p in idx.values())+1 )
for word,positions in idx.items():
for pos in positions: words[pos]=word
print(' '.join(words))
"
undefinedSearching for Papers
论文搜索
Basic Keyword Search
基础关键词搜索
Searches across titles, abstracts, and fulltext. Uses stemming and stop-word removal.
bash
undefined搜索范围覆盖标题、摘要及全文。支持词干提取和停用词移除。
bash
undefinedSimple search
Simple search
With per_page limit
With per_page limit
Boolean Search
布尔搜索
Use uppercase , , with parentheses and quoted phrases:
ANDORNOTbash
undefined使用大写的、、,结合括号和引号短语实现复杂搜索:
ANDORNOTbash
undefinedComplex boolean query
Complex boolean query
Exact phrase match (use double quotes, URL-encoded as %22)
Exact phrase match (use double quotes, URL-encoded as %22)
Search Specific Fields
指定字段搜索
bash
undefinedbash
undefinedTitle only
Title only
Abstract only
Abstract only
Title and abstract combined
Title and abstract combined
Fulltext search (subset of works)
Fulltext search (subset of works)
Filtering
过滤条件
Filters are the most powerful feature. Combine them with commas (AND) or pipes (OR).
过滤是OpenAlex最强大的功能之一。可通过逗号(表示AND)或竖线(表示OR)组合多个过滤条件。
Most Useful Filters
常用过滤条件
bash
undefinedbash
undefinedBy publication year
按发表年份过滤
?filter=publication_year:2024
?filter=publication_year:2020-2024
?filter=publication_year:>2022
?filter=publication_year:2024
?filter=publication_year:2020-2024
?filter=publication_year:>2022
By citation count
按被引用次数过滤
?filter=cited_by_count:>100 # highly cited
?filter=cited_by_count:>1000 # landmark papers
?filter=cited_by_count:>100 # 高被引成果
?filter=cited_by_count:>1000 # 里程碑式成果
By open access
按开放获取状态过滤
?filter=is_oa:true # only open access
?filter=oa_status:gold # gold OA only
?filter=is_oa:true # 仅开放获取成果
?filter=oa_status:gold # 仅金色开放获取成果
By type
按成果类型过滤
?filter=type:article # journal articles
?filter=type:preprint # preprints
?filter=type:review # review articles
?filter=type:article # 仅期刊论文
?filter=type:preprint # 仅预印本
?filter=type:review # 仅综述文章
By language
按语言过滤
?filter=language:en # English only
?filter=language:en # 仅英文成果
Not retracted
排除已撤回成果
?filter=is_retracted:false
?filter=is_retracted:false
Has abstract
仅包含有摘要的成果
?filter=has_abstract:true
?filter=has_abstract:true
Has downloadable PDF
仅包含可下载PDF的成果
?filter=has_content.pdf:true
?filter=has_content.pdf:true
By author (OpenAlex ID)
按作者(OpenAlex ID)过滤
?filter=author.id:A5023888391
?filter=author.id:A5023888391
By institution (OpenAlex ID)
按机构(OpenAlex ID)过滤
?filter=institutions.id:I27837315 # e.g., University of Michigan
?filter=institutions.id:I27837315 # 示例:密歇根大学
By DOI
按DOI过滤
?filter=doi:https://doi.org/10.1038/s41586-021-03819-2
?filter=doi:https://doi.org/10.1038/s41586-021-03819-2
By indexed source
按索引来源过滤
?filter=indexed_in:arxiv # arXiv papers
?filter=indexed_in:pubmed # PubMed papers
?filter=indexed_in:crossref # Crossref papers
undefined?filter=indexed_in:arxiv # 仅arXiv收录的成果
?filter=indexed_in:pubmed # 仅PubMed收录的成果
?filter=indexed_in:crossref # 仅Crossref收录的成果
undefinedCombining Filters
组合过滤条件
bash
undefinedbash
undefinedAND: comma-separated
AND关系:逗号分隔
?filter=publication_year:>2022,cited_by_count:>50,is_oa:true,type:article
?filter=publication_year:>2022,cited_by_count:>50,is_oa:true,type:article
OR: pipe-separated within a filter
OR关系:同一过滤条件内用竖线分隔
?filter=publication_year:2023|2024
?filter=publication_year:2023|2024
NOT: prefix with !
NOT关系:前缀加!
?filter=type:!preprint
?filter=type:!preprint
Combined example: highly-cited OA articles from 2023-2024, not preprints
组合示例:2023-2024年发表的高被引开放获取期刊论文,排除预印本
undefinedSorting
排序
bash
undefinedbash
undefinedMost cited first
按被引用次数降序排列
?sort=cited_by_count:desc
?sort=cited_by_count:desc
Most recent first
按发表日期降序排列(最新优先)
?sort=publication_date:desc
?sort=publication_date:desc
Most relevant first (only when using search)
按相关性得分降序排列(仅在使用search参数时生效)
?sort=relevance_score:desc
?sort=relevance_score:desc
Multiple sort keys
多字段排序
?sort=publication_year:desc,cited_by_count:desc
undefined?sort=publication_year:desc,cited_by_count:desc
undefinedPagination
分页
Two modes: basic paging (for browsing) and cursor paging (for collecting all results).
bash
undefined支持两种分页模式:基础分页(用于浏览)和游标分页(用于批量获取所有结果)。
bash
undefinedBasic paging (limited to 10,000 results)
基础分页(最多支持10000条结果)
?page=1&per_page=25
?page=2&per_page=25
?page=1&per_page=25
?page=2&per_page=25
Cursor paging (unlimited, for collecting everything)
游标分页(无结果数量限制,用于批量采集)
?per_page=100&cursor=* # first page
?per_page=100&cursor=IlsxNjk0ODc... # next page (cursor from previous response meta)
The cursor for the next page is in `response.meta.next_cursor`. When it's `null`, you've reached the end.?per_page=100&cursor=* # 第一页
?per_page=100&cursor=IlsxNjk0ODc... # 下一页(游标来自上一次响应的meta字段)
下一页的游标可在`response.meta.next_cursor`中获取。当游标为`null`时,表示已获取全部结果。Select Fields
字段选择
Reduce response size by selecting only the fields you need:
bash
undefined通过指定所需字段,可减小响应数据体积,提升请求速度:
bash
undefinedOnly get IDs, titles, citation counts, and DOIs
仅获取ID、标题、被引用次数、DOI及发表年份
?select=id,display_name,cited_by_count,doi,publication_year
?select=id,display_name,cited_by_count,doi,publication_year
Minimal metadata for scanning
仅获取用于快速浏览的核心元数据
?select=id,display_name,publication_year,cited_by_count,open_access
undefined?select=id,display_name,publication_year,cited_by_count,open_access
undefinedCitation Graph Traversal
引用图谱遍历
Find what a paper cites (outgoing references)
查找某篇论文引用的成果(向外引用)
bash
undefinedbash
undefinedGet works cited BY a specific paper
获取某篇特定论文引用的所有成果
Find what cites a paper (incoming citations)
查找引用某篇论文的成果(向内引用)
bash
undefinedbash
undefinedGet works that CITE a specific paper
获取引用某篇特定论文的所有成果
undefinedFind related works
查找相关成果
bash
undefinedbash
undefinedGet related works (algorithmic, based on shared concepts)
获取算法推荐的相关成果(基于共享研究概念)
Citation chain: follow the references
引用链追踪:跟随引用关系拓展
- Get a seminal paper by DOI
- Find its (what it cites)
referenced_works - Find who cites it ()
filter=cites:WORK_ID - For the most cited citers, repeat
This is how you build a literature graph around a topic.
- 通过DOI获取一篇核心论文
- 查看其字段(该论文引用的成果)
referenced_works - 查找引用该论文的成果(使用)
filter=cites:WORK_ID - 对高被引的引用者重复上述步骤
通过这种方式,你可以围绕某一主题构建完整的文献图谱。
Author Lookup
作者查询
bash
undefinedbash
undefinedSearch for an author
搜索作者
Get an author's works (by OpenAlex author ID)
获取指定作者的所有成果(通过OpenAlex作者ID)
Get an author by ORCID
通过ORCID查询作者
undefinedLookup by External ID
通过外部ID查询
bash
undefinedbash
undefinedBy DOI
通过DOI查询
By PubMed ID
通过PubMed ID查询
By arXiv ID (via DOI)
通过arXiv ID查询(需转换为DOI格式)
Batch lookup: up to 50 IDs at once
批量DOI查询:最多支持50个ID,用竖线分隔
undefinedOpen Access & PDF Access
开放获取与PDF访问
bash
undefinedbash
undefinedFind OA papers with direct PDF links
查找可直接获取PDF的开放获取成果
The `best_oa_location.pdf_url` field gives a direct PDF link when available. The `open_access.oa_url` gives the best available OA landing page or PDF.
当存在可用PDF时,`best_oa_location.pdf_url`字段会提供直接下载链接。`open_access.oa_url`字段则会提供最优的开放获取着陆页或PDF链接。Practical Workflows
实用工作流
Literature Survey on a Topic
主题文献调研
bash
undefinedbash
undefined1. Find the most-cited papers on a topic
1. 查找某主题的高被引论文
2. For the top papers, explore their citation graphs
2. 针对核心论文,探索其引用图谱
3. Find recent papers building on this work
3. 查找基于该核心论文的最新研究成果
undefinedFind Landmark/Seminal Papers
查找里程碑/开创性成果
bash
undefinedbash
undefinedHighly cited + search term
高被引+关键词搜索
Find Recent Preprints
查找最新预印本
bash
undefinedbash
undefinedLatest preprints on a topic
某主题的最新预印本
undefinedFind Review Articles
查找综述文章
bash
undefinedbash
undefinedReview/survey papers on a topic
某主题的综述/调研论文
undefinedAuthor Analysis
作者分析
bash
undefinedbash
undefined1. Find the author
1. 查找目标作者
2. Get their most influential papers
2. 获取该作者最具影响力的成果
3. Get their recent work
3. 获取该作者的最新研究成果
undefinedSaving Results to Disk
将结果保存到本地
When doing deep research, save paper data to disk for later processing:
bash
undefined在深度研究中,建议将论文数据保存到本地以便后续处理:
bash
undefinedSave search results as JSON
将搜索结果保存为JSON文件
curl -s "https://api.openalex.org/works?search=topic&per_page=50&mailto=agent@kortix.ai" > research/papers/topic-search.json
curl -s "https://api.openalex.org/works?search=topic&per_page=50&mailto=agent@kortix.ai" > research/papers/topic-search.json
Extract and save a clean summary
提取并保存简洁的成果摘要
curl -s "https://api.openalex.org/works?search=topic&per_page=50&select=id,display_name,publication_year,cited_by_count,doi,authorships&mailto=agent@kortix.ai" | python3 -c "
import json, sys
data = json.load(sys.stdin)
for w in data.get('results', []):
authors = ', '.join(a['author']['display_name'] for a in w.get('authorships', [])[:3])
if len(w.get('authorships', [])) > 3: authors += ' et al.'
print(f"[{w.get('cited_by_count',0)} cites] {w['display_name']} ({w.get('publication_year','?')}) - {authors}")
if w.get('doi'): print(f" DOI: {w['doi']}")
print()
" > research/papers/topic-summary.txt
For deep research, save individual paper metadata to your `sources-index.md` and raw data to `sources/`:
```bashcurl -s "https://api.openalex.org/works?search=topic&per_page=50&select=id,display_name,publication_year,cited_by_count,doi,authorships&mailto=agent@kortix.ai" | python3 -c "
import json, sys
data = json.load(sys.stdin)
for w in data.get('results', []):
authors = ', '.join(a['author']['display_name'] for a in w.get('authorships', [])[:3])
if len(w.get('authorships', [])) > 3: authors += ' et al.'
print(f"[{w.get('cited_by_count',0)} cites] {w['display_name']} ({w.get('publication_year','?')}) - {authors}")
if w.get('doi'): print(f" DOI: {w['doi']}")
print()
" > research/papers/topic-summary.txt
在深度研究中,建议将单篇论文的元数据保存到`sources-index.md`,原始数据保存到`sources/`目录:
```bashSave a paper's full metadata
保存单篇论文的完整元数据
curl -s "https://api.openalex.org/works/W2741809807?mailto=agent@kortix.ai" > research/sources/001-paper-title.json
undefinedcurl -s "https://api.openalex.org/works/W2741809807?mailto=agent@kortix.ai" > research/sources/001-paper-title.json
undefinedRate Limits
请求速率限制
| Pool | Rate | How to get it |
|---|---|---|
| Common | 1 req/sec | No email provided |
| Polite | 10 req/sec | Add |
| Premium | Higher | Paid API key via |
Always use the polite pool. Add to every request.
&mailto=agent@kortix.ai| 资源池 | 速率限制 | 获取方式 |
|---|---|---|
| 普通池 | 1次/秒 | 未提供邮箱 |
| 礼貌池 | 10次/秒 | 在请求中添加 |
| 高级池 | 更高速率 | 通过 |
请始终使用礼貌池。在每个请求中添加。
&mailto=agent@kortix.aiTips
使用技巧
- Use aggressively to reduce response size and speed up requests
select - Use (max) when collecting lots of results to minimize request count
per_page=100 - Use cursor paging () when you need more than 10,000 results
cursor=* - Batch DOI lookups with OR syntax: (up to 50)
filter=doi:DOI1|DOI2|DOI3 - Reconstruct abstracts using the inverted index -- don't skip this, abstracts are gold
- Follow citation chains to find seminal works and recent developments
- Filter by when you need abstracts (not all works have them)
has_abstract:true - Filter by or
indexed_in:arxivto target specific repositoriesindexed_in:pubmed - Sort by to find the most influential papers first
cited_by_count:desc - Combine search + filters for precise results: search gives relevance, filters give precision
- 尽可能使用参数:减小响应数据体积,提升请求速度
select - 批量获取时使用(最大值):减少请求次数
per_page=100 - 当需要获取10000条以上结果时,使用游标分页()
cursor=* - 通过OR语法批量查询DOI:(最多支持50个)
filter=doi:DOI1|DOI2|DOI3 - 通过倒排索引重构明文摘要:摘要包含关键信息,请勿跳过此步骤
- 跟随引用链:查找开创性成果及最新研究进展
- 当需要摘要时,添加:并非所有成果都包含摘要
filter=has_abstract:true - 通过或
filter=indexed_in:arxiv:精准定位特定知识库的成果indexed_in:pubmed - 按排序:优先获取最具影响力的成果
cited_by_count:desc - 结合search和filter参数:search保证相关性,filter保证精准度