arxiv
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesearXiv Research
arXiv 学术研究
Search and retrieve academic papers from arXiv via their free REST API. No API key, no dependencies — just curl.
通过arXiv的免费REST API搜索和获取学术论文。无需API密钥,无需依赖库——只需使用curl即可。
Quick Reference
快速参考
| Action | Command |
|---|---|
| Search papers | |
| Get specific paper | |
| Read abstract (web) | |
| Read full paper (PDF) | |
| 操作 | 命令 |
|---|---|
| 搜索论文 | |
| 获取特定论文 | |
| 阅读摘要(网页) | |
| 阅读完整论文(PDF) | |
Searching Papers
搜索论文
The API returns Atom XML. Parse with / or pipe through for clean output.
grepsedpython3API返回Atom XML格式数据。可以使用/解析,或者通过管道处理以获得清晰输出。
grepsedpython3Basic search
基础搜索
bash
curl -s "https://export.arxiv.org/api/query?search_query=all:GRPO+reinforcement+learning&max_results=5"bash
curl -s "https://export.arxiv.org/api/query?search_query=all:GRPO+reinforcement+learning&max_results=5"Clean output (parse XML to readable format)
清晰输出(将XML解析为可读格式)
bash
curl -s "https://export.arxiv.org/api/query?search_query=all:GRPO+reinforcement+learning&max_results=5&sortBy=submittedDate&sortOrder=descending" | python3 -c "
import sys, xml.etree.ElementTree as ET
ns = {'a': 'http://www.w3.org/2005/Atom'}
root = ET.parse(sys.stdin).getroot()
for i, entry in enumerate(root.findall('a:entry', ns)):
title = entry.find('a:title', ns).text.strip().replace('\n', ' ')
arxiv_id = entry.find('a:id', ns).text.strip().split('/abs/')[-1]
published = entry.find('a:published', ns).text[:10]
authors = ', '.join(a.find('a:name', ns).text for a in entry.findall('a:author', ns))
summary = entry.find('a:summary', ns).text.strip()[:200]
cats = ', '.join(c.get('term') for c in entry.findall('a:category', ns))
print(f'{i+1}. [{arxiv_id}] {title}')
print(f' Authors: {authors}')
print(f' Published: {published} | Categories: {cats}')
print(f' Abstract: {summary}...')
print(f' PDF: https://arxiv.org/pdf/{arxiv_id}')
print()
"bash
curl -s "https://export.arxiv.org/api/query?search_query=all:GRPO+reinforcement+learning&max_results=5&sortBy=submittedDate&sortOrder=descending" | python3 -c "
import sys, xml.etree.ElementTree as ET
ns = {'a': 'http://www.w3.org/2005/Atom'}
root = ET.parse(sys.stdin).getroot()
for i, entry in enumerate(root.findall('a:entry', ns)):
title = entry.find('a:title', ns).text.strip().replace('\n', ' ')
arxiv_id = entry.find('a:id', ns).text.strip().split('/abs/')[-1]
published = entry.find('a:published', ns).text[:10]
authors = ', '.join(a.find('a:name', ns).text for a in entry.findall('a:author', ns))
summary = entry.find('a:summary', ns).text.strip()[:200]
cats = ', '.join(c.get('term') for c in entry.findall('a:category', ns))
print(f'{i+1}. [{arxiv_id}] {title}')
print(f' Authors: {authors}')
print(f' Published: {published} | Categories: {cats}')
print(f' Abstract: {summary}...')
print(f' PDF: https://arxiv.org/pdf/{arxiv_id}')
print()
"Search Query Syntax
搜索查询语法
| Prefix | Searches | Example |
|---|---|---|
| All fields | |
| Title | |
| Author | |
| Abstract | |
| Category | |
| Comment | |
| 前缀 | 搜索范围 | 示例 |
|---|---|---|
| 所有字段 | |
| 标题 | |
| 作者 | |
| 摘要 | |
| 分类 | |
| 评论 | |
Boolean operators
布尔运算符
undefinedundefinedAND (default when using +)
AND(使用+时默认逻辑)
search_query=all:transformer+attention
search_query=all:transformer+attention
OR
OR
search_query=all:GPT+OR+all:BERT
search_query=all:GPT+OR+all:BERT
AND NOT
AND NOT
search_query=all:language+model+ANDNOT+all:vision
search_query=all:language+model+ANDNOT+all:vision
Exact phrase
精确短语
search_query=ti:"chain+of+thought"
search_query=ti:"chain+of+thought"
Combined
组合查询
search_query=au:hinton+AND+cat:cs.LG
undefinedsearch_query=au:hinton+AND+cat:cs.LG
undefinedSort and Pagination
排序与分页
| Parameter | Options |
|---|---|
| |
| |
| Result offset (0-based) |
| Number of results (default 10, max 30000) |
bash
undefined| 参数 | 选项 |
|---|---|
| |
| |
| 结果偏移量(从0开始) |
| 结果数量(默认10,最大30000) |
bash
undefinedLatest 10 papers in cs.AI
cs.AI分类下最新的10篇论文
undefinedFetching Specific Papers
获取特定论文
bash
undefinedbash
undefinedBy arXiv ID
通过arXiv ID
Multiple papers
多篇论文
undefinedundefinedBibTeX Generation
BibTeX 生成
After fetching metadata for a paper, generate a BibTeX entry:
{% raw %}
bash
curl -s "https://export.arxiv.org/api/query?id_list=1706.03762" | python3 -c "
import sys, xml.etree.ElementTree as ET
ns = {'a': 'http://www.w3.org/2005/Atom', 'arxiv': 'http://arxiv.org/schemas/atom'}
root = ET.parse(sys.stdin).getroot()
entry = root.find('a:entry', ns)
if entry is None: sys.exit('Paper not found')
title = entry.find('a:title', ns).text.strip().replace('\n', ' ')
authors = ' and '.join(a.find('a:name', ns).text for a in entry.findall('a:author', ns))
year = entry.find('a:published', ns).text[:4]
raw_id = entry.find('a:id', ns).text.strip().split('/abs/')[-1]
cat = entry.find('arxiv:primary_category', ns)
primary = cat.get('term') if cat is not None else 'cs.LG'
last_name = entry.find('a:author', ns).find('a:name', ns).text.split()[-1]
print(f'@article{{{last_name}{year}_{raw_id.replace(\".\", \"\")},')
print(f' title = {{{title}}},')
print(f' author = {{{authors}}},')
print(f' year = {{{year}}},')
print(f' eprint = {{{raw_id}}},')
print(f' archivePrefix = {{arXiv}},')
print(f' primaryClass = {{{primary}}},')
print(f' url = {{https://arxiv.org/abs/{raw_id}}}')
print('}')
"{% endraw %}
获取论文元数据后,生成BibTeX条目:
{% raw %}
bash
curl -s "https://export.arxiv.org/api/query?id_list=1706.03762" | python3 -c "
import sys, xml.etree.ElementTree as ET
ns = {'a': 'http://www.w3.org/2005/Atom', 'arxiv': 'http://arxiv.org/schemas/atom'}
root = ET.parse(sys.stdin).getroot()
entry = root.find('a:entry', ns)
if entry is None: sys.exit('Paper not found')
title = entry.find('a:title', ns).text.strip().replace('\n', ' ')
authors = ' and '.join(a.find('a:name', ns).text for a in entry.findall('a:author', ns))
year = entry.find('a:published', ns).text[:4]
raw_id = entry.find('a:id', ns).text.strip().split('/abs/')[-1]
cat = entry.find('arxiv:primary_category', ns)
primary = cat.get('term') if cat is not None else 'cs.LG'
last_name = entry.find('a:author', ns).find('a:name', ns).text.split()[-1]
print(f'@article{{{last_name}{year}_{raw_id.replace(\".\", \"\")},')
print(f' title = {{{title}}},')
print(f' author = {{{authors}}},')
print(f' year = {{{year}}},')
print(f' eprint = {{{raw_id}}},')
print(f' archivePrefix = {{arXiv}},')
print(f' primaryClass = {{{primary}}},')
print(f' url = {{https://arxiv.org/abs/{raw_id}}}')
print('}')
"{% endraw %}
Reading Paper Content
阅读论文内容
After finding a paper, read it:
undefined找到论文后,可通过以下方式阅读:
undefinedAbstract page (fast, metadata + abstract)
摘要页面(快速获取元数据+摘要)
web_extract(urls=["https://arxiv.org/abs/2402.03300"])
web_extract(urls=["https://arxiv.org/abs/2402.03300"])
Full paper (PDF → markdown via Firecrawl)
完整论文(通过Firecrawl将PDF转换为markdown)
web_extract(urls=["https://arxiv.org/pdf/2402.03300"])
For local PDF processing, see the `ocr-and-documents` skill.web_extract(urls=["https://arxiv.org/pdf/2402.03300"])
如需本地PDF处理,请参考`ocr-and-documents`技能。Common Categories
常见分类
| Category | Field |
|---|---|
| Artificial Intelligence |
| Computation and Language (NLP) |
| Computer Vision |
| Machine Learning |
| Cryptography and Security |
| Machine Learning (Statistics) |
| Optimization and Control |
| Computational Physics |
Full list: https://arxiv.org/category_taxonomy
| 分类 | 领域 |
|---|---|
| 人工智能 |
| 计算与语言(NLP) |
| 计算机视觉 |
| 机器学习 |
| 密码学与安全 |
| 机器学习(统计学) |
| 优化与控制 |
| 计算物理 |
Helper Script
辅助脚本
The script handles XML parsing and provides clean output:
scripts/search_arxiv.pybash
python scripts/search_arxiv.py "GRPO reinforcement learning"
python scripts/search_arxiv.py "transformer attention" --max 10 --sort date
python scripts/search_arxiv.py --author "Yann LeCun" --max 5
python scripts/search_arxiv.py --category cs.AI --sort date
python scripts/search_arxiv.py --id 2402.03300
python scripts/search_arxiv.py --id 2402.03300,2401.12345No dependencies — uses only Python stdlib.
scripts/search_arxiv.pybash
python scripts/search_arxiv.py "GRPO reinforcement learning"
python scripts/search_arxiv.py "transformer attention" --max 10 --sort date
python scripts/search_arxiv.py --author "Yann LeCun" --max 5
python scripts/search_arxiv.py --category cs.AI --sort date
python scripts/search_arxiv.py --id 2402.03300
python scripts/search_arxiv.py --id 2402.03300,2401.12345无需依赖库——仅使用Python标准库。
Semantic Scholar (Citations, Related Papers, Author Profiles)
Semantic Scholar(引用、相关论文、作者档案)
arXiv doesn't provide citation data or recommendations. Use the Semantic Scholar API for that — free, no key needed for basic use (1 req/sec), returns JSON.
arXiv不提供引用数据或推荐功能。可使用Semantic Scholar API获取这些信息——免费使用,基础功能无需密钥(每秒1次请求),返回JSON格式数据。
Get paper details + citations
获取论文详情+引用数据
bash
undefinedbash
undefinedBy arXiv ID
通过arXiv ID
By Semantic Scholar paper ID or DOI
通过Semantic Scholar论文ID或DOI
Get citations OF a paper (who cited it)
获取某篇论文的引用文献(谁引用了它)
bash
curl -s "https://api.semanticscholar.org/graph/v1/paper/arXiv:2402.03300/citations?fields=title,authors,year,citationCount&limit=10" | python3 -m json.toolbash
curl -s "https://api.semanticscholar.org/graph/v1/paper/arXiv:2402.03300/citations?fields=title,authors,year,citationCount&limit=10" | python3 -m json.toolGet references FROM a paper (what it cites)
获取某篇论文的参考文献(它引用了什么)
bash
curl -s "https://api.semanticscholar.org/graph/v1/paper/arXiv:2402.03300/references?fields=title,authors,year,citationCount&limit=10" | python3 -m json.toolbash
curl -s "https://api.semanticscholar.org/graph/v1/paper/arXiv:2402.03300/references?fields=title,authors,year,citationCount&limit=10" | python3 -m json.toolSearch papers (alternative to arXiv search, returns JSON)
搜索论文(arXiv搜索的替代方案,返回JSON)
bash
curl -s "https://api.semanticscholar.org/graph/v1/paper/search?query=GRPO+reinforcement+learning&limit=5&fields=title,authors,year,citationCount,externalIds" | python3 -m json.toolbash
curl -s "https://api.semanticscholar.org/graph/v1/paper/search?query=GRPO+reinforcement+learning&limit=5&fields=title,authors,year,citationCount,externalIds" | python3 -m json.toolGet paper recommendations
获取论文推荐
bash
curl -s -X POST "https://api.semanticscholar.org/recommendations/v1/papers/" \
-H "Content-Type: application/json" \
-d '{"positivePaperIds": ["arXiv:2402.03300"], "negativePaperIds": []}' | python3 -m json.toolbash
curl -s -X POST "https://api.semanticscholar.org/recommendations/v1/papers/" \
-H "Content-Type: application/json" \
-d '{"positivePaperIds": ["arXiv:2402.03300"], "negativePaperIds": []}' | python3 -m json.toolAuthor profile
作者档案
bash
curl -s "https://api.semanticscholar.org/graph/v1/author/search?query=Yann+LeCun&fields=name,hIndex,citationCount,paperCount" | python3 -m json.toolbash
curl -s "https://api.semanticscholar.org/graph/v1/author/search?query=Yann+LeCun&fields=name,hIndex,citationCount,paperCount" | python3 -m json.toolUseful Semantic Scholar fields
实用的Semantic Scholar字段
titleauthorsyearabstractcitationCountreferenceCountinfluentialCitationCountisOpenAccessopenAccessPdffieldsOfStudypublicationVenueexternalIdstitleauthorsyearabstractcitationCountreferenceCountinfluentialCitationCountisOpenAccessopenAccessPdffieldsOfStudypublicationVenueexternalIdsComplete Research Workflow
完整研究工作流
- Discover:
python scripts/search_arxiv.py "your topic" --sort date --max 10 - Assess impact:
curl -s "https://api.semanticscholar.org/graph/v1/paper/arXiv:ID?fields=citationCount,influentialCitationCount" - Read abstract:
web_extract(urls=["https://arxiv.org/abs/ID"]) - Read full paper:
web_extract(urls=["https://arxiv.org/pdf/ID"]) - Find related work:
curl -s "https://api.semanticscholar.org/graph/v1/paper/arXiv:ID/references?fields=title,citationCount&limit=20" - Get recommendations: POST to Semantic Scholar recommendations endpoint
- Track authors:
curl -s "https://api.semanticscholar.org/graph/v1/author/search?query=NAME"
- 发现论文:
python scripts/search_arxiv.py "你的研究主题" --sort date --max 10 - 评估影响力:
curl -s "https://api.semanticscholar.org/graph/v1/paper/arXiv:ID?fields=citationCount,influentialCitationCount" - 阅读摘要:
web_extract(urls=["https://arxiv.org/abs/ID"]) - 阅读完整论文:
web_extract(urls=["https://arxiv.org/pdf/ID"]) - 查找相关研究:
curl -s "https://api.semanticscholar.org/graph/v1/paper/arXiv:ID/references?fields=title,citationCount&limit=20" - 获取推荐论文:向Semantic Scholar推荐端点发送POST请求
- 追踪作者:
curl -s "https://api.semanticscholar.org/graph/v1/author/search?query=作者姓名"
Rate Limits
请求频率限制
| API | Rate | Auth |
|---|---|---|
| arXiv | ~1 req / 3 seconds | None needed |
| Semantic Scholar | 1 req / second | None (100/sec with API key) |
| API | 频率限制 | 认证要求 |
|---|---|---|
| arXiv | 约每3秒1次请求 | 无需认证 |
| Semantic Scholar | 每秒1次请求 | 无需认证(使用API密钥可达到每秒100次) |
Notes
注意事项
- arXiv returns Atom XML — use the helper script or parsing snippet for clean output
- Semantic Scholar returns JSON — pipe through for readability
python3 -m json.tool - arXiv IDs: old format () vs new (
hep-th/0601001)2402.03300 - PDF: — Abstract:
https://arxiv.org/pdf/{id}https://arxiv.org/abs/{id} - HTML (when available):
https://arxiv.org/html/{id} - For local PDF processing, see the skill
ocr-and-documents
- arXiv返回Atom XML格式数据——建议使用辅助脚本或解析代码片段以获得清晰输出
- Semantic Scholar返回JSON格式数据——可通过管道处理以提高可读性
python3 -m json.tool - arXiv ID格式:旧格式() vs 新格式(
hep-th/0601001)2402.03300 - PDF地址:—— 摘要地址:
https://arxiv.org/pdf/{id}https://arxiv.org/abs/{id} - HTML页面(若可用):
https://arxiv.org/html/{id} - 如需本地PDF处理,请参考技能
ocr-and-documents
ID Versioning
ID版本控制
- always resolves to the latest version
arxiv.org/abs/1706.03762 - points to a specific immutable version
arxiv.org/abs/1706.03762v1 - When generating citations, preserve the version suffix you actually read to prevent citation drift (a later version may substantially change content)
- The API field returns the versioned URL (e.g.,
<id>)http://arxiv.org/abs/1706.03762v7
- 始终指向最新版本
arxiv.org/abs/1706.03762 - 指向特定的不可变版本
arxiv.org/abs/1706.03762v1 - 生成引用时,请保留你实际阅读的版本后缀,以避免引用偏差(后续版本可能大幅修改内容)
- API的字段返回带版本的URL(例如:
<id>)http://arxiv.org/abs/1706.03762v7
Withdrawn Papers
撤回的论文
Papers can be withdrawn after submission. When this happens:
- The field contains a withdrawal notice (look for "withdrawn" or "retracted")
<summary> - Metadata fields may be incomplete
- Always check the summary before treating a result as a valid paper
论文提交后可能被撤回。发生撤回时:
- 字段包含撤回通知(查找"withdrawn"或"retracted"关键词)
<summary> - 元数据字段可能不完整
- 在将结果视为有效论文前,请务必检查摘要内容