arxiv-mcp
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseMode: Cognitive/Prompt-Driven — No standalone utility script; use via agent context.
模式:认知/提示驱动 — 无独立实用脚本,需通过Agent上下文使用。
arXiv Search Skill
arXiv搜索技能
<identity>
arXiv Search Skill - Search and retrieve academic papers from arXiv.org using existing tools (WebFetch, Exa). No MCP server installation required.
</identity>
<identity>
arXiv搜索技能 - 使用现有工具(WebFetch、Exa)从arXiv.org搜索并获取学术论文,无需安装MCP服务器。
</identity>
✅ No Installation Required
✅ 无需安装
This skill uses existing tools to access arXiv:
- WebFetch - Direct access to arXiv API
- Exa - Semantic search with arXiv filtering
Works immediately - no MCP server, no restart needed.
<capabilities>
- Search academic papers by keywords, authors, categories, or date ranges
- Retrieve detailed paper metadata (title, authors, abstract, categories, PDF link)
- Get specific papers by arXiv ID
- Find related papers based on categories and keywords
- Filter by arXiv categories (cs.AI, cs.LG, cs.CV, math.*, physics.*, etc.)
- No API key required - uses public arXiv API
</capabilities>本技能使用现有工具访问arXiv:
- WebFetch - 直接访问arXiv API
- Exa - 支持arXiv过滤的语义搜索
开箱即用,无需MCP服务器,无需重启。
<capabilities>
- 按关键词、作者、分类或日期范围搜索学术论文
- 获取详细的论文元数据(标题、作者、摘要、分类、PDF链接)
- 通过arXiv ID获取指定论文
- 基于分类和关键词查找相关论文
- 支持按arXiv分类过滤(cs.AI、cs.LG、cs.CV、math.*、physics.*等)
- 无需API密钥,使用公开的arXiv API
</capabilities>Result Limits (Memory Safeguard)
结果限制(内存保护机制)
arxiv-mcp returns academic papers. To prevent memory exhaustion:
- max_results: 20 (HARD LIMIT)
- Each paper metadata ~300 bytes
- 20 papers × 300 bytes = ~6 KB metadata
- Papers can be 100+ KB each if fetched - DON'T fetch full papers
Why the limit?
- Previous limit: 100 results → 30 KB+ metadata → context explosion
- New limit: 20 results → 6 KB metadata → memory safe
- 20 papers is usually enough to find your target
arXiv-mcp返回学术论文,为避免内存耗尽:
- 最大结果数:20(硬限制)
- 单篇论文元数据约300字节
- 20篇论文 × 300字节 = 约6KB元数据
- 若抓取完整论文单篇可达100KB以上,请勿抓取完整论文内容
为什么设置该限制?
- 旧限制:100条结果 → 30KB以上元数据 → 上下文爆炸
- 新限制:20条结果 → 6KB元数据 → 内存安全
- 通常20篇论文足够找到目标内容
Method 1: WebFetch with arXiv API (Recommended for specific queries)
方法1:配合WebFetch使用arXiv API(特定查询推荐)
The arXiv API is publicly accessible at .
http://export.arxiv.org/api/queryarXiv API公开访问地址为
http://export.arxiv.org/api/queryRecommended Pattern
推荐使用范式
javascript
// ✓ GOOD: Limit results to 20
WebFetch({
url: 'http://export.arxiv.org/api/query?search_query=all:transformer+attention&max_results=20&sortBy=relevance',
prompt: 'Extract paper titles, authors, abstracts, arXiv IDs, and PDF links from these results',
});
// ✓ GOOD: Use specific filters to reduce result set
WebFetch({
url: 'http://export.arxiv.org/api/query?search_query=all:transformer+attention+2025&max_results=20&sortBy=submittedDate',
prompt: 'Extract recent papers on transformer attention',
});
// ✗ BAD: Old behavior - unlimited or >20 results
WebFetch({
url: 'http://export.arxiv.org/api/query?search_query=all:neural+networks',
// Too broad - will get 100s of results
});
// ✗ BAD: Exceeds memory limit
WebFetch({
url: 'http://export.arxiv.org/api/query?search_query=all:deep+learning&max_results=100',
// Over limit - memory risk
});javascript
// ✓ 推荐:限制结果数为20
WebFetch({
url: 'http://export.arxiv.org/api/query?search_query=all:transformer+attention&max_results=20&sortBy=relevance',
prompt: 'Extract paper titles, authors, abstracts, arXiv IDs, and PDF links from these results',
});
// ✓ 推荐:使用特定过滤器缩小结果集
WebFetch({
url: 'http://export.arxiv.org/api/query?search_query=all:transformer+attention+2025&max_results=20&sortBy=submittedDate',
prompt: 'Extract recent papers on transformer attention',
});
// ✗ 不推荐:旧行为,无限制或结果数超过20
WebFetch({
url: 'http://export.arxiv.org/api/query?search_query=all:neural+networks',
// 范围过宽,会返回数百条结果
});
// ✗ 不推荐:超出内存限制
WebFetch({
url: 'http://export.arxiv.org/api/query?search_query=all:deep+learning&max_results=100',
// 超出限制,存在内存风险
});Search by Keywords
按关键词搜索
javascript
WebFetch({
url: 'http://export.arxiv.org/api/query?search_query=all:transformer+attention&max_results=20&sortBy=relevance',
prompt: 'Extract paper titles, authors, abstracts, arXiv IDs, and PDF links from these results',
});javascript
WebFetch({
url: 'http://export.arxiv.org/api/query?search_query=all:transformer+attention&max_results=20&sortBy=relevance',
prompt: 'Extract paper titles, authors, abstracts, arXiv IDs, and PDF links from these results',
});Search by Author
按作者搜索
javascript
WebFetch({
url: 'http://export.arxiv.org/api/query?search_query=au:LeCun&max_results=10&sortBy=submittedDate',
prompt: 'Extract paper titles, authors, abstracts, and arXiv IDs',
});javascript
WebFetch({
url: 'http://export.arxiv.org/api/query?search_query=au:LeCun&max_results=10&sortBy=submittedDate',
prompt: 'Extract paper titles, authors, abstracts, and arXiv IDs',
});Search by Category
按分类搜索
javascript
WebFetch({
url: 'http://export.arxiv.org/api/query?search_query=cat:cs.LG&max_results=15&sortBy=submittedDate',
prompt: 'Extract paper titles, authors, abstracts, categories, and arXiv IDs',
});javascript
WebFetch({
url: 'http://export.arxiv.org/api/query?search_query=cat:cs.LG&max_results=15&sortBy=submittedDate',
prompt: 'Extract paper titles, authors, abstracts, categories, and arXiv IDs',
});Get Specific Paper by ID
按ID获取指定论文
javascript
WebFetch({
url: 'http://export.arxiv.org/api/query?id_list=2301.07041',
prompt:
'Extract full details: title, all authors, abstract, categories, published date, PDF link',
});javascript
WebFetch({
url: 'http://export.arxiv.org/api/query?id_list=2301.07041',
prompt:
'Extract full details: title, all authors, abstract, categories, published date, PDF link',
});API Query Parameters
API查询参数
| Parameter | Description | Example |
|---|---|---|
| Search terms with field prefixes | |
| Comma-separated arXiv IDs | |
| Number of results (default 10, max 100) | |
| Offset for pagination | |
| Sort order: | |
| | |
| 参数 | 描述 | 示例 |
|---|---|---|
| 带字段前缀的搜索词 | |
| 逗号分隔的arXiv ID列表 | |
| 结果数量(默认10,最大100) | |
| 分页偏移量 | |
| 排序规则: | |
| 排序方向: | |
Field Prefixes for search_query
search_query的字段前缀
| Prefix | Field | Example |
|---|---|---|
| All fields | |
| Title | |
| Author | |
| Abstract | |
| Category | |
| Comment | |
| 前缀 | 对应字段 | 示例 |
|---|---|---|
| 所有字段 | |
| 标题 | |
| 作者 | |
| 摘要 | |
| 分类 | |
| 评论 | |
Boolean Operators
布尔运算符
Combine terms with , , :
ANDORANDNOTsearch_query=ti:transformer+AND+abs:attention
search_query=au:LeCun+OR+au:Bengio
search_query=cat:cs.LG+ANDNOT+ti:survey使用、、组合搜索词:
ANDORANDNOTsearch_query=ti:transformer+AND+abs:attention
search_query=au:LeCun+OR+au:Bengio
search_query=cat:cs.LG+ANDNOT+ti:surveyWhen NOT to Use arxiv-mcp
不适合使用arxiv-mcp的场景
- General web research → Use WebSearch/WebFetch instead
- Implementation examples → Use or ripgrep skill on codebase (Grep/Glob as fallback)
pnpm search:code - Product research → Use WebSearch with news filter
- Community discussions → Use WebSearch for forums/Stack Overflow
arxiv-mcp is best for:
- Finding academic papers on specific topics
- Understanding theoretical foundations
- Citing research in documentation
- Quick literature review (20 papers max)
- 通用网络调研 → 改用WebSearch/WebFetch
- 实现示例查找 → 使用或代码库的ripgrep技能( fallback用Grep/Glob)
pnpm search:code - 产品调研 → 使用带新闻过滤的WebSearch
- 社区讨论查找 → 使用WebSearch搜索论坛/Stack Overflow
arxiv-mcp最佳适用场景:
- 查找特定主题的学术论文
- 理解理论基础
- 在文档中引用研究成果
- 快速文献综述(最多20篇论文)
Method 2: Exa Search (Better for semantic/natural language queries)
方法2:使用Exa搜索(语义/自然语言查询更优)
Use Exa for more natural language queries with arXiv filtering:
使用Exa进行更自然的语言查询,支持arXiv过滤:
Semantic Search
语义搜索
javascript
mcp__Exa__web_search_exa({
query: 'site:arxiv.org transformer architecture attention mechanism deep learning',
numResults: 10,
});javascript
mcp__Exa__web_search_exa({
query: 'site:arxiv.org transformer architecture attention mechanism deep learning',
numResults: 10,
});Recent Papers in a Field
领域最新论文
javascript
mcp__Exa__web_search_exa({
query: 'site:arxiv.org large language model scaling laws 2024',
numResults: 15,
});javascript
mcp__Exa__web_search_exa({
query: 'site:arxiv.org large language model scaling laws 2024',
numResults: 15,
});Author-Focused Search
特定作者搜索
javascript
mcp__Exa__web_search_exa({
query: 'site:arxiv.org author:"Yann LeCun" deep learning',
numResults: 10,
});javascript
mcp__Exa__web_search_exa({
query: 'site:arxiv.org author:"Yann LeCun" deep learning',
numResults: 10,
});Common arXiv Categories
常见arXiv分类
| Category | Field |
|---|---|
| cs.AI | Artificial Intelligence |
| cs.LG | Machine Learning |
| cs.CL | Computation and Language (NLP) |
| cs.CV | Computer Vision |
| cs.SE | Software Engineering |
| cs.CR | Cryptography and Security |
| stat.ML | Machine Learning (Statistics) |
| math.* | Mathematics (all subcategories) |
| physics.* | Physics (all subcategories) |
| q-bio.* | Quantitative Biology |
| econ.* | Economics |
| 分类代码 | 对应领域 |
|---|---|
| cs.AI | 人工智能 |
| cs.LG | 机器学习 |
| cs.CL | 计算语言学(NLP) |
| cs.CV | 计算机视觉 |
| cs.SE | 软件工程 |
| cs.CR | 密码学与安全 |
| stat.ML | 机器学习(统计学方向) |
| math.* | 数学(所有子分类) |
| physics.* | 物理(所有子分类) |
| q-bio.* | 定量生物学 |
| econ.* | 经济学 |
Workflow: Complete Research Process
工作流:完整研究流程
Step 1: Initial Search
步骤1:初步搜索
javascript
// Start with broad Exa search for semantic matching
mcp__Exa__web_search_exa({
query: 'site:arxiv.org transformer attention mechanism neural networks',
numResults: 10,
});javascript
// 先用Exa进行宽泛的语义匹配搜索
mcp__Exa__web_search_exa({
query: 'site:arxiv.org transformer attention mechanism neural networks',
numResults: 10,
});Step 2: Get Specific Papers
步骤2:获取指定论文详情
javascript
// Get details for interesting papers by ID
WebFetch({
url: 'http://export.arxiv.org/api/query?id_list=2301.07041,2302.13971',
prompt: 'Extract full metadata for each paper: title, authors, abstract, categories, PDF URL',
});javascript
// 通过ID获取感兴趣的论文详情
WebFetch({
url: 'http://export.arxiv.org/api/query?id_list=2301.07041,2302.13971',
prompt: 'Extract full metadata for each paper: title, authors, abstract, categories, PDF URL',
});Step 3: Find Related Work
步骤3:查找相关工作
javascript
// Search by category of interesting paper
WebFetch({
url: 'http://export.arxiv.org/api/query?search_query=cat:cs.LG+AND+ti:attention&max_results=10&sortBy=submittedDate',
prompt: 'Find related papers, extract titles and abstracts',
});javascript
// 按感兴趣论文的分类搜索相关内容
WebFetch({
url: 'http://export.arxiv.org/api/query?search_query=cat:cs.LG+AND+ti:attention&max_results=10&sortBy=submittedDate',
prompt: 'Find related papers, extract titles and abstracts',
});Step 4: Get Recent Papers
步骤4:获取最新论文
javascript
// Latest papers in the field
WebFetch({
url: 'http://export.arxiv.org/api/query?search_query=cat:cs.LG&max_results=20&sortBy=submittedDate&sortOrder=descending',
prompt: 'Extract the 20 most recent machine learning papers',
});</execution_process>
<best_practices>
- Use Exa for discovery: Natural language queries find semantically related papers
- Use WebFetch for precision: Specific IDs, categories, or API queries
- Combine approaches: Exa to discover, WebFetch to deep-dive
- Use specific queries: "transformer attention mechanism" > "machine learning"
- Check multiple categories: Papers often span cs.AI + cs.LG + cs.CL
- Sort by date for recent work:
sortBy=submittedDate&sortOrder=descending
</best_practices>
</instructions>
<examples>
<usage_example>
**Example 1: Search for transformer papers**:
javascript
WebFetch({
url: 'http://export.arxiv.org/api/query?search_query=ti:transformer+AND+abs:attention&max_results=10&sortBy=relevance',
prompt: 'Extract paper titles, authors, abstracts, and arXiv IDs',
});Example 2: Find papers by researcher:
javascript
WebFetch({
url: 'http://export.arxiv.org/api/query?search_query=au:Vaswani&max_results=15',
prompt: 'List all papers by this author with titles and dates',
});Example 3: Get recent ML papers:
javascript
WebFetch({
url: 'http://export.arxiv.org/api/query?search_query=cat:cs.LG&max_results=20&sortBy=submittedDate&sortOrder=descending',
prompt: 'Extract the 20 most recent machine learning papers with titles and abstracts',
});Example 4: Semantic search with Exa:
javascript
mcp__Exa__web_search_exa({
query: 'site:arxiv.org multimodal large language models vision 2024',
numResults: 10,
});Example 5: Get specific paper details:
javascript
WebFetch({
url: 'http://export.arxiv.org/api/query?id_list=1706.03762',
prompt: "Extract complete details for the 'Attention Is All You Need' paper",
});</usage_example>
</examples>
javascript
// 领域最新论文
WebFetch({
url: 'http://export.arxiv.org/api/query?search_query=cat:cs.LG&max_results=20&sortBy=submittedDate&sortOrder=descending',
prompt: 'Extract the 20 most recent machine learning papers',
});</execution_process>
<best_practices>
- 发现阶段用Exa:自然语言查询可以找到语义相关的论文
- 精准获取用WebFetch:适用于特定ID、分类或API查询场景
- 组合使用两种方法:Exa用于发现,WebFetch用于深入查询
- 使用具体查询词:"transformer attention mechanism" 优于 "machine learning"
- 检查多个分类:论文通常跨cs.AI + cs.LG + cs.CL等多个分类
- 查找最新内容按日期排序:使用
sortBy=submittedDate&sortOrder=descending
</best_practices>
</instructions>
<examples>
<usage_example>
**示例1:搜索Transformer相关论文**:
javascript
WebFetch({
url: 'http://export.arxiv.org/api/query?search_query=ti:transformer+AND+abs:attention&max_results=10&sortBy=relevance',
prompt: 'Extract paper titles, authors, abstracts, and arXiv IDs',
});示例2:查找特定研究者的论文:
javascript
WebFetch({
url: 'http://export.arxiv.org/api/query?search_query=au:Vaswani&max_results=15',
prompt: 'List all papers by this author with titles and dates',
});示例3:获取最新机器学习论文:
javascript
WebFetch({
url: 'http://export.arxiv.org/api/query?search_query=cat:cs.LG&max_results=20&sortBy=submittedDate&sortOrder=descending',
prompt: 'Extract the 20 most recent machine learning papers with titles and abstracts',
});示例4:使用Exa进行语义搜索:
javascript
mcp__Exa__web_search_exa({
query: 'site:arxiv.org multimodal large language models vision 2024',
numResults: 10,
});示例5:获取指定论文详情:
javascript
WebFetch({
url: 'http://export.arxiv.org/api/query?id_list=1706.03762',
prompt: "Extract complete details for the 'Attention Is All You Need' paper",
});</usage_example>
</examples>
Agent Integration
Agent集成
This skill is automatically assigned to:
- researcher - Academic research, literature review
- scientific-research-expert - Deep scientific analysis
- developer - Finding technical papers for implementation
本技能会自动分配给以下角色:
- researcher(研究员) - 学术研究、文献综述
- scientific-research-expert(科研专家) - 深度科学分析
- developer(开发者) - 查找技术论文用于实现
Iron Laws
铁律
- ALWAYS enforce max_results=20 — never allow unlimited or >20 result queries; context explosion from 100+ papers is a known failure mode that stalls agent pipelines.
- NEVER fetch full paper PDFs during literature review — extract metadata and abstracts only; full papers are 100KB+ each and will exhaust context budget in minutes.
- ALWAYS use Exa for semantic discovery, WebFetch for precision retrieval — Exa finds semantically related papers; WebFetch gets specific IDs or category feeds; use both in sequence, not interchangeably.
- NEVER use broad queries without field prefixes — returns thousands of results; always scope with
search_query=neural+networks,ti:,au:, orcat:prefixes to target the query.abs: - ALWAYS cite arXiv IDs (e.g., 2301.07041) when referencing papers — titles alone are ambiguous and change; IDs are stable, machine-readable, and enable instant retrieval.
- 始终强制max_results=20 — 绝不允许无限制或结果数超过20的查询,100+篇论文导致的上下文爆炸是已知的会导致Agent pipeline停滞的故障模式。
- 文献综述阶段绝不抓取完整论文PDF — 仅提取元数据和摘要,完整论文单篇可达100KB以上,数分钟内就会耗尽上下文预算。
- 始终用Exa做语义发现,用WebFetch做精准检索 — Exa查找语义相关论文,WebFetch获取指定ID或分类的内容,按顺序配合使用,不要混用。
- 绝不使用不带字段前缀的宽泛查询 — 会返回数千条结果,始终使用
search_query=neural+networks、ti:、au:或cat:前缀限定查询范围。abs: - 引用论文时始终标注arXiv ID(例如2301.07041) — 仅标题存在歧义且可能变更,ID是稳定的、机器可读的,可支持即时检索。
Anti-Patterns
反模式
| Anti-Pattern | Why It Fails | Correct Approach |
|---|---|---|
Using | Context explosion; 100 papers × 300 bytes = 30KB+ metadata | Always set |
| Fetching full paper PDFs | Single paper can be 100KB+; kills context budget | Extract abstract + metadata only via API |
| Broad query without field prefix | Returns irrelevant results across all fields | Use |
| Using only WebFetch for discovery | Misses semantically related papers not matching exact terms | Use Exa for semantic discovery first |
| Citing paper titles instead of arXiv IDs | Titles can be ambiguous or duplicated | Always include the arXiv ID (e.g., 1706.03762) |
| 反模式 | 故障原因 | 正确做法 |
|---|---|---|
使用 | 上下文爆炸,100篇论文 × 300字节 = 30KB以上元数据 | 始终设置 |
| 抓取完整论文PDF | 单篇论文可达100KB以上,耗尽上下文预算 | 仅通过API提取摘要+元数据 |
| 不带字段前缀的宽泛查询 | 返回全领域的无关结果 | 使用 |
| 仅用WebFetch做发现阶段搜索 | 遗漏不匹配精确关键词的语义相关论文 | 先用Exa进行语义发现 |
| 引用论文标题而非arXiv ID | 标题可能存在歧义或重复 | 始终包含arXiv ID(例如1706.03762) |
Memory Protocol (MANDATORY)
内存协议(强制要求)
Before starting:
bash
cat .claude/context/memory/learnings.mdAfter completing:
- New pattern ->
.claude/context/memory/learnings.md - Issue found ->
.claude/context/memory/issues.md - Decision made ->
.claude/context/memory/decisions.md
ASSUME INTERRUPTION: Your context may reset. If it's not in memory, it didn't happen.
开始前:
bash
cat .claude/context/memory/learnings.md完成后:
- 新的模式 → 写入
.claude/context/memory/learnings.md - 发现的问题 → 写入
.claude/context/memory/issues.md - 做出的决策 → 写入
.claude/context/memory/decisions.md
假设会被中断:你的上下文可能重置,未存入内存的内容等同于未发生。