arxiv-mcp

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
Mode: Cognitive/Prompt-Driven — No standalone utility script; use via agent context.
模式:认知/提示驱动 — 无独立实用脚本,需通过Agent上下文使用。

arXiv Search Skill

arXiv搜索技能

<identity> arXiv Search Skill - Search and retrieve academic papers from arXiv.org using existing tools (WebFetch, Exa). No MCP server installation required. </identity>
<identity> arXiv搜索技能 - 使用现有工具(WebFetch、Exa)从arXiv.org搜索并获取学术论文,无需安装MCP服务器。 </identity>

✅ No Installation Required

✅ 无需安装

This skill uses existing tools to access arXiv:
  • WebFetch - Direct access to arXiv API
  • Exa - Semantic search with arXiv filtering
Works immediately - no MCP server, no restart needed.
<capabilities> - Search academic papers by keywords, authors, categories, or date ranges - Retrieve detailed paper metadata (title, authors, abstract, categories, PDF link) - Get specific papers by arXiv ID - Find related papers based on categories and keywords - Filter by arXiv categories (cs.AI, cs.LG, cs.CV, math.*, physics.*, etc.) - No API key required - uses public arXiv API </capabilities>
本技能使用现有工具访问arXiv:
  • WebFetch - 直接访问arXiv API
  • Exa - 支持arXiv过滤的语义搜索
开箱即用,无需MCP服务器,无需重启。
<capabilities> - 按关键词、作者、分类或日期范围搜索学术论文 - 获取详细的论文元数据(标题、作者、摘要、分类、PDF链接) - 通过arXiv ID获取指定论文 - 基于分类和关键词查找相关论文 - 支持按arXiv分类过滤(cs.AI、cs.LG、cs.CV、math.*、physics.*等) - 无需API密钥,使用公开的arXiv API </capabilities>

Result Limits (Memory Safeguard)

结果限制(内存保护机制)

arxiv-mcp returns academic papers. To prevent memory exhaustion:
  • max_results: 20 (HARD LIMIT)
  • Each paper metadata ~300 bytes
  • 20 papers × 300 bytes = ~6 KB metadata
  • Papers can be 100+ KB each if fetched - DON'T fetch full papers
Why the limit?
  • Previous limit: 100 results → 30 KB+ metadata → context explosion
  • New limit: 20 results → 6 KB metadata → memory safe
  • 20 papers is usually enough to find your target
<instructions> <execution_process>
arXiv-mcp返回学术论文,为避免内存耗尽:
  • 最大结果数:20(硬限制)
  • 单篇论文元数据约300字节
  • 20篇论文 × 300字节 = 约6KB元数据
  • 若抓取完整论文单篇可达100KB以上,请勿抓取完整论文内容
为什么设置该限制?
  • 旧限制:100条结果 → 30KB以上元数据 → 上下文爆炸
  • 新限制:20条结果 → 6KB元数据 → 内存安全
  • 通常20篇论文足够找到目标内容
<instructions> <execution_process>

Method 1: WebFetch with arXiv API (Recommended for specific queries)

方法1:配合WebFetch使用arXiv API(特定查询推荐)

The arXiv API is publicly accessible at
http://export.arxiv.org/api/query
.
arXiv API公开访问地址为
http://export.arxiv.org/api/query

Recommended Pattern

推荐使用范式

javascript
// ✓ GOOD: Limit results to 20
WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=all:transformer+attention&max_results=20&sortBy=relevance',
  prompt: 'Extract paper titles, authors, abstracts, arXiv IDs, and PDF links from these results',
});

// ✓ GOOD: Use specific filters to reduce result set
WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=all:transformer+attention+2025&max_results=20&sortBy=submittedDate',
  prompt: 'Extract recent papers on transformer attention',
});

// ✗ BAD: Old behavior - unlimited or >20 results
WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=all:neural+networks',
  // Too broad - will get 100s of results
});

// ✗ BAD: Exceeds memory limit
WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=all:deep+learning&max_results=100',
  // Over limit - memory risk
});
javascript
// ✓ 推荐:限制结果数为20
WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=all:transformer+attention&max_results=20&sortBy=relevance',
  prompt: 'Extract paper titles, authors, abstracts, arXiv IDs, and PDF links from these results',
});

// ✓ 推荐:使用特定过滤器缩小结果集
WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=all:transformer+attention+2025&max_results=20&sortBy=submittedDate',
  prompt: 'Extract recent papers on transformer attention',
});

// ✗ 不推荐:旧行为,无限制或结果数超过20
WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=all:neural+networks',
  // 范围过宽,会返回数百条结果
});

// ✗ 不推荐:超出内存限制
WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=all:deep+learning&max_results=100',
  // 超出限制,存在内存风险
});

Search by Keywords

按关键词搜索

javascript
WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=all:transformer+attention&max_results=20&sortBy=relevance',
  prompt: 'Extract paper titles, authors, abstracts, arXiv IDs, and PDF links from these results',
});
javascript
WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=all:transformer+attention&max_results=20&sortBy=relevance',
  prompt: 'Extract paper titles, authors, abstracts, arXiv IDs, and PDF links from these results',
});

Search by Author

按作者搜索

javascript
WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=au:LeCun&max_results=10&sortBy=submittedDate',
  prompt: 'Extract paper titles, authors, abstracts, and arXiv IDs',
});
javascript
WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=au:LeCun&max_results=10&sortBy=submittedDate',
  prompt: 'Extract paper titles, authors, abstracts, and arXiv IDs',
});

Search by Category

按分类搜索

javascript
WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=cat:cs.LG&max_results=15&sortBy=submittedDate',
  prompt: 'Extract paper titles, authors, abstracts, categories, and arXiv IDs',
});
javascript
WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=cat:cs.LG&max_results=15&sortBy=submittedDate',
  prompt: 'Extract paper titles, authors, abstracts, categories, and arXiv IDs',
});

Get Specific Paper by ID

按ID获取指定论文

javascript
WebFetch({
  url: 'http://export.arxiv.org/api/query?id_list=2301.07041',
  prompt:
    'Extract full details: title, all authors, abstract, categories, published date, PDF link',
});
javascript
WebFetch({
  url: 'http://export.arxiv.org/api/query?id_list=2301.07041',
  prompt:
    'Extract full details: title, all authors, abstract, categories, published date, PDF link',
});

API Query Parameters

API查询参数

ParameterDescriptionExample
search_query
Search terms with field prefixes
all:transformer
,
au:LeCun
,
ti:attention
id_list
Comma-separated arXiv IDs
2301.07041,2302.13971
max_results
Number of results (default 10, max 100)
max_results=20
start
Offset for pagination
start=10
sortBy
Sort order:
relevance
,
lastUpdatedDate
,
submittedDate
sortBy=submittedDate
sortOrder
ascending
or
descending
sortOrder=descending
参数描述示例
search_query
带字段前缀的搜索词
all:transformer
,
au:LeCun
,
ti:attention
id_list
逗号分隔的arXiv ID列表
2301.07041,2302.13971
max_results
结果数量(默认10,最大100)
max_results=20
start
分页偏移量
start=10
sortBy
排序规则:
relevance
(相关性)、
lastUpdatedDate
(最新更新时间)、
submittedDate
(提交时间)
sortBy=submittedDate
sortOrder
排序方向:
ascending
(升序)或
descending
(降序)
sortOrder=descending

Field Prefixes for search_query

search_query的字段前缀

PrefixFieldExample
all:
All fields
all:machine+learning
ti:
Title
ti:transformer
au:
Author
au:Vaswani
abs:
Abstract
abs:attention+mechanism
cat:
Category
cat:cs.LG
co:
Comment
co:accepted
前缀对应字段示例
all:
所有字段
all:machine+learning
ti:
标题
ti:transformer
au:
作者
au:Vaswani
abs:
摘要
abs:attention+mechanism
cat:
分类
cat:cs.LG
co:
评论
co:accepted

Boolean Operators

布尔运算符

Combine terms with
AND
,
OR
,
ANDNOT
:
search_query=ti:transformer+AND+abs:attention
search_query=au:LeCun+OR+au:Bengio
search_query=cat:cs.LG+ANDNOT+ti:survey
使用
AND
OR
ANDNOT
组合搜索词:
search_query=ti:transformer+AND+abs:attention
search_query=au:LeCun+OR+au:Bengio
search_query=cat:cs.LG+ANDNOT+ti:survey

When NOT to Use arxiv-mcp

不适合使用arxiv-mcp的场景

  • General web research → Use WebSearch/WebFetch instead
  • Implementation examples → Use
    pnpm search:code
    or ripgrep skill on codebase (Grep/Glob as fallback)
  • Product research → Use WebSearch with news filter
  • Community discussions → Use WebSearch for forums/Stack Overflow
arxiv-mcp is best for:
  • Finding academic papers on specific topics
  • Understanding theoretical foundations
  • Citing research in documentation
  • Quick literature review (20 papers max)

  • 通用网络调研 → 改用WebSearch/WebFetch
  • 实现示例查找 → 使用
    pnpm search:code
    或代码库的ripgrep技能( fallback用Grep/Glob)
  • 产品调研 → 使用带新闻过滤的WebSearch
  • 社区讨论查找 → 使用WebSearch搜索论坛/Stack Overflow
arxiv-mcp最佳适用场景:
  • 查找特定主题的学术论文
  • 理解理论基础
  • 在文档中引用研究成果
  • 快速文献综述(最多20篇论文)

Method 2: Exa Search (Better for semantic/natural language queries)

方法2:使用Exa搜索(语义/自然语言查询更优)

Use Exa for more natural language queries with arXiv filtering:
使用Exa进行更自然的语言查询,支持arXiv过滤:

Semantic Search

语义搜索

javascript
mcp__Exa__web_search_exa({
  query: 'site:arxiv.org transformer architecture attention mechanism deep learning',
  numResults: 10,
});
javascript
mcp__Exa__web_search_exa({
  query: 'site:arxiv.org transformer architecture attention mechanism deep learning',
  numResults: 10,
});

Recent Papers in a Field

领域最新论文

javascript
mcp__Exa__web_search_exa({
  query: 'site:arxiv.org large language model scaling laws 2024',
  numResults: 15,
});
javascript
mcp__Exa__web_search_exa({
  query: 'site:arxiv.org large language model scaling laws 2024',
  numResults: 15,
});

Author-Focused Search

特定作者搜索

javascript
mcp__Exa__web_search_exa({
  query: 'site:arxiv.org author:"Yann LeCun" deep learning',
  numResults: 10,
});

javascript
mcp__Exa__web_search_exa({
  query: 'site:arxiv.org author:"Yann LeCun" deep learning',
  numResults: 10,
});

Common arXiv Categories

常见arXiv分类

CategoryField
cs.AIArtificial Intelligence
cs.LGMachine Learning
cs.CLComputation and Language (NLP)
cs.CVComputer Vision
cs.SESoftware Engineering
cs.CRCryptography and Security
stat.MLMachine Learning (Statistics)
math.*Mathematics (all subcategories)
physics.*Physics (all subcategories)
q-bio.*Quantitative Biology
econ.*Economics

分类代码对应领域
cs.AI人工智能
cs.LG机器学习
cs.CL计算语言学(NLP)
cs.CV计算机视觉
cs.SE软件工程
cs.CR密码学与安全
stat.ML机器学习(统计学方向)
math.*数学(所有子分类)
physics.*物理(所有子分类)
q-bio.*定量生物学
econ.*经济学

Workflow: Complete Research Process

工作流:完整研究流程

Step 1: Initial Search

步骤1:初步搜索

javascript
// Start with broad Exa search for semantic matching
mcp__Exa__web_search_exa({
  query: 'site:arxiv.org transformer attention mechanism neural networks',
  numResults: 10,
});
javascript
// 先用Exa进行宽泛的语义匹配搜索
mcp__Exa__web_search_exa({
  query: 'site:arxiv.org transformer attention mechanism neural networks',
  numResults: 10,
});

Step 2: Get Specific Papers

步骤2:获取指定论文详情

javascript
// Get details for interesting papers by ID
WebFetch({
  url: 'http://export.arxiv.org/api/query?id_list=2301.07041,2302.13971',
  prompt: 'Extract full metadata for each paper: title, authors, abstract, categories, PDF URL',
});
javascript
// 通过ID获取感兴趣的论文详情
WebFetch({
  url: 'http://export.arxiv.org/api/query?id_list=2301.07041,2302.13971',
  prompt: 'Extract full metadata for each paper: title, authors, abstract, categories, PDF URL',
});

Step 3: Find Related Work

步骤3:查找相关工作

javascript
// Search by category of interesting paper
WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=cat:cs.LG+AND+ti:attention&max_results=10&sortBy=submittedDate',
  prompt: 'Find related papers, extract titles and abstracts',
});
javascript
// 按感兴趣论文的分类搜索相关内容
WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=cat:cs.LG+AND+ti:attention&max_results=10&sortBy=submittedDate',
  prompt: 'Find related papers, extract titles and abstracts',
});

Step 4: Get Recent Papers

步骤4:获取最新论文

javascript
// Latest papers in the field
WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=cat:cs.LG&max_results=20&sortBy=submittedDate&sortOrder=descending',
  prompt: 'Extract the 20 most recent machine learning papers',
});
</execution_process>
<best_practices>
  1. Use Exa for discovery: Natural language queries find semantically related papers
  2. Use WebFetch for precision: Specific IDs, categories, or API queries
  3. Combine approaches: Exa to discover, WebFetch to deep-dive
  4. Use specific queries: "transformer attention mechanism" > "machine learning"
  5. Check multiple categories: Papers often span cs.AI + cs.LG + cs.CL
  6. Sort by date for recent work:
    sortBy=submittedDate&sortOrder=descending
</best_practices> </instructions>
<examples> <usage_example> **Example 1: Search for transformer papers**:
javascript
WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=ti:transformer+AND+abs:attention&max_results=10&sortBy=relevance',
  prompt: 'Extract paper titles, authors, abstracts, and arXiv IDs',
});
Example 2: Find papers by researcher:
javascript
WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=au:Vaswani&max_results=15',
  prompt: 'List all papers by this author with titles and dates',
});
Example 3: Get recent ML papers:
javascript
WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=cat:cs.LG&max_results=20&sortBy=submittedDate&sortOrder=descending',
  prompt: 'Extract the 20 most recent machine learning papers with titles and abstracts',
});
Example 4: Semantic search with Exa:
javascript
mcp__Exa__web_search_exa({
  query: 'site:arxiv.org multimodal large language models vision 2024',
  numResults: 10,
});
Example 5: Get specific paper details:
javascript
WebFetch({
  url: 'http://export.arxiv.org/api/query?id_list=1706.03762',
  prompt: "Extract complete details for the 'Attention Is All You Need' paper",
});
</usage_example> </examples>
javascript
// 领域最新论文
WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=cat:cs.LG&max_results=20&sortBy=submittedDate&sortOrder=descending',
  prompt: 'Extract the 20 most recent machine learning papers',
});
</execution_process>
<best_practices>
  1. 发现阶段用Exa:自然语言查询可以找到语义相关的论文
  2. 精准获取用WebFetch:适用于特定ID、分类或API查询场景
  3. 组合使用两种方法:Exa用于发现,WebFetch用于深入查询
  4. 使用具体查询词:"transformer attention mechanism" 优于 "machine learning"
  5. 检查多个分类:论文通常跨cs.AI + cs.LG + cs.CL等多个分类
  6. 查找最新内容按日期排序:使用
    sortBy=submittedDate&sortOrder=descending
</best_practices> </instructions>
<examples> <usage_example> **示例1:搜索Transformer相关论文**:
javascript
WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=ti:transformer+AND+abs:attention&max_results=10&sortBy=relevance',
  prompt: 'Extract paper titles, authors, abstracts, and arXiv IDs',
});
示例2:查找特定研究者的论文:
javascript
WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=au:Vaswani&max_results=15',
  prompt: 'List all papers by this author with titles and dates',
});
示例3:获取最新机器学习论文:
javascript
WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=cat:cs.LG&max_results=20&sortBy=submittedDate&sortOrder=descending',
  prompt: 'Extract the 20 most recent machine learning papers with titles and abstracts',
});
示例4:使用Exa进行语义搜索:
javascript
mcp__Exa__web_search_exa({
  query: 'site:arxiv.org multimodal large language models vision 2024',
  numResults: 10,
});
示例5:获取指定论文详情:
javascript
WebFetch({
  url: 'http://export.arxiv.org/api/query?id_list=1706.03762',
  prompt: "Extract complete details for the 'Attention Is All You Need' paper",
});
</usage_example> </examples>

Agent Integration

Agent集成

This skill is automatically assigned to:
  • researcher - Academic research, literature review
  • scientific-research-expert - Deep scientific analysis
  • developer - Finding technical papers for implementation
本技能会自动分配给以下角色:
  • researcher(研究员) - 学术研究、文献综述
  • scientific-research-expert(科研专家) - 深度科学分析
  • developer(开发者) - 查找技术论文用于实现

Iron Laws

铁律

  1. ALWAYS enforce max_results=20 — never allow unlimited or >20 result queries; context explosion from 100+ papers is a known failure mode that stalls agent pipelines.
  2. NEVER fetch full paper PDFs during literature review — extract metadata and abstracts only; full papers are 100KB+ each and will exhaust context budget in minutes.
  3. ALWAYS use Exa for semantic discovery, WebFetch for precision retrieval — Exa finds semantically related papers; WebFetch gets specific IDs or category feeds; use both in sequence, not interchangeably.
  4. NEVER use broad queries without field prefixes
    search_query=neural+networks
    returns thousands of results; always scope with
    ti:
    ,
    au:
    ,
    cat:
    , or
    abs:
    prefixes to target the query.
  5. ALWAYS cite arXiv IDs (e.g., 2301.07041) when referencing papers — titles alone are ambiguous and change; IDs are stable, machine-readable, and enable instant retrieval.
  1. 始终强制max_results=20 — 绝不允许无限制或结果数超过20的查询,100+篇论文导致的上下文爆炸是已知的会导致Agent pipeline停滞的故障模式。
  2. 文献综述阶段绝不抓取完整论文PDF — 仅提取元数据和摘要,完整论文单篇可达100KB以上,数分钟内就会耗尽上下文预算。
  3. 始终用Exa做语义发现,用WebFetch做精准检索 — Exa查找语义相关论文,WebFetch获取指定ID或分类的内容,按顺序配合使用,不要混用。
  4. 绝不使用不带字段前缀的宽泛查询
    search_query=neural+networks
    会返回数千条结果,始终使用
    ti:
    au:
    cat:
    abs:
    前缀限定查询范围。
  5. 引用论文时始终标注arXiv ID(例如2301.07041) — 仅标题存在歧义且可能变更,ID是稳定的、机器可读的,可支持即时检索。

Anti-Patterns

反模式

Anti-PatternWhy It FailsCorrect Approach
Using
max_results=100
or no limit
Context explosion; 100 papers × 300 bytes = 30KB+ metadataAlways set
max_results=20
(hard limit)
Fetching full paper PDFsSingle paper can be 100KB+; kills context budgetExtract abstract + metadata only via API
Broad query without field prefixReturns irrelevant results across all fieldsUse
ti:
,
au:
,
cat:
, or
abs:
prefix
Using only WebFetch for discoveryMisses semantically related papers not matching exact termsUse Exa for semantic discovery first
Citing paper titles instead of arXiv IDsTitles can be ambiguous or duplicatedAlways include the arXiv ID (e.g., 1706.03762)
反模式故障原因正确做法
使用
max_results=100
或不设限制
上下文爆炸,100篇论文 × 300字节 = 30KB以上元数据始终设置
max_results=20
(硬限制)
抓取完整论文PDF单篇论文可达100KB以上,耗尽上下文预算仅通过API提取摘要+元数据
不带字段前缀的宽泛查询返回全领域的无关结果使用
ti:
au:
cat:
abs:
前缀
仅用WebFetch做发现阶段搜索遗漏不匹配精确关键词的语义相关论文先用Exa进行语义发现
引用论文标题而非arXiv ID标题可能存在歧义或重复始终包含arXiv ID(例如1706.03762)

Memory Protocol (MANDATORY)

内存协议(强制要求)

Before starting:
bash
cat .claude/context/memory/learnings.md
After completing:
  • New pattern ->
    .claude/context/memory/learnings.md
  • Issue found ->
    .claude/context/memory/issues.md
  • Decision made ->
    .claude/context/memory/decisions.md
ASSUME INTERRUPTION: Your context may reset. If it's not in memory, it didn't happen.
开始前:
bash
cat .claude/context/memory/learnings.md
完成后:
  • 新的模式 → 写入
    .claude/context/memory/learnings.md
  • 发现的问题 → 写入
    .claude/context/memory/issues.md
  • 做出的决策 → 写入
    .claude/context/memory/decisions.md
假设会被中断:你的上下文可能重置,未存入内存的内容等同于未发生。