arxiv

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

arXiv Paper Search & Download

arXiv论文搜索与下载

Search topic or arXiv paper ID: $ARGUMENTS

搜索主题或arXiv论文ID：$ARGUMENTS

Constants

常量

PAPER_DIR - Local directory to save downloaded PDFs. Default:
```
papers/
```
in the current project directory.
MAX_RESULTS = 10 - Default number of search results.
FETCH_SCRIPT -
```
tools/arxiv_fetch.py
```
relative to the ARIS install, or the same path relative to the current project. Fall back to inline Python if not found.

Overrides (append to arguments):
/arxiv "attention mechanism" - max: 20
- return up to 20 results
/arxiv "2301.07041" - download
- download a specific paper by ID
/arxiv "query" - dir: literature/
- save PDFs to a custom directory
/arxiv "query" - download: all
- download all result PDFs

PAPER_DIR - 用于保存下载PDF的本地目录。默认值：当前项目目录下的
```
papers/
```
MAX_RESULTS = 10 - 默认的搜索结果数量
FETCH_SCRIPT - 相对于ARIS安装路径的
```
tools/arxiv_fetch.py
```
，或相对于当前项目的相同路径。如果未找到，则回退到内置Python代码。

覆盖配置（追加到参数中）：
/arxiv "attention mechanism" - max: 20
- 返回最多20条结果
/arxiv "2301.07041" - download
- 通过ID下载指定论文
/arxiv "query" - dir: literature/
- 将PDF保存到自定义目录
/arxiv "query" - download: all
- 下载所有结果的PDF

Workflow

工作流程

Step 1: Parse Arguments

步骤1：解析参数

Parse

$ARGUMENTS

for directives:

Query or ID: main search term or a bare arXiv ID such as
```
2301.07041
```
or
```
cs/0601001
```
- max: N
: override MAX_RESULTS (e.g.,
```
- max: 20
```
)
- dir: PATH
: override PAPER_DIR (e.g.,
```
- dir: literature/
```
)
- download
: download the first result's PDF after listing
- download: all
: download PDFs for all results

If the argument matches an arXiv ID pattern (

YYMM.NNNNN

category/NNNNNNN

), skip the search and go directly to Step 3.

解析

$ARGUMENTS

中的指令：

查询词或ID：主要搜索词，或纯arXiv ID，例如
```
2301.07041
```
或
```
cs/0601001
```
- max: N
：覆盖MAX_RESULTS（例如
```
- max: 20
```
）
- dir: PATH
：覆盖PAPER_DIR（例如
```
- dir: literature/
```
）
- download
：列出结果后下载第一条结果的PDF
- download: all
：下载所有结果的PDF

如果参数匹配arXiv ID格式（

YYMM.NNNNN

或

category/NNNNNNN

），则跳过搜索，直接进入步骤3。

Step 2: Search arXiv

步骤2：搜索arXiv

Locate the fetch script:

bash

SCRIPT=$(python3 -c "
import pathlib
candidates = [
    pathlib.Path('tools/arxiv_fetch.py'),
    pathlib.Path.home() / '.claude' / 'skills' / 'arxiv' / 'arxiv_fetch.py',
]
for p in candidates:
    if p.exists():
        print(p)
        break
" 2>/dev/null)

If SCRIPT is found, run:

bash

python3 "$SCRIPT" search "QUERY" --max MAX_RESULTS

If SCRIPT is not found, fall back to inline Python:

bash

python3 - <<'PYEOF'
import json
import urllib.parse
import urllib.request
import xml.etree.ElementTree as ET

NS = "http://www.w3.org/2005/Atom"
query = urllib.parse.quote("QUERY")
url = (f"http://export.arxiv.org/api/query"
       f"?search_query={query}&start=0&max_results=MAX_RESULTS"
       f"&sortBy=relevance&sortOrder=descending")
with urllib.request.urlopen(url, timeout=30) as r:
    root = ET.fromstring(r.read())
papers = []
for entry in root.findall(f"{{{NS}}}entry"):
    aid = entry.findtext(f"{{{NS}}}id", "").split("/abs/")[-1].split("v")[0]
    title = (entry.findtext(f"{{{NS}}}title", "") or "").strip().replace("\n", " ")
    abstract = (entry.findtext(f"{{{NS}}}summary", "") or "").strip().replace("\n", " ")
    authors = [a.findtext(f"{{{NS}}}name", "") for a in entry.findall(f"{{{NS}}}author")]
    published = entry.findtext(f"{{{NS}}}published", "")[:10]
    cats = [c.get("term", "") for c in entry.findall(f"{{{NS}}}category")]
    papers.append({
        "id": aid,
        "title": title,
        "authors": authors,
        "abstract": abstract,
        "published": published,
        "categories": cats,
        "pdf_url": f"https://arxiv.org/pdf/{aid}.pdf",
        "abs_url": f"https://arxiv.org/abs/{aid}",
    })
print(json.dumps(papers, ensure_ascii=False, indent=2))
PYEOF

Present results as a table:

text

| # | arXiv ID   | Title               | Authors        | Date       | Category |
|---|------------|---------------------|----------------|------------|----------|
| 1 | 2301.07041 | Attention Is All... | Vaswani et al. | 2017-06-12 | cs.LG    |

定位获取脚本：

bash

SCRIPT=$(python3 -c "
import pathlib
candidates = [
    pathlib.Path('tools/arxiv_fetch.py'),
    pathlib.Path.home() / '.claude' / 'skills' / 'arxiv' / 'arxiv_fetch.py',
]
for p in candidates:
    if p.exists():
        print(p)
        break
" 2>/dev/null)

如果找到SCRIPT，运行：

bash

python3 "$SCRIPT" search "QUERY" --max MAX_RESULTS

如果未找到SCRIPT，回退到内置Python代码：

bash

python3 - <<'PYEOF'
import json
import urllib.parse
import urllib.request
import xml.etree.ElementTree as ET

NS = "http://www.w3.org/2005/Atom"
query = urllib.parse.quote("QUERY")
url = (f"http://export.arxiv.org/api/query"
       f"?search_query={query}&start=0&max_results=MAX_RESULTS"
       f"&sortBy=relevance&sortOrder=descending")
with urllib.request.urlopen(url, timeout=30) as r:
    root = ET.fromstring(r.read())
papers = []
for entry in root.findall(f"{{{NS}}}entry"):
    aid = entry.findtext(f"{{{NS}}}id", "").split("/abs/")[-1].split("v")[0]
    title = (entry.findtext(f"{{{NS}}}title", "") or "").strip().replace("\n", " ")
    abstract = (entry.findtext(f"{{{NS}}}summary", "") or "").strip().replace("\n", " ")
    authors = [a.findtext(f"{{{NS}}}name", "") for a in entry.findall(f"{{{NS}}}author")]
    published = entry.findtext(f"{{{NS}}}published", "")[:10]
    cats = [c.get("term", "") for c in entry.findall(f"{{{NS}}}category")]
    papers.append({
        "id": aid,
        "title": title,
        "authors": authors,
        "abstract": abstract,
        "published": published,
        "categories": cats,
        "pdf_url": f"https://arxiv.org/pdf/{aid}.pdf",
        "abs_url": f"https://arxiv.org/abs/{aid}",
    })
print(json.dumps(papers, ensure_ascii=False, indent=2))
PYEOF

以表格形式展示结果：

text

| # | arXiv ID   | Title               | Authors        | Date       | Category |
|---|------------|---------------------|----------------|------------|----------|
| 1 | 2301.07041 | Attention Is All... | Vaswani et al. | 2017-06-12 | cs.LG    |

Step 3: Fetch Details for a Specific ID

步骤3：获取指定ID的详细信息

When a single paper ID is requested (either directly or from Step 2):

bash

python3 "$SCRIPT" search "id:ARXIV_ID" --max 1

当请求单个论文ID时（无论是直接请求还是来自步骤2）：

bash

python3 "$SCRIPT" search "id:ARXIV_ID" --max 1

or fallback:

或回退方案：

python3 -c " import urllib.request, xml.etree.ElementTree as ET NS = 'http://www.w3.org/2005/Atom' url = 'http://export.arxiv.org/api/query?id_list=ARXIV_ID' with urllib.request.urlopen(url, timeout=30) as r: root = ET.fromstring(r.read())

print full details ...

打印完整详情...


Display: title, all authors, categories, full abstract, published date, PDF URL, abstract URL.


展示内容：标题、所有作者、分类、完整摘要、发布日期、PDF链接、摘要页面链接。

Step 4: Download PDFs

步骤4：下载PDF

When download is requested, for each paper ID to download:

bash

undefined

当请求下载时，对每个要下载的论文ID执行：

bash

undefined

Using fetch script:

使用获取脚本：

python3 "$SCRIPT" download ARXIV_ID --dir PAPER_DIR

Fallback:

回退方案：

mkdir -p PAPER_DIR && python3 -c " import pathlib import sys import urllib.request

out = pathlib.Path('PAPER_DIR/ARXIV_ID.pdf') if out.exists(): print(f'Already exists: {out}') sys.exit(0) req = urllib.request.Request( 'https://arxiv.org/pdf/ARXIV_ID.pdf', headers={'User-Agent': 'arxiv-skill/1.0'}, ) with urllib.request.urlopen(req, timeout=60) as r: out.write_bytes(r.read()) print(f'Downloaded: {out} ({out.stat().st_size // 1024} KB)') "


After each download:

- Confirm file size > 10 KB (reject smaller files - likely an error HTML page)
- Add a 1-second delay between consecutive downloads to avoid rate limiting
- Report: `Downloaded: papers/2301.07041.pdf (842 KB)`

mkdir -p PAPER_DIR && python3 -c " import pathlib import sys import urllib.request


每次下载后：

- 确认文件大小>10 KB（如果更小则拒绝，因为可能是错误HTML页面）
- 连续下载之间添加1秒延迟，避免触发速率限制
- 报告：`Downloaded: papers/2301.07041.pdf (842 KB)`

Step 5: Summarize

步骤5：总结

For each paper (downloaded or fetched by API):

markdown

undefined

对每篇论文（已下载或通过API获取）：

markdown

undefined

[Title]

[标题]

arXiv: [ID] - [abs_url]
Authors: [full author list]
Date: [published]
Categories: [cs.LG, cs.AI, ...]
Abstract: [full abstract]
Key contributions (extracted from abstract):
- [contribution 1]
- [contribution 2]
- [contribution 3]
Local PDF: papers/[ID].pdf (if downloaded)

undefined

arXiv：[ID] - [abs_url]
作者：[完整作者列表]
日期：[发布日期]
分类：[cs.LG, cs.AI, ...]
摘要：[完整摘要]
核心贡献（从摘要提取）：
- [贡献1]
- [贡献2]
- [贡献3]
本地PDF：papers/[ID].pdf（如果已下载）

undefined

Step 6: Final Output

步骤6：最终输出

Summarize what was done:

```
Found N papers for "query"
```

Downloaded: papers/2301.07041.pdf (842 KB)

(for each download)

Any warnings (rate limit hit, file too small, already exists)

Suggest follow-up skills:

text

/research-lit "topic"     - multi-source review: Zotero + Obsidian + local PDFs + web
/novelty-check "idea"     - verify your idea is novel against these papers

总结已完成的操作：

```
为“查询词”找到N篇论文
```

已下载：papers/2301.07041.pdf (842 KB)

（每个下载对应一条）

任何警告（触发速率限制、文件过小、已存在）

推荐后续可用功能：

text

/research-lit "主题"     - 多来源综述：Zotero + Obsidian + 本地PDF + 网页
/novelty-check "想法"     - 对照这些论文验证你的想法是否具有创新性

Key Rules

核心规则

Always show the arXiv ID prominently - users need it for citations and reproducibility
Verify downloaded PDFs: file must be > 10 KB; warn and delete if smaller
Rate limit: wait 1 second between consecutive PDF downloads; retry once after 5 seconds on HTTP 429
Never overwrite an existing PDF at the same path - skip it and report "already exists"
Handle both arXiv ID formats: new (
```
2301.07041
```
) and old (
```
cs/0601001
```
)
PAPER_DIR is created automatically if it does not exist
If the arXiv API is unreachable, report the error clearly and suggest using
```
/research-lit
```
with
```
- sources: web
```
as a fallback

始终突出显示arXiv ID - 用户需要它来引用和复现研究
验证下载的PDF：文件必须>10 KB；如果更小则发出警告并删除
速率限制：连续PDF下载之间等待1秒；遇到HTTP 429错误时，等待5秒后重试一次
切勿覆盖同一路径下已存在的PDF - 跳过并报告“已存在”
支持两种arXiv ID格式：新格式（
```
2301.07041
```
）和旧格式（
```
cs/0601001
```
）
如果PAPER_DIR不存在，会自动创建
如果arXiv API无法访问，清晰报告错误，并建议使用
```
/research-lit
```
并添加
```
- sources: web
```
作为替代方案