arxiv-reader

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

arxiv-reader

arxiv-reader

Read and analyze arXiv papers directly from the workspace. Converts LaTeX source into clean text suitable for LLM analysis.
直接在工作区阅读和分析arXiv论文,将LaTeX源文件转换为适合LLM分析的干净文本。

Description

描述

Fetches arXiv papers, flattens LaTeX includes, and returns clean text for LLM analysis. Works standalone (downloads directly from arXiv) or delegates to the container arXiv server when available. Results are cached locally for instant repeat access.
可获取arXiv论文,扁平化LaTeX引入的内容,返回可供LLM分析的干净文本。支持独立运行(直接从arXiv下载),也可在可用时委派给容器化arXiv服务器。所有结果会在本地缓存,重复访问可即时获取。

Usage Examples

使用示例

  • "Read the paper 2301.00001 from arXiv"
  • "What sections does paper 2405.12345 have?"
  • "Get the abstract of 2312.09876"
  • "Fetch paper 2301.00001 without the appendix"
  • "阅读arXiv上编号为2301.00001的论文"
  • "论文2405.12345有哪些章节?"
  • "获取2312.09876的摘要"
  • "获取论文2301.00001的内容,排除附录"

Process

使用流程

  1. Quick look — Use
    arxiv_abstract
    to get a paper's abstract before committing to a full read
  2. Survey structure — Use
    arxiv_sections
    to understand the paper's outline
  3. Deep read — Use
    arxiv_fetch
    to get the full flattened LaTeX for analysis
  1. 快速预览 — 使用
    arxiv_abstract
    在完整阅读前先获取论文摘要
  2. 了解结构 — 使用
    arxiv_sections
    了解论文的大纲
  3. 深度阅读 — 使用
    arxiv_fetch
    获取完整的扁平化LaTeX内容用于分析

Tools

工具

arxiv_fetch

arxiv_fetch

Fetch the full flattened LaTeX source of an arXiv paper.
Parameters:
  • arxiv_id
    (string, required): arXiv paper ID (e.g.
    2301.00001
    or
    2301.00001v2
    )
  • remove_comments
    (boolean, optional): Strip LaTeX comments (default: true)
  • remove_appendix
    (boolean, optional): Remove appendix sections (default: false)
  • figure_paths
    (boolean, optional): Replace figures with file paths only (default: false)
Returns:
{ content: string, arxiv_id: string, cached: boolean }
Example:
json
{ "arxiv_id": "2301.00001", "remove_appendix": true }
获取arXiv论文完整的扁平化LaTeX源文件。
参数:
  • arxiv_id
    (字符串,必填):arXiv论文ID(例如
    2301.00001
    2301.00001v2
  • remove_comments
    (布尔值,可选):移除LaTeX注释(默认:true)
  • remove_appendix
    (布尔值,可选):移除附录章节(默认:false)
  • figure_paths
    (布尔值,可选):仅用文件路径替换图片内容(默认:false)
返回值:
{ content: string, arxiv_id: string, cached: boolean }
示例:
json
{ "arxiv_id": "2301.00001", "remove_appendix": true }

arxiv_sections

arxiv_sections

List all sections and subsections of an arXiv paper.
Parameters:
  • arxiv_id
    (string, required): arXiv paper ID
Returns:
{ arxiv_id: string, sections: string[] }
Example:
json
{ "arxiv_id": "2301.00001" }
列出arXiv论文的所有章节和子章节。
参数:
  • arxiv_id
    (字符串,必填):arXiv论文ID
返回值:
{ arxiv_id: string, sections: string[] }
示例:
json
{ "arxiv_id": "2301.00001" }

arxiv_abstract

arxiv_abstract

Extract just the abstract from an arXiv paper.
Parameters:
  • arxiv_id
    (string, required): arXiv paper ID
Returns:
{ arxiv_id: string, abstract: string }
Example:
json
{ "arxiv_id": "2301.00001" }
仅提取arXiv论文的摘要。
参数:
  • arxiv_id
    (字符串,必填):arXiv论文ID
返回值:
{ arxiv_id: string, abstract: string }
示例:
json
{ "arxiv_id": "2301.00001" }

Notes

注意事项

  • All results are cached locally — repeat requests are instant
  • Works standalone (no Docker required) or with the container arXiv server
  • Paper IDs support version suffixes (e.g.
    2301.00001v2
    )
  • Very large papers may take 10-30 seconds on first fetch
  • arxiv_abstract
    uses the arXiv Atom API for fast metadata retrieval in standalone mode
  • 所有结果都在本地缓存 — 重复请求可即时响应
  • 可独立运行(无需Docker),也可搭配容器化arXiv服务器使用
  • 论文ID支持版本后缀(例如
    2301.00001v2
  • 首次获取超大论文可能需要10-30秒
  • 独立运行模式下,
    arxiv_abstract
    使用arXiv Atom API快速获取元数据