Back to Details

paper-reader

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

开始前: 先跟用户打个招呼 🐕

Before Starting: Greet the user first 🐕

学术论文阅读助手 (Paper Reader)

Academic Paper Reading Assistant (Paper Reader)

专注 CV/DL 领域，支持 Zotero 集成和 Obsidian 笔记保存。

Focused on the CV/DL field, with support for Zotero integration and Obsidian note saving.

Step 0: 读取共享配置

Step 0: Read Shared Configuration

先读取

../_shared/user-config.json

，如果

../_shared/user-config.local.json

存在，再用它覆盖默认值。

显式生成并在后续统一使用这些变量：

```
VAULT_PATH
```
```
NOTES_PATH
```
```
CONCEPTS_PATH
```
```
ZOTERO_DB
```
```
ZOTERO_STORAGE
```
```
AUTO_REFRESH_INDEXES
```
```
GIT_COMMIT_ENABLED
```
```
GIT_PUSH_ENABLED
```

其中：

NOTES_PATH = {VAULT_PATH}/{paper_notes_folder}

CONCEPTS_PATH = {NOTES_PATH}/{concepts_folder}

```
GIT_PUSH_ENABLED
```
只有在
```
GIT_COMMIT_ENABLED=true
```
时才可能为真

后续统一使用上面的变量。

First read

../_shared/user-config.json

, then override default values with

../_shared/user-config.local.json

if it exists.

Explicitly generate and uniformly use these variables in subsequent steps:

```
VAULT_PATH
```
```
NOTES_PATH
```
```
CONCEPTS_PATH
```
```
ZOTERO_DB
```
```
ZOTERO_STORAGE
```
```
AUTO_REFRESH_INDEXES
```
```
GIT_COMMIT_ENABLED
```
```
GIT_PUSH_ENABLED
```

Where:

NOTES_PATH = {VAULT_PATH}/{paper_notes_folder}

CONCEPTS_PATH = {NOTES_PATH}/{concepts_folder}

```
GIT_PUSH_ENABLED
```
can only be true if
```
GIT_COMMIT_ENABLED=true
```

Use the above variables uniformly in subsequent steps.

1. 接收论文

1. Receive Paper

输入方式	示例	处理方法
PDF 路径	`/path/to/paper.pdf`	直接 Read
arXiv 链接	`https://arxiv.org/abs/xxxx`	WebFetch
Zotero 分类	"VLA 分类的论文"	查询数据库 → 列出 → 用户选择
Zotero 搜索	"Zotero 里的 π0.5"	搜索标题 → 找到 PDF
无 PDF	Zotero 条目无附件	从网上获取（见下方）

Input Method	Example	Processing Method
PDF Path	`/path/to/paper.pdf`	Direct Read
arXiv Link	`https://arxiv.org/abs/xxxx`	WebFetch
Zotero Category	"papers in VLA category"	Query Database → List → User Selection
Zotero Search	"π0.5 in Zotero"	Search Title → Locate PDF
No PDF	Zotero entry has no attachments	Fetch from the web (see below)

无 PDF 时的获取流程

Fetch Process When No PDF is Available

python3 assets/zotero_helper.py info {item_id}

获取论文信息

按优先级获取：arXiv HTML > arXiv PDF > DOI > WebSearch 标题
判断 arXiv ID：从 URL / Zotero extra 字段 / 标题搜索
推荐直接 WebFetch
```
https://arxiv.org/html/{arxiv_id}
```
，无需下载
跳过条件：既无 PDF 也无在线来源 / 非论文内容

Zotero 详细操作见
references/zotero-guide.md

Run

python3 assets/zotero_helper.py info {item_id}

to get paper information

Fetch in priority order: arXiv HTML > arXiv PDF > DOI > WebSearch by title
Identify arXiv ID: from URL / Zotero extra field / title search
Recommend directly using WebFetch on
```
https://arxiv.org/html/{arxiv_id}
```
without downloading
Skip conditions: neither PDF nor online source available / non-paper content

For detailed Zotero operations, see
references/zotero-guide.md

2. 阅读模式

2. Reading Modes

模式	触发词	输出
快速摘要	"快速看一下"、"quick"	3-5 句核心贡献
完整解析	"详细分析"、默认	结构化笔记（用模板）
批判分析	"批判性分析"、"critique"	方法论优缺点评估
知识提取	"提取公式"、"技术细节"	公式 + 算法伪代码

Mode	Trigger Phrases	Output
Quick Summary	"quickly look at", "quick"	3-5 sentences of core contributions
Full Analysis	"detailed analysis", default	Structured notes (using template)
Critical Analysis	"critically analyze", "critique"	Evaluation of methodological strengths and weaknesses
Knowledge Extraction	"extract formulas", "technical details"	Formulas + algorithm pseudocode

3. 笔记生成

3. Note Generation

模板: 严格遵循

assets/paper-note-template.md

，不可自行简化。

Template: Strictly follow

assets/paper-note-template.md

, no self-simplification allowed.

核心质量规则

Core Quality Rules

零遗漏: 论文中所有 Figure、所有公式、所有 Table 必须全部出现在笔记中
内联概念链接: 正文中首次出现的技术术语必须用
```
[[概念]]
```
链接，不仅仅是结尾
严禁 ASCII 流程图: 用结构化 Markdown 列表 +
```
$数学符号$
```
描述架构
公式完整性: 每个公式必须有名称（
```
[[概念|名称]]
```
）、LaTeX 公式、含义、符号说明
图片外链优先: arXiv HTML / 项目主页 / GitHub，找不到再本地下载

公式/图片/表格的详细质量规范见
references/quality-standards.md

No Omissions: All Figures, formulas, and Tables in the paper must be included in the notes
Inline Concept Links: Technical terms appearing for the first time in the text must be linked using
```
[[Concept]]
```
, not just at the end
No ASCII Flowcharts: Describe architectures using structured Markdown lists +
```
$mathematical symbols$
```
Formula Completeness: Each formula must have a name (
```
[[Concept|Name]]
```
), LaTeX formula, meaning, and symbol explanation
Priority to Image External Links: arXiv HTML / project homepage / GitHub; download locally only if not found

Detailed quality specifications for formulas/images/tables can be found in
references/quality-standards.md

图片获取流程（多源 fallback）

Image Fetch Process (Multi-Source Fallback)

目标: 确保笔记中包含论文的所有 Figure，先统计论文 Figure 总数再逐一获取。

WebSearch
```
"{论文标题} arxiv"
```
获取 arXiv ID
来源 A — arXiv HTML（首选）：
- WebFetch
```
https://arxiv.org/html/{arxiv_id}
```
  提取所有
```
<figure>
```
  的标题与 img src URL
- 统计论文 Figure 总数，确认提取数量是否完整
来源 B — 项目主页（HTML 404 或图片不全时）：
- 从摘要/HTML 中查找项目主页 URL（常见模式：
```
project page
```
  、
```
github.io
```
  、
```
our website
```
  ）
- WebFetch 项目主页，提取展示图片（通常包含 teaser / demo 图）
来源 C — PDF 提取（前两者都失败时）：
- ```
pdfimages -png
```
  从 PDF 中提取，筛选 >10KB 的有效图片
笔记中用
```
![Figure X](url)
```
外链嵌入
验证：外链可加载 / 本地文件 >10KB
URL 去重：写入前检查 URL 中是否有重复的 arxiv_id 路径段（如
```
2603.05312v1/2603.05312v1/
```
），有则删除重复段。详见
```
references/image-troubleshooting.md
```

ar5iv 编号不一定对应 Figure 编号，排错见
references/image-troubleshooting.md

Goal: Ensure the notes include all Figures from the paper. First count the total number of Figures in the paper, then fetch them one by one.

WebSearch
```
"{paper title} arxiv"
```
to get the arXiv ID
Source A — arXiv HTML (preferred):
- WebFetch
```
https://arxiv.org/html/{arxiv_id}
```
  to extract titles and img src URLs of all
```
<figure>
```
  elements
- Count the total number of Figures in the paper and confirm if the extracted quantity is complete
Source B — Project Homepage (when HTML returns 404 or images are incomplete):
- Find the project homepage URL from the abstract/HTML (common patterns:
```
project page
```
  ,
```
github.io
```
  ,
```
our website
```
  )
- WebFetch the project homepage and extract displayed images (usually includes teaser/demo images)
Source C — PDF Extraction (when the first two sources fail):
- Use
```
pdfimages -png
```
  to extract from PDF, filter valid images larger than 10KB
Embed in notes using
```
![Figure X](url)
```
external links
Verification: External links are loadable / local files are >10KB
URL Deduplication: Check for duplicate arxiv_id path segments in the URL (e.g.,
```
2603.05312v1/2603.05312v1/
```
) before writing; remove duplicates if present. See
```
references/image-troubleshooting.md
```
for details.

ar5iv numbers do not necessarily correspond to Figure numbers; see
references/image-troubleshooting.md
for troubleshooting

图片可靠性保障（生成后自动执行）

Image Reliability Guarantee (Automatically Executed After Generation)

笔记保存后，运行图片可达性检查脚本，自动将不可访问的外链图片下载到本地：

bash

python3 ../daily-papers/download_note_images.py "{笔记完整路径}"

可达的外链保持不动，不可达的自动下载到
```
assets/
```
并替换为 Obsidian wikilink
如有本地化操作，frontmatter
```
image_source
```
自动更新为
```
mixed
```

After saving the notes, run the image accessibility check script to automatically download inaccessible external link images to local:

bash

python3 ../daily-papers/download_note_images.py "{full note path}"

Accessible external links remain unchanged; inaccessible ones are automatically downloaded to
```
assets/
```
and replaced with Obsidian wikilinks
If localization is performed, the frontmatter
```
image_source
```
is automatically updated to
```
mixed
```

公式格式

Formula Format

每个公式必须包含：名称（

[[概念|名称]]

）、LaTeX

$$

块（前后留空行）、含义、符号列表。

$$

块前后必须有空行否则 Obsidian 不渲染。超长公式用

aligned

拆分。

Each formula must include: name (

[[Concept|Name]]

), LaTeX

$$

block (with blank lines before and after), meaning, and symbol list. There must be blank lines before and after the

$$

block, otherwise Obsidian will not render it. Split extra-long formulas using

aligned

.

4. Obsidian 保存

4. Obsidian Saving

文件命名

File Naming

只用方法名/模型名：

{方法名}.md

（如

Pi05.md

，不加年份前缀）。方法名判断：标题冒号前 / Abstract 中 "We propose XXX" / 希腊字母转 ASCII。不确定时保存到

_待整理/

。

Use only the method/model name:

{method_name}.md

(e.g.,

Pi05.md

, no year prefix). Method name judgment: before the colon in the title / "We propose XXX" in the Abstract / convert Greek letters to ASCII. Save to

_To_Organize/

if unsure.

保存路径

Save Path

按 Zotero 分类层级：

{NOTES_PATH}/{zotero_collection_path}/{方法名}.md

Follow Zotero category hierarchy:

{NOTES_PATH}/{zotero_collection_path}/{method_name}.md

YAML frontmatter

YAML Frontmatter

yaml

---
title: "论文标题"
method_name: "MethodName"
authors: [Author1, Author2]
year: 2025
venue: arXiv
tags: [tag1, tag2]  # 小写连字符，3-8 个
zotero_collection: 3-Robotics/1-VLX/VLA
image_source: online
created: YYYY-MM-DD
---

Tags 判断：看 Related Work 小标题 + Abstract 关键词。第一个 tag 是最核心主题。

yaml

---
title: "Paper Title"
method_name: "MethodName"
authors: [Author1, Author2]
year: 2025
venue: arXiv
tags: [tag1, tag2]  # lowercase hyphenated, 3-8 tags
zotero_collection: 3-Robotics/1-VLX/VLA
image_source: online
created: YYYY-MM-DD
---

Tags judgment: check Related Work subheadings + Abstract keywords. The first tag is the core theme.

保存后自动执行

Automatic Execution After Saving

只有在

AUTO_REFRESH_INDEXES=true

时才刷新目录页：

bash

python3 ../_shared/generate_concept_mocs.py
python3 ../_shared/generate_paper_mocs.py

只有在

GIT_COMMIT_ENABLED=true

时才做 git：

先确认
```
VAULT_PATH/.git
```
存在

git add {新增文件} {paper_notes_folder}/

后必须真的有 staged changes

满足条件后再执行：

bash

cd {VAULT_PATH} && git add {新增文件} {paper_notes_folder}/ && git commit -m "add paper note: {方法名}"

只有在
```
GIT_PUSH_ENABLED=true
```
且仓库已配置远端时才 push

Refresh the directory pages only if

AUTO_REFRESH_INDEXES=true

:

bash

python3 ../_shared/generate_concept_mocs.py
python3 ../_shared/generate_paper_mocs.py

Perform git operations only if

GIT_COMMIT_ENABLED=true

:

First confirm that
```
VAULT_PATH/.git
```
exists
Ensure there are actual staged changes after
```
git add {new file} {paper_notes_folder}/
```
Execute the following only if conditions are met:

bash

cd {VAULT_PATH} && git add {new file} {paper_notes_folder}/ && git commit -m "add paper note: {method_name}"

Push only if
```
GIT_PUSH_ENABLED=true
```
and the remote repository is configured

5. 概念库维护（每篇论文必做）

5. Concept Library Maintenance (Required for Each Paper)

概念库位置：

{CONCEPTS_PATH}

Concept library location:

{CONCEPTS_PATH}

流程

Process

扫描论文笔记中所有
```
[[概念]]
```
链接
检查每个链接对应的概念笔记是否存在（
```
ls
```
+
```
find
```
）
创建不存在的概念（不可跳过），自动归类到对应子目录

分类规则和模板见
references/concept-categories.md

Scan all
```
[[Concept]]
```
links in the paper notes
Check if the corresponding concept notes exist for each link (using
```
ls
```
+
```
find
```
)
Create non-existent concepts (cannot be skipped), and automatically classify them into corresponding subdirectories

Classification rules and templates can be found in
references/concept-categories.md

自检

Self-Check

笔记中所有
```
[[概念]]
```
链接的概念笔记都存在？
概念笔记包含本论文作为"代表工作"？

Do all
```
[[Concept]]
```
links in the notes have corresponding concept notes?
Do the concept notes include this paper as a "representative work"?

6. 完成后自检（合并 checklist）

6. Post-Completion Self-Check (Combined Checklist)

所有 Figure 都在笔记中（数量与论文一致）？
所有公式都在笔记中（变量一致、无冲突）？
所有 Table 完整保留（所有行列）？
正文中技术术语有
```
[[概念]]
```
内联链接？
概念库已更新（缺失的概念已创建）？
图片可用（外链可加载 / 本地 >10KB）？

Are all Figures included in the notes (quantity matches the paper)?
Are all formulas included in the notes (variables consistent, no conflicts)?
Are all Tables completely retained (all rows and columns)?
Are technical terms in the text linked with
```
[[Concept]]
```
inline links?
Has the concept library been updated (missing concepts have been created)?
Are images usable (external links loadable / local files >10KB)?

7. 交互式功能

7. Interactive Features

完成解析后询问：深入解释？对比其他论文？保存到 Obsidian？保存后自动创建缺失概念笔记，报告新增概念数量。

After completing the analysis, ask the user: Would you like an in-depth explanation? Compare with other papers? Save to Obsidian? After saving, automatically create missing concept notes and report the number of newly added concepts.

8. 批量处理

8. Batch Processing

支持 Zotero 分类批量处理（默认递归子分类）。流程：递归获取论文 → 去重 → 跳过已有笔记 → 依次处理 → 汇总。

Supports batch processing of Zotero categories (recursive subcategories by default). Process: recursively retrieve papers → deduplicate → skip existing notes → process sequentially → summarize.

参考文件（按需查阅）

Reference Files (Access As Needed)

references/zotero-guide.md
— Zotero 查询、分类、PDF 路径获取、智能分类判断
references/image-troubleshooting.md
— ar5iv 图片编号对应、PDF 提取备选
references/concept-categories.md
— 概念自动归类的 16 个子目录规则 + 模板
references/quality-standards.md
— 公式/图片/表格的详细质量规范 + 自检清单

references/zotero-guide.md
— Zotero query, classification, PDF path retrieval, intelligent category judgment
references/image-troubleshooting.md
— ar5iv image number correspondence, PDF extraction alternatives
references/concept-categories.md
— 16 subdirectory rules + templates for automatic concept classification
references/quality-standards.md
— Detailed quality specifications + self-checklists for formulas/images/tables