paper-reader
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinese开始前: 先跟用户打个招呼 🐕
Before Starting: Greet the user first 🐕
学术论文阅读助手 (Paper Reader)
Academic Paper Reading Assistant (Paper Reader)
专注 CV/DL 领域,支持 Zotero 集成和 Obsidian 笔记保存。
Focused on the CV/DL field, with support for Zotero integration and Obsidian note saving.
Step 0: 读取共享配置
Step 0: Read Shared Configuration
先读取 ,如果 存在,再用它覆盖默认值。
../_shared/user-config.json../_shared/user-config.local.json显式生成并在后续统一使用这些变量:
VAULT_PATHNOTES_PATHCONCEPTS_PATHZOTERO_DBZOTERO_STORAGEAUTO_REFRESH_INDEXESGIT_COMMIT_ENABLEDGIT_PUSH_ENABLED
其中:
NOTES_PATH = {VAULT_PATH}/{paper_notes_folder}CONCEPTS_PATH = {NOTES_PATH}/{concepts_folder}- 只有在
GIT_PUSH_ENABLED时才可能为真GIT_COMMIT_ENABLED=true
后续统一使用上面的变量。
First read , then override default values with if it exists.
../_shared/user-config.json../_shared/user-config.local.jsonExplicitly generate and uniformly use these variables in subsequent steps:
VAULT_PATHNOTES_PATHCONCEPTS_PATHZOTERO_DBZOTERO_STORAGEAUTO_REFRESH_INDEXESGIT_COMMIT_ENABLEDGIT_PUSH_ENABLED
Where:
NOTES_PATH = {VAULT_PATH}/{paper_notes_folder}CONCEPTS_PATH = {NOTES_PATH}/{concepts_folder}- can only be true if
GIT_PUSH_ENABLEDGIT_COMMIT_ENABLED=true
Use the above variables uniformly in subsequent steps.
1. 接收论文
1. Receive Paper
| 输入方式 | 示例 | 处理方法 |
|---|---|---|
| PDF 路径 | | 直接 Read |
| arXiv 链接 | | WebFetch |
| Zotero 分类 | "VLA 分类的论文" | 查询数据库 → 列出 → 用户选择 |
| Zotero 搜索 | "Zotero 里的 π0.5" | 搜索标题 → 找到 PDF |
| 无 PDF | Zotero 条目无附件 | 从网上获取(见下方) |
| Input Method | Example | Processing Method |
|---|---|---|
| PDF Path | | Direct Read |
| arXiv Link | | WebFetch |
| Zotero Category | "papers in VLA category" | Query Database → List → User Selection |
| Zotero Search | "π0.5 in Zotero" | Search Title → Locate PDF |
| No PDF | Zotero entry has no attachments | Fetch from the web (see below) |
无 PDF 时的获取流程
Fetch Process When No PDF is Available
- 获取论文信息
python3 assets/zotero_helper.py info {item_id} - 按优先级获取:arXiv HTML > arXiv PDF > DOI > WebSearch 标题
- 判断 arXiv ID:从 URL / Zotero extra 字段 / 标题搜索
- 推荐直接 WebFetch ,无需下载
https://arxiv.org/html/{arxiv_id} - 跳过条件:既无 PDF 也无在线来源 / 非论文内容
Zotero 详细操作见references/zotero-guide.md
- Run to get paper information
python3 assets/zotero_helper.py info {item_id} - Fetch in priority order: arXiv HTML > arXiv PDF > DOI > WebSearch by title
- Identify arXiv ID: from URL / Zotero extra field / title search
- Recommend directly using WebFetch on without downloading
https://arxiv.org/html/{arxiv_id} - Skip conditions: neither PDF nor online source available / non-paper content
For detailed Zotero operations, seereferences/zotero-guide.md
2. 阅读模式
2. Reading Modes
| 模式 | 触发词 | 输出 |
|---|---|---|
| 快速摘要 | "快速看一下"、"quick" | 3-5 句核心贡献 |
| 完整解析 | "详细分析"、默认 | 结构化笔记(用模板) |
| 批判分析 | "批判性分析"、"critique" | 方法论优缺点评估 |
| 知识提取 | "提取公式"、"技术细节" | 公式 + 算法伪代码 |
| Mode | Trigger Phrases | Output |
|---|---|---|
| Quick Summary | "quickly look at", "quick" | 3-5 sentences of core contributions |
| Full Analysis | "detailed analysis", default | Structured notes (using template) |
| Critical Analysis | "critically analyze", "critique" | Evaluation of methodological strengths and weaknesses |
| Knowledge Extraction | "extract formulas", "technical details" | Formulas + algorithm pseudocode |
3. 笔记生成
3. Note Generation
模板: 严格遵循 ,不可自行简化。
assets/paper-note-template.mdTemplate: Strictly follow , no self-simplification allowed.
assets/paper-note-template.md核心质量规则
Core Quality Rules
- 零遗漏: 论文中所有 Figure、所有公式、所有 Table 必须全部出现在笔记中
- 内联概念链接: 正文中首次出现的技术术语必须用 链接,不仅仅是结尾
[[概念]] - 严禁 ASCII 流程图: 用结构化 Markdown 列表 + 描述架构
$数学符号$ - 公式完整性: 每个公式必须有名称()、LaTeX 公式、含义、符号说明
[[概念|名称]] - 图片外链优先: arXiv HTML / 项目主页 / GitHub,找不到再本地下载
公式/图片/表格的详细质量规范见references/quality-standards.md
- No Omissions: All Figures, formulas, and Tables in the paper must be included in the notes
- Inline Concept Links: Technical terms appearing for the first time in the text must be linked using , not just at the end
[[Concept]] - No ASCII Flowcharts: Describe architectures using structured Markdown lists +
$mathematical symbols$ - Formula Completeness: Each formula must have a name (), LaTeX formula, meaning, and symbol explanation
[[Concept|Name]] - Priority to Image External Links: arXiv HTML / project homepage / GitHub; download locally only if not found
Detailed quality specifications for formulas/images/tables can be found inreferences/quality-standards.md
图片获取流程(多源 fallback)
Image Fetch Process (Multi-Source Fallback)
目标: 确保笔记中包含论文的所有 Figure,先统计论文 Figure 总数再逐一获取。
- WebSearch 获取 arXiv ID
"{论文标题} arxiv" - 来源 A — arXiv HTML(首选):
- WebFetch 提取所有
https://arxiv.org/html/{arxiv_id}的标题与 img src URL<figure> - 统计论文 Figure 总数,确认提取数量是否完整
- WebFetch
- 来源 B — 项目主页(HTML 404 或图片不全时):
- 从摘要/HTML 中查找项目主页 URL(常见模式:、
project page、github.io)our website - WebFetch 项目主页,提取展示图片(通常包含 teaser / demo 图)
- 从摘要/HTML 中查找项目主页 URL(常见模式:
- 来源 C — PDF 提取(前两者都失败时):
- 从 PDF 中提取,筛选 >10KB 的有效图片
pdfimages -png
- 笔记中用 外链嵌入
 - 验证:外链可加载 / 本地文件 >10KB
- URL 去重:写入前检查 URL 中是否有重复的 arxiv_id 路径段(如 ),有则删除重复段。详见
2603.05312v1/2603.05312v1/references/image-troubleshooting.md
ar5iv 编号不一定对应 Figure 编号,排错见references/image-troubleshooting.md
Goal: Ensure the notes include all Figures from the paper. First count the total number of Figures in the paper, then fetch them one by one.
- WebSearch to get the arXiv ID
"{paper title} arxiv" - Source A — arXiv HTML (preferred):
- WebFetch to extract titles and img src URLs of all
https://arxiv.org/html/{arxiv_id}elements<figure> - Count the total number of Figures in the paper and confirm if the extracted quantity is complete
- WebFetch
- Source B — Project Homepage (when HTML returns 404 or images are incomplete):
- Find the project homepage URL from the abstract/HTML (common patterns: ,
project page,github.io)our website - WebFetch the project homepage and extract displayed images (usually includes teaser/demo images)
- Find the project homepage URL from the abstract/HTML (common patterns:
- Source C — PDF Extraction (when the first two sources fail):
- Use to extract from PDF, filter valid images larger than 10KB
pdfimages -png
- Use
- Embed in notes using external links
 - Verification: External links are loadable / local files are >10KB
- URL Deduplication: Check for duplicate arxiv_id path segments in the URL (e.g., ) before writing; remove duplicates if present. See
2603.05312v1/2603.05312v1/for details.references/image-troubleshooting.md
ar5iv numbers do not necessarily correspond to Figure numbers; seefor troubleshootingreferences/image-troubleshooting.md
图片可靠性保障(生成后自动执行)
Image Reliability Guarantee (Automatically Executed After Generation)
笔记保存后,运行图片可达性检查脚本,自动将不可访问的外链图片下载到本地:
bash
python3 ../daily-papers/download_note_images.py "{笔记完整路径}"- 可达的外链保持不动,不可达的自动下载到 并替换为 Obsidian wikilink
assets/ - 如有本地化操作,frontmatter 自动更新为
image_sourcemixed
After saving the notes, run the image accessibility check script to automatically download inaccessible external link images to local:
bash
python3 ../daily-papers/download_note_images.py "{full note path}"- Accessible external links remain unchanged; inaccessible ones are automatically downloaded to and replaced with Obsidian wikilinks
assets/ - If localization is performed, the frontmatter is automatically updated to
image_sourcemixed
公式格式
Formula Format
每个公式必须包含:名称()、LaTeX 块(前后留空行)、含义、符号列表。
块前后必须有空行否则 Obsidian 不渲染。超长公式用 拆分。
[[概念|名称]]$$$$alignedEach formula must include: name (), LaTeX block (with blank lines before and after), meaning, and symbol list.
There must be blank lines before and after the block, otherwise Obsidian will not render it. Split extra-long formulas using .
[[Concept|Name]]$$$$aligned4. Obsidian 保存
4. Obsidian Saving
文件命名
File Naming
只用方法名/模型名:(如 ,不加年份前缀)。
方法名判断:标题冒号前 / Abstract 中 "We propose XXX" / 希腊字母转 ASCII。
不确定时保存到 。
{方法名}.mdPi05.md_待整理/Use only the method/model name: (e.g., , no year prefix).
Method name judgment: before the colon in the title / "We propose XXX" in the Abstract / convert Greek letters to ASCII.
Save to if unsure.
{method_name}.mdPi05.md_To_Organize/保存路径
Save Path
按 Zotero 分类层级:
{NOTES_PATH}/{zotero_collection_path}/{方法名}.mdFollow Zotero category hierarchy:
{NOTES_PATH}/{zotero_collection_path}/{method_name}.mdYAML frontmatter
YAML Frontmatter
yaml
---
title: "论文标题"
method_name: "MethodName"
authors: [Author1, Author2]
year: 2025
venue: arXiv
tags: [tag1, tag2] # 小写连字符,3-8 个
zotero_collection: 3-Robotics/1-VLX/VLA
image_source: online
created: YYYY-MM-DD
---Tags 判断:看 Related Work 小标题 + Abstract 关键词。第一个 tag 是最核心主题。
yaml
---
title: "Paper Title"
method_name: "MethodName"
authors: [Author1, Author2]
year: 2025
venue: arXiv
tags: [tag1, tag2] # lowercase hyphenated, 3-8 tags
zotero_collection: 3-Robotics/1-VLX/VLA
image_source: online
created: YYYY-MM-DD
---Tags judgment: check Related Work subheadings + Abstract keywords. The first tag is the core theme.
保存后自动执行
Automatic Execution After Saving
- 只有在 时才刷新目录页:
AUTO_REFRESH_INDEXES=truebashpython3 ../_shared/generate_concept_mocs.py python3 ../_shared/generate_paper_mocs.py - 只有在 时才做 git:
GIT_COMMIT_ENABLED=true- 先确认 存在
VAULT_PATH/.git - 后必须真的有 staged changes
git add {新增文件} {paper_notes_folder}/ - 满足条件后再执行:
bashcd {VAULT_PATH} && git add {新增文件} {paper_notes_folder}/ && git commit -m "add paper note: {方法名}"- 只有在 且仓库已配置远端时才 push
GIT_PUSH_ENABLED=true
- 先确认
- Refresh the directory pages only if :
AUTO_REFRESH_INDEXES=truebashpython3 ../_shared/generate_concept_mocs.py python3 ../_shared/generate_paper_mocs.py - Perform git operations only if :
GIT_COMMIT_ENABLED=true- First confirm that exists
VAULT_PATH/.git - Ensure there are actual staged changes after
git add {new file} {paper_notes_folder}/ - Execute the following only if conditions are met:
bashcd {VAULT_PATH} && git add {new file} {paper_notes_folder}/ && git commit -m "add paper note: {method_name}"- Push only if and the remote repository is configured
GIT_PUSH_ENABLED=true
- First confirm that
5. 概念库维护(每篇论文必做)
5. Concept Library Maintenance (Required for Each Paper)
概念库位置:
{CONCEPTS_PATH}Concept library location:
{CONCEPTS_PATH}流程
Process
- 扫描论文笔记中所有 链接
[[概念]] - 检查每个链接对应的概念笔记是否存在(+
ls)find - 创建不存在的概念(不可跳过),自动归类到对应子目录
分类规则和模板见references/concept-categories.md
- Scan all links in the paper notes
[[Concept]] - Check if the corresponding concept notes exist for each link (using +
ls)find - Create non-existent concepts (cannot be skipped), and automatically classify them into corresponding subdirectories
Classification rules and templates can be found inreferences/concept-categories.md
自检
Self-Check
- 笔记中所有 链接的概念笔记都存在?
[[概念]] - 概念笔记包含本论文作为"代表工作"?
- Do all links in the notes have corresponding concept notes?
[[Concept]] - Do the concept notes include this paper as a "representative work"?
6. 完成后自检(合并 checklist)
6. Post-Completion Self-Check (Combined Checklist)
- 所有 Figure 都在笔记中(数量与论文一致)?
- 所有公式都在笔记中(变量一致、无冲突)?
- 所有 Table 完整保留(所有行列)?
- 正文中技术术语有 内联链接?
[[概念]] - 概念库已更新(缺失的概念已创建)?
- 图片可用(外链可加载 / 本地 >10KB)?
- Are all Figures included in the notes (quantity matches the paper)?
- Are all formulas included in the notes (variables consistent, no conflicts)?
- Are all Tables completely retained (all rows and columns)?
- Are technical terms in the text linked with inline links?
[[Concept]] - Has the concept library been updated (missing concepts have been created)?
- Are images usable (external links loadable / local files >10KB)?
7. 交互式功能
7. Interactive Features
完成解析后询问:深入解释?对比其他论文?保存到 Obsidian?
保存后自动创建缺失概念笔记,报告新增概念数量。
After completing the analysis, ask the user: Would you like an in-depth explanation? Compare with other papers? Save to Obsidian?
After saving, automatically create missing concept notes and report the number of newly added concepts.
8. 批量处理
8. Batch Processing
支持 Zotero 分类批量处理(默认递归子分类)。流程:递归获取论文 → 去重 → 跳过已有笔记 → 依次处理 → 汇总。
Supports batch processing of Zotero categories (recursive subcategories by default). Process: recursively retrieve papers → deduplicate → skip existing notes → process sequentially → summarize.
参考文件(按需查阅)
Reference Files (Access As Needed)
- — Zotero 查询、分类、PDF 路径获取、智能分类判断
references/zotero-guide.md - — ar5iv 图片编号对应、PDF 提取备选
references/image-troubleshooting.md - — 概念自动归类的 16 个子目录规则 + 模板
references/concept-categories.md - — 公式/图片/表格的详细质量规范 + 自检清单
references/quality-standards.md
- — Zotero query, classification, PDF path retrieval, intelligent category judgment
references/zotero-guide.md - — ar5iv image number correspondence, PDF extraction alternatives
references/image-troubleshooting.md - — 16 subdirectory rules + templates for automatic concept classification
references/concept-categories.md - — Detailed quality specifications + self-checklists for formulas/images/tables
references/quality-standards.md