nature-reader

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Full-Paper Markdown Reader

全论文Markdown阅读文档

Use this skill to turn a research paper into a complete Markdown reading artifact.
The default output should read like a paper companion, not a summary dump:
  • keep the full prose, paragraph structure, and section flow
  • show original text and Chinese translation together
  • keep figures and tables close to the discussion that introduces them
  • preserve stable page and block anchors for traceability
  • write a complete
    paper.md
    by default
This skill is for papers, preprints, and conference proceedings across disciplines. It is not limited to Nature-family journals.
使用本技能将研究论文转换为完整的Markdown阅读文档。
默认输出应类似论文配套阅读材料,而非摘要堆砌:
  • 保留完整文本、段落结构和章节逻辑
  • 同时展示原文与中文译文
  • 将图表和表格放置在引入它们的相关论述附近
  • 保留稳定的页面和内容块锚点以实现可追溯性
  • 默认生成完整的
    paper.md
    文件
本技能适用于各学科的论文、预印本和会议论文集,不限于自然系列期刊。

When to use

使用场景

Use this skill when the user wants any of the following:
  • translate an entire paper into a complete Markdown document
  • make a paper easier to read without losing the original wording
  • generate a full-paper reading file with original/translation alignment
  • keep figures or tables visually close to the claims they support
  • preserve exact source locations for every substantive block
  • build a source-grounded markdown artifact rather than a slide deck or short summary
If the user only wants a summary, use a summarization skill instead. If the user only wants citation search, use a citation skill instead.
当用户有以下需求时使用本技能:
  • 将整篇论文翻译为完整的Markdown文档
  • 在不丢失原文表述的前提下让论文更易阅读
  • 生成原文与译文对齐的全论文阅读文件
  • 将图表或表格视觉上放置在其支持的论点附近
  • 为每个关键内容块保留精确的来源位置
  • 构建基于来源的Markdown文档,而非幻灯片或简短摘要
如果用户仅需要摘要,请使用摘要生成技能;如果用户仅需要引用检索,请使用引用处理技能。

Core principle

核心原则

Translate for meaning, not for style. Preserve the paper's structure, evidence, hedging, terminology, equations, units, and citation markers. Keep the output in prose paragraphs unless the source itself is tabular or list-like. Do not collapse the paper into keyword bullets or slide-style notes.
The reading file should help a reader move between:
  • original text
  • translated text
  • source location
  • figure or table evidence
翻译注重语义而非风格。保留论文的结构、证据、措辞的严谨性、专业术语、公式、单位和引用标记。除非原文本身是表格或列表形式,否则输出保持散文段落形式。请勿将论文压缩为关键词要点或幻灯片式笔记。
阅读文档应帮助读者在以下内容间切换:
  • 原文文本
  • 译文文本
  • 来源位置
  • 图表或表格证据

Workflow

工作流程

1. Identify the source and paper type

1. 识别来源与论文类型

Determine whether the source is:
  • selectable-text PDF
  • scanned PDF
  • publisher HTML
  • DOI or arXiv link
  • pasted text or notes
Then identify the paper type at a high level:
  • discovery or mechanism paper
  • methods or algorithm paper
  • resource or dataset paper
  • conference paper
  • review or perspective
This helps decide how tightly to couple text, figures, and captions.
确定来源类型:
  • 可选择文本的PDF
  • 扫描版PDF
  • 出版商HTML
  • DOI或arXiv链接
  • 粘贴的文本或笔记
然后从宏观层面识别论文类型:
  • 发现或机制研究论文
  • 方法或算法研究论文
  • 资源或数据集研究论文
  • 会议论文
  • 综述或观点论文
这有助于确定文本、图表和标题的耦合紧密程度。

2. Build a full-document source map before translating

2. 翻译前构建全文档来源映射

If the user provides a full paper, process the entire document. Do not stop at the abstract, introduction, or a few representative pages unless the user explicitly asks for a preview.
Create stable IDs for source blocks:
  • S001
    ,
    S002
    , ... for body text
  • C001
    ,
    C002
    , ... for captions
  • F001
    ,
    F002
    , ... for figures
  • T001
    ,
    T002
    , ... for tables
For each block, capture:
  • page number
  • block type
  • original text
  • translation
  • nearby figure or table references
  • confidence level when extraction is uncertain
Keep the source map stable so later questions can point back to the same IDs. For long papers, add a page index so the reader can jump across the whole document without losing location.
如果用户提供完整论文,请处理整个文档。除非用户明确要求预览,否则不要在摘要、引言或几页代表性内容处停止处理。
为来源内容块创建稳定ID:
  • S001
    ,
    S002
    , ... for body text
  • C001
    ,
    C002
    , ... for captions
  • F001
    ,
    F002
    , ... for figures
  • T001
    ,
    T002
    , ... for tables
为每个内容块记录:
  • 页码
  • 内容块类型
  • 原文文本
  • 译文
  • 附近的图表或表格引用
  • 提取不确定时的置信度
保持来源映射稳定,以便后续问题可以指向相同的ID。对于长篇论文,添加页面索引,让读者可以在整个文档中跳转而不丢失位置。

3. Translate conservatively

3. 保守翻译

Translate each block with these rules:
  • preserve technical terms unless a standard Chinese equivalent is clearly better
  • keep gene names, protein names, formulas, model names, and symbols intact
  • keep citations, superscripts, subscripts, and numeric values unchanged
  • do not collapse methods details into vague prose
  • keep paragraph order and section order unless the user asks for restructuring
  • mark uncertain text instead of guessing when OCR or layout extraction is weak
  • keep the source's paragraph form; do not convert dense prose into bullet-point keywords
If a sentence contains multiple claims, keep the translation readable but do not split away the original evidence chain.
翻译每个内容块时遵循以下规则:
  • 保留专业术语,除非有明确更优的标准中文译法
  • 保留基因名称、蛋白质名称、公式、模型名称和符号不变
  • 保留引用、上标、下标和数值不变
  • 不要将方法细节简化为模糊的文本
  • 保留段落顺序和章节顺序,除非用户要求重新组织
  • 当OCR或布局提取效果不佳时,标记不确定文本而非猜测
  • 保留原文的段落形式;不要将密集文本转换为关键词要点
如果一个句子包含多个论点,保持译文可读性,但不要拆分原文的证据链。

4. Place figures and tables near the relevant discussion

4. 将图表放置在相关论述附近

Do not try to recreate the PDF pixel-for-pixel. Preserve semantic proximity instead.
Default placement rule:
  • show a figure near its first substantive mention in the body text
  • keep the caption attached to the figure
  • if the caption contains critical details, keep caption and figure together
  • if a table is central to the claim, keep it near the paragraph that interprets it
If the paper has a complex multi-column layout, prefer a clean reading layout over exact visual mimicry.
不要尝试逐像素还原PDF布局,而是保留语义上的关联性。
默认放置规则:
  • 在正文首次实质性提及图表的位置附近展示该图表
  • 将标题与图表关联在一起
  • 如果标题包含关键细节,将标题与图表放在一起
  • 如果表格是论点的核心,将其放在解读它的段落附近
如果论文采用复杂的多栏布局,优先选择清晰的阅读布局而非精确的视觉模仿。

4b. Crop figures and tables tightly

4b. 精准裁剪图表

When extracting a figure or table image:
  • crop only the figure or table content area, not the whole page
  • use the smallest rectangle that fully contains the visual object
  • exclude page headers, footers, surrounding prose, and unrelated margins
  • keep the caption separate unless the caption is part of the requested visual crop
  • if the crop box is uncertain, mark it as approximate instead of enlarging it
Precision matters more than convenience here. A slightly smaller but correct crop is better than a wider crop that includes unrelated page content.
提取图表或表格图片时:
  • 仅裁剪图表或表格的内容区域,而非整页
  • 使用完全包含视觉对象的最小矩形
  • 排除页眉、页脚、周围文本和无关边距
  • 单独保留标题,除非标题是请求的视觉裁剪的一部分
  • 如果裁剪范围不确定,标记为近似裁剪而非扩大范围
此处精准性比便利性更重要。略小但准确的裁剪优于包含无关页面内容的大范围裁剪。

5. Generate the Markdown file

5. 生成Markdown文件

Default output is a single full-paper
paper.md
file.
The Markdown should usually include:
  • metadata header
  • page-level sections for long papers
  • body prose in paragraph form
  • figure and table blocks placed near the relevant discussion
  • clickable source anchors on every substantive block
  • short uncertainty notes only when extraction is weak
Do not add an interactive Q&A panel or follow-up widget in the Markdown deliverable. If the user later asks a question, answer it in chat using the source map rather than embedding a conversational panel in the artifact.
If a browser preview is explicitly requested, a companion
reader.html
can be generated as a secondary artifact, but the Markdown file remains the primary output.
默认输出为单个全论文
paper.md
文件。
Markdown通常应包含:
  • 元数据头部
  • 长篇论文的页面级章节
  • 段落形式的正文文本
  • 放置在相关论述附近的图表块
  • 每个关键内容块上的可点击来源锚点
  • 仅在提取效果不佳时添加简短的不确定性说明
不要在Markdown交付物中添加交互式问答面板或后续小部件。如果用户后续提出问题,使用来源映射在对话中回答,而非在文档中嵌入对话面板。
如果明确要求浏览器预览,可以生成配套的
reader.html
作为次要文档,但Markdown文件仍是主要输出。

6. Answer follow-up questions with source grounding

6. 结合来源回答后续问题

When the user asks a question after the file is created:
  • identify the most relevant source blocks first
  • answer from the paper, not from memory
  • cite the exact block IDs and page numbers
  • if the answer depends on a figure or table, cite that too
  • if the paper does not support the claim, say so plainly
Every substantive answer should include a source pointer such as:
  • p.4 S012-S013
  • Fig. 2 caption
  • Table 1
If the answer is a synthesis across several blocks, list all supporting locations.
文件生成后用户提出问题时:
  • 首先确定最相关的来源内容块
  • 依据论文内容回答,而非凭记忆
  • 引用精确的内容块ID和页码
  • 如果答案依赖于图表或表格,也一并引用
  • 如果论文不支持该论点,直接说明
每个关键回答都应包含来源指向,例如:
  • p.4 S012-S013
  • Fig. 2 caption
  • Table 1
如果答案是多个内容块的综合,列出所有支持位置。

Output contract

输出约定

Prefer these outputs:
  • paper.md
    for the full-paper Markdown artifact
  • source_map.json
    for stable source anchors
  • translation_notes.md
    for terminology, uncertainty, and layout notes
  • assets/
    for extracted figures or cropped snippets when needed
  • reader.html
    only when the user explicitly wants a browser preview
Do not hide missing information. If the source is incomplete, label the output as draft mode.
优先输出以下内容:
  • paper.md
    :全论文Markdown文档
  • source_map.json
    :稳定的来源锚点文件
  • translation_notes.md
    :专业术语、不确定性和布局说明文件
  • assets/
    :提取的图表或裁剪片段(如有需要)
  • reader.html
    :仅当用户明确要求浏览器预览时生成
不要隐藏缺失信息。如果来源不完整,将输出标记为草稿模式。

Tooling guidance

工具使用指南

If the input is a PDF, load the
pdf
skill first for extraction and OCR guidance. If the user asks for a richer browser view, use
web-artifacts-builder
or
frontend-design
only as a preview layer on top of the Markdown workflow. If the user wants citation-level grounding to original text, keep the source map explicit and do not lose the page or block IDs. If the user asks for a model backend, treat the provider as configurable and keep the prompt format provider-neutral.
如果输入是PDF,先加载
pdf
技能以获取提取和OCR指导。 如果用户要求更丰富的浏览器视图,仅将
web-artifacts-builder
frontend-design
用作Markdown工作流之上的预览层。 如果用户要求基于引用关联到原文,保持来源映射明确,不要丢失页码或内容块ID。 如果用户要求模型后端,将提供商视为可配置项,并保持提示格式与提供商无关。

Model backends

模型后端

Use official APIs from the provider the user has available. Prefer OpenAI-compatible chat or responses interfaces when they exist, because that keeps the paper reader portable across vendors.
  • DeepSeek
    : official OpenAI-compatible API at
    https://api.deepseek.com
  • GLM / Zhipu
    : official OpenAI-compatible API at
    https://open.bigmodel.cn/api/paas/v4
  • Qwen / DashScope
    : official OpenAI-compatible API at
    https://dashscope.aliyuncs.com/compatible-mode/v1
  • Kimi / Moonshot
    : official OpenAI-compatible API at
    https://api.moonshot.cn/v1
Keep model names provider-specific, but keep the app contract the same:
base_url
,
api_key
,
model
, and chat-completions-style messages.
使用用户可用的提供商官方API。优先选择兼容OpenAI的聊天或响应接口,因为这能让论文阅读器在不同供应商间移植。
  • DeepSeek
    : official OpenAI-compatible API at
    https://api.deepseek.com
  • GLM / Zhipu
    : official OpenAI-compatible API at
    https://open.bigmodel.cn/api/paas/v4
  • Qwen / DashScope
    : official OpenAI-compatible API at
    https://dashscope.aliyuncs.com/compatible-mode/v1
  • Kimi / Moonshot
    : official OpenAI-compatible API at
    https://api.moonshot.cn/v1
保留模型名称的提供商特异性,但保持应用约定一致:
base_url
,
api_key
,
model
和聊天补全风格的消息。

Quality bar

质量标准

Good output feels like a paper reader, not a machine translation dump.
It should let a reader:
  • read the paper in two languages
  • see where a claim came from
  • inspect the nearby figure or table
  • move through a complete Markdown file without losing source traceability
优质输出应像论文阅读工具,而非机器翻译堆砌。
它应能让读者:
  • 以两种语言阅读论文
  • 查看论点的来源
  • 查看附近的图表或表格
  • 在完整的Markdown文件中浏览而不丢失来源追溯性