paper-to-skill

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Paper-to-Skill Pipeline

Paper-to-Skill 流水线

Transform research papers into production-grade skill packages. The pipeline extracts the actionable methodology from a paper, structures it as a skill specification, and feeds it through co-evolutionary refinement to produce a validated package.

This closes the loop between research and practice: a paper published today can become an executable skill tomorrow, without manual authoring.

将研究论文转换为生产级别的Skill包。该流水线从论文中提取可落地的方法论，将其构建为Skill规范，并通过协同进化优化流程生成经过验证的Skill包。

这打通了研究与实践之间的闭环：今天发表的论文，明天就能成为可执行的Skill，无需手动编写。

Reference Files

参考文件

File	Contents	Load When
`references/extraction-patterns.md`	Patterns for extracting methodology from papers	Always

文件路径	内容描述	加载时机
`references/extraction-patterns.md`	用于从论文中提取方法论的模式规则	始终加载

Prerequisites

前置条件

The
```
to-markdown
```
skill (for PDF/document conversion)
The
```
research-critique
```
skill (for paper analysis)
The
```
test-engineer
```
agent (for co-evolutionary skill generation)

```
to-markdown
```
Skill（用于PDF/文档转换）
```
research-critique
```
Skill（用于论文分析）
```
test-engineer
```
Agent（用于协同进化式Skill生成）

Workflow

工作流

Phase 1: Paper Intake

阶段1：论文接收

Accept the paper in any supported format:

Input Format	Action
arXiv ID (e.g., 2604.01687)	Fetch via `https://arxiv.org/abs/<id>` , convert PDF
arXiv URL	Extract ID, fetch and convert
PDF file path	Convert using `to-markdown` skill
URL to paper	Fetch via `WebFetch` , convert if PDF
Pasted text	Use directly

For PDF conversion, invoke the

to-markdown

skill:

Convert this PDF to clean markdown, preserving section structure, tables, equations, and algorithm pseudocode. Drop references section but keep inline citations.

支持以下任意格式的论文输入：

输入格式	处理操作
arXiv ID（例如：2604.01687）	通过 `https://arxiv.org/abs/<id>` 获取论文，转换为PDF
arXiv URL	提取ID，获取并转换论文
PDF文件路径	使用 `to-markdown` Skill转换
论文URL	通过 `WebFetch` 获取，若为PDF则进行转换
粘贴的文本	直接使用

对于PDF转换，调用

to-markdown

Skill：

将此PDF转换为清晰的Markdown格式，保留章节结构、表格、公式和算法伪代码。移除参考文献部分，但保留文中引用。

Phase 2: Critical Analysis

阶段2：批判性分析

Invoke the

research-critique

skill on the converted paper:

Analyze this paper focusing on:

Core contribution: what is the novel methodology?

Algorithm description: extract the step-by-step procedure

Input/output specification: what goes in, what comes out?

Key parameters and their valid ranges

Claimed results and the evidence supporting them

Failure modes and limitations acknowledged by the authors

Prerequisites and dependencies (tools, data, compute)

The critique output becomes the foundation for the skill specification.

对转换后的论文调用

research-critique

Skill：

分析此论文，重点关注以下内容：

核心贡献：创新方法论是什么？

算法描述：提取分步流程

输入/输出规范：输入是什么，输出是什么？

关键参数及其有效范围

宣称的结果及支撑证据

作者认可的失效模式与局限性

前置条件与依赖项（工具、数据、算力）

分析输出将作为Skill规范的基础。

Phase 3: Skill Specification Extraction

阶段3：Skill规范提取

From the critique output, build a structured skill specification:

yaml

specification:
  name: <kebab-case derived from paper's methodology name>
  domain: <paper's application domain>
  source_paper:
    title: <paper title>
    arxiv_id: <if available>
    url: <paper URL>
    authors: <first author et al.>
    date: <publication date>
  
  capabilities:
    - <capability 1 derived from the methodology>
    - <capability 2>
    - <capability 3>
  
  input_format: <what the skill accepts>
  output_format: <what the skill produces>
  
  algorithm_steps:
    - step: 1
      description: <from paper's algorithm>
      parameters: [<key params with ranges>]
    - step: 2
      description: <next step>
  
  failure_modes:
    - <from paper's limitations section>
  
  example_tasks:
    - <task 1 the methodology would solve>
    - <task 2>
    - <task 3>

Extraction rules:

Prefer the paper's own algorithm pseudocode over prose descriptions
Include parameter ranges from the paper's experiments (e.g., "learning rate: 0.001-0.01")
Map the paper's terminology to armory conventions (e.g., "module" → "skill", "pipeline" → "workflow")
If the paper describes multiple variants, extract the best-performing one

See

references/extraction-patterns.md

for patterns specific to common paper types.

从分析输出中构建结构化的Skill规范：

yaml

specification:
  name: <由论文方法论名称派生的短横线命名格式>
  domain: <论文的应用领域>
  source_paper:
    title: <论文标题>
    arxiv_id: <若有则填写>
    url: <论文URL>
    authors: <第一作者等>
    date: <发表日期>
  
  capabilities:
    - <从方法论派生的能力1>
    - <能力2>
    - <能力3>
  
  input_format: <Skill接受的输入格式>
  output_format: <Skill生成的输出格式>
  
  algorithm_steps:
    - step: 1
      description: <来自论文的算法步骤>
      parameters: [<带范围的关键参数>]
    - step: 2
      description: <下一步骤>
  
  failure_modes:
    - <来自论文的局限性部分>
  
  example_tasks:
    - <该方法论可解决的任务1>
    - <任务2>
    - <任务3>

提取规则：

优先使用论文自身的算法伪代码而非文字描述
包含论文实验中给出的参数范围（例如："learning rate: 0.001-0.01"）
将论文术语映射为Armory规范（例如："module" → "skill"，"pipeline" → "workflow"）
若论文描述了多种变体，提取性能最优的一种

有关常见论文类型的特定提取模式，请参考

references/extraction-patterns.md

。

Phase 4: Skill Generation

阶段4：Skill生成

Hand off the specification to the

test-engineer

agent for co-evolutionary generation:

Evolve a skill for: [specification.domain]

Capabilities: [specification.capabilities] Algorithm: [specification.algorithm_steps] Input: [specification.input_format] Output: [specification.output_format] Failure modes: [specification.failure_modes] Example tasks: [specification.example_tasks]

Source: [specification.source_paper.title] ([specification.source_paper.url])

The test-engineer runs its full co-evolutionary loop (generate → verify → oracle → refine) using the specification as the task description.

将规范交给

test-engineer

Agent进行协同进化式生成：

为以下领域生成Skill：[specification.domain]

能力：[specification.capabilities] 算法：[specification.algorithm_steps] 输入：[specification.input_format] 输出：[specification.output_format] 失效模式：[specification.failure_modes] 示例任务：[specification.example_tasks]

来源：[specification.source_paper.title] ([specification.source_paper.url])

Test-engineer将以规范作为任务描述，运行完整的协同进化循环（生成→验证→基准测试→优化）。

Phase 5: Attribution and Finalization

阶段5：归因与最终定稿

Ensure the generated skill properly attributes the source paper:

Frontmatter: Add
```
source: <paper_url>
```
to the metadata

Body: Include an attribution section at the end of SKILL.md:

markdown

## Attribution

This skill implements the methodology from:
> <paper title>
> <authors>
> <venue/arxiv, date>
> <URL>

References: If the paper has supplementary materials (code, datasets), create a source materials reference file in the generated skill's
```
references/
```
directory linking to them
Verify the skill name does not conflict with existing packages in
```
manifest.yaml
```

确保生成的Skill正确标注来源论文：

前置元数据： 在元数据中添加
```
source: <paper_url>
```

正文： 在SKILL.md末尾添加归因部分：

markdown

## 归因

本Skill实现了以下论文中的方法论：
> <论文标题>
> <作者>
> <会议/arXiv，日期>
> <URL>

参考文献： 若论文有补充材料（代码、数据集），在生成的Skill的
```
references/
```
目录下创建源材料参考文件并链接至这些资源
验证Skill名称与
```
manifest.yaml
```
中的现有包无冲突

Output

输出结果

The complete skill package at

skills/<name>/

```
SKILL.md
```
with attribution and paper-derived workflow
```
evals/cases.yaml
```
with assertions generated by the co-evolutionary loop
```
references/
```
with extraction patterns and source materials
```
evals/evolution-log.yaml
```
from the test-engineer's refinement process

完整的Skill包将生成在

skills/<name>/

路径下：

包含归因和论文派生工作流的
```
SKILL.md
```
由协同进化循环生成的断言文件
```
evals/cases.yaml
```
包含提取模式和源材料的
```
references/
```
目录
来自test-engineer优化过程的
```
evals/evolution-log.yaml
```

Error Handling

错误处理

Error	Resolution
Paper has no clear algorithm	Extract the methodology from the experiments section
Paper is purely theoretical	Report: no actionable methodology; suggest literature-review instead
PDF conversion fails	Try alternative: fetch HTML version or request user paste text
Paper methodology requires data/compute	Note in skill's prerequisites; skill may be a workflow template only
test-engineer budget exhausted	Return best-scoring iteration with manual review warning

错误类型	解决方法
论文无明确算法	从实验部分提取方法论
纯理论性论文	提示：无可落地方法论；建议使用文献综述工具替代
PDF转换失败	尝试替代方案：获取HTML版本或请求用户粘贴文本
论文方法论需要特定数据/算力	在Skill的前置条件中注明；Skill可能仅为工作流模板
test-engineer算力耗尽	返回得分最高的迭代版本，并附上人工审核警告

Limitations

局限性

Cannot extract visual methodologies (circuit diagrams, neural architecture figures) — works on textual algorithm descriptions only
Papers with multiple interdependent contributions may produce overly complex skills — consider splitting into multiple skills
Non-English papers require translation before processing
The generated skill's quality depends on the paper's clarity of methodology description

无法提取可视化方法论（电路图、神经架构图）——仅适用于文本形式的算法描述
包含多个相互依赖贡献的论文可能生成过于复杂的Skill——考虑拆分为多个Skill
非英文论文需先翻译再处理
生成Skill的质量取决于论文中方法论描述的清晰度