paper-to-skill

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Paper-to-Skill Pipeline

Paper-to-Skill 流水线

Transform research papers into production-grade skill packages. The pipeline extracts the actionable methodology from a paper, structures it as a skill specification, and feeds it through co-evolutionary refinement to produce a validated package.
This closes the loop between research and practice: a paper published today can become an executable skill tomorrow, without manual authoring.
将研究论文转换为生产级别的Skill包。该流水线从论文中提取可落地的方法论,将其构建为Skill规范,并通过协同进化优化流程生成经过验证的Skill包。
这打通了研究与实践之间的闭环:今天发表的论文,明天就能成为可执行的Skill,无需手动编写。

Reference Files

参考文件

FileContentsLoad When
references/extraction-patterns.md
Patterns for extracting methodology from papersAlways
文件路径内容描述加载时机
references/extraction-patterns.md
用于从论文中提取方法论的模式规则始终加载

Prerequisites

前置条件

  • The
    to-markdown
    skill (for PDF/document conversion)
  • The
    research-critique
    skill (for paper analysis)
  • The
    test-engineer
    agent (for co-evolutionary skill generation)
  • to-markdown
    Skill(用于PDF/文档转换)
  • research-critique
    Skill(用于论文分析)
  • test-engineer
    Agent(用于协同进化式Skill生成)

Workflow

工作流

Phase 1: Paper Intake

阶段1:论文接收

Accept the paper in any supported format:
Input FormatAction
arXiv ID (e.g., 2604.01687)Fetch via
https://arxiv.org/abs/<id>
, convert PDF
arXiv URLExtract ID, fetch and convert
PDF file pathConvert using
to-markdown
skill
URL to paperFetch via
WebFetch
, convert if PDF
Pasted textUse directly
For PDF conversion, invoke the
to-markdown
skill:
Convert this PDF to clean markdown, preserving section structure, tables, equations, and algorithm pseudocode. Drop references section but keep inline citations.
支持以下任意格式的论文输入:
输入格式处理操作
arXiv ID(例如:2604.01687)通过
https://arxiv.org/abs/<id>
获取论文,转换为PDF
arXiv URL提取ID,获取并转换论文
PDF文件路径使用
to-markdown
Skill转换
论文URL通过
WebFetch
获取,若为PDF则进行转换
粘贴的文本直接使用
对于PDF转换,调用
to-markdown
Skill:
将此PDF转换为清晰的Markdown格式,保留章节结构、表格、公式和算法伪代码。移除参考文献部分,但保留文中引用。

Phase 2: Critical Analysis

阶段2:批判性分析

Invoke the
research-critique
skill on the converted paper:
Analyze this paper focusing on:
  1. Core contribution: what is the novel methodology?
  2. Algorithm description: extract the step-by-step procedure
  3. Input/output specification: what goes in, what comes out?
  4. Key parameters and their valid ranges
  5. Claimed results and the evidence supporting them
  6. Failure modes and limitations acknowledged by the authors
  7. Prerequisites and dependencies (tools, data, compute)
The critique output becomes the foundation for the skill specification.
对转换后的论文调用
research-critique
Skill:
分析此论文,重点关注以下内容:
  1. 核心贡献:创新方法论是什么?
  2. 算法描述:提取分步流程
  3. 输入/输出规范:输入是什么,输出是什么?
  4. 关键参数及其有效范围
  5. 宣称的结果及支撑证据
  6. 作者认可的失效模式与局限性
  7. 前置条件与依赖项(工具、数据、算力)
分析输出将作为Skill规范的基础。

Phase 3: Skill Specification Extraction

阶段3:Skill规范提取

From the critique output, build a structured skill specification:
yaml
specification:
  name: <kebab-case derived from paper's methodology name>
  domain: <paper's application domain>
  source_paper:
    title: <paper title>
    arxiv_id: <if available>
    url: <paper URL>
    authors: <first author et al.>
    date: <publication date>
  
  capabilities:
    - <capability 1 derived from the methodology>
    - <capability 2>
    - <capability 3>
  
  input_format: <what the skill accepts>
  output_format: <what the skill produces>
  
  algorithm_steps:
    - step: 1
      description: <from paper's algorithm>
      parameters: [<key params with ranges>]
    - step: 2
      description: <next step>
  
  failure_modes:
    - <from paper's limitations section>
  
  example_tasks:
    - <task 1 the methodology would solve>
    - <task 2>
    - <task 3>
Extraction rules:
  • Prefer the paper's own algorithm pseudocode over prose descriptions
  • Include parameter ranges from the paper's experiments (e.g., "learning rate: 0.001-0.01")
  • Map the paper's terminology to armory conventions (e.g., "module" → "skill", "pipeline" → "workflow")
  • If the paper describes multiple variants, extract the best-performing one
See
references/extraction-patterns.md
for patterns specific to common paper types.
从分析输出中构建结构化的Skill规范:
yaml
specification:
  name: <由论文方法论名称派生的短横线命名格式>
  domain: <论文的应用领域>
  source_paper:
    title: <论文标题>
    arxiv_id: <若有则填写>
    url: <论文URL>
    authors: <第一作者等>
    date: <发表日期>
  
  capabilities:
    - <从方法论派生的能力1>
    - <能力2>
    - <能力3>
  
  input_format: <Skill接受的输入格式>
  output_format: <Skill生成的输出格式>
  
  algorithm_steps:
    - step: 1
      description: <来自论文的算法步骤>
      parameters: [<带范围的关键参数>]
    - step: 2
      description: <下一步骤>
  
  failure_modes:
    - <来自论文的局限性部分>
  
  example_tasks:
    - <该方法论可解决的任务1>
    - <任务2>
    - <任务3>
提取规则:
  • 优先使用论文自身的算法伪代码而非文字描述
  • 包含论文实验中给出的参数范围(例如:"learning rate: 0.001-0.01")
  • 将论文术语映射为Armory规范(例如:"module" → "skill","pipeline" → "workflow")
  • 若论文描述了多种变体,提取性能最优的一种
有关常见论文类型的特定提取模式,请参考
references/extraction-patterns.md

Phase 4: Skill Generation

阶段4:Skill生成

Hand off the specification to the
test-engineer
agent for co-evolutionary generation:
Evolve a skill for: [specification.domain]
Capabilities: [specification.capabilities] Algorithm: [specification.algorithm_steps] Input: [specification.input_format] Output: [specification.output_format] Failure modes: [specification.failure_modes] Example tasks: [specification.example_tasks]
Source: [specification.source_paper.title] ([specification.source_paper.url])
The test-engineer runs its full co-evolutionary loop (generate → verify → oracle → refine) using the specification as the task description.
将规范交给
test-engineer
Agent进行协同进化式生成:
为以下领域生成Skill:[specification.domain]
能力:[specification.capabilities] 算法:[specification.algorithm_steps] 输入:[specification.input_format] 输出:[specification.output_format] 失效模式:[specification.failure_modes] 示例任务:[specification.example_tasks]
来源:[specification.source_paper.title] ([specification.source_paper.url])
Test-engineer将以规范作为任务描述,运行完整的协同进化循环(生成→验证→基准测试→优化)。

Phase 5: Attribution and Finalization

阶段5:归因与最终定稿

Ensure the generated skill properly attributes the source paper:
  1. Frontmatter: Add
    source: <paper_url>
    to the metadata
  2. Body: Include an attribution section at the end of SKILL.md:
    markdown
    ## Attribution
    
    This skill implements the methodology from:
    > <paper title>
    > <authors>
    > <venue/arxiv, date>
    > <URL>
  3. References: If the paper has supplementary materials (code, datasets), create a source materials reference file in the generated skill's
    references/
    directory linking to them
  4. Verify the skill name does not conflict with existing packages in
    manifest.yaml
确保生成的Skill正确标注来源论文:
  1. 前置元数据: 在元数据中添加
    source: <paper_url>
  2. 正文: 在SKILL.md末尾添加归因部分:
    markdown
    ## 归因
    
    本Skill实现了以下论文中的方法论:
    > <论文标题>
    > <作者>
    > <会议/arXiv,日期>
    > <URL>
  3. 参考文献: 若论文有补充材料(代码、数据集),在生成的Skill的
    references/
    目录下创建源材料参考文件并链接至这些资源
  4. 验证Skill名称与
    manifest.yaml
    中的现有包无冲突

Output

输出结果

The complete skill package at
skills/<name>/
:
  • SKILL.md
    with attribution and paper-derived workflow
  • evals/cases.yaml
    with assertions generated by the co-evolutionary loop
  • references/
    with extraction patterns and source materials
  • evals/evolution-log.yaml
    from the test-engineer's refinement process
完整的Skill包将生成在
skills/<name>/
路径下:
  • 包含归因和论文派生工作流的
    SKILL.md
  • 由协同进化循环生成的断言文件
    evals/cases.yaml
  • 包含提取模式和源材料的
    references/
    目录
  • 来自test-engineer优化过程的
    evals/evolution-log.yaml

Error Handling

错误处理

ErrorResolution
Paper has no clear algorithmExtract the methodology from the experiments section
Paper is purely theoreticalReport: no actionable methodology; suggest literature-review instead
PDF conversion failsTry alternative: fetch HTML version or request user paste text
Paper methodology requires data/computeNote in skill's prerequisites; skill may be a workflow template only
test-engineer budget exhaustedReturn best-scoring iteration with manual review warning
错误类型解决方法
论文无明确算法从实验部分提取方法论
纯理论性论文提示:无可落地方法论;建议使用文献综述工具替代
PDF转换失败尝试替代方案:获取HTML版本或请求用户粘贴文本
论文方法论需要特定数据/算力在Skill的前置条件中注明;Skill可能仅为工作流模板
test-engineer算力耗尽返回得分最高的迭代版本,并附上人工审核警告

Limitations

局限性

  • Cannot extract visual methodologies (circuit diagrams, neural architecture figures) — works on textual algorithm descriptions only
  • Papers with multiple interdependent contributions may produce overly complex skills — consider splitting into multiple skills
  • Non-English papers require translation before processing
  • The generated skill's quality depends on the paper's clarity of methodology description
  • 无法提取可视化方法论(电路图、神经架构图)——仅适用于文本形式的算法描述
  • 包含多个相互依赖贡献的论文可能生成过于复杂的Skill——考虑拆分为多个Skill
  • 非英文论文需先翻译再处理
  • 生成Skill的质量取决于论文中方法论描述的清晰度