paper-to-skill
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePaper-to-Skill Pipeline
Paper-to-Skill 流水线
Transform research papers into production-grade skill packages. The pipeline extracts
the actionable methodology from a paper, structures it as a skill specification, and
feeds it through co-evolutionary refinement to produce a validated package.
This closes the loop between research and practice: a paper published today can become
an executable skill tomorrow, without manual authoring.
将研究论文转换为生产级别的Skill包。该流水线从论文中提取可落地的方法论,将其构建为Skill规范,并通过协同进化优化流程生成经过验证的Skill包。
这打通了研究与实践之间的闭环:今天发表的论文,明天就能成为可执行的Skill,无需手动编写。
Reference Files
参考文件
| File | Contents | Load When |
|---|---|---|
| Patterns for extracting methodology from papers | Always |
| 文件路径 | 内容描述 | 加载时机 |
|---|---|---|
| 用于从论文中提取方法论的模式规则 | 始终加载 |
Prerequisites
前置条件
- The skill (for PDF/document conversion)
to-markdown - The skill (for paper analysis)
research-critique - The agent (for co-evolutionary skill generation)
test-engineer
- Skill(用于PDF/文档转换)
to-markdown - Skill(用于论文分析)
research-critique - Agent(用于协同进化式Skill生成)
test-engineer
Workflow
工作流
Phase 1: Paper Intake
阶段1:论文接收
Accept the paper in any supported format:
| Input Format | Action |
|---|---|
| arXiv ID (e.g., 2604.01687) | Fetch via |
| arXiv URL | Extract ID, fetch and convert |
| PDF file path | Convert using |
| URL to paper | Fetch via |
| Pasted text | Use directly |
For PDF conversion, invoke the skill:
to-markdownConvert this PDF to clean markdown, preserving section structure, tables, equations, and algorithm pseudocode. Drop references section but keep inline citations.
支持以下任意格式的论文输入:
| 输入格式 | 处理操作 |
|---|---|
| arXiv ID(例如:2604.01687) | 通过 |
| arXiv URL | 提取ID,获取并转换论文 |
| PDF文件路径 | 使用 |
| 论文URL | 通过 |
| 粘贴的文本 | 直接使用 |
对于PDF转换,调用 Skill:
to-markdown将此PDF转换为清晰的Markdown格式,保留章节结构、表格、公式和算法伪代码。移除参考文献部分,但保留文中引用。
Phase 2: Critical Analysis
阶段2:批判性分析
Invoke the skill on the converted paper:
research-critiqueAnalyze this paper focusing on:
- Core contribution: what is the novel methodology?
- Algorithm description: extract the step-by-step procedure
- Input/output specification: what goes in, what comes out?
- Key parameters and their valid ranges
- Claimed results and the evidence supporting them
- Failure modes and limitations acknowledged by the authors
- Prerequisites and dependencies (tools, data, compute)
The critique output becomes the foundation for the skill specification.
对转换后的论文调用 Skill:
research-critique分析此论文,重点关注以下内容:
- 核心贡献:创新方法论是什么?
- 算法描述:提取分步流程
- 输入/输出规范:输入是什么,输出是什么?
- 关键参数及其有效范围
- 宣称的结果及支撑证据
- 作者认可的失效模式与局限性
- 前置条件与依赖项(工具、数据、算力)
分析输出将作为Skill规范的基础。
Phase 3: Skill Specification Extraction
阶段3:Skill规范提取
From the critique output, build a structured skill specification:
yaml
specification:
name: <kebab-case derived from paper's methodology name>
domain: <paper's application domain>
source_paper:
title: <paper title>
arxiv_id: <if available>
url: <paper URL>
authors: <first author et al.>
date: <publication date>
capabilities:
- <capability 1 derived from the methodology>
- <capability 2>
- <capability 3>
input_format: <what the skill accepts>
output_format: <what the skill produces>
algorithm_steps:
- step: 1
description: <from paper's algorithm>
parameters: [<key params with ranges>]
- step: 2
description: <next step>
failure_modes:
- <from paper's limitations section>
example_tasks:
- <task 1 the methodology would solve>
- <task 2>
- <task 3>Extraction rules:
- Prefer the paper's own algorithm pseudocode over prose descriptions
- Include parameter ranges from the paper's experiments (e.g., "learning rate: 0.001-0.01")
- Map the paper's terminology to armory conventions (e.g., "module" → "skill", "pipeline" → "workflow")
- If the paper describes multiple variants, extract the best-performing one
See for patterns specific to common paper types.
references/extraction-patterns.md从分析输出中构建结构化的Skill规范:
yaml
specification:
name: <由论文方法论名称派生的短横线命名格式>
domain: <论文的应用领域>
source_paper:
title: <论文标题>
arxiv_id: <若有则填写>
url: <论文URL>
authors: <第一作者等>
date: <发表日期>
capabilities:
- <从方法论派生的能力1>
- <能力2>
- <能力3>
input_format: <Skill接受的输入格式>
output_format: <Skill生成的输出格式>
algorithm_steps:
- step: 1
description: <来自论文的算法步骤>
parameters: [<带范围的关键参数>]
- step: 2
description: <下一步骤>
failure_modes:
- <来自论文的局限性部分>
example_tasks:
- <该方法论可解决的任务1>
- <任务2>
- <任务3>提取规则:
- 优先使用论文自身的算法伪代码而非文字描述
- 包含论文实验中给出的参数范围(例如:"learning rate: 0.001-0.01")
- 将论文术语映射为Armory规范(例如:"module" → "skill","pipeline" → "workflow")
- 若论文描述了多种变体,提取性能最优的一种
有关常见论文类型的特定提取模式,请参考。
references/extraction-patterns.mdPhase 4: Skill Generation
阶段4:Skill生成
Hand off the specification to the agent for co-evolutionary generation:
test-engineerEvolve a skill for: [specification.domain]Capabilities: [specification.capabilities] Algorithm: [specification.algorithm_steps] Input: [specification.input_format] Output: [specification.output_format] Failure modes: [specification.failure_modes] Example tasks: [specification.example_tasks]Source: [specification.source_paper.title] ([specification.source_paper.url])
The test-engineer runs its full co-evolutionary loop (generate → verify → oracle → refine)
using the specification as the task description.
将规范交给 Agent进行协同进化式生成:
test-engineer为以下领域生成Skill:[specification.domain]能力:[specification.capabilities] 算法:[specification.algorithm_steps] 输入:[specification.input_format] 输出:[specification.output_format] 失效模式:[specification.failure_modes] 示例任务:[specification.example_tasks]来源:[specification.source_paper.title] ([specification.source_paper.url])
Test-engineer将以规范作为任务描述,运行完整的协同进化循环(生成→验证→基准测试→优化)。
Phase 5: Attribution and Finalization
阶段5:归因与最终定稿
Ensure the generated skill properly attributes the source paper:
- Frontmatter: Add to the metadata
source: <paper_url> - Body: Include an attribution section at the end of SKILL.md:
markdown
## Attribution This skill implements the methodology from: > <paper title> > <authors> > <venue/arxiv, date> > <URL> - References: If the paper has supplementary materials (code, datasets), create a
source materials reference file in the generated skill's directory linking to them
references/ - Verify the skill name does not conflict with existing packages in
manifest.yaml
确保生成的Skill正确标注来源论文:
- 前置元数据: 在元数据中添加
source: <paper_url> - 正文: 在SKILL.md末尾添加归因部分:
markdown
## 归因 本Skill实现了以下论文中的方法论: > <论文标题> > <作者> > <会议/arXiv,日期> > <URL> - 参考文献: 若论文有补充材料(代码、数据集),在生成的Skill的目录下创建源材料参考文件并链接至这些资源
references/ - 验证Skill名称与中的现有包无冲突
manifest.yaml
Output
输出结果
The complete skill package at :
skills/<name>/- with attribution and paper-derived workflow
SKILL.md - with assertions generated by the co-evolutionary loop
evals/cases.yaml - with extraction patterns and source materials
references/ - from the test-engineer's refinement process
evals/evolution-log.yaml
完整的Skill包将生成在路径下:
skills/<name>/- 包含归因和论文派生工作流的
SKILL.md - 由协同进化循环生成的断言文件
evals/cases.yaml - 包含提取模式和源材料的目录
references/ - 来自test-engineer优化过程的
evals/evolution-log.yaml
Error Handling
错误处理
| Error | Resolution |
|---|---|
| Paper has no clear algorithm | Extract the methodology from the experiments section |
| Paper is purely theoretical | Report: no actionable methodology; suggest literature-review instead |
| PDF conversion fails | Try alternative: fetch HTML version or request user paste text |
| Paper methodology requires data/compute | Note in skill's prerequisites; skill may be a workflow template only |
| test-engineer budget exhausted | Return best-scoring iteration with manual review warning |
| 错误类型 | 解决方法 |
|---|---|
| 论文无明确算法 | 从实验部分提取方法论 |
| 纯理论性论文 | 提示:无可落地方法论;建议使用文献综述工具替代 |
| PDF转换失败 | 尝试替代方案:获取HTML版本或请求用户粘贴文本 |
| 论文方法论需要特定数据/算力 | 在Skill的前置条件中注明;Skill可能仅为工作流模板 |
| test-engineer算力耗尽 | 返回得分最高的迭代版本,并附上人工审核警告 |
Limitations
局限性
- Cannot extract visual methodologies (circuit diagrams, neural architecture figures) — works on textual algorithm descriptions only
- Papers with multiple interdependent contributions may produce overly complex skills — consider splitting into multiple skills
- Non-English papers require translation before processing
- The generated skill's quality depends on the paper's clarity of methodology description
- 无法提取可视化方法论(电路图、神经架构图)——仅适用于文本形式的算法描述
- 包含多个相互依赖贡献的论文可能生成过于复杂的Skill——考虑拆分为多个Skill
- 非英文论文需先翻译再处理
- 生成Skill的质量取决于论文中方法论描述的清晰度