skill-distiller

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Skill Distiller

技能蒸馏器

Transform skills authored for high-capability models (Opus) into deterministic workflows that execute reliably on lower-cost models (Sonnet, Haiku). The core insight from EvoSkills: skills encode reusable task structure, not model-specific artifacts. A skill evolved on Opus transfers with +35-45pp gains to other models — but only when the instructions are sufficiently deterministic that lower-capability models can follow them without improvising.

将为高能力模型（Opus）编写的技能转换为可在低成本模型（Sonnet、Haiku）上可靠执行的确定性工作流。EvoSkills的核心见解是：技能编码的是可复用的任务结构，而非模型特定的产物。在Opus上演进的技能迁移到其他模型时，性能可提升35-45个百分点——但只有当指令足够确定，低能力模型无需即兴发挥就能遵循时，这种提升才会实现。

Reference Files

参考文件

File	Contents	Load When
`references/distillation-patterns.md`	Pattern catalog for converting reasoning to rules	Always

文件路径	内容	加载时机
`references/distillation-patterns.md`	将推理转换为规则的模式目录	始终加载

Prerequisites

前置条件

The source skill must exist and pass
```
package-evaluator
```
at >= 70%
Access to both the source model (Opus) and target model (Haiku/Sonnet) for validation
The
```
surrogate-verifier
```
skill for cross-model assertion checking

源技能必须存在且通过
```
package-evaluator
```
评估，得分≥70%
可访问源模型（Opus）和目标模型（Haiku/Sonnet）以进行验证
具备用于跨模型断言检查的
```
surrogate-verifier
```
技能

Workflow

工作流

Phase 1: Complexity Analysis

阶段1：复杂度分析

Score each section of the source SKILL.md for reasoning difficulty:

Complexity Signal	Score	Distillation Action
Decision tree with 3+ branches	HIGH	Convert to explicit if/then lookup table
"Use judgment" or "consider context"	HIGH	Replace with concrete heuristic rules
Multi-step inference chain	HIGH	Break into numbered atomic steps
Reference to domain expertise	MED	Add explicit reference file with knowledge
Clear enumerated steps	LOW	Keep as-is
Concrete examples with expected output	LOW	Keep as-is

Produce a complexity map: section name -> complexity score -> planned action.

为源SKILL.md的每个章节评分，判断推理难度：

复杂度信号	评分	蒸馏操作
包含3个及以上分支的决策树	高	转换为明确的if/then查找表
"运用判断"或"考虑上下文"	高	替换为具体的启发式规则
多步骤推理链	高	拆分为编号的原子步骤
引用领域专业知识	中	添加包含相关知识的明确参考文件
清晰的枚举步骤	低	保持原样
带有预期输出的具体示例	低	保持原样

生成复杂度映射表：章节名称 → 复杂度评分 → 计划操作。

Phase 2: Trace Collection

阶段2：追踪收集

Execute the source skill with Opus on 5 representative tasks:

Select tasks from
```
evals/cases.yaml
```
(positive cases) or generate new ones
For each task, capture the full execution trace:
- Tool calls made (which tools, in what order)
- Intermediate reasoning visible in output
- Final output structure and content
- Time taken and token usage
Store traces as structured data for pattern extraction

使用Opus在5个代表性任务上执行源技能：

从
```
evals/cases.yaml
```
中选择任务（正向案例）或生成新任务
针对每个任务，捕获完整的执行追踪：
- 调用的工具（哪些工具、调用顺序）
- 输出中可见的中间推理过程
- 最终输出的结构和内容
- 耗时和Token使用量
将追踪结果存储为结构化数据，用于模式提取

Phase 3: Pattern Extraction

阶段3：模式提取

From the collected traces, extract deterministic patterns:

Decision paths — For each HIGH-complexity section, find the actual decisions Opus made across the 5 tasks. If Opus chose the same path in 4/5 cases, that path becomes the default rule
Lookup tables — Where Opus applied domain knowledge, build explicit lookup tables (e.g., "if input contains SQL, use these patterns; if input contains Python, use those")
Concrete examples — Extract representative input/output pairs from traces to serve as few-shot examples in the distilled skill
Tool sequences — Identify the common tool invocation pattern and make it explicit ("Step 1: Read the file. Step 2: Grep for pattern X. Step 3: Write output.")

从收集到的追踪结果中提取确定性模式：

决策路径 — 对于每个高复杂度章节，找出Opus在5个任务中实际做出的决策。如果Opus在4/5的案例中选择了相同路径，则该路径成为默认规则
查找表 — 在Opus应用领域知识的地方，构建明确的查找表（例如：“如果输入包含SQL，使用这些模式；如果输入包含Python，使用那些模式”）
具体示例 — 从追踪结果中提取代表性的输入/输出对，作为蒸馏后技能中的少样本示例
工具序列 — 识别常见的工具调用模式并将其明确化（“步骤1：读取文件。步骤2：使用Grep查找模式X。步骤3：写入输出。”）

Phase 4: Distilled Rewrite

阶段4：蒸馏重写

Rewrite the SKILL.md applying all distillation actions from Phase 1:

Source Pattern	Distilled Replacement
"Analyze the code and determine..."	"Check for these 5 specific patterns: [list]"
"Use appropriate formatting"	"Output as a markdown table with columns: [A, B, C]"
"Consider the context to decide..."	"If [condition A]: do X. If [condition B]: do Y. Default: Z"
"Apply best practices for..."	Reference file with explicit best practices enumerated
Multi-paragraph reasoning instruction	Numbered step list with single-sentence steps

Rules for the rewrite:

Every instruction must be actionable by a model with no domain expertise
No step should require inference — each step's input and output must be explicit
Replace all "consider", "analyze", "determine" verbs with "check", "count", "list", "output"
Add concrete examples for any step that could be ambiguous
Keep the SKILL.md under 500 lines (distillation should reduce, not expand)

应用阶段1中的所有蒸馏操作，重写SKILL.md：

源模式	蒸馏后替代内容
"分析代码并确定..."	"检查以下5个特定模式：[列表]"
"使用合适的格式"	"输出为包含以下列的Markdown表格：[A, B, C]"
"根据上下文决定..."	"如果[条件A]：执行X。如果[条件B]：执行Y。默认：Z"
"应用...的最佳实践"	引用包含明确枚举最佳实践的参考文件
多段落推理指令	拆分为包含单句步骤的编号步骤列表

重写规则：

每条指令必须是无领域专业知识的模型也能执行的可操作指令
任何步骤都不应需要推理——每个步骤的输入和输出必须明确
将所有“考虑”“分析”“确定”类动词替换为“检查”“计数”“列出”“输出”
为任何可能存在歧义的步骤添加具体示例
保持SKILL.md在500行以内（蒸馏应精简内容，而非扩充）

Phase 5: Target Model Validation

阶段5：目标模型验证

Run the distilled skill on the target model (Haiku or Sonnet):

Execute the same 5 tasks from Phase 2 with the distilled skill loaded
Use the
```
surrogate-verifier
```
to generate assertions for each task output
Compare pass rates:

Metric	Source (Opus + original)	Target (Haiku + distilled)	Delta
Assertions passed	N/M	N/M	±
Weighted score	X.XX	X.XX	±
Output completeness	%	%	±
Format compliance	%	%	±

If target model score < 80% of source model score, iterate:
- Identify which assertions the target model fails
- Add more explicit instructions for those specific failure points
- Re-run validation (max 3 iterations)

在目标模型（Haiku或Sonnet）上运行蒸馏后的技能：

加载蒸馏后的技能，执行阶段2中的5个相同任务
使用
```
surrogate-verifier
```
为每个任务输出生成断言
对比通过率：

指标	源模型（Opus + 原技能）	目标模型（Haiku + 蒸馏后技能）	差值
通过的断言数	N/M	N/M	±
加权得分	X.XX	X.XX	±
输出完整性	%	%	±
格式合规性	%	%	±

如果目标模型得分低于源模型得分的80%，则迭代优化：
- 确定目标模型未通过哪些断言
- 针对这些特定失败点添加更明确的指令
- 重新运行验证（最多3次迭代）

Phase 6: Cross-Model Report

阶段6：跨模型报告

Produce the final comparison:

markdown

undefined

生成最终对比报告：

markdown

undefined

Skill Distillation Report: <skill-name>

技能蒸馏报告: <skill-name>

Complexity Reduction

复杂度降低情况

Sections distilled: N/M (HIGH → LOW)
Instruction word count: original X → distilled Y (Z% reduction)
Decision points replaced with lookup tables: N

已蒸馏章节数: N/M（高→低）
指令字数: 原X字 → 蒸馏后Y字（减少Z%）
替换为查找表的决策点数量: N

Cross-Model Performance

跨模型性能

Model	Assertions Passed	Weighted Score	Format Compliance
Opus	7/7	1.00	100%
Sonnet	6/7	0.92	100%
Haiku	5/7	0.85	85%

模型	通过的断言数	加权得分	格式合规性
Opus	7/7	1.00	100%
Sonnet	6/7	0.92	100%
Haiku	5/7	0.85	85%

Changes Made

已做修改

[Section] "Analyze complexity" → explicit 5-item checklist
[Section] "Apply formatting" → fixed markdown table template ...

[章节] "分析复杂度" → 明确的5项检查清单
[章节] "应用格式" → 固定的Markdown表格模板 ...

Recommendation

建议

[SHIP | ITERATE | MANUAL_REVIEW_NEEDED]

undefined

[发布 | 迭代 | 需要人工审核]

undefined

Error Handling

错误处理

Error	Resolution
Source skill scores below 70%	Refuse distillation; recommend evolution via test-engineer
No execution traces available	Generate synthetic tasks and collect traces before proceeding
Target model fails all assertions	Skill may be too complex for target model; report with detail
Distilled skill longer than source	Review distillation; patterns may need consolidation

错误类型	解决方法
源技能得分低于70%	拒绝蒸馏；建议通过test-engineer进行技能演进
无可用执行追踪	生成合成任务并收集追踪结果后再继续
目标模型未通过所有断言	技能可能对目标模型来说过于复杂；详细报告情况
蒸馏后技能比原技能更长	复查蒸馏过程；可能需要合并模式

Limitations

局限性

Cannot distill skills that rely on open-ended adaptive reasoning at many decision points or multi-turn reasoning
Visual/interactive skills (HTML generation, browser automation) may not distill well
Distillation optimizes for determinism, not creativity — skills requiring open-ended generation (writing, brainstorming) are poor candidates
Trace collection requires actual model execution, incurring API costs

无法蒸馏依赖大量开放式自适应推理或多轮推理的技能
视觉/交互式技能（HTML生成、浏览器自动化）可能无法很好地蒸馏
蒸馏针对确定性优化，而非创造性——需要开放式生成的技能（写作、头脑风暴）不适合蒸馏
追踪收集需要实际执行模型，会产生API成本