Back to Details

skill-optimizer

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

When to use

适用场景

Use this skill when you need to:

Improve whether a skill is actually applied by models
Diagnose why some criteria fail across all models
Prevent a skill from making outputs worse
Refactor skill text for stronger retrieval under context pressure
Build repeatable benchmark loops and release gates

当你需要完成以下事项时，可使用本技能：

提升模型实际应用技能的概率
诊断为何部分标准在所有模型中都无法通过
避免技能导致输出质量下降
在上下文受限的情况下重构技能文本，增强检索效果
构建可重复的基准测试循环与发布门槛

Optimization loop (default workflow)

优化循环（默认工作流）

Measure baseline and skill-on behavior (per model, per scenario, per criterion)
Find failure pattern:
- universal failure (0% with skill)
- model-specific weakness
- regression (negative delta)
Edit for salience:
- add explicit triggers
- add concrete integrated examples
- tighten checklists and decision rules
Re-run evals and compare deltas
Ship with guardrails (documented gate + run history + follow-up issues)

测量基准表现与启用技能后的表现（按模型、场景、标准分别测量）
定位失败模式：
- 普遍性失败（启用技能后成功率为0%）
- 特定模型的弱点
- 性能退化（负面差异）
优化显著性：
- 添加明确的触发词
- 添加具体的集成示例
- 精简检查清单与决策规则
重新运行评估并对比差异
附带防护机制发布（记录门槛标准 + 运行历史 + 后续跟进事项）

How to use

使用方法

Read individual rule files for detailed procedures and templates:

rules/benchmark-loop.md - End-to-end benchmark loop and scoring
rules/activation-design.md - Improve retrieval and instruction uptake
rules/context-budget.md - Reduce token cost without losing behavior
rules/regression-triage.md - Diagnose and fix skill-on regressions
rules/release-gates.md - Go/no-go criteria before shipping skill updates

阅读单个规则文件获取详细流程与模板：

rules/benchmark-loop.md - 端到端基准测试循环与评分
rules/activation-design.md - 提升检索与指令接收效果
rules/context-budget.md - 在不改变行为的前提下降低Token成本
rules/regression-triage.md - 诊断并修复启用技能后的性能退化问题
rules/release-gates.md - 技能更新发布前的准入/不准入标准

Practical heuristics

实用启发式建议

Prefer few high-signal rules over many soft recommendations
Put fragile, high-value behaviors in top-level checklists
Include at least one integrated example per common scenario
Add explicit wording for what must not be omitted
Track gains/losses with with-skill vs without-skill comparisons

优先选择少量高信号规则，而非大量软性建议
将易出错、高价值的行为纳入顶层检查清单
每个常见场景至少包含一个集成示例
添加明确表述，说明哪些内容绝对不能遗漏
通过启用技能与未启用技能的对比来跟踪收益/损失