skill-optimizer
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseWhen to use
适用场景
Use this skill when you need to:
- Improve whether a skill is actually applied by models
- Diagnose why some criteria fail across all models
- Prevent a skill from making outputs worse
- Refactor skill text for stronger retrieval under context pressure
- Build repeatable benchmark loops and release gates
当你需要完成以下事项时,可使用本技能:
- 提升模型实际应用技能的概率
- 诊断为何部分标准在所有模型中都无法通过
- 避免技能导致输出质量下降
- 在上下文受限的情况下重构技能文本,增强检索效果
- 构建可重复的基准测试循环与发布门槛
Optimization loop (default workflow)
优化循环(默认工作流)
- Measure baseline and skill-on behavior (per model, per scenario, per criterion)
- Find failure pattern:
- universal failure (0% with skill)
- model-specific weakness
- regression (negative delta)
- Edit for salience:
- add explicit triggers
- add concrete integrated examples
- tighten checklists and decision rules
- Re-run evals and compare deltas
- Ship with guardrails (documented gate + run history + follow-up issues)
- 测量基准表现与启用技能后的表现(按模型、场景、标准分别测量)
- 定位失败模式:
- 普遍性失败(启用技能后成功率为0%)
- 特定模型的弱点
- 性能退化(负面差异)
- 优化显著性:
- 添加明确的触发词
- 添加具体的集成示例
- 精简检查清单与决策规则
- 重新运行评估并对比差异
- 附带防护机制发布(记录门槛标准 + 运行历史 + 后续跟进事项)
How to use
使用方法
Read individual rule files for detailed procedures and templates:
- rules/benchmark-loop.md - End-to-end benchmark loop and scoring
- rules/activation-design.md - Improve retrieval and instruction uptake
- rules/context-budget.md - Reduce token cost without losing behavior
- rules/regression-triage.md - Diagnose and fix skill-on regressions
- rules/release-gates.md - Go/no-go criteria before shipping skill updates
阅读单个规则文件获取详细流程与模板:
- rules/benchmark-loop.md - 端到端基准测试循环与评分
- rules/activation-design.md - 提升检索与指令接收效果
- rules/context-budget.md - 在不改变行为的前提下降低Token成本
- rules/regression-triage.md - 诊断并修复启用技能后的性能退化问题
- rules/release-gates.md - 技能更新发布前的准入/不准入标准
Practical heuristics
实用启发式建议
- Prefer few high-signal rules over many soft recommendations
- Put fragile, high-value behaviors in top-level checklists
- Include at least one integrated example per common scenario
- Add explicit wording for what must not be omitted
- Track gains/losses with with-skill vs without-skill comparisons
- 优先选择少量高信号规则,而非大量软性建议
- 将易出错、高价值的行为纳入顶层检查清单
- 每个常见场景至少包含一个集成示例
- 添加明确表述,说明哪些内容绝对不能遗漏
- 通过启用技能与未启用技能的对比来跟踪收益/损失