skill-optimizer

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

When to use

适用场景

Use this skill when you need to:
  • Improve whether a skill is actually applied by models
  • Diagnose why some criteria fail across all models
  • Prevent a skill from making outputs worse
  • Refactor skill text for stronger retrieval under context pressure
  • Build repeatable benchmark loops and release gates
当你需要完成以下事项时,可使用本技能:
  • 提升模型实际应用技能的概率
  • 诊断为何部分标准在所有模型中都无法通过
  • 避免技能导致输出质量下降
  • 在上下文受限的情况下重构技能文本,增强检索效果
  • 构建可重复的基准测试循环与发布门槛

Optimization loop (default workflow)

优化循环(默认工作流)

  1. Measure baseline and skill-on behavior (per model, per scenario, per criterion)
  2. Find failure pattern:
    • universal failure (0% with skill)
    • model-specific weakness
    • regression (negative delta)
  3. Edit for salience:
    • add explicit triggers
    • add concrete integrated examples
    • tighten checklists and decision rules
  4. Re-run evals and compare deltas
  5. Ship with guardrails (documented gate + run history + follow-up issues)
  1. 测量基准表现与启用技能后的表现(按模型、场景、标准分别测量)
  2. 定位失败模式
    • 普遍性失败(启用技能后成功率为0%)
    • 特定模型的弱点
    • 性能退化(负面差异)
  3. 优化显著性
    • 添加明确的触发词
    • 添加具体的集成示例
    • 精简检查清单与决策规则
  4. 重新运行评估并对比差异
  5. 附带防护机制发布(记录门槛标准 + 运行历史 + 后续跟进事项)

How to use

使用方法

Read individual rule files for detailed procedures and templates:
  • rules/benchmark-loop.md - End-to-end benchmark loop and scoring
  • rules/activation-design.md - Improve retrieval and instruction uptake
  • rules/context-budget.md - Reduce token cost without losing behavior
  • rules/regression-triage.md - Diagnose and fix skill-on regressions
  • rules/release-gates.md - Go/no-go criteria before shipping skill updates
阅读单个规则文件获取详细流程与模板:
  • rules/benchmark-loop.md - 端到端基准测试循环与评分
  • rules/activation-design.md - 提升检索与指令接收效果
  • rules/context-budget.md - 在不改变行为的前提下降低Token成本
  • rules/regression-triage.md - 诊断并修复启用技能后的性能退化问题
  • rules/release-gates.md - 技能更新发布前的准入/不准入标准

Practical heuristics

实用启发式建议

  • Prefer few high-signal rules over many soft recommendations
  • Put fragile, high-value behaviors in top-level checklists
  • Include at least one integrated example per common scenario
  • Add explicit wording for what must not be omitted
  • Track gains/losses with with-skill vs without-skill comparisons
  • 优先选择少量高信号规则,而非大量软性建议
  • 将易出错、高价值的行为纳入顶层检查清单
  • 每个常见场景至少包含一个集成示例
  • 添加明确表述,说明哪些内容绝对不能遗漏
  • 通过启用技能与未启用技能的对比来跟踪收益/损失