ing-skill-generator

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

ING Skill Generator — Complete Knowledge Base

ING Skill Generator — 完整知识库

Generate production-ready GitHub Copilot skills from ING documentation repositories. This skill transforms documentation-as-code into self-contained expert knowledge bases that senior engineers can use in their Spring Boot / Java 21 projects.
This skill includes:
  • Skill generation from local cloned repos
  • Evaluation framework with with-skill vs baseline comparison
  • Grading agents for automated assertion checking
  • Benchmark aggregation and interactive review viewer
  • Description optimization for better triggering
At a high level, the process goes like this:
  1. Identify the target repository — by default, the repository that contains this skill (three levels up from
    .agents/skills/<name>/SKILL.md
    ); use a different path only if the user specifies one
  2. Analyze the repo structure, identify tool name, latest version, documentation files
  3. Extract and synthesize content following ING skill template (8 sections)
  4. Generate the SKILL.md with proper frontmatter and verbatim code examples
  5. Run test cases (with-skill vs baseline) to verify quality
  6. Review results, iterate based on feedback
  7. Optimize description for better triggering
Your job is to figure out where the user is in this process and help them progress. Maybe they have a freshly cloned repo and want a skill generated. Or maybe they already have a draft and want to improve it. Be flexible — if they say "just generate the skill, I don't need evals", do that instead.

从ING文档仓库生成可用于生产环境的GitHub Copilot Skill。本Skill可将文档即代码转换为独立的专家知识库,供资深工程师在他们的Spring Boot / Java 21项目中使用。
本Skill包含:
  • 从本地克隆仓库生成Skill
  • 带Skill与基准对比的评估框架
  • 用于自动断言检查的评分Agent
  • 基准测试聚合与交互式评审查看器
  • 为优化触发效果而做的描述优化
整体流程大致如下:
  1. 识别目标仓库 — 默认是包含本Skill的仓库(从
    .agents/skills/<name>/SKILL.md
    向上三级目录);仅当用户指定其他路径时才使用该路径
  2. 分析仓库结构,识别工具名称、最新版本、文档文件
  3. 按照ING Skill模板(8个章节)提取并合成内容
  4. 生成带有正确前置元数据和原始代码示例的SKILL.md文件
  5. 运行测试用例(带Skill与基准对比)以验证质量
  6. 评审结果,根据反馈迭代优化
  7. 优化描述以提升触发效果
你的工作是判断用户当前处于流程的哪个阶段,并协助他们推进。用户可能刚克隆了仓库想要生成Skill,也可能已有草稿想要优化。请灵活处理——如果用户说“直接生成Skill,我不需要评估”,那就直接执行生成操作。

Communicating with the User

与用户沟通

The ING skill generator may be used by people across a range of familiarity with coding jargon. While most users are likely senior engineers, pay attention to context cues.
Default assumptions:
  • "evaluation" and "benchmark" are OK
  • "JSON" and "assertion" — look for cues the user knows these before using without explanation
  • ING-specific terms (Baker, Merak, Kingsroad) — explain briefly if unclear
It's OK to briefly explain terms if you're in doubt. Feel free to clarify with a short definition if unsure.

ING Skill Generator的使用者对编程术语的熟悉程度各不相同。虽然大多数用户可能是资深工程师,但仍需留意上下文线索。
默认假设:
  • “evaluation(评估)”和“benchmark(基准测试)”这类术语可以直接使用
  • “JSON”和“assertion(断言)”——在使用前需确认用户是否了解这些术语
  • ING特定术语(Baker、Merak、Kingsroad)——如果用户可能不了解,需简要解释
如果不确定,可简要解释术语。如有疑问,也可通过简短定义进行澄清。

1. Process Overview

1. 流程概述

  1. Analyze the repository — identify the tool name, latest version, documentation structure
  2. Extract content — gather all relevant docs, configs, code examples, warnings
  3. Synthesize knowledge base — merge, dedupe, organize into standard sections
  4. Output the skill file — produce a valid SKILL.md with proper frontmatter

  1. 分析仓库 — 识别工具名称、最新版本、文档结构
  2. 提取内容 — 收集所有相关文档、配置、代码示例、警告信息
  3. 合成知识库 — 合并、去重、整理为标准章节
  4. 输出Skill文件 — 生成符合要求的SKILL.md文件,包含正确的前置元数据

Creating a Skill from Repository

从仓库创建Skill

This is the core workflow for generating ING skills from documentation repositories.
这是从文档仓库生成ING Skill的核心工作流。

Step 1: Capture Intent

步骤1:捕捉用户意图

Start by understanding what the user wants. Key questions:
  1. What repository?Infer this automatically: the repository to analyze is the one that contains this skill. Since skills live at
    .agents/skills/<skill-name>/SKILL.md
    , the repository root is three levels up from the SKILL.md file. Confirm with the user only if this can't be determined or if they explicitly name a different path.
  2. What tool/framework? Confirm the tool name if not obvious from the repo
  3. Run test cases? Suggest yes for complex repos, optional for simple ones
If the conversation already contains this info (e.g., "generate a skill from /tmp/baker"), extract it and confirm.
首先要了解用户的需求。关键问题:
  1. 哪个仓库?自动推断:需要分析的仓库是包含本Skill的仓库。由于Skill存储在
    .agents/skills/<skill-name>/SKILL.md
    ,仓库根目录是SKILL.md文件向上三级目录。仅当无法确定或用户明确指定其他路径时,才与用户确认。
  2. 哪个工具/框架? 如果从仓库名称无法明确判断,需确认工具名称
  3. 是否运行测试用例? 对于复杂仓库建议运行,简单仓库可选
如果对话中已包含相关信息(例如:“从/tmp/baker生成Skill”),请提取信息并与用户确认。

Step 2: Analyze the Repository

步骤2:分析仓库

Before writing anything, understand the documentation structure:
bash
undefined
在开始编写前,先了解文档结构:
bash
undefined

Find all documentation files

查找所有文档文件

find <repo-path> -name ".md" -o -name ".adoc" -o -name "*.rst" | head -50
find <repo-path> -name ".md" -o -name ".adoc" -o -name "*.rst" | head -50

Check for docs directory

检查是否存在docs目录

ls -la <repo-path>/docs/ 2>/dev/null || ls -la <repo-path>/
ls -la <repo-path>/docs/ 2>/dev/null || ls -la <repo-path>/

Look for version info

查找版本信息

cat <repo-path>/pom.xml 2>/dev/null | grep -A1 "<version>" | head -5 cat <repo-path>/package.json 2>/dev/null | grep "version" cat <repo-path>/CHANGELOG.md 2>/dev/null | head -20

Identify:
- **Tool name** — from repo name, README title, or project config
- **Current version** — from pom.xml, package.json, build.gradle, or badges
- **Documentation structure** — where the main docs live, how they're organized
- **Code examples** — where sample code is located
cat <repo-path>/pom.xml 2>/dev/null | grep -A1 "<version>" | head -5 cat <repo-path>/package.json 2>/dev/null | grep "version" cat <repo-path>/CHANGELOG.md 2>/dev/null | head -20

识别以下信息:
- **工具名称** — 从仓库名称、README标题或项目配置中获取
- **当前版本** — 从pom.xml、package.json、build.gradle或版本徽章中获取
- **文档结构** — 主文档的存储位置及组织方式
- **代码示例** — 示例代码的存储位置

Step 3: Map Documentation to Sections

步骤3:将文档映射到对应章节

Create a mental map of which source files feed into which output sections:
Source Files→ Output Section
README.md, docs/overview.md, docs/intro.md1. Overview
docs/concepts.md, docs/architecture.md2. Core Concepts
docs/configuration.md, application.properties3. Configuration Reference
examples/, docs/tutorials/, docs/guides/4. Code Examples
docs/integration.md, docs/other-tools.md5. Integration
docs/troubleshooting.md, docs/faq.md, comments in code6. Pitfalls & Anti-patterns
docs/faq.md (or generate from common questions)7. FAQ
Terminology in any doc8. Glossary
在脑海中建立源文件与输出章节的映射关系:
源文件→ 输出章节
README.md, docs/overview.md, docs/intro.md1. 概述
docs/concepts.md, docs/architecture.md2. 核心概念
docs/configuration.md, application.properties3. 配置参考
examples/, docs/tutorials/, docs/guides/4. 代码示例
docs/integration.md, docs/other-tools.md5. 集成
docs/troubleshooting.md, docs/faq.md, 代码中的注释6. 常见陷阱与反模式
docs/faq.md(或从常见问题生成)7. 常见问题(FAQ)
任意文档中的术语8. 术语表

Step 4: Extract and Synthesize

步骤4:提取与合成

Read each relevant file and extract content:
  1. Copy code verbatim — never summarize or paraphrase code blocks
  2. Merge duplicates — if the same concept appears in multiple places, combine into one section
  3. Capture tribal knowledge — look for comments like "WARNING", "NOTE", "IMPORTANT", gotchas in examples
  4. Mark gaps — if a section is sparse, include it anyway with ⚠️ marker
读取每个相关文件并提取内容:
  1. 代码原样复制 — 绝不总结或改写代码块
  2. 合并重复内容 — 如果同一概念出现在多个位置,合并为一个章节
  3. 收集隐性经验 — 留意“WARNING”“NOTE”“IMPORTANT”等注释,以及示例中的注意事项
  4. 标记内容缺失 — 如果某个章节内容稀疏,仍需保留并标记⚠️

Step 5: Generate the SKILL.md

步骤5:生成SKILL.md

Follow the exact template structure in Section 6 (Output Template). Key requirements:
  • YAML frontmatter with
    name
    (kebab-case) and
    description
    (comprehensive, trigger-friendly)
  • All 8 sections present, even if sparse
  • Configuration tables with 4 columns: Property, Type, Default, Description
  • No hyperlinks — all content inline
严格遵循第6节(输出模板)的结构要求:
  • 包含
    name
    (kebab-case格式)和
    description
    (全面、易于触发)的YAML前置元数据
  • 必须包含全部8个章节,即使内容稀疏
  • 配置表需包含4列:Property、Type、Default、Description
  • 无超链接——所有内容需内嵌

Skill Writing Guide

Skill编写指南

Anatomy of an ING Skill

ING Skill的结构

skill-name/
├── SKILL.md (required)
│   ├── YAML frontmatter (name, description required)
│   └── Markdown instructions (8 sections)
└── Bundled Resources (optional)
    ├── scripts/    - Executable code for tasks
    ├── references/ - Additional docs loaded as needed
    └── assets/     - Templates, examples
skill-name/
├── SKILL.md(必填)
│   ├── YAML前置元数据(name和description为必填)
│   └── Markdown说明(8个章节)
└── 捆绑资源(可选)
    ├── scripts/    - 用于执行任务的可执行代码
    ├── references/ - 按需加载的额外文档
    └── assets/     - 模板、示例

Progressive Disclosure

渐进式披露

Skills use a three-level loading system:
  1. Metadata (name + description) — Always in context (~100 words)
  2. SKILL.md body — In context when skill triggers (<500 lines ideal)
  3. Bundled resources — As needed (read explicitly when required)
Key patterns:
  • Keep SKILL.md under 500 lines; if approaching limit, move detail to
    references/
  • Reference files clearly with guidance on when to read them
  • For large reference files (>300 lines), include a table of contents
Skill采用三级加载机制:
  1. 元数据(名称+描述)——始终处于上下文环境中(约100词)
  2. SKILL.md主体——Skill触发时处于上下文环境中(理想情况下不超过500行)
  3. 捆绑资源——按需加载(仅在需要时明确读取)
关键模式:
  • 保持SKILL.md在500行以内;如果接近上限,将细节移至
    references/
    目录
  • 明确引用文件,并说明何时需要读取
  • 对于大型参考文件(超过300行),需包含目录

Writing Patterns

编写模式

Prefer imperative form in instructions.
Defining output formats:
markdown
undefined
说明性内容优先使用祈使语气。
定义输出格式:
markdown
undefined

Configuration Reference

配置参考

ALWAYS use this exact table format:
PropertyTypeDefaultDescription

**Examples pattern:**
```markdown
请严格使用以下表格格式:
PropertyTypeDefaultDescription

**示例模式:**
```markdown

Code Examples

代码示例

Example 1: Basic Recipe
java
// Verbatim code from source
Recipe recipe = new Recipe("OrderProcess")
    .withInteraction(validateOrder)
    .withSensoryEvent(orderPlaced);
undefined
示例1:基础Recipe
java
// 来自源文件的原始代码
Recipe recipe = new Recipe("OrderProcess")
    .withInteraction(validateOrder)
    .withSensoryEvent(orderPlaced);
undefined

Writing Style

写作风格

Explain why things are important rather than heavy-handed MUSTs. Use theory of mind and make the skill general, not narrow to specific examples. Write a draft, then review with fresh eyes and improve.
解释内容的重要性,而非生硬地使用“必须”这类词汇。运用心智模型,使Skill具有通用性,而非局限于特定示例。先撰写草稿,再重新审阅并改进。

Step 6: Test Cases

步骤6:测试用例

After generating the skill, create 2-3 realistic test prompts. Save to
evals/evals.json
:
json
{
  "skill_name": "baker-framework",
  "evals": [
    {
      "id": 1,
      "name": "basic-recipe-creation",
      "prompt": "Generate a skill from the Baker docs at /tmp/baker",
      "expected_output": "SKILL.md with 8 sections, proper frontmatter",
      "files": [],
      "expectations": []
    }
  ]
}
See
references/schemas.md
for the full schema.

生成Skill后,创建2-3个真实的测试提示,并保存至
evals/evals.json
json
{
  "skill_name": "baker-framework",
  "evals": [
    {
      "id": 1,
      "name": "basic-recipe-creation",
      "prompt": "从/tmp/baker的Baker文档生成Skill",
      "expected_output": "包含8个章节、前置元数据的SKILL.md",
      "files": [],
      "expectations": []
    }
  ]
}
完整的Schema请参考
references/schemas.md

2. Naming Rules

2. 命名规则

Derive the canonical name directly from the repository:
SourcePriority
Repository nameHighest (e.g.,
ing-bank/baker
baker
)
Top-level README titleIf repo name is generic
Project folder nameFallback
Critical rules:
  • Use exactly what the project is called — no inventing, generalizing, or renaming
  • Convert to kebab-case for the skill
    name
    field (e.g.,
    Baker Framework
    baker-framework
    )
  • If the repo covers multiple tools, derive each tool's name from its module/subfolder/section title

直接从仓库推导标准名称:
来源优先级
仓库名称最高(例如:
ing-bank/baker
baker
顶层README标题如果仓库名称较为通用
项目文件夹名称备选方案
关键规则:
  • 严格使用项目的官方名称——不得自创、泛化或重命名
  • Skill的
    name
    字段需转换为kebab-case格式(例如:
    Baker Framework
    baker-framework
  • 如果仓库包含多个工具,需从模块/子文件夹/章节标题中推导每个工具的名称

3. Versioning Strategy

3. 版本策略

When documentation contains multiple versions:
  1. Identify latest version using:
    • Explicit version numbers (e.g.,
      v4.1.0
      >
      v3.2.0
      )
    • Release dates (e.g.,
      2024
      >
      2023
      )
    • Folder/file naming (e.g.,
      docs-v2/
      >
      docs-v1/
      )
    • CHANGELOG.md
      or release notes
  2. Use latest version as source of truth for all:
    • Configuration properties
    • API signatures
    • Code examples
    • Behavioral descriptions
  3. Document version changes when relevant:
    📌 Changed in 4.0 — previous behavior was: synchronous execution only
  4. Discard deprecated content unless it explains a still-relevant migration path

当文档包含多个版本时:
  1. 识别最新版本,可通过以下方式:
    • 明确的版本号(例如:
      v4.1.0
      >
      v3.2.0
    • 发布日期(例如:
      2024
      >
      2023
    • 文件夹/文件命名(例如:
      docs-v2/
      >
      docs-v1/
    • CHANGELOG.md
      或发布说明
  2. 以最新版本为事实来源,涵盖所有内容:
    • 配置属性
    • API签名
    • 代码示例
    • 行为描述
  3. 相关时记录版本变更
    📌 4.0版本变更 — 之前的行为:仅支持同步执行
  4. 丢弃已弃用的内容,除非其仍能解释相关的迁移路径

4. Content Extraction

4. 内容提取

4.1 What to Include

4.1 需包含的内容

Content TypeHandling
Code snippetsCopy verbatim — never summarize
Configuration blocksCopy verbatim with all properties
API signaturesCopy verbatim with types and parameters
Architecture diagrams (textual)Include as ASCII or describe structure
Warnings / gotchasAlways include, even if brief
Anti-patternsAlways include with explanations
Tribal knowledgeCapture implicit knowledge from comments, examples
内容类型处理方式
代码片段原样复制 — 绝不总结
配置块原样复制,包含所有属性
API签名原样复制,包含类型和参数
架构图(文本形式)以ASCII形式包含或描述结构
警告/注意事项始终包含,即使内容简短
反模式始终包含并附带解释
隐性经验从注释、示例中提取隐性知识

4.2 What to Exclude

4.2 需排除的内容

  • Hyperlinks (all content must be inline)
  • File path references to the source repo
  • Installation instructions for the docs themselves
  • CI/CD pipeline configs for the docs repo
  • Contributor guidelines (unless relevant to framework usage)
  • 超链接(所有内容必须内嵌)
  • 指向源仓库的文件路径引用
  • 文档自身的安装说明
  • 文档仓库的CI/CD流水线配置
  • 贡献者指南(除非与框架使用相关)

4.3 Merging and Deduplication

4.3 合并与去重

When a concept appears in multiple files:
  1. Identify all occurrences
  2. Merge into one coherent section
  3. Preserve all unique details from each source
  4. Remove redundant explanations

当同一概念出现在多个文件中时:
  1. 识别所有出现位置
  2. 合并为一个连贯的章节
  3. 保留每个来源的所有独特细节
  4. 删除冗余解释

5. Handling Sparse Documentation

5. 处理内容稀疏的文档

When documentation is incomplete or ambiguous:
  1. Add a clear marker:
    ⚠️ Documentation incomplete — verify with team
  2. Include whatever partial information exists
  3. Note specific gaps:
    ⚠️ Default value not documented — verify in source code

当文档不完整或存在歧义时:
  1. 添加明确标记:
    ⚠️ 文档不完整 — 请与团队确认
  2. 包含所有可用的部分信息
  3. 记录具体的内容缺失:
    ⚠️ 默认值未记录 — 请在源代码中确认

6. Output Template

6. 输出模板

CRITICAL: All 8 sections MUST be present in every generated skill. If documentation is sparse for a section, include it anyway with a ⚠️ marker noting what's missing.
The generated skill must follow this exact structure:
markdown
---
name: [tool-name-kebab-case]
description: >
  Expert skill for [Tool Name] — an ING-internal framework for [one-line purpose].
  Use this skill when working in any ING Spring Boot / Java 21 project that integrates
  with [Tool Name]. Covers configuration, recipes, integration patterns, pitfalls,
  and verbatim code examples.
---
重要提示:每个生成的Skill必须包含全部8个章节。 如果某个章节的文档内容稀疏,仍需保留该章节并标记⚠️说明缺失内容。
生成的Skill必须严格遵循以下结构:
markdown
---
name: [tool-name-kebab-case]
description: >
  用于[工具名称]的专家级Skill — 一款ING内部框架,用于[一行描述用途]。
  当在集成了[工具名称]的ING Spring Boot / Java 21项目中工作时,可使用本Skill。涵盖配置、Recipe、集成模式、常见陷阱及原始代码示例。
---

[Tool Name] — Complete Knowledge Base

[工具名称] — 完整知识库

Table of Contents

目录

  1. Overview
  2. Core Concepts
  3. Configuration Reference
  4. Code Examples
  5. Integration with Other ING Tools
  6. Pitfalls & Anti-patterns
  7. FAQ
  8. Glossary

  1. 概述
  2. 核心概念
  3. 配置参考
  4. 代码示例
  5. 与其他ING工具的集成
  6. 常见陷阱与反模式
  7. 常见问题(FAQ)
  8. 术语表

1. Overview

1. 概述

[What the tool does and why it exists inside ING. MUST include current version number.]
[工具的功能及在ING内部的存在意义。必须包含当前版本号。]

Current Version: X.Y.Z

当前版本:X.Y.Z

[Version-specific notes if any]
[版本特定说明(如有)]

2. Core Concepts

2. 核心概念

[Mental models, architecture decisions, key abstractions. Use tables for comparisons.]
[心智模型、架构决策、关键抽象。可使用表格进行对比。]

3. Configuration Reference

3. 配置参考

MANDATORY: Configuration tables MUST have exactly 4 columns: Property, Type, Default, Description
PropertyTypeDefaultDescription
example.propertyString
null
Purpose of property
example.timeoutint
3000
Timeout in milliseconds ⚠️ NOT seconds
If default is unknown, use:
⚠️ verify
as the value.
必填:配置表必须包含4列:Property、Type、Default、Description
PropertyTypeDefaultDescription
example.propertyString
null
属性用途
example.timeoutint
3000
超时时间(毫秒)⚠️ 注意:不是秒
如果默认值未知,使用:
⚠️ 请确认
作为值。

4. Code Examples

4. 代码示例

[Verbatim snippets from source docs, organized by use case. NEVER summarize or paraphrase code.]
java
// Example: Basic interaction definition
@RequiresIngredient("orderId")
@FiresEvent(OrderValidated.class)
public interface ValidateOrder {
    OrderValidated apply(String orderId);
}
[来自源文档的原始代码片段,按用例组织。绝不总结或改写代码。]
java
// 示例:基础交互定义
@RequiresIngredient("orderId")
@FiresEvent(OrderValidated.class)
public interface ValidateOrder {
    OrderValidated apply(String orderId);
}

5. Integration with Other ING Tools

5. 与其他ING工具的集成

[How this tool connects to Baker / Merak SDK / Kingsroad / other ING systems]
⚠️ If no integration docs exist, write: "No documented integrations. Check with the team for internal usage patterns."
[本工具如何与Baker / Merak SDK / Kingsroad / 其他ING系统集成]
⚠️ 如果没有集成文档,请编写:“无已记录的集成方式。请与团队确认内部使用模式。”

6. Pitfalls & Anti-patterns

6. 常见陷阱与反模式

[Exact warnings from docs + implicit gotchas discovered in examples]
Don't: [Anti-pattern description] ✅ Do: [Correct approach]
⚠️ If no pitfalls documented, write: "No pitfalls documented. Exercise standard caution with [relevant concerns]."
[来自文档的明确警告 + 从示例中发现的隐性注意事项]
不要:[反模式描述] ✅ 应该:[正确做法]
⚠️ 如果没有记录的陷阱,请编写:“无已记录的陷阱。使用时请针对[相关问题]保持常规谨慎。”

7. FAQ

7. 常见问题(FAQ)

Q: [Common question from docs or implied by content] A: [Answer]
⚠️ If no FAQ exists, generate 2-3 questions based on likely user needs.
Q: [来自文档或隐含在内容中的常见问题] A: [答案]
⚠️ 如果没有FAQ,请根据用户可能的需求生成2-3个问题。

8. Glossary

8. 术语表

TermDefinition
[ING-specific term][Precise definition]
⚠️ If no glossary exists, extract key terms from the documentation and define them.

---
术语定义
[ING特定术语][精确定义]
⚠️ 如果没有术语表,请从文档中提取关键术语并定义。

---

7. Frontmatter Requirements

7. 前置元数据要求

7.1 Name Field

7.1 名称字段

  • Kebab-case, lowercase
  • Derived from repo/project name
  • Example:
    baker-framework
    ,
    merak-sdk
    ,
    kingsroad-cli
  • 小写kebab-case格式
  • 从仓库/项目名称推导
  • 示例:
    baker-framework
    ,
    merak-sdk
    ,
    kingsroad-cli

7.2 Description Field

7.2 描述字段

The description is the primary trigger mechanism. Make it comprehensive:
  1. Start with what the skill is for
  2. Include the framework's purpose
  3. List specific contexts when to use
  4. Mention related keywords that should trigger
Good example:
yaml
description: >
  Expert skill for Baker — an ING-internal framework for orchestrating microservice-based
  process flows using a declarative recipe DSL. Use this skill when working in any ING
  Spring Boot / Java 21 project that integrates with Baker. Covers configuration, recipes,
  interactions, event handling, error strategies, testing, and verbatim code examples.

描述是主要触发机制。请确保描述全面:
  1. 开头说明Skill的用途
  2. 包含框架的用途
  3. 列出具体的使用场景
  4. 提及应触发Skill的相关关键词
优秀示例:
yaml
description: >
  用于Baker的专家级Skill — 一款ING内部框架,使用声明式Recipe DSL编排基于微服务的流程。当在集成了Baker的ING Spring Boot / Java 21项目中工作时,可使用本Skill。涵盖配置、Recipe、交互、事件处理、错误策略、测试及原始代码示例。

8. Target Audience

8. 目标受众

The generated skill targets:
  • Senior engineers at ING working in Spring Boot / Java 21 projects on Kubernetes
  • They know general software engineering
  • They do not know ING-internal framework internals
  • They need practical, actionable guidance
Write accordingly:
  • Explain ING-specific concepts
  • Don't explain basic Java/Spring concepts
  • Include complete, working examples
  • Highlight common mistakes

生成的Skill面向:
  • ING的资深工程师,在Kubernetes上的Spring Boot / Java 21项目中工作
  • 他们具备通用软件工程知识
  • 他们不了解ING内部框架的内部实现
  • 他们需要实用、可操作的指导
请据此编写内容:
  • 解释ING特定概念
  • 无需解释基础Java/Spring概念
  • 包含完整、可运行的示例
  • 突出常见错误

9. Quality Checklist

9. 质量检查清单

Before finalizing a generated skill, verify:
Structural Requirements (MANDATORY):
  • YAML frontmatter present with
    ---
    delimiters
  • name
    field is kebab-case, derived from repo/project
  • description
    field includes purpose and trigger keywords
  • All 8 sections present (Overview through Glossary)
  • Table of Contents matches section headings
Content Requirements:
  • Latest version identified and stated in Overview
  • All code snippets copied verbatim (no summarization)
  • Configuration table has 4 columns: Property, Type, Default, Description
  • No hyperlinks or external URLs anywhere
  • Sparse sections marked with ⚠️ (not omitted)
  • Version changes marked with 📌
  • Content is self-contained and usable in isolation
Common Mistakes to Avoid:
  • ❌ Omitting sections because docs are sparse (always include with ⚠️)
  • ❌ Missing "Default" column in config tables
  • ❌ Summarizing code instead of copying verbatim
  • ❌ Using non-kebab-case names (e.g., "Baker_Framework" instead of "baker-framework")
  • ❌ Including hyperlinks (convert to inline content)

在最终确定生成的Skill前,请验证以下内容:
结构要求(必填):
  • 包含
    ---
    分隔的YAML前置元数据
  • name
    字段为kebab-case格式,从仓库/项目名称推导
  • description
    字段包含用途和触发关键词
  • 包含全部8个章节(从概述到术语表)
  • 目录与章节标题匹配
内容要求:
  • 在概述中明确并标注最新版本
  • 所有代码片段均为原样复制(无总结)
  • 配置表包含4列:Property、Type、Default、Description
  • 无任何超链接或外部URL
  • 内容稀疏的章节标记⚠️(未省略)
  • 版本变更标记📌
  • 内容独立,可单独使用
需避免的常见错误:
  • ❌ 因文档内容稀疏而省略章节(始终需保留并标记⚠️)
  • ❌ 配置表缺少“Default”列
  • ❌ 总结代码而非原样复制
  • ❌ 使用非kebab-case格式的名称(例如:“Baker_Framework”而非“baker-framework”)
  • ❌ 包含超链接(转换为内嵌内容)

10. Example Workflow

10. 示例工作流

When asked to generate a skill from a repo:
  1. Read the repo structure
    bash
    find <repo-path> -name "*.md" -o -name "*.adoc" | head -50
    ls <repo-path>/docs/ 2>/dev/null || ls <repo-path>/
  2. Identify the tool name and version
    • Check README.md, pom.xml, build.gradle, package.json
    • Look for version badges, changelog, releases
  3. Map documentation to output sections
    • Overview/Introduction → Section 1
    • Concepts/Architecture → Section 2
    • Configuration/Properties → Section 3
    • Examples/Tutorials → Section 4
    • Integration guides → Section 5
    • Troubleshooting/Warnings → Section 6
    • FAQ (if exists) → Section 7
    • Glossary/Terms → Section 8
  4. Extract and synthesize
    • Read each relevant file
    • Copy code blocks verbatim
    • Merge duplicate explanations
    • Note gaps with ⚠️ markers
  5. Generate the SKILL.md
    • Use exact template structure
    • Validate frontmatter YAML
    • Ensure no broken references
  6. Save to appropriate location
    • Default:
      .agents/skills/[tool-name]/SKILL.md

当要求从仓库生成Skill时:
  1. 读取仓库结构
    bash
    find <repo-path> -name "*.md" -o -name "*.adoc" | head -50
    ls <repo-path>/docs/ 2>/dev/null || ls <repo-path>/
  2. 识别工具名称和版本
    • 检查README.md、pom.xml、build.gradle、package.json
    • 查找版本徽章、变更日志、发布记录
  3. 将文档映射到输出章节
    • 概述/介绍 → 第1章
    • 概念/架构 → 第2章
    • 配置/属性 → 第3章
    • 示例/教程 → 第4章
    • 集成指南 → 第5章
    • 故障排除/警告 → 第6章
    • FAQ(如有) → 第7章
    • 术语表/术语 → 第8章
  4. 提取与合成
    • 读取每个相关文件
    • 原样复制代码块
    • 合并重复解释
    • 用⚠️标记内容缺失
  5. 生成SKILL.md
    • 使用精确的模板结构
    • 验证前置元数据YAML
    • 确保无无效引用
  6. 保存到合适位置
    • 默认位置:
      .agents/skills/[tool-name]/SKILL.md

11. Running and Evaluating Test Cases

11. 运行与评估测试用例

After generating a skill, run test cases to verify quality. Put results in
<skill-name>-workspace/
as a sibling to the skill directory.
生成Skill后,运行测试用例以验证质量。将结果保存到
<skill-name>-workspace/
目录,与Skill目录同级。

Step 1: Spawn all runs (with-skill AND baseline) in parallel

步骤1:并行启动所有运行(带Skill和基准测试)

For each test case, spawn two subagents in the same turn — one with the skill, one without:
With-skill run:
Execute this task:
- Skill path: <path-to-skill>/SKILL.md
- Task: <eval prompt - e.g., "Generate a skill from the Baker docs at /tmp/baker">
- Input files: <path to cloned repo>
- Save outputs to: <workspace>/iteration-<N>/eval-<name>/with_skill/run-1/outputs/
- Outputs to save: The generated SKILL.md file

IMPORTANT: First read the skill, then follow its instructions.
Baseline run (no skill):
Execute this task (no skill guidance - baseline):
- Task: <same eval prompt>
- Input files: <same repo path>
- Save outputs to: <workspace>/iteration-<N>/eval-<name>/without_skill/run-1/outputs/
- Outputs to save: The generated SKILL.md file
Write an
eval_metadata.json
for each test case:
json
{
  "eval_id": 1,
  "eval_name": "baker-repo-full",
  "prompt": "Generate a skill from the Baker docs at /tmp/baker",
  "assertions": [
    "Output is a valid SKILL.md file with YAML frontmatter",
    "Contains all 8 required sections",
    "Code examples are verbatim from source"
  ]
}
对于每个测试用例,在同一轮次中启动两个子Agent——一个使用Skill,一个不使用:
使用Skill的运行:
执行以下任务:
- Skill路径:<path-to-skill>/SKILL.md
- 任务:<评估提示 - 例如:“从/tmp/baker的Baker文档生成Skill”>
- 输入文件:<克隆仓库的路径>
- 输出保存到:<workspace>/iteration-<N>/eval-<name>/with_skill/run-1/outputs/
- 需保存的输出:生成的SKILL.md文件

重要提示:先读取Skill,再遵循其说明执行。
基准测试运行(不使用Skill):
执行以下任务(无Skill指导 - 基准测试):
- 任务:<相同的评估提示>
- 输入文件:<相同的仓库路径>
- 输出保存到:<workspace>/iteration-<N>/eval-<name>/without_skill/run-1/outputs/
- 需保存的输出:生成的SKILL.md文件
为每个测试用例编写
eval_metadata.json
json
{
  "eval_id": 1,
  "eval_name": "baker-repo-full",
  "prompt": "从/tmp/baker的Baker文档生成Skill",
  "assertions": [
    "输出为包含YAML前置元数据的有效SKILL.md文件",
    "包含所有8个必填章节",
    "代码示例与源文件完全一致"
  ]
}

Step 2: Draft assertions while runs are in progress

步骤2:在运行过程中编写断言

Good assertions for ING skill generation:
Structural:
  • "Output has YAML frontmatter with --- delimiters"
  • "Frontmatter contains 'name' field in kebab-case"
  • "Contains all 8 sections: Overview through Glossary"
  • "Configuration table has 4 columns: Property, Type, Default, Description"
Content:
  • "Version number X.Y.Z is mentioned in Overview"
  • "Code examples are verbatim (not summarized)"
  • "No hyperlinks or external URLs"
  • "Sparse sections marked with ⚠️"
Version handling:
  • "Uses only latest version content"
  • "Deprecated content excluded"
  • "Version changes marked with 📌"
适用于ING Skill生成的优秀断言:
结构类:
  • “输出包含
    ---
    分隔的YAML前置元数据”
  • “前置元数据包含kebab-case格式的
    name
    字段”
  • “包含全部8个章节:从概述到术语表”
  • “配置表包含4列:Property、Type、Default、Description”
内容类:
  • “概述中提及版本号X.Y.Z”
  • “代码示例为原样复制(未总结)”
  • “无超链接或外部URL”
  • “内容稀疏的章节标记⚠️”
版本处理类:
  • “仅使用最新版本的内容”
  • “已弃用内容已排除”
  • “版本变更标记📌”

Step 3: Capture timing data as runs complete

步骤3:运行完成后捕获计时数据

When each subagent completes, save timing to
timing.json
:
json
{
  "total_tokens": 84852,
  "duration_ms": 23332,
  "total_duration_seconds": 23.3
}
每个子Agent完成后,将计时数据保存到
timing.json
json
{
  "total_tokens": 84852,
  "duration_ms": 23332,
  "total_duration_seconds": 23.3
}

Step 4: Grade, aggregate, and launch viewer

步骤4:评分、聚合并启动查看器

  1. Grade each run — spawn a grader subagent with the absolute path to
    <skill-dir>/agents/grader.md
    and have it evaluate the assertions. Each run gets its own flat
    grading.json
    saved as a sibling to
    outputs/
    :
json
{
  "expectations": [
    {"text": "Has YAML frontmatter", "passed": true, "evidence": "File starts with ---"},
    {"text": "All 8 sections present", "passed": true, "evidence": "Found sections 1-8"}
  ],
  "summary": {
    "passed": 2,
    "failed": 0,
    "total": 2,
    "pass_rate": 1.0
  },
  "claims": [],
  "user_notes_summary": {"uncertainties": [], "needs_review": [], "workarounds": []},
  "eval_feedback": {"suggestions": [], "overall": "No suggestions, evals look solid"}
}
Save to
<workspace>/iteration-<N>/eval-<name>/with_skill/run-1/grading.json
(and the same pattern for
without_skill/run-1/grading.json
).
  1. Aggregate into benchmark:
bash
python -m scripts.aggregate_benchmark <workspace>/iteration-N --skill-name <name>
  1. Launch the viewer:
bash
python eval-viewer/generate_review.py <workspace>/iteration-N \
  --skill-name "ing-skill-generator" \
  --benchmark <workspace>/iteration-N/benchmark.json
For iteration 2+, add
--previous-workspace <workspace>/iteration-<N-1>
.
For headless environments, use
--static <output.html>
instead.
  1. 为每个运行评分 — 启动评分子Agent,传入
    <skill-dir>/agents/grader.md
    的绝对路径,让其评估断言。每个运行的评分结果保存为独立的
    grading.json
    ,与
    outputs/
    目录同级:
json
{
  "expectations": [
    {"text": "包含YAML前置元数据", "passed": true, "evidence": "文件以---开头"},
    {"text": "包含全部8个章节", "passed": true, "evidence": "找到第1-8章"}
  ],
  "summary": {
    "passed": 2,
    "failed": 0,
    "total": 2,
    "pass_rate": 1.0
  },
  "claims": [],
  "user_notes_summary": {"uncertainties": [], "needs_review": [], "workarounds": []},
  "eval_feedback": {"suggestions": [], "overall": "无建议,评估结果可靠"}
}
保存到
<workspace>/iteration-<N>/eval-<name>/with_skill/run-1/grading.json
without_skill/run-1/grading.json
采用相同格式)。
  1. 聚合为基准测试结果:
bash
python -m scripts.aggregate_benchmark <workspace>/iteration-N --skill-name <name>
  1. 启动查看器:
bash
python eval-viewer/generate_review.py <workspace>/iteration-N \
  --skill-name "ing-skill-generator" \
  --benchmark <workspace>/iteration-N/benchmark.json
对于第2次及以后的迭代,添加
--previous-workspace <workspace>/iteration-<N-1>
参数。
对于无界面环境,使用
--static <output.html>
参数替代。

Step 5: Read feedback and improve

步骤5:读取反馈并改进

When the user reviews results, read
feedback.json
:
json
{
  "reviews": [
    {"run_id": "eval-1-with_skill", "feedback": "missing version number in overview"},
    {"run_id": "eval-2-with_skill", "feedback": ""}
  ]
}
Empty feedback = user is satisfied. Focus improvements on cases with complaints.

当用户评审结果后,读取
feedback.json
json
{
  "reviews": [
    {"run_id": "eval-1-with_skill", "feedback": "概述中缺少版本号"},
    {"run_id": "eval-2-with_skill", "feedback": ""}
  ]
}
空反馈表示用户满意。重点改进有反馈的案例。

12. Improving the Skill

12. 改进Skill

After running test cases and collecting feedback:
运行测试用例并收集反馈后:

How to improve ING skill generation

如何改进ING Skill生成

  1. Check structural compliance — If outputs are missing sections or using wrong formats, strengthen the template instructions with explicit requirements.
  2. Check content extraction — If code examples are summarized instead of verbatim, add more emphasis on copying exactly. If warnings/pitfalls are missed, add instructions to scan for keywords like "WARNING", "NOTE", "⚠️".
  3. Check version handling — If old content leaks in, add clearer instructions to identify and exclude deprecated versions.
  4. Look at transcripts — Read how the subagent processed the docs. If it's doing redundant work or missing files, adjust the workflow instructions.
  5. Look for repeated work — If all test runs independently wrote similar helper scripts or took the same approach, consider bundling that script in the skill's
    scripts/
    directory.
  1. 检查结构合规性 — 如果输出缺少章节或格式错误,需强化模板说明,增加明确要求。
  2. 检查内容提取 — 如果代码示例被总结而非原样复制,需更强调精确复制。如果遗漏了警告/陷阱,需增加扫描“WARNING”“NOTE”“⚠️”等关键词的说明。
  3. 检查版本处理 — 如果旧版本内容被包含,需增加更明确的说明,指导识别并排除已弃用版本。
  4. 查看执行记录 — 读取子Agent处理文档的过程。如果存在冗余工作或遗漏文件,需调整工作流说明。
  5. 查找重复工作 — 如果所有测试运行都独立编写了类似的辅助脚本或采用了相同方法,可考虑将该脚本捆绑到Skill的
    scripts/
    目录中。

The iteration loop

迭代循环

  1. Apply improvements to
    SKILL.md
  2. Rerun all test cases into
    iteration-<N+1>/
  3. Launch viewer with
    --previous-workspace
    pointing to previous iteration
  4. Collect feedback, improve, repeat
Keep going until:
  • User is happy
  • All feedback is empty
  • Pass rates are consistently high

  1. SKILL.md
    应用改进
  2. 重新运行所有测试用例,保存到
    iteration-<N+1>/
    目录
  3. 启动查看器,传入
    --previous-workspace
    参数指向之前的迭代版本
  4. 收集反馈,改进,重复上述步骤
持续迭代直到:
  • 用户满意
  • 所有反馈为空
  • 通过率持续保持较高水平

13. Advanced: Blind Comparison

13. 进阶:盲态对比

For situations where you want a more rigorous comparison between two versions of a skill (e.g., "is the new version actually better?"), there's a blind comparison system.
当需要更严格地比较两个Skill版本(例如:“新版本是否真的更好?”)时,可使用盲态对比系统。

How it works

工作原理

  1. Give two outputs to an independent agent without telling it which is which
  2. Let it judge quality based purely on the outputs
  3. Analyze why the winner won
Read
agents/comparator.md
and
agents/analyzer.md
for the details.
  1. 将两个输出结果交给独立Agent,不告知哪个是新版本
  2. 让Agent仅根据输出结果判断质量
  3. 分析获胜版本的优势
详细信息请参考
agents/comparator.md
agents/analyzer.md

When to use

使用场景

  • Comparing a new skill version against the previous version
  • Deciding between two different approaches to the same problem
  • When quantitative metrics (pass rates) are similar but you sense a quality difference
This is optional and requires subagents. The human review loop is usually sufficient.

  • 比较Skill的新版本与旧版本
  • 为同一问题的两种不同方法做决策
  • 当定量指标(通过率)相近,但你感觉质量存在差异时
这是可选功能,需要子Agent支持。通常人工评审循环已足够。

14. Description Optimization

14. 描述优化

The description field in SKILL.md frontmatter is the primary mechanism that determines whether Claude invokes a skill. After creating or improving a skill, offer to optimize the description for better triggering accuracy.
SKILL.md前置元数据中的描述字段是决定Claude是否调用Skill的主要机制。创建或改进Skill后,可主动提出优化描述以提升触发准确性。

Step 1: Generate trigger eval queries

步骤1:生成触发评估查询

Create 20 eval queries — a mix of should-trigger (8-10) and should-not-trigger (8-10).
The queries must be realistic — the kind of thing a real Claude Code user would actually type. Include:
  • File paths and personal context
  • Different lengths and styles (formal, casual, typos)
  • Edge cases, not clear-cut examples
Bad examples:
"Format this data"
"Extract text from PDF"
"Create a skill"
Good examples:
"ok so I just cloned the merak-sdk repo to /tmp/merak and my tech lead wants me to turn the docs into something our team can use in their IDE. can you help?"

"I have the Baker framework documentation at ~/projects/ing-bank/baker/docs. Need to create a Copilot skill that covers all the recipe patterns and error handling strategies."

"we're using kingsroad-cli internally and the docs are scattered across like 5 different markdown files. can you consolidate them into a skill?"
For should-trigger queries, think about coverage:
  • Different phrasings of the same intent (formal, casual)
  • Cases where the user doesn't explicitly say "skill" but clearly needs one
  • Mentions of ING frameworks (Baker, Merak, Kingsroad)
  • References to documentation repos, docs/ folders
For should-not-trigger queries, the most valuable are near-misses:
  • Using the frameworks (not creating skills for them)
  • "How do I configure Baker retry policies?" — needs Baker skill, not skill generator
  • General Spring Boot/Java questions
  • Other types of skill creation (not ING-specific)
The key: don't make should-not-trigger queries obviously irrelevant. "Write a fibonacci function" is too easy — it doesn't test anything. Negative cases should be genuinely tricky.
创建20个评估查询——包含应触发(8-10个)和不应触发(8-10个)的查询。
查询必须符合实际——即Claude Code真实用户可能输入的内容。需包含:
  • 文件路径和个人上下文
  • 不同长度和风格(正式、随意、拼写错误)
  • 边缘情况,而非明确的示例
不良示例:
"格式化这些数据"
"从PDF提取文本"
"创建一个Skill"
优秀示例:
"我刚把merak-sdk仓库克隆到/tmp/merak,我的技术主管让我把文档转换成团队可以在IDE中使用的内容。你能帮忙吗?"

"我在~/projects/ing-bank/baker/docs路径下有Baker框架的文档。需要创建一个Copilot Skill,涵盖所有Recipe模式和错误处理策略。"

"我们内部使用kingsroad-cli,但文档分散在5个不同的markdown文件中。你能把它们整合为一个Skill吗?"
对于应触发的查询,需考虑覆盖范围:
  • 同一意图的不同表述(正式、随意)
  • 用户未明确提及“Skill”但显然需要的场景
  • 提及ING框架(Baker、Merak、Kingsroad)
  • 引用文档仓库、docs/文件夹
对于不应触发的查询,最有价值的是接近触发条件的案例:
  • 使用框架(而非为框架创建Skill)
  • “如何配置Baker重试策略?” — 需要Baker Skill,而非Skill生成器
  • 通用Spring Boot/Java问题
  • 其他类型的Skill创建(非ING特定)
关键:不要让不应触发的查询明显无关。“编写斐波那契函数”过于简单——无法测试任何内容。负面案例应具有一定的迷惑性。

Step 2: Review with user

步骤2:与用户评审

Present the eval set for review using the HTML template:
  1. Read the template from
    assets/eval_review.html
  2. Replace placeholders:
    • __EVAL_DATA_PLACEHOLDER__
      → the JSON array
    • __SKILL_NAME_PLACEHOLDER__
      → skill name
    • __SKILL_DESCRIPTION_PLACEHOLDER__
      → current description
  3. Write to temp file and open:
    open /tmp/eval_review_ing-skill-generator.html
  4. User edits queries, toggles should-trigger, then clicks "Export Eval Set"
  5. File downloads to
    ~/Downloads/eval_set.json
    as a JSON array with this format:
    json
    [
      {"query": "I have the Baker docs at ~/projects/baker...", "should_trigger": true},
      {"query": "How do I configure Baker retry policies?", "should_trigger": false}
    ]
  6. Copy the downloaded file to the workspace:
    cp ~/Downloads/eval_set.json <workspace>/trigger-eval.json
This step matters — bad eval queries lead to bad descriptions.
使用HTML模板向用户展示评估集:
  1. 读取
    assets/eval_review.html
    模板
  2. 替换占位符:
    • __EVAL_DATA_PLACEHOLDER__
      → JSON数组
    • __SKILL_NAME_PLACEHOLDER__
      → Skill名称
    • __SKILL_DESCRIPTION_PLACEHOLDER__
      → 当前描述
  3. 写入临时文件并打开:
    open /tmp/eval_review_ing-skill-generator.html
  4. 用户编辑查询,切换应触发状态,然后点击“导出评估集”
  5. 文件将下载为
    ~/Downloads/eval_set.json
    ,格式为JSON数组:
    json
    [
      {"query": "我在~/projects/baker路径下有Baker文档...", "should_trigger": true},
      {"query": "如何配置Baker重试策略?", "should_trigger": false}
    ]
  6. 将下载的文件复制到工作区:
    cp ~/Downloads/eval_set.json <workspace>/trigger-eval.json
此步骤至关重要——不良的评估查询会导致不良的描述。

Step 3: Run the optimization loop

步骤3:运行优化循环

Tell the user: "This will take some time — I'll run in the background and check periodically."
bash
python -m scripts.run_loop \
  --eval-set <workspace>/trigger-eval.json \
  --skill-path <skill-path> \
  --model <model-id-powering-this-session> \
  --max-iterations 5 \
  --verbose
Use the model ID from your system prompt so triggering tests match what the user experiences.
The script:
  • Splits eval set into 60% train / 40% held-out test
  • Evaluates current description (3 runs per query for reliability)
  • Proposes improvements based on failures
  • Re-evaluates each new description on both train and test
  • Selects best by test score (not train) to avoid overfitting
告知用户:“这需要一些时间——我将在后台运行并定期检查。”
bash
python -m scripts.run_loop \
  --eval-set <workspace>/trigger-eval.json \
  --skill-path <skill-path> \
  --model <model-id-powering-this-session> \
  --max-iterations 5 \
  --verbose
使用系统提示中的模型ID,确保触发测试与用户体验一致。
该脚本:
  • 将评估集拆分为60%训练集 / 40%保留测试集
  • 评估当前描述(每个查询运行3次以保证可靠性)
  • 根据失败情况提出改进建议
  • 重新评估每个新描述在训练集和测试集上的表现
  • 根据测试集分数(而非训练集)选择最佳描述,避免过拟合

How skill triggering works

Skill触发的工作原理

Understanding this helps design better eval queries:
  • Skills appear in Claude's
    available_skills
    list with name + description
  • Claude decides whether to consult a skill based on that description
  • Important: Claude only consults skills for tasks it can't easily handle on its own
This means:
  • Simple queries like "read this file" may not trigger skills even if description matches
  • Complex, multi-step, or specialized queries reliably trigger when description matches
  • Your eval queries should be substantive enough that Claude would benefit from consulting a skill
了解这一点有助于设计更好的评估查询:
  • Skill以名称+描述的形式出现在Claude的
    available_skills
    列表中
  • Claude根据描述决定是否调用Skill
  • 重要提示:Claude仅会在无法轻松独立完成任务时,才会调用Skill
这意味着:
  • 简单查询如“读取这个文件”即使描述匹配,也可能不会触发Skill
  • 复杂、多步骤或专业的查询,当描述匹配时会可靠触发
  • 评估查询应足够复杂,Claude会从调用Skill中受益

Step 4: Apply results

步骤4:应用结果

Take
best_description
from the JSON output and update SKILL.md frontmatter. Show the user before/after and report scores.
从JSON输出中获取
best_description
,更新SKILL.md的前置元数据。向用户展示前后对比并报告分数。

Package and Present

打包与呈现

If you have access to the
present_files
tool, package the skill:
bash
python -m scripts.package_skill <path/to/skill-folder>
This creates a
.skill
file the user can install.

如果有权限使用
present_files
工具,可打包Skill:
bash
python -m scripts.package_skill <path/to/skill-folder>
这会创建一个用户可安装的
.skill
文件。

15. Claude.ai-Specific Instructions

15. Claude.ai特定说明

In Claude.ai, the core workflow is the same (analyze repo → generate skill → test → review → improve), but some mechanics change because Claude.ai doesn't have subagents.
Running test cases: No subagents means no parallel execution. For each test case:
  1. Read the skill's SKILL.md
  2. Follow its instructions to accomplish the test prompt yourself
  3. Do them one at a time
This is less rigorous than independent subagents (you wrote the skill and you're running it), but it's a useful sanity check — the human review step compensates.
Reviewing results: If you can't open a browser (no display), skip the browser reviewer. Instead, present results directly in the conversation:
  • Show the prompt and output for each test case
  • If output is a file, save it and tell the user where to download
  • Ask for feedback inline: "How does this look? Anything you'd change?"
Benchmarking: Skip quantitative benchmarking — it relies on baseline comparisons which aren't meaningful without subagents. Focus on qualitative feedback.
The iteration loop: Same as before — improve the skill, rerun test cases, ask for feedback — just without the browser reviewer.
Description optimization: Requires
claude -p
CLI which is only in Claude Code. Skip it on Claude.ai.
Blind comparison: Requires subagents. Skip it.
Packaging:
package_skill.py
works anywhere with Python. User can download the resulting
.skill
file.
Updating an existing skill: The user might want to update, not create. In this case:
  • Preserve the original name — use unchanged
  • Copy to writeable location before editing — installed paths may be read-only
  • Stage in
    /tmp/
    first
    if packaging manually

在Claude.ai中,核心工作流相同(分析仓库→生成Skill→测试→评审→改进),但部分机制因Claude.ai没有子Agent而有所变化。
运行测试用例:没有子Agent意味着无法并行执行。对于每个测试用例:
  1. 读取Skill的SKILL.md
  2. 遵循其说明完成测试提示
  3. 逐个执行
这不如独立子Agent严谨(你编写了Skill并自己运行),但仍是有用的 sanity check——人工评审步骤可弥补这一不足。
评审结果:如果无法打开浏览器(无显示),可跳过浏览器评审器。直接在对话中展示结果:
  • 展示每个测试用例的提示和输出
  • 如果输出是文件,保存并告知用户下载位置
  • 直接询问反馈:“看起来如何?有什么需要修改的吗?”
基准测试:跳过定量基准测试——没有子Agent的话,基准对比没有意义。重点关注定性反馈。
迭代循环:与之前相同——改进Skill,重新运行测试用例,询问反馈——只是没有浏览器评审器。
描述优化:需要
claude -p
CLI,仅在Claude Code中可用。在Claude.ai中跳过此步骤。
盲态对比:需要子Agent支持。跳过此步骤。
打包
package_skill.py
在任何有Python的环境中都可运行。用户可下载生成的
.skill
文件。
更新现有Skill:用户可能需要更新而非创建Skill。在这种情况下:
  • 保留原始名称 — 不得修改
  • 编辑前复制到可写位置 — 安装路径可能为只读
  • 如果手动打包,先在
    /tmp/
    目录中处理

16. Cowork-Specific Instructions

16. Cowork特定说明

If you're in Cowork:
  • Subagents work — the main workflow (spawn tests in parallel, run baselines, grade) all works. If timeouts are severe, run tests in series.
  • No browser/display — use
    --static <output_path>
    to write standalone HTML instead of starting a server. Then proffer a link for the user to open.
  • IMPORTANT: Generate the eval viewer BEFORE evaluating yourself. Use
    generate_review.py
    (not custom HTML). Get results in front of the human ASAP!
  • Feedback via download — since there's no running server, "Submit All Reviews" downloads
    feedback.json
    . Read it from Downloads (may need to request access).
  • Packaging works
    package_skill.py
    just needs Python and filesystem.
  • Description optimization
    run_loop.py
    /
    run_eval.py
    should work fine since they use
    claude -p
    subprocess, not browser. Save this until the skill is fully finished and user agrees it's good.
  • Updating existing skills — follow the update guidance in Claude.ai section above.

如果在Cowork环境中:
  • 子Agent可用 — 主要工作流(并行启动测试、运行基准测试、评分)均可正常运行。如果超时严重,可串行运行测试。
  • 无浏览器/显示 — 使用
    --static <output_path>
    参数生成独立HTML文件,而非启动服务器。然后为用户提供链接打开。
  • 重要提示:先生成评估查看器再自行评估。 使用
    generate_review.py
    (而非自定义HTML)。尽快将结果展示给用户!
  • 通过下载反馈 — 由于没有运行服务器,“提交所有评审”会下载
    feedback.json
    。从下载目录读取(可能需要请求访问权限)。
  • 打包可用
    package_skill.py
    仅需要Python和文件系统。
  • 描述优化
    run_loop.py
    /
    run_eval.py
    应可正常运行,因为它们使用
    claude -p
    子进程而非浏览器。请在Skill完全完成且用户确认满意后再执行此步骤。
  • 更新现有Skill — 遵循上述Claude.ai部分的更新指南。

17. Reference Files

17. 参考文件

The following files support evaluation and improvement:
agents/
  • grader.md
    — How to evaluate assertions against outputs
  • comparator.md
    — Blind A/B comparison between versions
  • analyzer.md
    — Analyze why one version beat another
references/
  • schemas.md
    — JSON structures for evals.json, grading.json, benchmark.json
scripts/
  • aggregate_benchmark.py
    — Combine grading results into benchmark stats
  • generate_report.py
    — Create summary reports
  • improve_description.py
    — Generate improved descriptions
  • run_eval.py
    — Run trigger evaluation
  • run_loop.py
    — Full optimization loop
  • quick_validate.py
    — Fast validation checks
eval-viewer/
  • generate_review.py
    — Generate interactive review page
  • viewer.html
    — Template for review interface
以下文件支持评估与改进:
agents/
  • grader.md
    — 如何根据输出评估断言
  • comparator.md
    — 版本间的盲态A/B对比
  • analyzer.md
    — 分析获胜版本的优势
references/
  • schemas.md
    — evals.json、grading.json、benchmark.json的JSON结构
scripts/
  • aggregate_benchmark.py
    — 将评分结果合并为基准测试统计数据
  • generate_report.py
    — 创建总结报告
  • improve_description.py
    — 生成改进后的描述
  • run_eval.py
    — 运行触发评估
  • run_loop.py
    — 完整优化循环
  • quick_validate.py
    — 快速验证检查
eval-viewer/
  • generate_review.py
    — 生成交互式评审页面
  • viewer.html
    — 评审界面模板