ing-skill-generator
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseING Skill Generator — Complete Knowledge Base
ING Skill Generator — 完整知识库
Generate production-ready GitHub Copilot skills from ING documentation repositories. This skill
transforms documentation-as-code into self-contained expert knowledge bases that senior engineers
can use in their Spring Boot / Java 21 projects.
This skill includes:
- Skill generation from local cloned repos
- Evaluation framework with with-skill vs baseline comparison
- Grading agents for automated assertion checking
- Benchmark aggregation and interactive review viewer
- Description optimization for better triggering
At a high level, the process goes like this:
- Identify the target repository — by default, the repository that contains this skill (three levels up from ); use a different path only if the user specifies one
.agents/skills/<name>/SKILL.md - Analyze the repo structure, identify tool name, latest version, documentation files
- Extract and synthesize content following ING skill template (8 sections)
- Generate the SKILL.md with proper frontmatter and verbatim code examples
- Run test cases (with-skill vs baseline) to verify quality
- Review results, iterate based on feedback
- Optimize description for better triggering
Your job is to figure out where the user is in this process and help them progress. Maybe they have a freshly cloned repo and want a skill generated. Or maybe they already have a draft and want to improve it. Be flexible — if they say "just generate the skill, I don't need evals", do that instead.
从ING文档仓库生成可用于生产环境的GitHub Copilot Skill。本Skill可将文档即代码转换为独立的专家知识库,供资深工程师在他们的Spring Boot / Java 21项目中使用。
本Skill包含:
- 从本地克隆仓库生成Skill
- 带Skill与基准对比的评估框架
- 用于自动断言检查的评分Agent
- 基准测试聚合与交互式评审查看器
- 为优化触发效果而做的描述优化
整体流程大致如下:
- 识别目标仓库 — 默认是包含本Skill的仓库(从向上三级目录);仅当用户指定其他路径时才使用该路径
.agents/skills/<name>/SKILL.md - 分析仓库结构,识别工具名称、最新版本、文档文件
- 按照ING Skill模板(8个章节)提取并合成内容
- 生成带有正确前置元数据和原始代码示例的SKILL.md文件
- 运行测试用例(带Skill与基准对比)以验证质量
- 评审结果,根据反馈迭代优化
- 优化描述以提升触发效果
你的工作是判断用户当前处于流程的哪个阶段,并协助他们推进。用户可能刚克隆了仓库想要生成Skill,也可能已有草稿想要优化。请灵活处理——如果用户说“直接生成Skill,我不需要评估”,那就直接执行生成操作。
Communicating with the User
与用户沟通
The ING skill generator may be used by people across a range of familiarity with coding jargon. While most users are likely senior engineers, pay attention to context cues.
Default assumptions:
- "evaluation" and "benchmark" are OK
- "JSON" and "assertion" — look for cues the user knows these before using without explanation
- ING-specific terms (Baker, Merak, Kingsroad) — explain briefly if unclear
It's OK to briefly explain terms if you're in doubt. Feel free to clarify with a short definition if unsure.
ING Skill Generator的使用者对编程术语的熟悉程度各不相同。虽然大多数用户可能是资深工程师,但仍需留意上下文线索。
默认假设:
- “evaluation(评估)”和“benchmark(基准测试)”这类术语可以直接使用
- “JSON”和“assertion(断言)”——在使用前需确认用户是否了解这些术语
- ING特定术语(Baker、Merak、Kingsroad)——如果用户可能不了解,需简要解释
如果不确定,可简要解释术语。如有疑问,也可通过简短定义进行澄清。
1. Process Overview
1. 流程概述
- Analyze the repository — identify the tool name, latest version, documentation structure
- Extract content — gather all relevant docs, configs, code examples, warnings
- Synthesize knowledge base — merge, dedupe, organize into standard sections
- Output the skill file — produce a valid SKILL.md with proper frontmatter
- 分析仓库 — 识别工具名称、最新版本、文档结构
- 提取内容 — 收集所有相关文档、配置、代码示例、警告信息
- 合成知识库 — 合并、去重、整理为标准章节
- 输出Skill文件 — 生成符合要求的SKILL.md文件,包含正确的前置元数据
Creating a Skill from Repository
从仓库创建Skill
This is the core workflow for generating ING skills from documentation repositories.
这是从文档仓库生成ING Skill的核心工作流。
Step 1: Capture Intent
步骤1:捕捉用户意图
Start by understanding what the user wants. Key questions:
- What repository? — Infer this automatically: the repository to analyze is the one that contains this skill. Since skills live at , the repository root is three levels up from the SKILL.md file. Confirm with the user only if this can't be determined or if they explicitly name a different path.
.agents/skills/<skill-name>/SKILL.md - What tool/framework? Confirm the tool name if not obvious from the repo
- Run test cases? Suggest yes for complex repos, optional for simple ones
If the conversation already contains this info (e.g., "generate a skill from /tmp/baker"), extract it and confirm.
首先要了解用户的需求。关键问题:
- 哪个仓库? — 自动推断:需要分析的仓库是包含本Skill的仓库。由于Skill存储在,仓库根目录是SKILL.md文件向上三级目录。仅当无法确定或用户明确指定其他路径时,才与用户确认。
.agents/skills/<skill-name>/SKILL.md - 哪个工具/框架? 如果从仓库名称无法明确判断,需确认工具名称
- 是否运行测试用例? 对于复杂仓库建议运行,简单仓库可选
如果对话中已包含相关信息(例如:“从/tmp/baker生成Skill”),请提取信息并与用户确认。
Step 2: Analyze the Repository
步骤2:分析仓库
Before writing anything, understand the documentation structure:
bash
undefined在开始编写前,先了解文档结构:
bash
undefinedFind all documentation files
查找所有文档文件
find <repo-path> -name ".md" -o -name ".adoc" -o -name "*.rst" | head -50
find <repo-path> -name ".md" -o -name ".adoc" -o -name "*.rst" | head -50
Check for docs directory
检查是否存在docs目录
ls -la <repo-path>/docs/ 2>/dev/null || ls -la <repo-path>/
ls -la <repo-path>/docs/ 2>/dev/null || ls -la <repo-path>/
Look for version info
查找版本信息
cat <repo-path>/pom.xml 2>/dev/null | grep -A1 "<version>" | head -5
cat <repo-path>/package.json 2>/dev/null | grep "version"
cat <repo-path>/CHANGELOG.md 2>/dev/null | head -20
Identify:
- **Tool name** — from repo name, README title, or project config
- **Current version** — from pom.xml, package.json, build.gradle, or badges
- **Documentation structure** — where the main docs live, how they're organized
- **Code examples** — where sample code is locatedcat <repo-path>/pom.xml 2>/dev/null | grep -A1 "<version>" | head -5
cat <repo-path>/package.json 2>/dev/null | grep "version"
cat <repo-path>/CHANGELOG.md 2>/dev/null | head -20
识别以下信息:
- **工具名称** — 从仓库名称、README标题或项目配置中获取
- **当前版本** — 从pom.xml、package.json、build.gradle或版本徽章中获取
- **文档结构** — 主文档的存储位置及组织方式
- **代码示例** — 示例代码的存储位置Step 3: Map Documentation to Sections
步骤3:将文档映射到对应章节
Create a mental map of which source files feed into which output sections:
| Source Files | → Output Section |
|---|---|
| README.md, docs/overview.md, docs/intro.md | 1. Overview |
| docs/concepts.md, docs/architecture.md | 2. Core Concepts |
| docs/configuration.md, application.properties | 3. Configuration Reference |
| examples/, docs/tutorials/, docs/guides/ | 4. Code Examples |
| docs/integration.md, docs/other-tools.md | 5. Integration |
| docs/troubleshooting.md, docs/faq.md, comments in code | 6. Pitfalls & Anti-patterns |
| docs/faq.md (or generate from common questions) | 7. FAQ |
| Terminology in any doc | 8. Glossary |
在脑海中建立源文件与输出章节的映射关系:
| 源文件 | → 输出章节 |
|---|---|
| README.md, docs/overview.md, docs/intro.md | 1. 概述 |
| docs/concepts.md, docs/architecture.md | 2. 核心概念 |
| docs/configuration.md, application.properties | 3. 配置参考 |
| examples/, docs/tutorials/, docs/guides/ | 4. 代码示例 |
| docs/integration.md, docs/other-tools.md | 5. 集成 |
| docs/troubleshooting.md, docs/faq.md, 代码中的注释 | 6. 常见陷阱与反模式 |
| docs/faq.md(或从常见问题生成) | 7. 常见问题(FAQ) |
| 任意文档中的术语 | 8. 术语表 |
Step 4: Extract and Synthesize
步骤4:提取与合成
Read each relevant file and extract content:
- Copy code verbatim — never summarize or paraphrase code blocks
- Merge duplicates — if the same concept appears in multiple places, combine into one section
- Capture tribal knowledge — look for comments like "WARNING", "NOTE", "IMPORTANT", gotchas in examples
- Mark gaps — if a section is sparse, include it anyway with ⚠️ marker
读取每个相关文件并提取内容:
- 代码原样复制 — 绝不总结或改写代码块
- 合并重复内容 — 如果同一概念出现在多个位置,合并为一个章节
- 收集隐性经验 — 留意“WARNING”“NOTE”“IMPORTANT”等注释,以及示例中的注意事项
- 标记内容缺失 — 如果某个章节内容稀疏,仍需保留并标记⚠️
Step 5: Generate the SKILL.md
步骤5:生成SKILL.md
Follow the exact template structure in Section 6 (Output Template). Key requirements:
- YAML frontmatter with (kebab-case) and
name(comprehensive, trigger-friendly)description - All 8 sections present, even if sparse
- Configuration tables with 4 columns: Property, Type, Default, Description
- No hyperlinks — all content inline
严格遵循第6节(输出模板)的结构要求:
- 包含(kebab-case格式)和
name(全面、易于触发)的YAML前置元数据description - 必须包含全部8个章节,即使内容稀疏
- 配置表需包含4列:Property、Type、Default、Description
- 无超链接——所有内容需内嵌
Skill Writing Guide
Skill编写指南
Anatomy of an ING Skill
ING Skill的结构
skill-name/
├── SKILL.md (required)
│ ├── YAML frontmatter (name, description required)
│ └── Markdown instructions (8 sections)
└── Bundled Resources (optional)
├── scripts/ - Executable code for tasks
├── references/ - Additional docs loaded as needed
└── assets/ - Templates, examplesskill-name/
├── SKILL.md(必填)
│ ├── YAML前置元数据(name和description为必填)
│ └── Markdown说明(8个章节)
└── 捆绑资源(可选)
├── scripts/ - 用于执行任务的可执行代码
├── references/ - 按需加载的额外文档
└── assets/ - 模板、示例Progressive Disclosure
渐进式披露
Skills use a three-level loading system:
- Metadata (name + description) — Always in context (~100 words)
- SKILL.md body — In context when skill triggers (<500 lines ideal)
- Bundled resources — As needed (read explicitly when required)
Key patterns:
- Keep SKILL.md under 500 lines; if approaching limit, move detail to
references/ - Reference files clearly with guidance on when to read them
- For large reference files (>300 lines), include a table of contents
Skill采用三级加载机制:
- 元数据(名称+描述)——始终处于上下文环境中(约100词)
- SKILL.md主体——Skill触发时处于上下文环境中(理想情况下不超过500行)
- 捆绑资源——按需加载(仅在需要时明确读取)
关键模式:
- 保持SKILL.md在500行以内;如果接近上限,将细节移至目录
references/ - 明确引用文件,并说明何时需要读取
- 对于大型参考文件(超过300行),需包含目录
Writing Patterns
编写模式
Prefer imperative form in instructions.
Defining output formats:
markdown
undefined说明性内容优先使用祈使语气。
定义输出格式:
markdown
undefinedConfiguration Reference
配置参考
ALWAYS use this exact table format:
| Property | Type | Default | Description |
|---|
**Examples pattern:**
```markdown请严格使用以下表格格式:
| Property | Type | Default | Description |
|---|
**示例模式:**
```markdownCode Examples
代码示例
Example 1: Basic Recipe
java
// Verbatim code from source
Recipe recipe = new Recipe("OrderProcess")
.withInteraction(validateOrder)
.withSensoryEvent(orderPlaced);undefined示例1:基础Recipe
java
// 来自源文件的原始代码
Recipe recipe = new Recipe("OrderProcess")
.withInteraction(validateOrder)
.withSensoryEvent(orderPlaced);undefinedWriting Style
写作风格
Explain why things are important rather than heavy-handed MUSTs. Use theory of mind and make the skill general, not narrow to specific examples. Write a draft, then review with fresh eyes and improve.
解释内容的重要性,而非生硬地使用“必须”这类词汇。运用心智模型,使Skill具有通用性,而非局限于特定示例。先撰写草稿,再重新审阅并改进。
Step 6: Test Cases
步骤6:测试用例
After generating the skill, create 2-3 realistic test prompts. Save to :
evals/evals.jsonjson
{
"skill_name": "baker-framework",
"evals": [
{
"id": 1,
"name": "basic-recipe-creation",
"prompt": "Generate a skill from the Baker docs at /tmp/baker",
"expected_output": "SKILL.md with 8 sections, proper frontmatter",
"files": [],
"expectations": []
}
]
}See for the full schema.
references/schemas.md生成Skill后,创建2-3个真实的测试提示,并保存至:
evals/evals.jsonjson
{
"skill_name": "baker-framework",
"evals": [
{
"id": 1,
"name": "basic-recipe-creation",
"prompt": "从/tmp/baker的Baker文档生成Skill",
"expected_output": "包含8个章节、前置元数据的SKILL.md",
"files": [],
"expectations": []
}
]
}完整的Schema请参考。
references/schemas.md2. Naming Rules
2. 命名规则
Derive the canonical name directly from the repository:
| Source | Priority |
|---|---|
| Repository name | Highest (e.g., |
| Top-level README title | If repo name is generic |
| Project folder name | Fallback |
Critical rules:
- Use exactly what the project is called — no inventing, generalizing, or renaming
- Convert to kebab-case for the skill field (e.g.,
name→Baker Framework)baker-framework - If the repo covers multiple tools, derive each tool's name from its module/subfolder/section title
直接从仓库推导标准名称:
| 来源 | 优先级 |
|---|---|
| 仓库名称 | 最高(例如: |
| 顶层README标题 | 如果仓库名称较为通用 |
| 项目文件夹名称 | 备选方案 |
关键规则:
- 严格使用项目的官方名称——不得自创、泛化或重命名
- Skill的字段需转换为kebab-case格式(例如:
name→Baker Framework)baker-framework - 如果仓库包含多个工具,需从模块/子文件夹/章节标题中推导每个工具的名称
3. Versioning Strategy
3. 版本策略
When documentation contains multiple versions:
-
Identify latest version using:
- Explicit version numbers (e.g., >
v4.1.0)v3.2.0 - Release dates (e.g., >
2024)2023 - Folder/file naming (e.g., >
docs-v2/)docs-v1/ - or release notes
CHANGELOG.md
- Explicit version numbers (e.g.,
-
Use latest version as source of truth for all:
- Configuration properties
- API signatures
- Code examples
- Behavioral descriptions
-
Document version changes when relevant:
📌 Changed in 4.0 — previous behavior was: synchronous execution only -
Discard deprecated content unless it explains a still-relevant migration path
当文档包含多个版本时:
-
识别最新版本,可通过以下方式:
- 明确的版本号(例如:>
v4.1.0)v3.2.0 - 发布日期(例如:>
2024)2023 - 文件夹/文件命名(例如:>
docs-v2/)docs-v1/ - 或发布说明
CHANGELOG.md
- 明确的版本号(例如:
-
以最新版本为事实来源,涵盖所有内容:
- 配置属性
- API签名
- 代码示例
- 行为描述
-
相关时记录版本变更:
📌 4.0版本变更 — 之前的行为:仅支持同步执行 -
丢弃已弃用的内容,除非其仍能解释相关的迁移路径
4. Content Extraction
4. 内容提取
4.1 What to Include
4.1 需包含的内容
| Content Type | Handling |
|---|---|
| Code snippets | Copy verbatim — never summarize |
| Configuration blocks | Copy verbatim with all properties |
| API signatures | Copy verbatim with types and parameters |
| Architecture diagrams (textual) | Include as ASCII or describe structure |
| Warnings / gotchas | Always include, even if brief |
| Anti-patterns | Always include with explanations |
| Tribal knowledge | Capture implicit knowledge from comments, examples |
| 内容类型 | 处理方式 |
|---|---|
| 代码片段 | 原样复制 — 绝不总结 |
| 配置块 | 原样复制,包含所有属性 |
| API签名 | 原样复制,包含类型和参数 |
| 架构图(文本形式) | 以ASCII形式包含或描述结构 |
| 警告/注意事项 | 始终包含,即使内容简短 |
| 反模式 | 始终包含并附带解释 |
| 隐性经验 | 从注释、示例中提取隐性知识 |
4.2 What to Exclude
4.2 需排除的内容
- Hyperlinks (all content must be inline)
- File path references to the source repo
- Installation instructions for the docs themselves
- CI/CD pipeline configs for the docs repo
- Contributor guidelines (unless relevant to framework usage)
- 超链接(所有内容必须内嵌)
- 指向源仓库的文件路径引用
- 文档自身的安装说明
- 文档仓库的CI/CD流水线配置
- 贡献者指南(除非与框架使用相关)
4.3 Merging and Deduplication
4.3 合并与去重
When a concept appears in multiple files:
- Identify all occurrences
- Merge into one coherent section
- Preserve all unique details from each source
- Remove redundant explanations
当同一概念出现在多个文件中时:
- 识别所有出现位置
- 合并为一个连贯的章节
- 保留每个来源的所有独特细节
- 删除冗余解释
5. Handling Sparse Documentation
5. 处理内容稀疏的文档
When documentation is incomplete or ambiguous:
-
Add a clear marker:
⚠️ Documentation incomplete — verify with team -
Include whatever partial information exists
-
Note specific gaps:
⚠️ Default value not documented — verify in source code
当文档不完整或存在歧义时:
-
添加明确标记:
⚠️ 文档不完整 — 请与团队确认 -
包含所有可用的部分信息
-
记录具体的内容缺失:
⚠️ 默认值未记录 — 请在源代码中确认
6. Output Template
6. 输出模板
CRITICAL: All 8 sections MUST be present in every generated skill. If documentation is sparse for a section, include it anyway with a ⚠️ marker noting what's missing.
The generated skill must follow this exact structure:
markdown
---
name: [tool-name-kebab-case]
description: >
Expert skill for [Tool Name] — an ING-internal framework for [one-line purpose].
Use this skill when working in any ING Spring Boot / Java 21 project that integrates
with [Tool Name]. Covers configuration, recipes, integration patterns, pitfalls,
and verbatim code examples.
---重要提示:每个生成的Skill必须包含全部8个章节。 如果某个章节的文档内容稀疏,仍需保留该章节并标记⚠️说明缺失内容。
生成的Skill必须严格遵循以下结构:
markdown
---
name: [tool-name-kebab-case]
description: >
用于[工具名称]的专家级Skill — 一款ING内部框架,用于[一行描述用途]。
当在集成了[工具名称]的ING Spring Boot / Java 21项目中工作时,可使用本Skill。涵盖配置、Recipe、集成模式、常见陷阱及原始代码示例。
---[Tool Name] — Complete Knowledge Base
[工具名称] — 完整知识库
Table of Contents
目录
- Overview
- Core Concepts
- Configuration Reference
- Code Examples
- Integration with Other ING Tools
- Pitfalls & Anti-patterns
- FAQ
- Glossary
- 概述
- 核心概念
- 配置参考
- 代码示例
- 与其他ING工具的集成
- 常见陷阱与反模式
- 常见问题(FAQ)
- 术语表
1. Overview
1. 概述
[What the tool does and why it exists inside ING. MUST include current version number.]
[工具的功能及在ING内部的存在意义。必须包含当前版本号。]
Current Version: X.Y.Z
当前版本:X.Y.Z
[Version-specific notes if any]
[版本特定说明(如有)]
2. Core Concepts
2. 核心概念
[Mental models, architecture decisions, key abstractions. Use tables for comparisons.]
[心智模型、架构决策、关键抽象。可使用表格进行对比。]
3. Configuration Reference
3. 配置参考
MANDATORY: Configuration tables MUST have exactly 4 columns: Property, Type, Default, Description
| Property | Type | Default | Description |
|---|---|---|---|
| example.property | String | | Purpose of property |
| example.timeout | int | | Timeout in milliseconds ⚠️ NOT seconds |
If default is unknown, use: as the value.
⚠️ verify必填:配置表必须包含4列:Property、Type、Default、Description
| Property | Type | Default | Description |
|---|---|---|---|
| example.property | String | | 属性用途 |
| example.timeout | int | | 超时时间(毫秒)⚠️ 注意:不是秒 |
如果默认值未知,使用:作为值。
⚠️ 请确认4. Code Examples
4. 代码示例
[Verbatim snippets from source docs, organized by use case. NEVER summarize or paraphrase code.]
java
// Example: Basic interaction definition
@RequiresIngredient("orderId")
@FiresEvent(OrderValidated.class)
public interface ValidateOrder {
OrderValidated apply(String orderId);
}[来自源文档的原始代码片段,按用例组织。绝不总结或改写代码。]
java
// 示例:基础交互定义
@RequiresIngredient("orderId")
@FiresEvent(OrderValidated.class)
public interface ValidateOrder {
OrderValidated apply(String orderId);
}5. Integration with Other ING Tools
5. 与其他ING工具的集成
[How this tool connects to Baker / Merak SDK / Kingsroad / other ING systems]
⚠️ If no integration docs exist, write: "No documented integrations. Check with the team for internal usage patterns."
[本工具如何与Baker / Merak SDK / Kingsroad / 其他ING系统集成]
⚠️ 如果没有集成文档,请编写:“无已记录的集成方式。请与团队确认内部使用模式。”
6. Pitfalls & Anti-patterns
6. 常见陷阱与反模式
[Exact warnings from docs + implicit gotchas discovered in examples]
❌ Don't: [Anti-pattern description]
✅ Do: [Correct approach]
⚠️ If no pitfalls documented, write: "No pitfalls documented. Exercise standard caution with [relevant concerns]."
[来自文档的明确警告 + 从示例中发现的隐性注意事项]
❌ 不要:[反模式描述]
✅ 应该:[正确做法]
⚠️ 如果没有记录的陷阱,请编写:“无已记录的陷阱。使用时请针对[相关问题]保持常规谨慎。”
7. FAQ
7. 常见问题(FAQ)
Q: [Common question from docs or implied by content]
A: [Answer]
⚠️ If no FAQ exists, generate 2-3 questions based on likely user needs.
Q: [来自文档或隐含在内容中的常见问题]
A: [答案]
⚠️ 如果没有FAQ,请根据用户可能的需求生成2-3个问题。
8. Glossary
8. 术语表
| Term | Definition |
|---|---|
| [ING-specific term] | [Precise definition] |
⚠️ If no glossary exists, extract key terms from the documentation and define them.
---| 术语 | 定义 |
|---|---|
| [ING特定术语] | [精确定义] |
⚠️ 如果没有术语表,请从文档中提取关键术语并定义。
---7. Frontmatter Requirements
7. 前置元数据要求
7.1 Name Field
7.1 名称字段
- Kebab-case, lowercase
- Derived from repo/project name
- Example: ,
baker-framework,merak-sdkkingsroad-cli
- 小写kebab-case格式
- 从仓库/项目名称推导
- 示例:,
baker-framework,merak-sdkkingsroad-cli
7.2 Description Field
7.2 描述字段
The description is the primary trigger mechanism. Make it comprehensive:
- Start with what the skill is for
- Include the framework's purpose
- List specific contexts when to use
- Mention related keywords that should trigger
Good example:
yaml
description: >
Expert skill for Baker — an ING-internal framework for orchestrating microservice-based
process flows using a declarative recipe DSL. Use this skill when working in any ING
Spring Boot / Java 21 project that integrates with Baker. Covers configuration, recipes,
interactions, event handling, error strategies, testing, and verbatim code examples.描述是主要触发机制。请确保描述全面:
- 开头说明Skill的用途
- 包含框架的用途
- 列出具体的使用场景
- 提及应触发Skill的相关关键词
优秀示例:
yaml
description: >
用于Baker的专家级Skill — 一款ING内部框架,使用声明式Recipe DSL编排基于微服务的流程。当在集成了Baker的ING Spring Boot / Java 21项目中工作时,可使用本Skill。涵盖配置、Recipe、交互、事件处理、错误策略、测试及原始代码示例。8. Target Audience
8. 目标受众
The generated skill targets:
- Senior engineers at ING working in Spring Boot / Java 21 projects on Kubernetes
- They know general software engineering
- They do not know ING-internal framework internals
- They need practical, actionable guidance
Write accordingly:
- Explain ING-specific concepts
- Don't explain basic Java/Spring concepts
- Include complete, working examples
- Highlight common mistakes
生成的Skill面向:
- ING的资深工程师,在Kubernetes上的Spring Boot / Java 21项目中工作
- 他们具备通用软件工程知识
- 他们不了解ING内部框架的内部实现
- 他们需要实用、可操作的指导
请据此编写内容:
- 解释ING特定概念
- 无需解释基础Java/Spring概念
- 包含完整、可运行的示例
- 突出常见错误
9. Quality Checklist
9. 质量检查清单
Before finalizing a generated skill, verify:
Structural Requirements (MANDATORY):
- YAML frontmatter present with delimiters
--- - field is kebab-case, derived from repo/project
name - field includes purpose and trigger keywords
description - All 8 sections present (Overview through Glossary)
- Table of Contents matches section headings
Content Requirements:
- Latest version identified and stated in Overview
- All code snippets copied verbatim (no summarization)
- Configuration table has 4 columns: Property, Type, Default, Description
- No hyperlinks or external URLs anywhere
- Sparse sections marked with ⚠️ (not omitted)
- Version changes marked with 📌
- Content is self-contained and usable in isolation
Common Mistakes to Avoid:
- ❌ Omitting sections because docs are sparse (always include with ⚠️)
- ❌ Missing "Default" column in config tables
- ❌ Summarizing code instead of copying verbatim
- ❌ Using non-kebab-case names (e.g., "Baker_Framework" instead of "baker-framework")
- ❌ Including hyperlinks (convert to inline content)
在最终确定生成的Skill前,请验证以下内容:
结构要求(必填):
- 包含分隔的YAML前置元数据
--- - 字段为kebab-case格式,从仓库/项目名称推导
name - 字段包含用途和触发关键词
description - 包含全部8个章节(从概述到术语表)
- 目录与章节标题匹配
内容要求:
- 在概述中明确并标注最新版本
- 所有代码片段均为原样复制(无总结)
- 配置表包含4列:Property、Type、Default、Description
- 无任何超链接或外部URL
- 内容稀疏的章节标记⚠️(未省略)
- 版本变更标记📌
- 内容独立,可单独使用
需避免的常见错误:
- ❌ 因文档内容稀疏而省略章节(始终需保留并标记⚠️)
- ❌ 配置表缺少“Default”列
- ❌ 总结代码而非原样复制
- ❌ 使用非kebab-case格式的名称(例如:“Baker_Framework”而非“baker-framework”)
- ❌ 包含超链接(转换为内嵌内容)
10. Example Workflow
10. 示例工作流
When asked to generate a skill from a repo:
-
Read the repo structurebash
find <repo-path> -name "*.md" -o -name "*.adoc" | head -50 ls <repo-path>/docs/ 2>/dev/null || ls <repo-path>/ -
Identify the tool name and version
- Check README.md, pom.xml, build.gradle, package.json
- Look for version badges, changelog, releases
-
Map documentation to output sections
- Overview/Introduction → Section 1
- Concepts/Architecture → Section 2
- Configuration/Properties → Section 3
- Examples/Tutorials → Section 4
- Integration guides → Section 5
- Troubleshooting/Warnings → Section 6
- FAQ (if exists) → Section 7
- Glossary/Terms → Section 8
-
Extract and synthesize
- Read each relevant file
- Copy code blocks verbatim
- Merge duplicate explanations
- Note gaps with ⚠️ markers
-
Generate the SKILL.md
- Use exact template structure
- Validate frontmatter YAML
- Ensure no broken references
-
Save to appropriate location
- Default:
.agents/skills/[tool-name]/SKILL.md
- Default:
当要求从仓库生成Skill时:
-
读取仓库结构bash
find <repo-path> -name "*.md" -o -name "*.adoc" | head -50 ls <repo-path>/docs/ 2>/dev/null || ls <repo-path>/ -
识别工具名称和版本
- 检查README.md、pom.xml、build.gradle、package.json
- 查找版本徽章、变更日志、发布记录
-
将文档映射到输出章节
- 概述/介绍 → 第1章
- 概念/架构 → 第2章
- 配置/属性 → 第3章
- 示例/教程 → 第4章
- 集成指南 → 第5章
- 故障排除/警告 → 第6章
- FAQ(如有) → 第7章
- 术语表/术语 → 第8章
-
提取与合成
- 读取每个相关文件
- 原样复制代码块
- 合并重复解释
- 用⚠️标记内容缺失
-
生成SKILL.md
- 使用精确的模板结构
- 验证前置元数据YAML
- 确保无无效引用
-
保存到合适位置
- 默认位置:
.agents/skills/[tool-name]/SKILL.md
- 默认位置:
11. Running and Evaluating Test Cases
11. 运行与评估测试用例
After generating a skill, run test cases to verify quality. Put results in as a sibling to the skill directory.
<skill-name>-workspace/生成Skill后,运行测试用例以验证质量。将结果保存到目录,与Skill目录同级。
<skill-name>-workspace/Step 1: Spawn all runs (with-skill AND baseline) in parallel
步骤1:并行启动所有运行(带Skill和基准测试)
For each test case, spawn two subagents in the same turn — one with the skill, one without:
With-skill run:
Execute this task:
- Skill path: <path-to-skill>/SKILL.md
- Task: <eval prompt - e.g., "Generate a skill from the Baker docs at /tmp/baker">
- Input files: <path to cloned repo>
- Save outputs to: <workspace>/iteration-<N>/eval-<name>/with_skill/run-1/outputs/
- Outputs to save: The generated SKILL.md file
IMPORTANT: First read the skill, then follow its instructions.Baseline run (no skill):
Execute this task (no skill guidance - baseline):
- Task: <same eval prompt>
- Input files: <same repo path>
- Save outputs to: <workspace>/iteration-<N>/eval-<name>/without_skill/run-1/outputs/
- Outputs to save: The generated SKILL.md fileWrite an for each test case:
eval_metadata.jsonjson
{
"eval_id": 1,
"eval_name": "baker-repo-full",
"prompt": "Generate a skill from the Baker docs at /tmp/baker",
"assertions": [
"Output is a valid SKILL.md file with YAML frontmatter",
"Contains all 8 required sections",
"Code examples are verbatim from source"
]
}对于每个测试用例,在同一轮次中启动两个子Agent——一个使用Skill,一个不使用:
使用Skill的运行:
执行以下任务:
- Skill路径:<path-to-skill>/SKILL.md
- 任务:<评估提示 - 例如:“从/tmp/baker的Baker文档生成Skill”>
- 输入文件:<克隆仓库的路径>
- 输出保存到:<workspace>/iteration-<N>/eval-<name>/with_skill/run-1/outputs/
- 需保存的输出:生成的SKILL.md文件
重要提示:先读取Skill,再遵循其说明执行。基准测试运行(不使用Skill):
执行以下任务(无Skill指导 - 基准测试):
- 任务:<相同的评估提示>
- 输入文件:<相同的仓库路径>
- 输出保存到:<workspace>/iteration-<N>/eval-<name>/without_skill/run-1/outputs/
- 需保存的输出:生成的SKILL.md文件为每个测试用例编写:
eval_metadata.jsonjson
{
"eval_id": 1,
"eval_name": "baker-repo-full",
"prompt": "从/tmp/baker的Baker文档生成Skill",
"assertions": [
"输出为包含YAML前置元数据的有效SKILL.md文件",
"包含所有8个必填章节",
"代码示例与源文件完全一致"
]
}Step 2: Draft assertions while runs are in progress
步骤2:在运行过程中编写断言
Good assertions for ING skill generation:
Structural:
- "Output has YAML frontmatter with --- delimiters"
- "Frontmatter contains 'name' field in kebab-case"
- "Contains all 8 sections: Overview through Glossary"
- "Configuration table has 4 columns: Property, Type, Default, Description"
Content:
- "Version number X.Y.Z is mentioned in Overview"
- "Code examples are verbatim (not summarized)"
- "No hyperlinks or external URLs"
- "Sparse sections marked with ⚠️"
Version handling:
- "Uses only latest version content"
- "Deprecated content excluded"
- "Version changes marked with 📌"
适用于ING Skill生成的优秀断言:
结构类:
- “输出包含分隔的YAML前置元数据”
--- - “前置元数据包含kebab-case格式的字段”
name - “包含全部8个章节:从概述到术语表”
- “配置表包含4列:Property、Type、Default、Description”
内容类:
- “概述中提及版本号X.Y.Z”
- “代码示例为原样复制(未总结)”
- “无超链接或外部URL”
- “内容稀疏的章节标记⚠️”
版本处理类:
- “仅使用最新版本的内容”
- “已弃用内容已排除”
- “版本变更标记📌”
Step 3: Capture timing data as runs complete
步骤3:运行完成后捕获计时数据
When each subagent completes, save timing to :
timing.jsonjson
{
"total_tokens": 84852,
"duration_ms": 23332,
"total_duration_seconds": 23.3
}每个子Agent完成后,将计时数据保存到:
timing.jsonjson
{
"total_tokens": 84852,
"duration_ms": 23332,
"total_duration_seconds": 23.3
}Step 4: Grade, aggregate, and launch viewer
步骤4:评分、聚合并启动查看器
- Grade each run — spawn a grader subagent with the absolute path to and have it evaluate the assertions. Each run gets its own flat
<skill-dir>/agents/grader.mdsaved as a sibling tograding.json:outputs/
json
{
"expectations": [
{"text": "Has YAML frontmatter", "passed": true, "evidence": "File starts with ---"},
{"text": "All 8 sections present", "passed": true, "evidence": "Found sections 1-8"}
],
"summary": {
"passed": 2,
"failed": 0,
"total": 2,
"pass_rate": 1.0
},
"claims": [],
"user_notes_summary": {"uncertainties": [], "needs_review": [], "workarounds": []},
"eval_feedback": {"suggestions": [], "overall": "No suggestions, evals look solid"}
}Save to (and the same pattern for ).
<workspace>/iteration-<N>/eval-<name>/with_skill/run-1/grading.jsonwithout_skill/run-1/grading.json- Aggregate into benchmark:
bash
python -m scripts.aggregate_benchmark <workspace>/iteration-N --skill-name <name>- Launch the viewer:
bash
python eval-viewer/generate_review.py <workspace>/iteration-N \
--skill-name "ing-skill-generator" \
--benchmark <workspace>/iteration-N/benchmark.jsonFor iteration 2+, add .
--previous-workspace <workspace>/iteration-<N-1>For headless environments, use instead.
--static <output.html>- 为每个运行评分 — 启动评分子Agent,传入的绝对路径,让其评估断言。每个运行的评分结果保存为独立的
<skill-dir>/agents/grader.md,与grading.json目录同级:outputs/
json
{
"expectations": [
{"text": "包含YAML前置元数据", "passed": true, "evidence": "文件以---开头"},
{"text": "包含全部8个章节", "passed": true, "evidence": "找到第1-8章"}
],
"summary": {
"passed": 2,
"failed": 0,
"total": 2,
"pass_rate": 1.0
},
"claims": [],
"user_notes_summary": {"uncertainties": [], "needs_review": [], "workarounds": []},
"eval_feedback": {"suggestions": [], "overall": "无建议,评估结果可靠"}
}保存到(采用相同格式)。
<workspace>/iteration-<N>/eval-<name>/with_skill/run-1/grading.jsonwithout_skill/run-1/grading.json- 聚合为基准测试结果:
bash
python -m scripts.aggregate_benchmark <workspace>/iteration-N --skill-name <name>- 启动查看器:
bash
python eval-viewer/generate_review.py <workspace>/iteration-N \
--skill-name "ing-skill-generator" \
--benchmark <workspace>/iteration-N/benchmark.json对于第2次及以后的迭代,添加参数。
--previous-workspace <workspace>/iteration-<N-1>对于无界面环境,使用参数替代。
--static <output.html>Step 5: Read feedback and improve
步骤5:读取反馈并改进
When the user reviews results, read :
feedback.jsonjson
{
"reviews": [
{"run_id": "eval-1-with_skill", "feedback": "missing version number in overview"},
{"run_id": "eval-2-with_skill", "feedback": ""}
]
}Empty feedback = user is satisfied. Focus improvements on cases with complaints.
当用户评审结果后,读取:
feedback.jsonjson
{
"reviews": [
{"run_id": "eval-1-with_skill", "feedback": "概述中缺少版本号"},
{"run_id": "eval-2-with_skill", "feedback": ""}
]
}空反馈表示用户满意。重点改进有反馈的案例。
12. Improving the Skill
12. 改进Skill
After running test cases and collecting feedback:
运行测试用例并收集反馈后:
How to improve ING skill generation
如何改进ING Skill生成
-
Check structural compliance — If outputs are missing sections or using wrong formats, strengthen the template instructions with explicit requirements.
-
Check content extraction — If code examples are summarized instead of verbatim, add more emphasis on copying exactly. If warnings/pitfalls are missed, add instructions to scan for keywords like "WARNING", "NOTE", "⚠️".
-
Check version handling — If old content leaks in, add clearer instructions to identify and exclude deprecated versions.
-
Look at transcripts — Read how the subagent processed the docs. If it's doing redundant work or missing files, adjust the workflow instructions.
-
Look for repeated work — If all test runs independently wrote similar helper scripts or took the same approach, consider bundling that script in the skill'sdirectory.
scripts/
-
检查结构合规性 — 如果输出缺少章节或格式错误,需强化模板说明,增加明确要求。
-
检查内容提取 — 如果代码示例被总结而非原样复制,需更强调精确复制。如果遗漏了警告/陷阱,需增加扫描“WARNING”“NOTE”“⚠️”等关键词的说明。
-
检查版本处理 — 如果旧版本内容被包含,需增加更明确的说明,指导识别并排除已弃用版本。
-
查看执行记录 — 读取子Agent处理文档的过程。如果存在冗余工作或遗漏文件,需调整工作流说明。
-
查找重复工作 — 如果所有测试运行都独立编写了类似的辅助脚本或采用了相同方法,可考虑将该脚本捆绑到Skill的目录中。
scripts/
The iteration loop
迭代循环
- Apply improvements to
SKILL.md - Rerun all test cases into
iteration-<N+1>/ - Launch viewer with pointing to previous iteration
--previous-workspace - Collect feedback, improve, repeat
Keep going until:
- User is happy
- All feedback is empty
- Pass rates are consistently high
- 对应用改进
SKILL.md - 重新运行所有测试用例,保存到目录
iteration-<N+1>/ - 启动查看器,传入参数指向之前的迭代版本
--previous-workspace - 收集反馈,改进,重复上述步骤
持续迭代直到:
- 用户满意
- 所有反馈为空
- 通过率持续保持较高水平
13. Advanced: Blind Comparison
13. 进阶:盲态对比
For situations where you want a more rigorous comparison between two versions of a skill (e.g., "is the new version actually better?"), there's a blind comparison system.
当需要更严格地比较两个Skill版本(例如:“新版本是否真的更好?”)时,可使用盲态对比系统。
How it works
工作原理
- Give two outputs to an independent agent without telling it which is which
- Let it judge quality based purely on the outputs
- Analyze why the winner won
Read and for the details.
agents/comparator.mdagents/analyzer.md- 将两个输出结果交给独立Agent,不告知哪个是新版本
- 让Agent仅根据输出结果判断质量
- 分析获胜版本的优势
详细信息请参考和。
agents/comparator.mdagents/analyzer.mdWhen to use
使用场景
- Comparing a new skill version against the previous version
- Deciding between two different approaches to the same problem
- When quantitative metrics (pass rates) are similar but you sense a quality difference
This is optional and requires subagents. The human review loop is usually sufficient.
- 比较Skill的新版本与旧版本
- 为同一问题的两种不同方法做决策
- 当定量指标(通过率)相近,但你感觉质量存在差异时
这是可选功能,需要子Agent支持。通常人工评审循环已足够。
14. Description Optimization
14. 描述优化
The description field in SKILL.md frontmatter is the primary mechanism that determines whether Claude invokes a skill. After creating or improving a skill, offer to optimize the description for better triggering accuracy.
SKILL.md前置元数据中的描述字段是决定Claude是否调用Skill的主要机制。创建或改进Skill后,可主动提出优化描述以提升触发准确性。
Step 1: Generate trigger eval queries
步骤1:生成触发评估查询
Create 20 eval queries — a mix of should-trigger (8-10) and should-not-trigger (8-10).
The queries must be realistic — the kind of thing a real Claude Code user would actually type. Include:
- File paths and personal context
- Different lengths and styles (formal, casual, typos)
- Edge cases, not clear-cut examples
Bad examples:
"Format this data"
"Extract text from PDF"
"Create a skill"Good examples:
"ok so I just cloned the merak-sdk repo to /tmp/merak and my tech lead wants me to turn the docs into something our team can use in their IDE. can you help?"
"I have the Baker framework documentation at ~/projects/ing-bank/baker/docs. Need to create a Copilot skill that covers all the recipe patterns and error handling strategies."
"we're using kingsroad-cli internally and the docs are scattered across like 5 different markdown files. can you consolidate them into a skill?"For should-trigger queries, think about coverage:
- Different phrasings of the same intent (formal, casual)
- Cases where the user doesn't explicitly say "skill" but clearly needs one
- Mentions of ING frameworks (Baker, Merak, Kingsroad)
- References to documentation repos, docs/ folders
For should-not-trigger queries, the most valuable are near-misses:
- Using the frameworks (not creating skills for them)
- "How do I configure Baker retry policies?" — needs Baker skill, not skill generator
- General Spring Boot/Java questions
- Other types of skill creation (not ING-specific)
The key: don't make should-not-trigger queries obviously irrelevant. "Write a fibonacci function" is too easy — it doesn't test anything. Negative cases should be genuinely tricky.
创建20个评估查询——包含应触发(8-10个)和不应触发(8-10个)的查询。
查询必须符合实际——即Claude Code真实用户可能输入的内容。需包含:
- 文件路径和个人上下文
- 不同长度和风格(正式、随意、拼写错误)
- 边缘情况,而非明确的示例
不良示例:
"格式化这些数据"
"从PDF提取文本"
"创建一个Skill"优秀示例:
"我刚把merak-sdk仓库克隆到/tmp/merak,我的技术主管让我把文档转换成团队可以在IDE中使用的内容。你能帮忙吗?"
"我在~/projects/ing-bank/baker/docs路径下有Baker框架的文档。需要创建一个Copilot Skill,涵盖所有Recipe模式和错误处理策略。"
"我们内部使用kingsroad-cli,但文档分散在5个不同的markdown文件中。你能把它们整合为一个Skill吗?"对于应触发的查询,需考虑覆盖范围:
- 同一意图的不同表述(正式、随意)
- 用户未明确提及“Skill”但显然需要的场景
- 提及ING框架(Baker、Merak、Kingsroad)
- 引用文档仓库、docs/文件夹
对于不应触发的查询,最有价值的是接近触发条件的案例:
- 使用框架(而非为框架创建Skill)
- “如何配置Baker重试策略?” — 需要Baker Skill,而非Skill生成器
- 通用Spring Boot/Java问题
- 其他类型的Skill创建(非ING特定)
关键:不要让不应触发的查询明显无关。“编写斐波那契函数”过于简单——无法测试任何内容。负面案例应具有一定的迷惑性。
Step 2: Review with user
步骤2:与用户评审
Present the eval set for review using the HTML template:
- Read the template from
assets/eval_review.html - Replace placeholders:
- → the JSON array
__EVAL_DATA_PLACEHOLDER__ - → skill name
__SKILL_NAME_PLACEHOLDER__ - → current description
__SKILL_DESCRIPTION_PLACEHOLDER__
- Write to temp file and open:
open /tmp/eval_review_ing-skill-generator.html - User edits queries, toggles should-trigger, then clicks "Export Eval Set"
- File downloads to as a JSON array with this format:
~/Downloads/eval_set.jsonjson[ {"query": "I have the Baker docs at ~/projects/baker...", "should_trigger": true}, {"query": "How do I configure Baker retry policies?", "should_trigger": false} ] - Copy the downloaded file to the workspace:
cp ~/Downloads/eval_set.json <workspace>/trigger-eval.json
This step matters — bad eval queries lead to bad descriptions.
使用HTML模板向用户展示评估集:
- 读取模板
assets/eval_review.html - 替换占位符:
- → JSON数组
__EVAL_DATA_PLACEHOLDER__ - → Skill名称
__SKILL_NAME_PLACEHOLDER__ - → 当前描述
__SKILL_DESCRIPTION_PLACEHOLDER__
- 写入临时文件并打开:
open /tmp/eval_review_ing-skill-generator.html - 用户编辑查询,切换应触发状态,然后点击“导出评估集”
- 文件将下载为,格式为JSON数组:
~/Downloads/eval_set.jsonjson[ {"query": "我在~/projects/baker路径下有Baker文档...", "should_trigger": true}, {"query": "如何配置Baker重试策略?", "should_trigger": false} ] - 将下载的文件复制到工作区:
cp ~/Downloads/eval_set.json <workspace>/trigger-eval.json
此步骤至关重要——不良的评估查询会导致不良的描述。
Step 3: Run the optimization loop
步骤3:运行优化循环
Tell the user: "This will take some time — I'll run in the background and check periodically."
bash
python -m scripts.run_loop \
--eval-set <workspace>/trigger-eval.json \
--skill-path <skill-path> \
--model <model-id-powering-this-session> \
--max-iterations 5 \
--verboseUse the model ID from your system prompt so triggering tests match what the user experiences.
The script:
- Splits eval set into 60% train / 40% held-out test
- Evaluates current description (3 runs per query for reliability)
- Proposes improvements based on failures
- Re-evaluates each new description on both train and test
- Selects best by test score (not train) to avoid overfitting
告知用户:“这需要一些时间——我将在后台运行并定期检查。”
bash
python -m scripts.run_loop \
--eval-set <workspace>/trigger-eval.json \
--skill-path <skill-path> \
--model <model-id-powering-this-session> \
--max-iterations 5 \
--verbose使用系统提示中的模型ID,确保触发测试与用户体验一致。
该脚本:
- 将评估集拆分为60%训练集 / 40%保留测试集
- 评估当前描述(每个查询运行3次以保证可靠性)
- 根据失败情况提出改进建议
- 重新评估每个新描述在训练集和测试集上的表现
- 根据测试集分数(而非训练集)选择最佳描述,避免过拟合
How skill triggering works
Skill触发的工作原理
Understanding this helps design better eval queries:
- Skills appear in Claude's list with name + description
available_skills - Claude decides whether to consult a skill based on that description
- Important: Claude only consults skills for tasks it can't easily handle on its own
This means:
- Simple queries like "read this file" may not trigger skills even if description matches
- Complex, multi-step, or specialized queries reliably trigger when description matches
- Your eval queries should be substantive enough that Claude would benefit from consulting a skill
了解这一点有助于设计更好的评估查询:
- Skill以名称+描述的形式出现在Claude的列表中
available_skills - Claude根据描述决定是否调用Skill
- 重要提示:Claude仅会在无法轻松独立完成任务时,才会调用Skill
这意味着:
- 简单查询如“读取这个文件”即使描述匹配,也可能不会触发Skill
- 复杂、多步骤或专业的查询,当描述匹配时会可靠触发
- 评估查询应足够复杂,Claude会从调用Skill中受益
Step 4: Apply results
步骤4:应用结果
Take from the JSON output and update SKILL.md frontmatter. Show the user before/after and report scores.
best_description从JSON输出中获取,更新SKILL.md的前置元数据。向用户展示前后对比并报告分数。
best_descriptionPackage and Present
打包与呈现
If you have access to the tool, package the skill:
present_filesbash
python -m scripts.package_skill <path/to/skill-folder>This creates a file the user can install.
.skill如果有权限使用工具,可打包Skill:
present_filesbash
python -m scripts.package_skill <path/to/skill-folder>这会创建一个用户可安装的文件。
.skill15. Claude.ai-Specific Instructions
15. Claude.ai特定说明
In Claude.ai, the core workflow is the same (analyze repo → generate skill → test → review → improve), but some mechanics change because Claude.ai doesn't have subagents.
Running test cases: No subagents means no parallel execution. For each test case:
- Read the skill's SKILL.md
- Follow its instructions to accomplish the test prompt yourself
- Do them one at a time
This is less rigorous than independent subagents (you wrote the skill and you're running it), but it's a useful sanity check — the human review step compensates.
Reviewing results: If you can't open a browser (no display), skip the browser reviewer. Instead, present results directly in the conversation:
- Show the prompt and output for each test case
- If output is a file, save it and tell the user where to download
- Ask for feedback inline: "How does this look? Anything you'd change?"
Benchmarking: Skip quantitative benchmarking — it relies on baseline comparisons which aren't meaningful without subagents. Focus on qualitative feedback.
The iteration loop: Same as before — improve the skill, rerun test cases, ask for feedback — just without the browser reviewer.
Description optimization: Requires CLI which is only in Claude Code. Skip it on Claude.ai.
claude -pBlind comparison: Requires subagents. Skip it.
Packaging: works anywhere with Python. User can download the resulting file.
package_skill.py.skillUpdating an existing skill: The user might want to update, not create. In this case:
- Preserve the original name — use unchanged
- Copy to writeable location before editing — installed paths may be read-only
- Stage in first if packaging manually
/tmp/
在Claude.ai中,核心工作流相同(分析仓库→生成Skill→测试→评审→改进),但部分机制因Claude.ai没有子Agent而有所变化。
运行测试用例:没有子Agent意味着无法并行执行。对于每个测试用例:
- 读取Skill的SKILL.md
- 遵循其说明完成测试提示
- 逐个执行
这不如独立子Agent严谨(你编写了Skill并自己运行),但仍是有用的 sanity check——人工评审步骤可弥补这一不足。
评审结果:如果无法打开浏览器(无显示),可跳过浏览器评审器。直接在对话中展示结果:
- 展示每个测试用例的提示和输出
- 如果输出是文件,保存并告知用户下载位置
- 直接询问反馈:“看起来如何?有什么需要修改的吗?”
基准测试:跳过定量基准测试——没有子Agent的话,基准对比没有意义。重点关注定性反馈。
迭代循环:与之前相同——改进Skill,重新运行测试用例,询问反馈——只是没有浏览器评审器。
描述优化:需要 CLI,仅在Claude Code中可用。在Claude.ai中跳过此步骤。
claude -p盲态对比:需要子Agent支持。跳过此步骤。
打包:在任何有Python的环境中都可运行。用户可下载生成的文件。
package_skill.py.skill更新现有Skill:用户可能需要更新而非创建Skill。在这种情况下:
- 保留原始名称 — 不得修改
- 编辑前复制到可写位置 — 安装路径可能为只读
- 如果手动打包,先在目录中处理
/tmp/
16. Cowork-Specific Instructions
16. Cowork特定说明
If you're in Cowork:
-
Subagents work — the main workflow (spawn tests in parallel, run baselines, grade) all works. If timeouts are severe, run tests in series.
-
No browser/display — useto write standalone HTML instead of starting a server. Then proffer a link for the user to open.
--static <output_path> -
IMPORTANT: Generate the eval viewer BEFORE evaluating yourself. Use(not custom HTML). Get results in front of the human ASAP!
generate_review.py -
Feedback via download — since there's no running server, "Submit All Reviews" downloads. Read it from Downloads (may need to request access).
feedback.json -
Packaging works —just needs Python and filesystem.
package_skill.py -
Description optimization —/
run_loop.pyshould work fine since they userun_eval.pysubprocess, not browser. Save this until the skill is fully finished and user agrees it's good.claude -p -
Updating existing skills — follow the update guidance in Claude.ai section above.
如果在Cowork环境中:
-
子Agent可用 — 主要工作流(并行启动测试、运行基准测试、评分)均可正常运行。如果超时严重,可串行运行测试。
-
无浏览器/显示 — 使用参数生成独立HTML文件,而非启动服务器。然后为用户提供链接打开。
--static <output_path> -
重要提示:先生成评估查看器再自行评估。 使用(而非自定义HTML)。尽快将结果展示给用户!
generate_review.py -
通过下载反馈 — 由于没有运行服务器,“提交所有评审”会下载。从下载目录读取(可能需要请求访问权限)。
feedback.json -
打包可用 —仅需要Python和文件系统。
package_skill.py -
描述优化 —/
run_loop.py应可正常运行,因为它们使用run_eval.py子进程而非浏览器。请在Skill完全完成且用户确认满意后再执行此步骤。claude -p -
更新现有Skill — 遵循上述Claude.ai部分的更新指南。
17. Reference Files
17. 参考文件
The following files support evaluation and improvement:
agents/
- — How to evaluate assertions against outputs
grader.md - — Blind A/B comparison between versions
comparator.md - — Analyze why one version beat another
analyzer.md
references/
- — JSON structures for evals.json, grading.json, benchmark.json
schemas.md
scripts/
- — Combine grading results into benchmark stats
aggregate_benchmark.py - — Create summary reports
generate_report.py - — Generate improved descriptions
improve_description.py - — Run trigger evaluation
run_eval.py - — Full optimization loop
run_loop.py - — Fast validation checks
quick_validate.py
eval-viewer/
- — Generate interactive review page
generate_review.py - — Template for review interface
viewer.html
以下文件支持评估与改进:
agents/
- — 如何根据输出评估断言
grader.md - — 版本间的盲态A/B对比
comparator.md - — 分析获胜版本的优势
analyzer.md
references/
- — evals.json、grading.json、benchmark.json的JSON结构
schemas.md
scripts/
- — 将评分结果合并为基准测试统计数据
aggregate_benchmark.py - — 创建总结报告
generate_report.py - — 生成改进后的描述
improve_description.py - — 运行触发评估
run_eval.py - — 完整优化循环
run_loop.py - — 快速验证检查
quick_validate.py
eval-viewer/
- — 生成交互式评审页面
generate_review.py - — 评审界面模板
viewer.html