skill-evaluation
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSkill Evaluation
Skill 评估
Evaluate skills as operational procedures. A good research skill changes future
agent behavior on realistic tasks and leaves inspectable artifacts.
将技能作为操作流程进行评估。优质的研究类Skill能够改变Agent在实际任务中的后续行为,并留下可检查的成果物。
Read First
必读内容
references/skill-evaluation-policy.mdreferences/external-skill-recommendations.md
references/skill-evaluation-policy.mdreferences/external-skill-recommendations.md
Workflow
工作流程
- Define the failure mode the skill should prevent.
- Write pressure scenarios using real academic tasks: bad PDFs, missing DOI, unsupported claim, messy repo, ambiguous SOTA, unreliable notebook, or MCP outage.
- Run or reason through the baseline behavior without assuming the skill helps.
- Revise the skill frontmatter so it triggers on the right user intents.
- Keep lean; move detailed policies to
SKILL.md.references/ - Validate local structure with .
python3 scripts/validate_skills.py - Install-list the package with .
npx -y skills add <repo> --list - Record recommended external skills separately from custom internal skills.
- 明确该Skill需要防范的故障模式。
- 结合真实学术任务编写压力测试场景:损坏的PDF、缺失的DOI、无依据的主张、混乱的代码仓库、模糊的SOTA、不可靠的Notebook或MCP中断。
- 在不假设该Skill会生效的前提下,运行或推演基准行为。
- 修改Skill的前置内容,使其能对正确的用户意图做出触发响应。
- 保持简洁;将详细策略移至
SKILL.md目录下。references/ - 使用验证本地结构。
python3 scripts/validate_skills.py - 使用将包添加至安装列表。
npx -y skills add <repo> --list - 将推荐的外部Skill与自定义内部Skill分开记录。
Review Criteria
评审标准
- trigger description starts with
Use when - no workflow summary in frontmatter
- references exist and are loaded only when needed
- outputs map to the research project contract
- citations and evidence rules are explicit
- skill does not duplicate a stronger external skill without a reason
- 触发描述以开头
Use when - 前置内容中不包含工作流程摘要
- 参考文件存在且仅在需要时加载
- 输出内容与研究项目契约匹配
- 引用和证据规则明确
- 无合理理由的情况下,Skill不得与更完善的外部Skill重复
Do Not
禁止事项
- Create one giant "research agent" skill.
- Hide fragile procedures in prose when a script or test can check them.
- Copy external skills blindly without adapting output paths and repository contracts.
- 创建一个庞大的“研究Agent”Skill。
- 当可用脚本或测试检查时,不得将易出错的流程隐藏在文本描述中。
- 不得盲目复制外部Skill而不调整输出路径和仓库契约。