skill-comply

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

skill-comply: Automated Compliance Measurement

skill-comply:自动化合规性检测

Measures whether coding agents actually follow skills, rules, or agent definitions by:
  1. Auto-generating expected behavioral sequences (specs) from any .md file
  2. Auto-generating scenarios with decreasing prompt strictness (supportive → neutral → competing)
  3. Running
    claude -p
    and capturing tool call traces via stream-json
  4. Classifying tool calls against spec steps using LLM (not regex)
  5. Checking temporal ordering deterministically
  6. Generating self-contained reports with spec, prompts, and timelines
该工具用于检测编码Agent是否实际遵循技能、规则或Agent定义,具体方式如下:
  1. 从任意.md文件自动生成预期行为序列(specs)
  2. 自动生成提示严格程度递减的场景(支持型→中立型→竞争型)
  3. 运行
    claude -p
    并通过stream-json捕获工具调用轨迹
  4. 使用LLM(而非正则表达式)对照spec步骤对工具调用进行分类
  5. 确定性检查时间顺序
  6. 生成包含spec、提示和时间线的独立报告

Supported Targets

支持的检测目标

  • Skills (
    skills/*/SKILL.md
    ): Workflow skills like search-first, TDD guides
  • Rules (
    rules/common/*.md
    ): Mandatory rules like testing.md, security.md, git-workflow.md
  • Agent definitions (
    agents/*.md
    ): Whether an agent gets invoked when expected (internal workflow verification not yet supported)
  • Skills
    skills/*/SKILL.md
    ):诸如搜索优先、TDD指南之类的工作流技能
  • Rules
    rules/common/*.md
    ):诸如testing.md、security.md、git-workflow.md之类的强制性规则
  • Agent定义
    agents/*.md
    ):检测Agent是否在预期时机被调用(内部工作流验证暂不支持)

When to Activate

激活时机

  • User runs
    /skill-comply <path>
  • User asks "is this rule actually being followed?"
  • After adding new rules/skills, to verify agent compliance
  • Periodically as part of quality maintenance
  • 用户运行
    /skill-comply <路径>
    命令时
  • 用户询问“这条规则是否真的被遵循?”时
  • 添加新规则/技能后,用于验证Agent合规性时
  • 作为质量维护的一部分定期执行时

Usage

使用方法

bash
undefined
bash
undefined

Full run

完整运行

uv run python -m scripts.run ~/.claude/rules/common/testing.md
uv run python -m scripts.run ~/.claude/rules/common/testing.md

Dry run (no cost, spec + scenarios only)

试运行(无成本,仅生成spec和场景)

uv run python -m scripts.run --dry-run ~/.claude/skills/search-first/SKILL.md
uv run python -m scripts.run --dry-run ~/.claude/skills/search-first/SKILL.md

Custom models

自定义模型

uv run python -m scripts.run --gen-model haiku --model sonnet <path>
undefined
uv run python -m scripts.run --gen-model haiku --model sonnet <路径>
undefined

Key Concept: Prompt Independence

核心概念:提示独立性

Measures whether a skill/rule is followed even when the prompt doesn't explicitly support it.
检测即使在提示未明确支持的情况下,技能/规则是否仍被遵循。

Report Contents

报告内容

Reports are self-contained and include:
  1. Expected behavioral sequence (auto-generated spec)
  2. Scenario prompts (what was asked at each strictness level)
  3. Compliance scores per scenario
  4. Tool call timelines with LLM classification labels
报告为独立文件,包含以下内容:
  1. 预期行为序列(自动生成的spec)
  2. 场景提示(各严格程度下的请求内容)
  3. 各场景的合规性得分
  4. 带有LLM分类标签的工具调用时间线

Advanced (optional)

进阶功能(可选)

For users familiar with hooks, reports also include hook promotion recommendations for steps with low compliance. This is informational — the main value is the compliance visibility itself.
对于熟悉hooks的用户,报告还会针对合规性较低的步骤提供hook升级建议。这仅为参考信息——核心价值在于合规性可视化本身。