review-skill-improver

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Review Skill Improver

Review Skill 优化器

Purpose

用途

Analyzes structured feedback logs to:

Identify rules that produce false positives (high REJECT rate)
Identify missing rules (issues that should have been caught)
Suggest specific skill modifications

分析结构化反馈日志以：

识别产生误报（高REJECT率）的规则
识别缺失的规则（本应被发现的问题）
提出具体的skill修改建议

Input

输入

Feedback log in enhanced schema format (see

review-feedback-schema

skill).

采用增强schema格式的反馈日志（参见

review-feedback-schema

skill）。

Analysis Process

分析流程

Step 1: Aggregate by Rule Source

步骤1：按规则来源聚合

For each unique rule_source:
  - Count total issues flagged
  - Count ACCEPT vs REJECT
  - Calculate rejection rate
  - Extract rejection rationales

For each unique rule_source:
  - Count total issues flagged
  - Count ACCEPT vs REJECT
  - Calculate rejection rate
  - Extract rejection rationales

Step 2: Identify High-Rejection Rules

步骤2：识别高拒绝率规则

Rules with >30% rejection rate warrant investigation:

Read the rejection rationales
Identify common themes
Determine if rule needs refinement or exception

拒绝率>30%的规则需要调查：

阅读拒绝理由
识别共同主题
判断规则是否需要细化或添加例外

Step 3: Pattern Analysis

步骤3：模式分析

Group rejections by rationale theme:

"Linter already handles this" -> Add linter verification step
"Framework supports this pattern" -> Add exception to skill
"Intentional design decision" -> Add codebase context check
"Wrong code path assumed" -> Add code tracing step

按拒绝理由主题分组：

"Linter already handles this" -> 添加linter验证步骤
"Framework supports this pattern" -> 为skill添加例外
"Intentional design decision" -> 添加代码库上下文检查
"Wrong code path assumed" -> 添加代码追踪步骤

Step 4: Generate Improvement Recommendations

步骤4：生成改进建议

For each identified issue, produce:

markdown

undefined

针对每个识别出的问题，生成：

markdown

undefined

Recommendation: [SHORT_TITLE]

Affected Skill:

skill-name/SKILL.md

skill-name/references/file.md

Problem: [What's causing false positives]

Evidence:

rejections with rationale "[common theme]"
Example: [file:line] - [issue] - [rationale]

Proposed Fix:

markdown

[Exact text to add/modify in the skill]

Expected Impact: Reduce false positive rate for [rule] from X% to Y%

undefined

Affected Skill:

skill-name/SKILL.md

skill-name/references/file.md

Problem: [What's causing false positives]

Evidence:

rejections with rationale "[common theme]"
Example: [file:line] - [issue] - [rationale]

Proposed Fix:

markdown

[Exact text to add/modify in the skill]

Expected Impact: Reduce false positive rate for [rule] from X% to Y%

undefined

Output Format

输出格式

markdown

undefined

markdown

undefined

Review Skill Improvement Report

Summary

Feedback entries analyzed: [N]
Unique rules triggered: [N]
High-rejection rules identified: [N]
Recommendations generated: [N]

Feedback entries analyzed: [N]
Unique rules triggered: [N]
High-rejection rules identified: [N]
Recommendations generated: [N]

High-Rejection Rules

Rule Source	Total	Rejected	Rate	Theme
...	...	...	...	...

Rule Source	Total	Rejected	Rate	Theme
...	...	...	...	...

Recommendations

[Numbered list of recommendations in format above]

Rules Performing Well

[Rules with <10% rejection rate - preserve these]

undefined

[Rules with <10% rejection rate - preserve these]

undefined

Usage

使用方法

bash

undefined

bash

undefined

Analyze feedback and generate improvement report

/review-skill-improver --output improvement-report.md

undefined

/review-skill-improver --output improvement-report.md

undefined

Example Analysis

示例分析

Given this feedback data:

csv

rule_source,verdict,rationale
python-code-review:line-length,REJECT,ruff check passes
python-code-review:line-length,REJECT,no E501 violation
python-code-review:line-length,REJECT,linter config allows 120
python-code-review:line-length,ACCEPT,fixed long line
pydantic-ai-common-pitfalls:tool-decorator,REJECT,docs support raw functions
python-code-review:type-safety,ACCEPT,added type annotation
python-code-review:type-safety,ACCEPT,fixed Any usage

Analysis output:

markdown

undefined

给定以下反馈数据：

csv

rule_source,verdict,rationale
python-code-review:line-length,REJECT,ruff check passes
python-code-review:line-length,REJECT,no E501 violation
python-code-review:line-length,REJECT,linter config allows 120
python-code-review:line-length,ACCEPT,fixed long line
pydantic-ai-common-pitfalls:tool-decorator,REJECT,docs support raw functions
python-code-review:type-safety,ACCEPT,added type annotation
python-code-review:type-safety,ACCEPT,fixed Any usage

分析输出：

markdown

undefined

Review Skill Improvement Report

Summary

Feedback entries analyzed: 7
Unique rules triggered: 3
High-rejection rules identified: 2
Recommendations generated: 2

Feedback entries analyzed: 7
Unique rules triggered: 3
High-rejection rules identified: 2
Recommendations generated: 2

High-Rejection Rules

Rule Source	Total	Rejected	Rate	Theme
python-code-review:line-length	4	3	75%	linter handles this
pydantic-ai-common-pitfalls:tool-decorator	1	1	100%	framework supports pattern

Rule Source	Total	Rejected	Rate	Theme
python-code-review:line-length	4	3	75%	linter handles this
pydantic-ai-common-pitfalls:tool-decorator	1	1	100%	framework supports pattern

Recommendations

1. Add Linter Verification for Line Length

Affected Skill:

commands/review-python.md

Problem: Flagging line length issues that linters confirm don't exist

Evidence:

3 rejections with rationale "linter passes/handles this"
Example: amelia/drivers/api/openai.py:102 - Line too long - ruff check passes

Proposed Fix: Add step to run

ruff check

before manual review. If linter passes for line length, do not flag manually.

Expected Impact: Reduce false positive rate for line-length from 75% to <10%

Affected Skill:

commands/review-python.md

Problem: Flagging line length issues that linters confirm don't exist

Evidence:

3 rejections with rationale "linter passes/handles this"
Example: amelia/drivers/api/openai.py:102 - Line too long - ruff check passes

Proposed Fix: Add step to run

ruff check

before manual review. If linter passes for line length, do not flag manually.

Expected Impact: Reduce false positive rate for line-length from 75% to <10%

2. Add Raw Function Tool Registration Exception

Affected Skill:

skills/pydantic-ai-common-pitfalls/SKILL.md

Problem: Flagging valid pydantic-ai pattern as error

Evidence:

1 rejection with rationale "docs support raw functions"

Proposed Fix: Add "Valid Patterns" section documenting that passing functions with RunContext to Agent(tools=[...]) is valid.

Expected Impact: Eliminate false positives for this pattern

Affected Skill:

skills/pydantic-ai-common-pitfalls/SKILL.md

Problem: Flagging valid pydantic-ai pattern as error

Evidence:

1 rejection with rationale "docs support raw functions"

Proposed Fix: Add "Valid Patterns" section documenting that passing functions with RunContext to Agent(tools=[...]) is valid.

Expected Impact: Eliminate false positives for this pattern

Rules Performing Well

Rule Source	Total	Accepted	Rate
python-code-review:type-safety	2	2	100%

undefined

Rule Source	Total	Accepted	Rate
python-code-review:type-safety	2	2	100%

undefined

Future: Automated Skill Updates

未来规划：自动化Skill更新

Once confidence is high, this skill can:

Generate PRs to beagle with skill improvements
Track improvement impact over time
A/B test rule variations

当置信度足够高时，该skill可以：

生成PR到beagle以更新skill
随时间追踪改进效果
对规则变体进行A/B测试

Feedback Loop

反馈循环

Review Code -> Log Outcomes -> Analyze Patterns -> Improve Skills -> Better Reviews
     ^                                                                    |
     +--------------------------------------------------------------------+

This creates a continuous improvement cycle where review quality improves based on empirical data rather than guesswork.

Review Code -> Log Outcomes -> Analyze Patterns -> Improve Skills -> Better Reviews
     ^                                                                    |
     +--------------------------------------------------------------------+

这创建了一个持续改进的循环，评审质量基于实证数据而非猜测得到提升。