review-skill-improver

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Review Skill Improver

Review Skill 优化器

Purpose

用途

Analyzes structured feedback logs to:
  1. Identify rules that produce false positives (high REJECT rate)
  2. Identify missing rules (issues that should have been caught)
  3. Suggest specific skill modifications
分析结构化反馈日志以:
  1. 识别产生误报(高REJECT率)的规则
  2. 识别缺失的规则(本应被发现的问题)
  3. 提出具体的skill修改建议

Input

输入

Feedback log in enhanced schema format (see
review-feedback-schema
skill).
采用增强schema格式的反馈日志(参见
review-feedback-schema
skill)。

Analysis Process

分析流程

Step 1: Aggregate by Rule Source

步骤1:按规则来源聚合

For each unique rule_source:
  - Count total issues flagged
  - Count ACCEPT vs REJECT
  - Calculate rejection rate
  - Extract rejection rationales
For each unique rule_source:
  - Count total issues flagged
  - Count ACCEPT vs REJECT
  - Calculate rejection rate
  - Extract rejection rationales

Step 2: Identify High-Rejection Rules

步骤2:识别高拒绝率规则

Rules with >30% rejection rate warrant investigation:
  • Read the rejection rationales
  • Identify common themes
  • Determine if rule needs refinement or exception
拒绝率>30%的规则需要调查:
  • 阅读拒绝理由
  • 识别共同主题
  • 判断规则是否需要细化或添加例外

Step 3: Pattern Analysis

步骤3:模式分析

Group rejections by rationale theme:
  • "Linter already handles this" -> Add linter verification step
  • "Framework supports this pattern" -> Add exception to skill
  • "Intentional design decision" -> Add codebase context check
  • "Wrong code path assumed" -> Add code tracing step
按拒绝理由主题分组:
  • "Linter already handles this" -> 添加linter验证步骤
  • "Framework supports this pattern" -> 为skill添加例外
  • "Intentional design decision" -> 添加代码库上下文检查
  • "Wrong code path assumed" -> 添加代码追踪步骤

Step 4: Generate Improvement Recommendations

步骤4:生成改进建议

For each identified issue, produce:
markdown
undefined
针对每个识别出的问题,生成:
markdown
undefined

Recommendation: [SHORT_TITLE]

Recommendation: [SHORT_TITLE]

Affected Skill:
skill-name/SKILL.md
or
skill-name/references/file.md
Problem: [What's causing false positives]
Evidence:
  • rejections with rationale "[common theme]"
  • Example: [file:line] - [issue] - [rationale]
Proposed Fix:
markdown
[Exact text to add/modify in the skill]
Expected Impact: Reduce false positive rate for [rule] from X% to Y%
undefined
Affected Skill:
skill-name/SKILL.md
or
skill-name/references/file.md
Problem: [What's causing false positives]
Evidence:
  • rejections with rationale "[common theme]"
  • Example: [file:line] - [issue] - [rationale]
Proposed Fix:
markdown
[Exact text to add/modify in the skill]
Expected Impact: Reduce false positive rate for [rule] from X% to Y%
undefined

Output Format

输出格式

markdown
undefined
markdown
undefined

Review Skill Improvement Report

Review Skill Improvement Report

Summary

Summary

  • Feedback entries analyzed: [N]
  • Unique rules triggered: [N]
  • High-rejection rules identified: [N]
  • Recommendations generated: [N]
  • Feedback entries analyzed: [N]
  • Unique rules triggered: [N]
  • High-rejection rules identified: [N]
  • Recommendations generated: [N]

High-Rejection Rules

High-Rejection Rules

Rule SourceTotalRejectedRateTheme
...............
Rule SourceTotalRejectedRateTheme
...............

Recommendations

Recommendations

[Numbered list of recommendations in format above]
[Numbered list of recommendations in format above]

Rules Performing Well

Rules Performing Well

[Rules with <10% rejection rate - preserve these]
undefined
[Rules with <10% rejection rate - preserve these]
undefined

Usage

使用方法

bash
undefined
bash
undefined

Analyze feedback and generate improvement report

Analyze feedback and generate improvement report

/review-skill-improver --output improvement-report.md
undefined
/review-skill-improver --output improvement-report.md
undefined

Example Analysis

示例分析

Given this feedback data:
csv
rule_source,verdict,rationale
python-code-review:line-length,REJECT,ruff check passes
python-code-review:line-length,REJECT,no E501 violation
python-code-review:line-length,REJECT,linter config allows 120
python-code-review:line-length,ACCEPT,fixed long line
pydantic-ai-common-pitfalls:tool-decorator,REJECT,docs support raw functions
python-code-review:type-safety,ACCEPT,added type annotation
python-code-review:type-safety,ACCEPT,fixed Any usage
Analysis output:
markdown
undefined
给定以下反馈数据:
csv
rule_source,verdict,rationale
python-code-review:line-length,REJECT,ruff check passes
python-code-review:line-length,REJECT,no E501 violation
python-code-review:line-length,REJECT,linter config allows 120
python-code-review:line-length,ACCEPT,fixed long line
pydantic-ai-common-pitfalls:tool-decorator,REJECT,docs support raw functions
python-code-review:type-safety,ACCEPT,added type annotation
python-code-review:type-safety,ACCEPT,fixed Any usage
分析输出:
markdown
undefined

Review Skill Improvement Report

Review Skill Improvement Report

Summary

Summary

  • Feedback entries analyzed: 7
  • Unique rules triggered: 3
  • High-rejection rules identified: 2
  • Recommendations generated: 2
  • Feedback entries analyzed: 7
  • Unique rules triggered: 3
  • High-rejection rules identified: 2
  • Recommendations generated: 2

High-Rejection Rules

High-Rejection Rules

Rule SourceTotalRejectedRateTheme
python-code-review:line-length4375%linter handles this
pydantic-ai-common-pitfalls:tool-decorator11100%framework supports pattern
Rule SourceTotalRejectedRateTheme
python-code-review:line-length4375%linter handles this
pydantic-ai-common-pitfalls:tool-decorator11100%framework supports pattern

Recommendations

Recommendations

1. Add Linter Verification for Line Length

1. Add Linter Verification for Line Length

Affected Skill:
commands/review-python.md
Problem: Flagging line length issues that linters confirm don't exist
Evidence:
  • 3 rejections with rationale "linter passes/handles this"
  • Example: amelia/drivers/api/openai.py:102 - Line too long - ruff check passes
Proposed Fix: Add step to run
ruff check
before manual review. If linter passes for line length, do not flag manually.
Expected Impact: Reduce false positive rate for line-length from 75% to <10%
Affected Skill:
commands/review-python.md
Problem: Flagging line length issues that linters confirm don't exist
Evidence:
  • 3 rejections with rationale "linter passes/handles this"
  • Example: amelia/drivers/api/openai.py:102 - Line too long - ruff check passes
Proposed Fix: Add step to run
ruff check
before manual review. If linter passes for line length, do not flag manually.
Expected Impact: Reduce false positive rate for line-length from 75% to <10%

2. Add Raw Function Tool Registration Exception

2. Add Raw Function Tool Registration Exception

Affected Skill:
skills/pydantic-ai-common-pitfalls/SKILL.md
Problem: Flagging valid pydantic-ai pattern as error
Evidence:
  • 1 rejection with rationale "docs support raw functions"
Proposed Fix: Add "Valid Patterns" section documenting that passing functions with RunContext to Agent(tools=[...]) is valid.
Expected Impact: Eliminate false positives for this pattern
Affected Skill:
skills/pydantic-ai-common-pitfalls/SKILL.md
Problem: Flagging valid pydantic-ai pattern as error
Evidence:
  • 1 rejection with rationale "docs support raw functions"
Proposed Fix: Add "Valid Patterns" section documenting that passing functions with RunContext to Agent(tools=[...]) is valid.
Expected Impact: Eliminate false positives for this pattern

Rules Performing Well

Rules Performing Well

Rule SourceTotalAcceptedRate
python-code-review:type-safety22100%
undefined
Rule SourceTotalAcceptedRate
python-code-review:type-safety22100%
undefined

Future: Automated Skill Updates

未来规划:自动化Skill更新

Once confidence is high, this skill can:
  1. Generate PRs to beagle with skill improvements
  2. Track improvement impact over time
  3. A/B test rule variations
当置信度足够高时,该skill可以:
  1. 生成PR到beagle以更新skill
  2. 随时间追踪改进效果
  3. 对规则变体进行A/B测试

Feedback Loop

反馈循环

Review Code -> Log Outcomes -> Analyze Patterns -> Improve Skills -> Better Reviews
     ^                                                                    |
     +--------------------------------------------------------------------+
This creates a continuous improvement cycle where review quality improves based on empirical data rather than guesswork.
Review Code -> Log Outcomes -> Analyze Patterns -> Improve Skills -> Better Reviews
     ^                                                                    |
     +--------------------------------------------------------------------+
这创建了一个持续改进的循环,评审质量基于实证数据而非猜测得到提升。