review-skill-improver
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseReview Skill Improver
Review Skill 优化器
Purpose
用途
Analyzes structured feedback logs to:
- Identify rules that produce false positives (high REJECT rate)
- Identify missing rules (issues that should have been caught)
- Suggest specific skill modifications
分析结构化反馈日志以:
- 识别产生误报(高REJECT率)的规则
- 识别缺失的规则(本应被发现的问题)
- 提出具体的skill修改建议
Input
输入
Feedback log in enhanced schema format (see skill).
review-feedback-schema采用增强schema格式的反馈日志(参见 skill)。
review-feedback-schemaAnalysis Process
分析流程
Step 1: Aggregate by Rule Source
步骤1:按规则来源聚合
For each unique rule_source:
- Count total issues flagged
- Count ACCEPT vs REJECT
- Calculate rejection rate
- Extract rejection rationalesFor each unique rule_source:
- Count total issues flagged
- Count ACCEPT vs REJECT
- Calculate rejection rate
- Extract rejection rationalesStep 2: Identify High-Rejection Rules
步骤2:识别高拒绝率规则
Rules with >30% rejection rate warrant investigation:
- Read the rejection rationales
- Identify common themes
- Determine if rule needs refinement or exception
拒绝率>30%的规则需要调查:
- 阅读拒绝理由
- 识别共同主题
- 判断规则是否需要细化或添加例外
Step 3: Pattern Analysis
步骤3:模式分析
Group rejections by rationale theme:
- "Linter already handles this" -> Add linter verification step
- "Framework supports this pattern" -> Add exception to skill
- "Intentional design decision" -> Add codebase context check
- "Wrong code path assumed" -> Add code tracing step
按拒绝理由主题分组:
- "Linter already handles this" -> 添加linter验证步骤
- "Framework supports this pattern" -> 为skill添加例外
- "Intentional design decision" -> 添加代码库上下文检查
- "Wrong code path assumed" -> 添加代码追踪步骤
Step 4: Generate Improvement Recommendations
步骤4:生成改进建议
For each identified issue, produce:
markdown
undefined针对每个识别出的问题,生成:
markdown
undefinedRecommendation: [SHORT_TITLE]
Recommendation: [SHORT_TITLE]
Affected Skill: or
skill-name/SKILL.mdskill-name/references/file.mdProblem: [What's causing false positives]
Evidence:
- rejections with rationale "[common theme]"
- Example: [file:line] - [issue] - [rationale]
Proposed Fix:
markdown
[Exact text to add/modify in the skill]Expected Impact: Reduce false positive rate for [rule] from X% to Y%
undefinedAffected Skill: or
skill-name/SKILL.mdskill-name/references/file.mdProblem: [What's causing false positives]
Evidence:
- rejections with rationale "[common theme]"
- Example: [file:line] - [issue] - [rationale]
Proposed Fix:
markdown
[Exact text to add/modify in the skill]Expected Impact: Reduce false positive rate for [rule] from X% to Y%
undefinedOutput Format
输出格式
markdown
undefinedmarkdown
undefinedReview Skill Improvement Report
Review Skill Improvement Report
Summary
Summary
- Feedback entries analyzed: [N]
- Unique rules triggered: [N]
- High-rejection rules identified: [N]
- Recommendations generated: [N]
- Feedback entries analyzed: [N]
- Unique rules triggered: [N]
- High-rejection rules identified: [N]
- Recommendations generated: [N]
High-Rejection Rules
High-Rejection Rules
| Rule Source | Total | Rejected | Rate | Theme |
|---|---|---|---|---|
| ... | ... | ... | ... | ... |
| Rule Source | Total | Rejected | Rate | Theme |
|---|---|---|---|---|
| ... | ... | ... | ... | ... |
Recommendations
Recommendations
[Numbered list of recommendations in format above]
[Numbered list of recommendations in format above]
Rules Performing Well
Rules Performing Well
[Rules with <10% rejection rate - preserve these]
undefined[Rules with <10% rejection rate - preserve these]
undefinedUsage
使用方法
bash
undefinedbash
undefinedAnalyze feedback and generate improvement report
Analyze feedback and generate improvement report
/review-skill-improver --output improvement-report.md
undefined/review-skill-improver --output improvement-report.md
undefinedExample Analysis
示例分析
Given this feedback data:
csv
rule_source,verdict,rationale
python-code-review:line-length,REJECT,ruff check passes
python-code-review:line-length,REJECT,no E501 violation
python-code-review:line-length,REJECT,linter config allows 120
python-code-review:line-length,ACCEPT,fixed long line
pydantic-ai-common-pitfalls:tool-decorator,REJECT,docs support raw functions
python-code-review:type-safety,ACCEPT,added type annotation
python-code-review:type-safety,ACCEPT,fixed Any usageAnalysis output:
markdown
undefined给定以下反馈数据:
csv
rule_source,verdict,rationale
python-code-review:line-length,REJECT,ruff check passes
python-code-review:line-length,REJECT,no E501 violation
python-code-review:line-length,REJECT,linter config allows 120
python-code-review:line-length,ACCEPT,fixed long line
pydantic-ai-common-pitfalls:tool-decorator,REJECT,docs support raw functions
python-code-review:type-safety,ACCEPT,added type annotation
python-code-review:type-safety,ACCEPT,fixed Any usage分析输出:
markdown
undefinedReview Skill Improvement Report
Review Skill Improvement Report
Summary
Summary
- Feedback entries analyzed: 7
- Unique rules triggered: 3
- High-rejection rules identified: 2
- Recommendations generated: 2
- Feedback entries analyzed: 7
- Unique rules triggered: 3
- High-rejection rules identified: 2
- Recommendations generated: 2
High-Rejection Rules
High-Rejection Rules
| Rule Source | Total | Rejected | Rate | Theme |
|---|---|---|---|---|
| python-code-review:line-length | 4 | 3 | 75% | linter handles this |
| pydantic-ai-common-pitfalls:tool-decorator | 1 | 1 | 100% | framework supports pattern |
| Rule Source | Total | Rejected | Rate | Theme |
|---|---|---|---|---|
| python-code-review:line-length | 4 | 3 | 75% | linter handles this |
| pydantic-ai-common-pitfalls:tool-decorator | 1 | 1 | 100% | framework supports pattern |
Recommendations
Recommendations
1. Add Linter Verification for Line Length
1. Add Linter Verification for Line Length
Affected Skill:
commands/review-python.mdProblem: Flagging line length issues that linters confirm don't exist
Evidence:
- 3 rejections with rationale "linter passes/handles this"
- Example: amelia/drivers/api/openai.py:102 - Line too long - ruff check passes
Proposed Fix:
Add step to run before manual review. If linter passes for line length, do not flag manually.
ruff checkExpected Impact: Reduce false positive rate for line-length from 75% to <10%
Affected Skill:
commands/review-python.mdProblem: Flagging line length issues that linters confirm don't exist
Evidence:
- 3 rejections with rationale "linter passes/handles this"
- Example: amelia/drivers/api/openai.py:102 - Line too long - ruff check passes
Proposed Fix:
Add step to run before manual review. If linter passes for line length, do not flag manually.
ruff checkExpected Impact: Reduce false positive rate for line-length from 75% to <10%
2. Add Raw Function Tool Registration Exception
2. Add Raw Function Tool Registration Exception
Affected Skill:
skills/pydantic-ai-common-pitfalls/SKILL.mdProblem: Flagging valid pydantic-ai pattern as error
Evidence:
- 1 rejection with rationale "docs support raw functions"
Proposed Fix:
Add "Valid Patterns" section documenting that passing functions with RunContext to Agent(tools=[...]) is valid.
Expected Impact: Eliminate false positives for this pattern
Affected Skill:
skills/pydantic-ai-common-pitfalls/SKILL.mdProblem: Flagging valid pydantic-ai pattern as error
Evidence:
- 1 rejection with rationale "docs support raw functions"
Proposed Fix:
Add "Valid Patterns" section documenting that passing functions with RunContext to Agent(tools=[...]) is valid.
Expected Impact: Eliminate false positives for this pattern
Rules Performing Well
Rules Performing Well
| Rule Source | Total | Accepted | Rate |
|---|---|---|---|
| python-code-review:type-safety | 2 | 2 | 100% |
undefined| Rule Source | Total | Accepted | Rate |
|---|---|---|---|
| python-code-review:type-safety | 2 | 2 | 100% |
undefinedFuture: Automated Skill Updates
未来规划:自动化Skill更新
Once confidence is high, this skill can:
- Generate PRs to beagle with skill improvements
- Track improvement impact over time
- A/B test rule variations
当置信度足够高时,该skill可以:
- 生成PR到beagle以更新skill
- 随时间追踪改进效果
- 对规则变体进行A/B测试
Feedback Loop
反馈循环
Review Code -> Log Outcomes -> Analyze Patterns -> Improve Skills -> Better Reviews
^ |
+--------------------------------------------------------------------+This creates a continuous improvement cycle where review quality improves based on empirical data rather than guesswork.
Review Code -> Log Outcomes -> Analyze Patterns -> Improve Skills -> Better Reviews
^ |
+--------------------------------------------------------------------+这创建了一个持续改进的循环,评审质量基于实证数据而非猜测得到提升。