self-improving-agent

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Self-Improving Agent

自我改进Agent

"An AI agent that learns from every interaction, accumulating patterns and insights to continuously improve its own capabilities." — Based on 2025 lifelong learning research
"一款可从每次交互中学习、积累模式和洞见以持续提升自身能力的AI Agent。" — 基于2025年终身学习研究

Overview

概述

This is a universal self-improvement system that learns from ALL skill experiences, not just PRDs. It implements a complete feedback loop with:
  • Multi-Memory Architecture: Semantic + Episodic + Working memory
  • Self-Correction: Detects and fixes skill guidance errors
  • Self-Validation: Periodically verifies skill accuracy
  • Hooks Integration: Auto-triggers on skill events (before_start, after_complete, on_error)
  • Evolution Markers: Traceable changes with source attribution
这是一个通用自我改进系统,可从所有技能经验中学习,而不仅仅是PRD。它实现了一个完整的反馈循环,包含:
  • 多内存架构:语义+情景+工作内存
  • 自我修正:检测并修复技能指导错误
  • 自我验证:定期验证技能准确性
  • 钩子集成:在技能事件(before_start、after_complete、on_error)时自动触发
  • 演进标记:可追溯的变更及来源归因

Research-Based Design

基于研究的设计

Based on 2025 research:
ResearchKey InsightApplication
SimpleMemEfficient lifelong memoryPattern accumulation system
Multi-Memory SurveySemantic + Episodic memoryWorld knowledge + experiences
Lifelong LearningContinuous task stream learningLearn from every skill use
Evo-MemoryTest-time lifelong learningReal-time adaptation
基于2025年相关研究:
研究核心洞见应用
SimpleMem高效终身内存模式积累系统
Multi-Memory Survey语义+情景内存世界知识+经验存储
Lifelong Learning持续任务流学习从每次技能使用中学习
Evo-Memory测试时终身学习实时适配

The Self-Improvement Loop

自我改进循环

┌─────────────────────────────────────────────────────────────────┐
│                    UNIVERSAL SELF-IMPROVEMENT                    │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   Skill Event → Extract Experience → Abstract Pattern → Update  │
│        │                  │                │         │          │
│        ▼                  ▼                ▼         ▼          │
│   ┌─────────────────────────────────────────────────────┐       │
│   │              MULTI-MEMORY SYSTEM                      │       │
│   ├─────────────────────────────────────────────────────┤       │
│   │  Semantic Memory   │  Episodic Memory  │ Working Memory │  │
│   │  (Patterns/Rules)  │  (Experiences)    │  (Current)     │  │
│   │  memory/semantic/  │  memory/episodic/ │  memory/working/│  │
│   └─────────────────────────────────────────────────────┘       │
│                                                                 │
│   ┌─────────────────────────────────────────────────────┐       │
│   │              FEEDBACK LOOP                            │       │
│   │  User Feedback → Confidence Update → Pattern Adapt   │       │
│   └─────────────────────────────────────────────────────┘       │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│                    UNIVERSAL SELF-IMPROVEMENT                    │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   Skill Event → Extract Experience → Abstract Pattern → Update  │
│        │                  │                │         │          │
│        ▼                  ▼                ▼         ▼          │
│   ┌─────────────────────────────────────────────────────┐       │
│   │              MULTI-MEMORY SYSTEM                      │       │
│   ├─────────────────────────────────────────────────────┤       │
│   │  Semantic Memory   │  Episodic Memory  │ Working Memory │  │
│   │  (Patterns/Rules)  │  (Experiences)    │  (Current)     │  │
│   │  memory/semantic/  │  memory/episodic/ │  memory/working/│  │
│   └─────────────────────────────────────────────────────┘       │
│                                                                 │
│   ┌─────────────────────────────────────────────────────┐       │
│   │              FEEDBACK LOOP                            │       │
│   │  User Feedback → Confidence Update → Pattern Adapt   │       │
│   └─────────────────────────────────────────────────────┘       │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

When This Activates

触发时机

Automatic Triggers (via hooks)

自动触发(通过钩子)

EventTriggerAction
before_startAny skill startsLog session start
after_completeAny skill completesExtract patterns, update skills
on_errorBash returns non-zero exitCapture error context, trigger self-correction
事件触发条件动作
before_start任意技能启动记录会话开始
after_complete任意技能完成提取模式,更新技能
on_errorBash返回非零退出码捕获错误上下文,触发自我修正

Manual Triggers

手动触发

  • User says "自我进化", "self-improve", "从经验中学习"
  • User says "分析今天的经验", "总结教训"
  • User asks to improve a specific skill
  • 用户输入“自我进化”、“self-improve”、“从经验中学习”
  • 用户输入“分析今天的经验”、“总结教训”
  • 用户要求改进特定技能

Evolution Priority Matrix

演进优先级矩阵

Trigger evolution when new reusable knowledge appears:
TriggerTarget SkillPriorityAction
New PRD pattern discoveredprd-plannerHighAdd to quality checklist
Architecture tradeoff clarifiedarchitecting-solutionsHighAdd to decision patterns
API design rule learnedapi-designerHighUpdate template
Debugging fix discovereddebuggerHighAdd to anti-patterns
Review checklist gapcode-reviewerHighAdd checklist item
Perf/security insightperformance-engineer, security-auditorHighAdd to patterns
UI/UX spec issueprd-planner, architecting-solutionsHighAdd visual spec requirements
React/state patterndebugger, refactoring-specialistMediumAdd to patterns
Test strategy improvementtest-automator, qa-expertMediumUpdate approach
CI/deploy fixdeployment-engineerMediumAdd to troubleshooting
当出现新的可复用知识时触发演进:
触发条件目标技能优先级动作
发现新的PRD模式prd-planner添加到质量检查清单
明确架构权衡architecting-solutions添加到决策模式
学到API设计规则api-designer更新模板
发现调试修复方案debugger添加到反模式库
发现评审检查清单漏洞code-reviewer添加检查项
获得性能/安全洞见performance-engineer, security-auditor添加到模式库
发现UI/UX规范问题prd-planner, architecting-solutions添加视觉规范要求
学到React/状态模式debugger, refactoring-specialist添加到模式库
改进测试策略test-automator, qa-expert更新方法
发现CI/部署修复方案deployment-engineer添加到故障排查库

Multi-Memory Architecture

多内存架构

1. Semantic Memory (
memory/semantic-patterns.json
)

1. 语义内存(
memory/semantic-patterns.json

Stores abstract patterns and rules reusable across contexts:
json
{
  "patterns": {
    "pattern_id": {
      "id": "pat-2025-01-11-001",
      "name": "Pattern Name",
      "source": "user_feedback|implementation_review|retrospective",
      "confidence": 0.95,
      "applications": 5,
      "created": "2025-01-11",
      "category": "prd_structure|react_patterns|async_patterns|...",
      "pattern": "One-line summary",
      "problem": "What problem does this solve?",
      "solution": { ... },
      "quality_rules": [ ... ],
      "target_skills": [ ... ]
    }
  }
}
存储可复用的抽象模式和规则
json
{
  "patterns": {
    "pattern_id": {
      "id": "pat-2025-01-11-001",
      "name": "Pattern Name",
      "source": "user_feedback|implementation_review|retrospective",
      "confidence": 0.95,
      "applications": 5,
      "created": "2025-01-11",
      "category": "prd_structure|react_patterns|async_patterns|...",
      "pattern": "One-line summary",
      "problem": "What problem does this solve?",
      "solution": { ... },
      "quality_rules": [ ... ],
      "target_skills": [ ... ]
    }
  }
}

2. Episodic Memory (
memory/episodic/
)

2. 情景内存(
memory/episodic/

Stores specific experiences and what happened:
memory/episodic/
├── 2025/
│   ├── 2025-01-11-prd-creation.json
│   ├── 2025-01-11-debug-session.json
│   └── 2025-01-12-refactoring.json
json
{
  "id": "ep-2025-01-11-001",
  "timestamp": "2025-01-11T10:30:00Z",
  "skill": "debugger",
  "situation": "User reported data not refreshing after form submission",
  "root_cause": "Empty callback in onRefresh prop",
  "solution": "Implement actual refresh logic in callback",
  "lesson": "Always verify callbacks are not empty functions",
  "related_pattern": "callback_verification",
  "user_feedback": {
    "rating": 8,
    "comments": "This was exactly the issue"
  }
}
存储具体的经验及事件经过
memory/episodic/
├── 2025/
│   ├── 2025-01-11-prd-creation.json
│   ├── 2025-01-11-debug-session.json
│   └── 2025-01-12-refactoring.json
json
{
  "id": "ep-2025-01-11-001",
  "timestamp": "2025-01-11T10:30:00Z",
  "skill": "debugger",
  "situation": "User reported data not refreshing after form submission",
  "root_cause": "Empty callback in onRefresh prop",
  "solution": "Implement actual refresh logic in callback",
  "lesson": "Always verify callbacks are not empty functions",
  "related_pattern": "callback_verification",
  "user_feedback": {
    "rating": 8,
    "comments": "This was exactly the issue"
  }
}

3. Working Memory (
memory/working/
)

3. 工作内存(
memory/working/

Stores current session context:
memory/working/
├── current_session.json   # Active session data
├── last_error.json        # Error context for self-correction
└── session_end.json       # Session end marker
存储当前会话上下文
memory/working/
├── current_session.json   # Active session data
├── last_error.json        # Error context for self-correction
└── session_end.json       # Session end marker

Self-Improvement Process

自我改进流程

Phase 1: Experience Extraction

阶段1:经验提取

After any skill completes, extract:
yaml
What happened:
  skill_used: {which skill}
  task: {what was being done}
  outcome: {success|partial|failure}

Key Insights:
  what_went_well: [what worked]
  what_went_wrong: [what didn't work]
  root_cause: {underlying issue if applicable}

User Feedback:
  rating: {1-10 if provided}
  comments: {specific feedback}
任意技能完成后,提取以下信息:
yaml
What happened:
  skill_used: {which skill}
  task: {what was being done}
  outcome: {success|partial|failure}

Key Insights:
  what_went_well: [what worked]
  what_went_wrong: [what didn't work]
  root_cause: {underlying issue if applicable}

User Feedback:
  rating: {1-10 if provided}
  comments: {specific feedback}

Phase 2: Pattern Abstraction

阶段2:模式抽象

Convert experiences to reusable patterns:
Concrete ExperienceAbstract PatternTarget Skill
"User forgot to save PRD notes""Always persist thinking to files"prd-planner
"Code review missed SQL injection""Add security checklist item"code-reviewer
"Callback was empty, didn't work""Verify callback implementations"debugger
"Net APY position ambiguous""UI specs need exact relative positions"prd-planner
Abstraction Rules:
yaml
If experience_repeats 3+ times:
  pattern_level: critical
  action: Add to skill's "Critical Mistakes" section

If solution_was_effective:
  pattern_level: best_practice
  action: Add to skill's "Best Practices" section

If user_rating >= 7:
  pattern_level: strength
  action: Reinforce this approach

If user_rating <= 4:
  pattern_level: weakness
  action: Add to "What to Avoid" section
将经验转化为可复用模式:
具体经验抽象模式目标技能
"用户忘记保存PRD笔记""始终将思考内容持久化到文件中"prd-planner
"代码评审遗漏SQL注入问题""添加安全检查清单项"code-reviewer
"回调函数为空,无法正常工作""验证回调函数实现"debugger
"Net APY位置不明确""UI规范需要精确的相对位置"prd-planner
抽象规则:
yaml
If experience_repeats 3+ times:
  pattern_level: critical
  action: Add to skill's "Critical Mistakes" section

If solution_was_effective:
  pattern_level: best_practice
  action: Add to skill's "Best Practices" section

If user_rating >= 7:
  pattern_level: strength
  action: Reinforce this approach

If user_rating <= 4:
  pattern_level: weakness
  action: Add to "What to Avoid" section

Phase 3: Skill Updates

阶段3:技能更新

Update the appropriate skill files with evolution markers:
markdown
<!-- Evolution: 2025-01-12 | source: ep-2025-01-12-001 | skill: debugger -->
使用演进标记更新相应的技能文件:
markdown
<!-- Evolution: 2025-01-12 | source: ep-2025-01-12-001 | skill: debugger -->

Pattern Added (2025-01-12)

Pattern Added (2025-01-12)

Pattern: Always verify callbacks are not empty functions
Source: Episode ep-2025-01-12-001
Confidence: 0.95
Pattern: Always verify callbacks are not empty functions
Source: Episode ep-2025-01-12-001
Confidence: 0.95

Updated Checklist

Updated Checklist

  • Verify all callbacks have implementations
  • Test callback execution paths

**Correction Markers** (when fixing wrong guidance):

```markdown
<!-- Correction: 2025-01-12 | was: "Use callback chain" | reason: caused stale refresh -->
  • Verify all callbacks have implementations
  • Test callback execution paths

**修正标记**(修复错误指导时):

```markdown
<!-- Correction: 2025-01-12 | was: "Use callback chain" | reason: caused stale refresh -->

Corrected Guidance

Corrected Guidance

Use direct state monitoring instead of callback chains:
typescript
// ✅ Do: Direct state monitoring
const prevPendingCount = usePrevious(pendingCount);
undefined
Use direct state monitoring instead of callback chains:
typescript
// ✅ Do: Direct state monitoring
const prevPendingCount = usePrevious(pendingCount);
undefined

Phase 4: Memory Consolidation

阶段4:内存整合

  1. Update semantic memory (
    memory/semantic-patterns.json
    )
  2. Store episodic memory (
    memory/episodic/YYYY-MM-DD-{skill}.json
    )
  3. Update pattern confidence based on applications/feedback
  4. Prune outdated patterns (low confidence, no recent applications)
  1. 更新语义内存
    memory/semantic-patterns.json
  2. 存储情景内存
    memory/episodic/YYYY-MM-DD-{skill}.json
  3. 基于应用情况/反馈更新模式置信度
  4. 清理过时模式(低置信度、近期无应用的模式)

Self-Correction (on_error hook)

自我修正(on_error钩子)

Triggered when:
  • Bash command returns non-zero exit code
  • Tests fail after following skill guidance
  • User reports the guidance produced incorrect results
Process:
markdown
undefined
在以下情况触发:
  • Bash命令返回非零退出码
  • 遵循技能指导后测试失败
  • 用户反馈指导产生错误结果
流程:
markdown
undefined

Self-Correction Workflow

Self-Correction Workflow

  1. Detect Error
    • Capture error context from working/last_error.json
    • Identify which skill guidance was followed
  2. Verify Root Cause
    • Was the skill guidance incorrect?
    • Was the guidance misinterpreted?
    • Was the guidance incomplete?
  3. Apply Correction
    • Update skill file with corrected guidance
    • Add correction marker with reason
    • Update related patterns in semantic memory
  4. Validate Fix
    • Test the corrected guidance
    • Ask user to verify

**Example:**

```markdown
<!-- Correction: 2025-01-12 | was: "useMemo for claimable ids" | reason: stale data at click time -->
  1. Detect Error
    • Capture error context from working/last_error.json
    • Identify which skill guidance was followed
  2. Verify Root Cause
    • Was the skill guidance incorrect?
    • Was the guidance misinterpreted?
    • Was the guidance incomplete?
  3. Apply Correction
    • Update skill file with corrected guidance
    • Add correction marker with reason
    • Update related patterns in semantic memory
  4. Validate Fix
    • Test the corrected guidance
    • Ask user to verify

**示例:**

```markdown
<!-- Correction: 2025-01-12 | was: "useMemo for claimable ids" | reason: stale data at click time -->

Self-Correction: Click-Time Computation

Self-Correction: Click-Time Computation

Issue: Using useMemo for claimable IDs caused stale data Fix: Compute at click time for always-fresh data Pattern: click_time_vs_open_time_computation
undefined
Issue: Using useMemo for claimable IDs caused stale data Fix: Compute at click time for always-fresh data Pattern: click_time_vs_open_time_computation
undefined

Self-Validation

自我验证

Use the validation template in
references/appendix.md
when reviewing updates.
审查更新时,请使用
references/appendix.md
中的验证模板。

Hooks Integration

钩子集成

Wiring Hooks in Claude Code Settings

在Claude Code设置中配置钩子

Add to Claude Code settings (
~/.claude/settings.json
):
json
{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash|Write|Edit",
        "hooks": [
          {
            "type": "command",
            "command": "bash ${SKILLS_DIR}/self-improving-agent/hooks/pre-tool.sh \"$TOOL_NAME\" \"$TOOL_INPUT\""
          }
        ]
      }
    ],
    "PostToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          {
            "type": "command",
            "command": "bash ${SKILLS_DIR}/self-improving-agent/hooks/post-bash.sh \"$TOOL_OUTPUT\" \"$EXIT_CODE\""
          }
        ]
      }
    ],
    "Stop": [
      {
        "matcher": "",
        "hooks": [
          {
            "type": "command",
            "command": "bash ${SKILLS_DIR}/self-improving-agent/hooks/session-end.sh"
          }
        ]
      }
    ]
  }
}
Replace
${SKILLS_DIR}
with your actual skills path.
添加到Claude Code设置(
~/.claude/settings.json
):
json
{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash|Write|Edit",
        "hooks": [
          {
            "type": "command",
            "command": "bash ${SKILLS_DIR}/self-improving-agent/hooks/pre-tool.sh \"$TOOL_NAME\" \"$TOOL_INPUT\""
          }
        ]
      }
    ],
    "PostToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          {
            "type": "command",
            "command": "bash ${SKILLS_DIR}/self-improving-agent/hooks/post-bash.sh \"$TOOL_OUTPUT\" \"$EXIT_CODE\""
          }
        ]
      }
    ],
    "Stop": [
      {
        "matcher": "",
        "hooks": [
          {
            "type": "command",
            "command": "bash ${SKILLS_DIR}/self-improving-agent/hooks/session-end.sh"
          }
        ]
      }
    ]
  }
}
${SKILLS_DIR}
替换为实际的技能路径。

Additional References

额外参考

See
references/appendix.md
for memory structure, workflow diagrams, metrics, feedback templates, and research links.
请查看
references/appendix.md
获取内存结构、工作流图表、指标、反馈模板和研究链接。

Best Practices

最佳实践

DO

建议

  • ✅ Learn from EVERY skill interaction
  • ✅ Extract patterns at the right abstraction level
  • ✅ Update multiple related skills
  • ✅ Track confidence and apply counts
  • ✅ Ask for user feedback on improvements
  • ✅ Use evolution/correction markers for traceability
  • ✅ Validate guidance before applying broadly
  • ✅ 从每次技能交互中学习
  • ✅ 在合适的抽象级别提取模式
  • ✅ 更新多个相关技能
  • ✅ 跟踪置信度和应用次数
  • ✅ 询问用户对改进的反馈
  • ✅ 使用演进/修正标记确保可追溯性
  • ✅ 广泛应用前验证指导内容

DON'T

避免

  • ❌ Over-generalize from single experiences
  • ❌ Update skills without confidence tracking
  • ❌ Ignore negative feedback
  • ❌ Make changes that break existing functionality
  • ❌ Create contradictory patterns
  • ❌ Update skills without understanding context
  • ❌ 从单一经验过度泛化
  • ❌ 不跟踪置信度就更新技能
  • ❌ 忽略负面反馈
  • ❌ 做出破坏现有功能的变更
  • ❌ 创建矛盾的模式
  • ❌ 在不理解上下文的情况下更新技能

Quick Start

快速开始

After any skill completes, this agent automatically:
  1. Analyzes what happened
  2. Extracts patterns and insights
  3. Updates relevant skill files
  4. Logs to memory for future reference
  5. Reports summary to user
任意技能完成后,本Agent会自动执行以下操作:
  1. 分析事件经过
  2. 提取模式和洞见
  3. 更新相关技能文件
  4. 记录到内存供未来参考
  5. 向用户报告总结

References

参考文献