self-improving-agent
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSelf-Improving Agent - Autonomous Learning Patterns
自改进Agent - 自主学习模式
Tier: POWERFUL
Category: Engineering
Tags: self-improvement, AI agents, feedback loops, auto-memory, meta-learning, performance tracking
等级: POWERFUL
分类: 工程
标签: self-improvement, AI agents, feedback loops, auto-memory, meta-learning, performance tracking
Overview
概述
Self-Improving Agent provides architectural patterns for AI agents that get better with use. Most agents are stateless -- they make the same mistakes repeatedly because they lack mechanisms to learn from their own execution. This skill addresses that gap with concrete patterns for feedback capture, memory curation, skill extraction, and regression detection.
The key insight: auto-memory captures everything, but curation is what turns noise into knowledge.
自改进Agent为可随使用不断优化的AI Agent提供架构模式。大多数Agent是无状态的——它们会反复犯同样的错误,因为缺乏从自身执行过程中学习的机制。本技能通过提供反馈采集、记忆治理、技能提取和回归检测的具体模式,填补了这一空白。
核心观点:自动记忆会捕获所有内容,但治理才是将噪音转化为知识的关键。
Core Architecture
核心架构
The Improvement Loop
改进循环
┌──────────────────────────────────────────────────────────┐
│ SELF-IMPROVEMENT CYCLE │
│ │
│ ┌─────────┐ ┌──────────┐ ┌─────────────┐ │
│ │ Execute │───▶│ Evaluate │───▶│ Extract │ │
│ │ Task │ │ Outcome │ │ Learnings │ │
│ └─────────┘ └──────────┘ └─────────────┘ │
│ ▲ │ │
│ │ ▼ │
│ ┌─────────┐ ┌──────────┐ ┌─────────────┐ │
│ │ Apply │◀───│ Promote │◀───│ Validate │ │
│ │ Rules │ │ to Rules │ │ Learnings │ │
│ └─────────┘ └──────────┘ └─────────────┘ │
│ │
└──────────────────────────────────────────────────────────┘┌──────────────────────────────────────────────────────────┐
│ SELF-IMPROVEMENT CYCLE │
│ │
│ ┌─────────┐ ┌──────────┐ ┌─────────────┐ │
│ │ Execute │───▶│ Evaluate │───▶│ Extract │ │
│ │ Task │ │ Outcome │ │ Learnings │ │
│ └─────────┘ └──────────┘ └─────────────┘ │
│ ▲ │ │
│ │ ▼ │
│ ┌─────────┐ ┌──────────┐ ┌─────────────┐ │
│ │ Apply │◀───│ Promote │◀───│ Validate │ │
│ │ Rules │ │ to Rules │ │ Learnings │ │
│ └─────────┘ └──────────┘ └─────────────┘ │
│ │
└──────────────────────────────────────────────────────────┘Improvement Maturity Levels
改进成熟度等级
| Level | Name | Mechanism | Example |
|---|---|---|---|
| 0 | Stateless | No memory between sessions | Default agent behavior |
| 1 | Recording | Captures observations, no action | Auto-memory logging |
| 2 | Curating | Organizes and deduplicates observations | Memory review + cleanup |
| 3 | Promoting | Graduates patterns to enforced rules | MEMORY.md entries become CLAUDE.md rules |
| 4 | Extracting | Creates reusable skills from proven patterns | Recurring solutions become skill packages |
| 5 | Meta-Learning | Adapts learning strategy itself | Adjusts what to capture based on what proved useful |
Most agents operate at Level 0-1. This skill provides the machinery for Levels 2-5.
| 等级 | 名称 | 机制 | 示例 |
|---|---|---|---|
| 0 | 无状态 | 会话间无记忆 | Agent默认行为 |
| 1 | 记录 | 捕获观测数据,不做处理 | 自动记忆日志 |
| 2 | 治理 | 对观测数据进行整理和去重 | 记忆 review + 清理 |
| 3 | 升级 | 将验证过的模式升级为强制规则 | MEMORY.md 条目变为 CLAUDE.md 规则 |
| 4 | 提取 | 从已验证的模式中创建可复用技能 | 重复出现的解决方案变为技能包 |
| 5 | 元学习 | 自适应调整学习策略本身 | 根据已证明有用的内容调整捕获范围 |
大多数Agent运行在0-1级,本技能提供了实现2-5级的机制。
Core Capabilities
核心能力
1. Memory Curation System
1. 记忆治理系统
The Memory Stack
记忆栈
┌─────────────────────────────────────────────────┐
│ CLAUDE.md / .claude/rules/ │
│ Highest authority. Enforced every session. │
│ Capacity: Unlimited. Load: Full file. │
├─────────────────────────────────────────────────┤
│ MEMORY.md (auto-memory) │
│ Project learnings. Auto-captured by Claude. │
│ Capacity: First 200 lines loaded. Overflow to │
│ topic files. │
├─────────────────────────────────────────────────┤
│ Session Context │
│ Current conversation. Ephemeral. │
│ Capacity: Context window. │
└─────────────────────────────────────────────────┘┌─────────────────────────────────────────────────┐
│ CLAUDE.md / .claude/rules/ │
│ Highest authority. Enforced every session. │
│ Capacity: Unlimited. Load: Full file. │
├─────────────────────────────────────────────────┤
│ MEMORY.md (auto-memory) │
│ Project learnings. Auto-captured by Claude. │
│ Capacity: First 200 lines loaded. Overflow to │
│ topic files. │
├─────────────────────────────────────────────────┤
│ Session Context │
│ Current conversation. Ephemeral. │
│ Capacity: Context window. │
└─────────────────────────────────────────────────┘Memory Review Protocol
记忆审查协议
Run periodically (weekly or after every 10 sessions):
Step 1: Read MEMORY.md and all topic files
Step 2: Classify each entry
Categories:
- PROMOTE: Pattern proven 3+ times, should be a rule
- CONSOLIDATE: Multiple entries saying the same thing
- STALE: References deleted files, old patterns, resolved issues
- KEEP: Still relevant, not yet proven enough to promote
- EXTRACT: Recurring solution that should be a reusable skill
Step 3: Execute actions
- PROMOTE entries → move to CLAUDE.md or .claude/rules/
- CONSOLIDATE entries → merge into single clear entry
- STALE entries → delete
- EXTRACT entries → create skill package (see Skill Extraction)
Step 4: Verify MEMORY.md is under 200 lines
- If over 200: move topic-specific entries to topic files
- Topic files: ~/.claude/projects/<path>/memory/<topic>.md定期运行(每周或每10次会话后运行):
Step 1: Read MEMORY.md and all topic files
Step 2: Classify each entry
Categories:
- PROMOTE: Pattern proven 3+ times, should be a rule
- CONSOLIDATE: Multiple entries saying the same thing
- STALE: References deleted files, old patterns, resolved issues
- KEEP: Still relevant, not yet proven enough to promote
- EXTRACT: Recurring solution that should be a reusable skill
Step 3: Execute actions
- PROMOTE entries → move to CLAUDE.md or .claude/rules/
- CONSOLIDATE entries → merge into single clear entry
- STALE entries → delete
- EXTRACT entries → create skill package (see Skill Extraction)
Step 4: Verify MEMORY.md is under 200 lines
- If over 200: move topic-specific entries to topic files
- Topic files: ~/.claude/projects/<path>/memory/<topic>.mdPromotion Criteria
升级标准
An entry is ready for promotion when:
| Criterion | Threshold | Why |
|---|---|---|
| Recurrence | Seen in 3+ sessions | Not a one-off |
| Consistency | Same solution every time | Not context-dependent |
| Impact | Prevented errors or saved significant time | Worth enforcing |
| Stability | Underlying code/system unchanged | Won't immediately become stale |
| Clarity | Can be stated in 1-2 sentences | Rules must be unambiguous |
条目满足以下条件即可准备升级:
| 标准 | 阈值 | 原因 |
|---|---|---|
| 复现次数 | 在3次以上会话中出现 | 不是偶发现象 |
| 一致性 | 每次都采用相同解决方案 | 不依赖特定上下文 |
| 影响度 | 可预防错误或节省大量时间 | 值得强制执行 |
| 稳定性 | 底层代码/系统未发生变更 | 不会很快失效 |
| 清晰度 | 可在1-2句话内表述清楚 | 规则必须无歧义 |
Promotion Targets
升级目标位置
| Pattern Type | Promote To | Example |
|---|---|---|
| Coding convention | | "Always use |
| Project architecture | | "All API routes go through middleware chain" |
| Tool preference | | "Use pnpm, not npm" |
| Debugging pattern | | "When tests fail, check env vars first" |
| File-scoped rule | | "In migrations/, always add down migration" |
| 模式类型 | 升级到 | 示例 |
|---|---|---|
| 编码规范 | | "Always use |
| 项目架构 | | "所有API路由都要经过中间件链" |
| 工具偏好 | | "使用pnpm,不使用npm" |
| 调试模式 | | "测试失败时先检查环境变量" |
| 文件范围规则 | 带 | "在migrations/目录下必须添加回滚迁移" |
2. Feedback Loop Design
2. 反馈循环设计
Outcome Classification
结果分类
Every agent task produces an outcome. Classify it:
SUCCESS - Task completed, user accepted result
PARTIAL - Task completed but required corrections
FAILURE - Task failed, user had to redo
REJECTION - User explicitly rejected approach
TIMEOUT - Task exceeded time/token budget
ERROR - Technical error (tool failure, API error)每个Agent任务都会产生一个结果,对其进行分类:
SUCCESS - Task completed, user accepted result
PARTIAL - Task completed but required corrections
FAILURE - Task failed, user had to redo
REJECTION - User explicitly rejected approach
TIMEOUT - Task exceeded time/token budget
ERROR - Technical error (tool failure, API error)Signal Extraction from Outcomes
从结果中提取信号
| Outcome | Signal | Memory Action |
|---|---|---|
| SUCCESS (first try) | Approach works well | Reinforce (increment confidence) |
| SUCCESS (after correction) | Initial approach had gap | Log the correction pattern |
| PARTIAL (user edited result) | Output format or content gap | Log what user changed |
| FAILURE | Approach fundamentally wrong | Log anti-pattern with context |
| REJECTION | Misunderstood requirements | Log clarification pattern |
| Repeated ERROR | Tool or environment issue | Log workaround or fix |
| 结果 | 信号 | 记忆操作 |
|---|---|---|
| 首次尝试成功 | 方案效果好 | 强化(提升置信度) |
| 修正后成功 | 初始方案存在缺陷 | 记录修正模式 |
| 部分完成(用户编辑了结果) | 输出格式或内容存在缺口 | 记录用户修改的内容 |
| 失败 | 方案根本上不可行 | 记录反模式及上下文 |
| 被拒绝 | 误解了需求 | 记录澄清模式 |
| 重复错误 | 工具或环境问题 | 记录 workaround 或修复方案 |
Feedback Capture Template
反馈采集模板
markdown
undefinedmarkdown
undefinedLearning: [Short description]
Learning: [Short description]
Context: [What task was being performed]
What happened: [Outcome description]
Root cause: [Why the outcome occurred]
Correct approach: [What should have been done]
Confidence: [High/Medium/Low]
Recurrence: [First time / Seen N times]
Action: [KEEP / PROMOTE / EXTRACT]
undefinedContext: [What task was being performed]
What happened: [Outcome description]
Root cause: [Why the outcome occurred]
Correct approach: [What should have been done]
Confidence: [High/Medium/Low]
Recurrence: [First time / Seen N times]
Action: [KEEP / PROMOTE / EXTRACT]
undefined3. Performance Regression Detection
3. 性能回归检测
Metrics to Track
需要跟踪的指标
| Metric | Measurement | Regression Signal |
|---|---|---|
| First-attempt success rate | Tasks accepted without correction | Dropping below 70% |
| Correction count per task | User edits after agent output | Rising above 2 per task |
| Tool error rate | Failed tool calls / total calls | Rising above 5% |
| Context relevance | Retrieved context actually used | Dropping below 60% |
| Task completion time | Turns to complete task | Rising trend over 5 sessions |
| 指标 | 测量方式 | 回归信号 |
|---|---|---|
| 首次尝试成功率 | 无需修正即可被接受的任务占比 | 下降到70%以下 |
| 单任务修正次数 | Agent输出后用户的编辑次数 | 上升到单任务2次以上 |
| 工具错误率 | 失败的工具调用/总调用次数 | 上升到5%以上 |
| 上下文相关性 | 检索到的上下文实际被使用的占比 | 下降到60%以下 |
| 任务完成时间 | 完成任务需要的轮次 | 连续5次会话呈上升趋势 |
Regression Response Protocol
回归响应协议
1. DETECT: Metric crosses threshold
2. DIAGNOSE: Compare recent sessions vs baseline
- What changed? (New code? New patterns? New tools?)
- Which task types are affected?
- Is it a memory issue or a capability issue?
3. RESPOND:
- Memory issue → Review and curate MEMORY.md
- Stale rules → Update CLAUDE.md
- New code patterns → Add rules for new patterns
- Capability gap → Extract as skill request
4. VERIFY: Track metric for next 3 sessions1. DETECT: Metric crosses threshold
2. DIAGNOSE: Compare recent sessions vs baseline
- What changed? (New code? New patterns? New tools?)
- Which task types are affected?
- Is it a memory issue or a capability issue?
3. RESPOND:
- Memory issue → Review and curate MEMORY.md
- Stale rules → Update CLAUDE.md
- New code patterns → Add rules for new patterns
- Capability gap → Extract as skill request
4. VERIFY: Track metric for next 3 sessions4. Skill Extraction
4. 技能提取
When a solution pattern is proven and reusable, extract it into a standalone skill.
当某个解决方案模式经过验证且可复用时,将其提取为独立技能。
Extraction Criteria
提取标准
A pattern is ready for extraction when:
- Used successfully 5+ times across different contexts
- Solution is generalizable (not project-specific)
- Takes more than trivial effort to recreate from scratch
- Would benefit other projects/usersA pattern is ready for extraction when:
- Used successfully 5+ times across different contexts
- Solution is generalizable (not project-specific)
- Takes more than trivial effort to recreate from scratch
- Would benefit other projects/usersExtraction Process
提取流程
Step 1: Document the pattern
- What problem does it solve?
- What's the step-by-step approach?
- What are the inputs and outputs?
- What are the edge cases?
Step 2: Generalize
- Remove project-specific details
- Identify configurable parameters
- Add handling for common variations
Step 3: Package as skill
- Create SKILL.md with frontmatter
- Add references/ for knowledge bases
- Add scripts/ if automatable
- Add assets/ for templates
Step 4: Validate
- Test on a different project
- Have another person/agent use it
- Iterate on unclear instructionsStep 1: Document the pattern
- What problem does it solve?
- What's the step-by-step approach?
- What are the inputs and outputs?
- What are the edge cases?
Step 2: Generalize
- Remove project-specific details
- Identify configurable parameters
- Add handling for common variations
Step 3: Package as skill
- Create SKILL.md with frontmatter
- Add references/ for knowledge bases
- Add scripts/ if automatable
- Add assets/ for templates
Step 4: Validate
- Test on a different project
- Have another person/agent use it
- Iterate on unclear instructions5. Meta-Learning Patterns
5. 元学习模式
Adaptive Capture Strategy
自适应采集策略
Not all observations are equally valuable. Adjust what gets captured based on what proved useful:
Initial strategy: Capture everything
After 10 sessions: Analyze which captured items led to promotions
After 20 sessions: Adjust capture to focus on high-value categories
High-value categories (typically):
- Error resolutions (80% promotion rate)
- User corrections (70% promotion rate)
- Tool preferences (60% promotion rate)
Low-value categories (typically):
- File structure observations (10% promotion rate)
- One-off workarounds (5% promotion rate)不是所有观测数据都有同等价值,根据已验证有用的内容调整采集范围:
Initial strategy: Capture everything
After 10 sessions: Analyze which captured items led to promotions
After 20 sessions: Adjust capture to focus on high-value categories
High-value categories (typically):
- Error resolutions (80% promotion rate)
- User corrections (70% promotion rate)
- Tool preferences (60% promotion rate)
Low-value categories (typically):
- File structure observations (10% promotion rate)
- One-off workarounds (5% promotion rate)Anti-Pattern Detection
反模式检测
Beyond capturing what works, actively detect what fails:
| Anti-Pattern | Detection Signal | Response |
|---|---|---|
| Repeated wrong import path | Same correction 3+ times | Add to CLAUDE.md as rule |
| Wrong test framework used | User always changes test approach | Add testing rules |
| Incorrect API usage | Same API error pattern | Add API usage notes |
| Style guide violations | User reformats same patterns | Add style rules |
| Wrong branch workflow | User corrects git operations | Add git workflow rules |
除了捕获有效内容,还要主动检测失败模式:
| 反模式 | 检测信号 | 响应 |
|---|---|---|
| 重复出现错误的导入路径 | 相同修正出现3次以上 | 作为规则添加到CLAUDE.md |
| 使用错误的测试框架 | 用户总是修改测试方案 | 添加测试规则 |
| 不正确的API使用 | 相同的API错误模式 | 添加API使用说明 |
| 违反风格指南 | 用户反复格式化相同模式 | 添加风格规则 |
| 错误的分支工作流 | 用户修正git操作 | 添加git工作流规则 |
6. Continuous Calibration
6. 持续校准
Confidence Scoring
置信度评分
Every piece of learned knowledge carries a confidence score:
Confidence = base_score * recency_factor * consistency_factor
base_score:
- User explicitly stated: 1.0
- Observed from successful outcome: 0.8
- Inferred from pattern: 0.6
- Guessed from context: 0.3
recency_factor:
- Last 7 days: 1.0
- 7-30 days: 0.9
- 30-90 days: 0.7
- 90+ days: 0.5
consistency_factor:
- Never contradicted: 1.0
- Contradicted once, reaffirmed: 0.9
- Contradicted, not reaffirmed: 0.5
- Actively contradicted: 0.0 (delete)每条学习到的知识都带有置信度评分:
Confidence = base_score * recency_factor * consistency_factor
base_score:
- User explicitly stated: 1.0
- Observed from successful outcome: 0.8
- Inferred from pattern: 0.6
- Guessed from context: 0.3
recency_factor:
- Last 7 days: 1.0
- 7-30 days: 0.9
- 30-90 days: 0.7
- 90+ days: 0.5
consistency_factor:
- Never contradicted: 1.0
- Contradicted once, reaffirmed: 0.9
- Contradicted, not reaffirmed: 0.5
- Actively contradicted: 0.0 (delete)Belief Revision
信念修正
When new information contradicts existing knowledge:
1. Compare confidence scores
2. If new info higher confidence → update knowledge
3. If roughly equal → flag for user confirmation
4. If new info lower confidence → keep existing, note conflict
5. Always log the conflict for review当新信息与现有知识冲突时:
1. Compare confidence scores
2. If new info higher confidence → update knowledge
3. If roughly equal → flag for user confirmation
4. If new info lower confidence → keep existing, note conflict
5. Always log the conflict for reviewWorkflows
工作流
Workflow 1: Weekly Memory Health Check
工作流1:每周记忆健康检查
1. Read all memory files (MEMORY.md + topic files)
2. Count total entries and lines
3. For each entry, classify: PROMOTE / CONSOLIDATE / STALE / KEEP / EXTRACT
4. Execute promotions (with user confirmation)
5. Execute consolidations
6. Delete stale entries
7. Verify under 200-line limit
8. Report: entries promoted, consolidated, deleted, remaining1. Read all memory files (MEMORY.md + topic files)
2. Count total entries and lines
3. For each entry, classify: PROMOTE / CONSOLIDATE / STALE / KEEP / EXTRACT
4. Execute promotions (with user confirmation)
5. Execute consolidations
6. Delete stale entries
7. Verify under 200-line limit
8. Report: entries promoted, consolidated, deleted, remainingWorkflow 2: Post-Session Learning Capture
工作流2:会话后学习采集
1. Review session outcomes (successes, corrections, failures)
2. For each correction: log what was wrong and what was right
3. For each failure: log root cause and correct approach
4. Check existing memory for related entries
5. If related entry exists: increment recurrence count
6. If new: add entry with context
7. If recurrence threshold met: flag for promotion1. Review session outcomes (successes, corrections, failures)
2. For each correction: log what was wrong and what was right
3. For each failure: log root cause and correct approach
4. Check existing memory for related entries
5. If related entry exists: increment recurrence count
6. If new: add entry with context
7. If recurrence threshold met: flag for promotionWorkflow 3: Regression Investigation
工作流3:回归问题排查
1. Identify the degraded metric
2. Pull last 5 sessions' outcomes for that task type
3. Compare against baseline (first 5 sessions)
4. Identify what changed: memory, code, rules, environment
5. Propose fix: update rule, add rule, retrain pattern
6. Apply fix
7. Monitor next 3 sessions1. Identify the degraded metric
2. Pull last 5 sessions' outcomes for that task type
3. Compare against baseline (first 5 sessions)
4. Identify what changed: memory, code, rules, environment
5. Propose fix: update rule, add rule, retrain pattern
6. Apply fix
7. Monitor next 3 sessionsCommon Pitfalls
常见陷阱
| Pitfall | Why It Happens | Fix |
|---|---|---|
| Memory bloat | Auto-capture without curation | Weekly review, enforce 200-line limit |
| Stale rules | Code changes, rules don't update | Timestamp rules, periodic re-verification |
| Over-promotion | Promoting one-off patterns as rules | Require 3+ recurrences before promotion |
| Silent regression | No metrics tracking | Implement outcome classification |
| Cargo cult rules | Copying rules without understanding | Each rule must have a "why" annotation |
| Contradiction spirals | New rules conflict with old rules | Belief revision protocol |
| 陷阱 | 发生原因 | 修复方案 |
|---|---|---|
| 内存膨胀 | 只自动采集不治理 | 每周review,强制执行200行限制 |
| 规则失效 | 代码变更但规则未更新 | 给规则加时间戳,定期重新验证 |
| 过度升级 | 将偶发模式升级为规则 | 升级前要求至少复现3次 |
| 隐性回归 | 无指标跟踪 | 落地结果分类机制 |
| 盲目照搬规则 | 复制规则但不理解背后逻辑 | 每条规则必须附带"为什么"注释 |
| 冲突循环 | 新规则与旧规则冲突 | 执行信念修正协议 |
Integration Points
集成点
| Skill | Integration |
|---|---|
| context-engine | Context Engine manages what the agent sees; Self-Improving Agent manages what the agent remembers |
| agent-designer | Agent Designer defines agent architecture; Self-Improving Agent adds the learning layer |
| prompt-engineer-toolkit | Prompts that degrade over time are a regression; track and test them |
| observability-designer | Monitor agent performance metrics alongside system metrics |
| 技能 | 集成方式 |
|---|---|
| context-engine | 上下文引擎管理Agent可见的内容,自改进Agent管理Agent记忆的内容 |
| agent-designer | Agent设计器定义Agent架构,自改进Agent添加学习层 |
| prompt-engineer-toolkit | 随时间退化的Prompt属于回归,对其进行跟踪和测试 |
| observability-designer | 将Agent性能指标与系统指标放在一起监控 |
References
参考文献
- - Detailed feedback capture and analysis patterns
references/feedback-loop-patterns.md - - Step-by-step memory review and promotion procedures
references/memory-curation-guide.md - - Advanced patterns for agents that learn how to learn
references/meta-learning-architectures.md
- - 详细的反馈采集和分析模式
references/feedback-loop-patterns.md - - 分步记忆审查和升级流程
references/memory-curation-guide.md - - 可自主学习学习方法的Agent高级模式
references/meta-learning-architectures.md