using-dynamic-architectures
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDynamic Architectures Meta-Skill
动态架构元技能
When to Use This Skill
何时使用该技能
Invoke this meta-skill when you encounter:
- Growing Networks: Adding capacity during training (new layers, neurons, modules)
- Pruning Networks: Removing capacity that isn't contributing
- Continual Learning: Training on new tasks without forgetting old ones
- Gradient Isolation: Training new modules without destabilizing existing weights
- Modular Composition: Building networks from graftable, composable components
- Lifecycle Management: State machines controlling when to grow, train, integrate, prune
- Progressive Training: Staged capability expansion with warmup and cooldown
This is the entry point for dynamic/morphogenetic neural network patterns. It routes to 7 specialized reference sheets.
当你遇到以下场景时,可调用该元技能:
- 网络生长:在训练过程中增加模型容量(新增层、神经元、模块)
- 网络剪枝:移除无贡献的模型容量
- 持续学习:在学习新任务时不遗忘旧任务知识
- 梯度隔离:训练新模块时不破坏现有权重的稳定性
- 模块化组合:使用可移植、可组合的组件构建网络
- 生命周期管理:通过状态机控制网络的生长、训练、集成、剪枝时机
- 渐进式训练:分阶段扩展模型能力,包含预热和冷却环节
这是动态/形态发生神经网络模式的入口点,可引导至7份专业参考文档。
How to Access Reference Sheets
如何访问参考文档
IMPORTANT: All reference sheets are located in the SAME DIRECTORY as this SKILL.md file.
When this skill is loaded from:
skills/using-dynamic-architectures/SKILL.mdReference sheets like are at:
continual-learning-foundations.mdskills/using-dynamic-architectures/continual-learning-foundations.mdNOT at:
(WRONG PATH)
skills/continual-learning-foundations.md重要提示:所有参考文档都位于与本SKILL.md文件同一目录下。
若本技能从以下路径加载:
skills/using-dynamic-architectures/SKILL.md则类似的参考文档路径为:
continual-learning-foundations.mdskills/using-dynamic-architectures/continual-learning-foundations.md而非:
(错误路径)
skills/continual-learning-foundations.mdCore Principle
核心原则
Dynamic architectures grow capability, not just tune weights.
Static networks are a guess about capacity. Dynamic networks let training signal drive structure. The challenge is growing without forgetting, integrating without destabilizing, and knowing when to act.
Key tensions:
- Stability vs. Plasticity: Preserve existing knowledge while adding new capacity
- Isolation vs. Integration: Train new modules separately, then merge carefully
- Exploration vs. Exploitation: When to add capacity vs. when to stabilize
动态架构是扩展能力,而非仅调优权重。
静态网络是对模型容量的预判,而动态网络让训练信号驱动结构演变。其挑战在于扩展时不遗忘知识、集成时不破坏稳定性,以及准确把握调整时机。
关键矛盾点:
- 稳定性vs可塑性:保留已有知识的同时新增模型容量
- 隔离性vs集成性:先独立训练新模块,再谨慎合并
- 探索vs利用:判断何时新增容量、何时稳定模型
The 7 Dynamic Architecture Skills
7项动态架构技能
- continual-learning-foundations - EWC, PackNet, rehearsal strategies, catastrophic forgetting theory
- gradient-isolation-techniques - Freezing, gradient masking, stop_grad patterns, alpha blending
- peft-adapter-techniques - LoRA, QLoRA, DoRA, adapter placement, merging strategies
- dynamic-architecture-patterns - Grow/prune patterns, slot-based expansion, capacity scheduling
- modular-neural-composition - MoE, gating, grafting semantics, interface contracts
- ml-lifecycle-orchestration - State machines, quality gates, transition triggers, controllers
- progressive-training-strategies - Staged expansion, warmup/cooldown, knowledge transfer
- continual-learning-foundations - EWC、PackNet、重放策略、灾难性遗忘理论
- gradient-isolation-techniques - 冻结策略、梯度掩码、stop_grad模式、Alpha混合
- peft-adapter-techniques - LoRA、QLoRA、DoRA、适配器部署、合并策略
- dynamic-architecture-patterns - 生长/剪枝模式、基于插槽的扩展、容量调度
- modular-neural-composition - MoE、门控机制、移植语义、接口契约
- ml-lifecycle-orchestration - 状态机、质量门控、触发条件、控制器
- progressive-training-strategies - 分阶段扩展、预热/冷却、知识迁移
Routing Decision Framework
路由决策框架
Step 1: Identify the Core Problem
步骤1:识别核心问题
Diagnostic Questions:
- "Are you trying to prevent forgetting when training on new data/tasks?"
- "Are you trying to add new capacity to an existing trained network?"
- "Are you designing how multiple modules combine?"
- "Are you deciding WHEN to grow, prune, or integrate?"
Quick Routing:
| Problem | Primary Skill |
|---|---|
| "Model forgets old tasks when I train new ones" | continual-learning-foundations |
| "New module destabilizes existing weights" | gradient-isolation-techniques |
| "Fine-tune LLM efficiently without full training" | peft-adapter-techniques |
| "When should I add more capacity?" | dynamic-architecture-patterns |
| "How do module outputs combine?" | modular-neural-composition |
| "How do I manage the grow/train/integrate cycle?" | ml-lifecycle-orchestration |
| "How do I warm up new modules safely?" | progressive-training-strategies |
诊断问题:
- "你是否在尝试训练新数据/任务时避免遗忘?"
- "你是否想为已训练完成的网络新增容量?"
- "你是否在设计多模块的组合方式?"
- "你是否在判断何时进行生长、剪枝或集成操作?"
快速路由:
| 问题 | 对应核心技能 |
|---|---|
| "训练新任务时模型遗忘旧任务知识" | continual-learning-foundations |
| "新模块导致现有权重不稳定" | gradient-isolation-techniques |
| "高效微调大语言模型,无需全量训练" | peft-adapter-techniques |
| "我应该何时新增模型容量?" | dynamic-architecture-patterns |
| "模块输出如何组合?" | modular-neural-composition |
| "如何管理生长/训练/集成的循环流程?" | ml-lifecycle-orchestration |
| "如何安全地预热新模块?" | progressive-training-strategies |
Step 2: Catastrophic Forgetting (Continual Learning)
步骤2:灾难性遗忘(持续学习)
Symptoms:
- Performance on old tasks drops when training on new tasks
- Model "forgets" previous capabilities
- Fine-tuning overwrites learned features
Route to: continual-learning-foundations.md
Covers:
- Why SGD causes forgetting (loss landscape geometry)
- EWC, SI, MAS (regularization approaches)
- Progressive Neural Networks, PackNet (architectural approaches)
- Experience replay, generative replay (rehearsal approaches)
- Measuring forgetting (backward/forward transfer)
When to Use:
- Training sequentially on multiple tasks
- Fine-tuning without forgetting base capabilities
- Designing systems that accumulate knowledge over time
症状:
- 训练新任务时,旧任务的性能下降
- 模型“遗忘”之前具备的能力
- 微调覆盖已学习的特征
路由至: continual-learning-foundations.md
涵盖内容:
- 为什么SGD会导致遗忘(损失 landscape 几何特性)
- EWC、SI、MAS(正则化方法)
- 渐进式神经网络、PackNet(架构层面方法)
- 经验重放、生成式重放(重放类方法)
- 遗忘程度的衡量(反向/正向迁移)
适用场景:
- 按顺序训练多个任务
- 微调时不遗忘基础能力
- 设计可随时间积累知识的系统
Step 3: Gradient Isolation
步骤3:梯度隔离
Symptoms:
- New module training affects host network stability
- Want to train on host errors without backprop flowing to host
- Need gradual integration of new capacity
Route to: gradient-isolation-techniques.md
Covers:
- Freezing strategies (full, partial, scheduled)
- vs
detach()semanticsno_grad() - Dual-path training (residual learning on errors)
- Alpha blending for gradual integration
- Hook-based gradient surgery
When to Use:
- Training "seed" modules that learn from host errors
- Preventing catastrophic interference during growth
- Implementing safe module grafting
症状:
- 训练新模块影响宿主网络的稳定性
- 希望基于宿主误差训练,但不向宿主传播反向梯度
- 需要逐步集成新的模型容量
路由至: gradient-isolation-techniques.md
涵盖内容:
- 冻结策略(全量、部分、调度式)
- 与
detach()的语义差异no_grad() - 双路径训练(基于误差的残差学习)
- 用于逐步集成的Alpha混合
- 基于钩子的梯度手术
适用场景:
- 训练从宿主误差中学习的“种子”模块
- 避免模型生长时的灾难性干扰
- 实现安全的模块移植
Step 4: PEFT Adapters (LoRA, QLoRA)
步骤4:PEFT适配器(LoRA、QLoRA)
Symptoms:
- Want to fine-tune large pretrained models efficiently
- Memory constraints prevent full fine-tuning
- Need task-specific adaptation without modifying base weights
Route to: peft-adapter-techniques.md
Covers:
- LoRA (low-rank adaptation) fundamentals
- QLoRA (quantized base + LoRA adapters)
- DoRA (weight-decomposed adaptation)
- Adapter placement strategies
- Merging adapters into base model
- Multiple adapter management
When to Use:
- Fine-tuning LLMs on limited compute
- Creating task-specific model variants
- Memory-efficient adaptation of large models
症状:
- 希望高效微调大尺寸预训练模型
- 内存限制导致无法进行全量微调
- 需要针对特定任务适配模型,但不修改基础权重
路由至: peft-adapter-techniques.md
涵盖内容:
- LoRA(低秩适配)基础原理
- QLoRA(量化基础模型+LoRA适配器)
- DoRA(权重分解适配)
- 适配器部署策略
- 将适配器合并至基础模型
- 多适配器管理
适用场景:
- 在有限计算资源下微调大语言模型
- 创建特定任务的模型变体
- 对大模型进行内存高效的适配
Step 5: Dynamic Architecture Patterns
步骤5:动态架构模式
Symptoms:
- Need to add capacity during training (not just before)
- Want to prune underperforming components
- Deciding when/where to grow the network
Route to: dynamic-architecture-patterns.md
Covers:
- Growth patterns (slot-based, layer widening, depth extension)
- Pruning patterns (magnitude, gradient-based, lottery ticket)
- Trigger conditions (loss plateau, contribution metrics, budgets)
- Capacity scheduling (grow-as-needed vs overparameterize-then-prune)
When to Use:
- Building networks that expand during training
- Implementing neural architecture search lite
- Managing parameter budgets with dynamic allocation
症状:
- 需要在训练过程中新增容量(而非训练前)
- 希望剪枝性能不佳的组件
- 判断网络生长的时机和位置
路由至: dynamic-architecture-patterns.md
涵盖内容:
- 生长模式(基于插槽、层拓宽、深度扩展)
- 剪枝模式(基于幅度、梯度、彩票假设)
- 触发条件(损失平台期、贡献度指标、预算限制)
- 容量调度(按需生长 vs 先过参数化再剪枝)
适用场景:
- 构建训练过程中可扩展的网络
- 实现轻量型神经架构搜索
- 通过动态分配管理参数预算
Step 6: Modular Composition
步骤6:模块化组合
Symptoms:
- Combining outputs from multiple modules
- Designing gating/routing mechanisms
- Need graftable, replaceable components
Route to: modular-neural-composition.md
Covers:
- Combination mechanisms (additive, multiplicative, selective)
- Mixture of Experts (sparse gating, load balancing)
- Grafting semantics (input/output attachment points)
- Interface contracts (shape matching, normalization boundaries)
- Multi-module coordination (independent, competitive, cooperative)
When to Use:
- Building modular architectures with interchangeable parts
- Implementing MoE or gated architectures
- Designing residual streams as module communication
症状:
- 组合多个模块的输出
- 设计门控/路由机制
- 需要可移植、可替换的组件
路由至: modular-neural-composition.md
涵盖内容:
- 组合机制(加法、乘法、选择式)
- Mixture of Experts(稀疏门控、负载均衡)
- 移植语义(输入/输出附着点)
- 接口契约(形状匹配、归一化边界)
- 多模块协作(独立、竞争、协同)
适用场景:
- 构建具备可互换组件的模块化架构
- 实现MoE或门控架构
- 将残差流设计为模块通信通道
Step 7: Lifecycle Orchestration
步骤7:生命周期编排
Symptoms:
- Need to decide WHEN to grow, train, integrate, prune
- Building state machines for module lifecycle
- Want quality gates before integration decisions
Route to: ml-lifecycle-orchestration.md
Covers:
- State machine fundamentals (states, transitions, terminals)
- Gate design patterns (structural, performance, stability, contribution)
- Transition triggers (metric-based, time-based, budget-based)
- Rollback and recovery (cooldown, hysteresis)
- Controller patterns (heuristic, learned/RL, hybrid)
When to Use:
- Designing grow/train/integrate/prune workflows
- Implementing quality gates for safe integration
- Building RL-controlled architecture decisions
症状:
- 需要判断何时进行生长、训练、集成、剪枝
- 为模块生命周期构建状态机
- 在集成决策前设置质量门控
路由至: ml-lifecycle-orchestration.md
涵盖内容:
- 状态机基础(状态、转换、终止条件)
- 门控设计模式(结构、性能、稳定性、贡献度)
- 转换触发条件(基于指标、时间、预算)
- 回滚与恢复(冷却、滞后机制)
- 控制器模式(启发式、基于强化学习、混合式)
适用场景:
- 设计生长/训练/集成/剪枝的工作流
- 为安全集成实现质量门控
- 构建由强化学习控制的架构决策系统
Step 8: Progressive Training
步骤8:渐进式训练
Symptoms:
- New modules cause instability when integrated
- Need warmup/cooldown for safe capacity addition
- Planning multi-stage training schedules
Route to: progressive-training-strategies.md
Covers:
- Staged capacity expansion strategies
- Warmup patterns (zero-init, LR warmup, alpha ramp)
- Cooldown and stabilization (settling periods, consolidation)
- Multi-stage schedules (sequential, overlapping, budget-aware)
- Knowledge transfer between stages (inheritance, distillation)
When to Use:
- Ramping new modules safely into production
- Designing curriculum over architecture (not just data)
- Preventing stage transition shock
症状:
- 新模块集成时导致模型不稳定
- 需要通过预热/冷却安全新增容量
- 规划多阶段训练调度
路由至: progressive-training-strategies.md
涵盖内容:
- 分阶段容量扩展策略
- 预热模式(零初始化、学习率预热、Alpha渐变)
- 冷却与稳定(沉淀周期、巩固训练)
- 多阶段调度(顺序式、重叠式、预算感知)
- 阶段间知识迁移(继承、蒸馏)
适用场景:
- 将新模块安全部署至生产环境
- 针对架构设计课程式训练(而非仅针对数据)
- 避免阶段转换时的模型震荡
Common Multi-Skill Scenarios
常见多技能组合场景
Scenario: Building a Morphogenetic System
场景:构建形态发生系统
Need: Network that grows seeds, trains them in isolation, and grafts successful ones
Routing sequence:
- dynamic-architecture-patterns - Slot-based expansion, where seeds attach
- gradient-isolation-techniques - Train seeds on host errors without destabilizing host
- modular-neural-composition - How seed outputs blend into host stream
- ml-lifecycle-orchestration - State machine for seed lifecycle
- progressive-training-strategies - Warmup/cooldown for grafting
需求: 可生长种子模块、隔离训练并移植成功模块的网络
路由顺序:
- dynamic-architecture-patterns - 基于插槽的扩展,确定种子模块的附着位置
- gradient-isolation-techniques - 在不影响宿主稳定性的前提下,基于宿主误差训练种子模块
- modular-neural-composition - 种子模块输出与宿主流的融合方式
- ml-lifecycle-orchestration - 种子模块生命周期的状态机
- progressive-training-strategies - 移植时的预热/冷却流程
Scenario: Continual Learning Without Forgetting
场景:无遗忘的持续学习
Need: Train on sequence of tasks without catastrophic forgetting
Routing sequence:
- continual-learning-foundations - Understand forgetting, choose approach
- gradient-isolation-techniques - If using architectural approach (columns, modules)
- progressive-training-strategies - Staged training across tasks
需求: 按顺序训练多个任务,避免灾难性遗忘
路由顺序:
- continual-learning-foundations - 理解遗忘机制,选择合适方案
- gradient-isolation-techniques - 若采用架构层面的方案(列、模块)
- progressive-training-strategies - 跨任务的分阶段训练
Scenario: Neural Architecture Search (Lite)
场景:轻量型神经架构搜索
Need: Grow/prune network based on training signal
Routing sequence:
- dynamic-architecture-patterns - Growth/pruning triggers and patterns
- ml-lifecycle-orchestration - Automation via heuristics or RL
- progressive-training-strategies - Stabilization between changes
需求: 根据训练信号生长/剪枝网络
路由顺序:
- dynamic-architecture-patterns - 生长/剪枝的触发条件与模式
- ml-lifecycle-orchestration - 通过启发式或强化学习实现自动化
- progressive-training-strategies - 模型变更后的稳定训练
Scenario: RL-Controlled Architecture
场景:强化学习控制的架构
Need: RL agent deciding when to grow, prune, integrate
Routing sequence:
- ml-lifecycle-orchestration - Learned controller patterns
- dynamic-architecture-patterns - What actions the RL agent can take
- gradient-isolation-techniques - Safe exploration during training
需求: 由RL agent决定何时生长、剪枝、集成
路由顺序:
- ml-lifecycle-orchestration - 基于学习的控制器模式
- dynamic-architecture-patterns - RL agent可执行的操作
- gradient-isolation-techniques - 训练时的安全探索
Rationalization Resistance Table
常见误区纠正表
| Rationalization | Reality | Counter-Guidance |
|---|---|---|
| "Just train a bigger model from scratch" | Transfer + growth often beats from-scratch | "Check continual-learning-foundations for why" |
| "I'll freeze everything except the new layer" | Full freeze may be too restrictive | "Check gradient-isolation-techniques for partial strategies" |
| "I'll add capacity whenever loss plateaus" | Need more than loss plateau (contribution check) | "Check ml-lifecycle-orchestration for proper gates" |
| "Modules can just sum their outputs" | Naive summation can cause interference | "Check modular-neural-composition for combination mechanisms" |
| "I'll integrate immediately when training finishes" | Need warmup/holding period | "Check progressive-training-strategies for safe integration" |
| "EWC solves all forgetting problems" | EWC has limitations, may need architectural approach | "Check continual-learning-foundations for trade-offs" |
| 错误认知 | 实际情况 | 纠正建议 |
|---|---|---|
| "直接从头训练更大的模型即可" | 迁移+生长的效果通常优于从头训练 | "查看continual-learning-foundations文档了解原因" |
| "我会冻结除新层之外的所有部分" | 全量冻结可能过于受限 | "查看gradient-isolation-techniques文档了解部分冻结策略" |
| "只要损失进入平台期就新增容量" | 不能仅依赖损失平台期,还需检查贡献度 | "查看ml-lifecycle-orchestration文档了解正确的门控机制" |
| "模块输出直接相加即可" | 简单求和可能导致干扰 | "查看modular-neural-composition文档了解组合机制" |
| "训练完成后立即集成模块" | 需要预热/等待周期 | "查看progressive-training-strategies文档了解安全集成方式" |
| "EWC能解决所有遗忘问题" | EWC存在局限性,可能需要架构层面的方案 | "查看continual-learning-foundations文档了解权衡点" |
Red Flags Checklist
风险警示清单
Watch for these signs of incorrect approach:
- No Isolation: Training new modules without gradient isolation from host
- No Warmup: Integrating new capacity at full amplitude immediately
- No Gates: Integrating based only on time, not performance metrics
- Naive Combination: Summing module outputs without gating or blending
- Ignoring Forgetting: Adding new tasks without measuring old task performance
- No Rollback: No plan for what happens if integration fails
注意以下错误方法的迹象:
- 无隔离机制:训练新模块时未与宿主网络进行梯度隔离
- 无预热流程:直接以全幅度集成新容量
- 无质量门控:仅基于时间而非性能指标进行集成
- 简单组合方式:未使用门控或融合机制,直接求和模块输出
- 忽略遗忘问题:新增任务时未衡量旧任务的性能
- 无回滚方案:未规划集成失败后的处理流程
Relationship to Other Packs
与其他工具包的关联
| Request | Primary Pack | Why |
|---|---|---|
| "Implement PPO for architecture decisions" | yzmir-deep-rl | RL algorithm implementation |
| "Evaluate architecture changes without mutation" | yzmir-deep-rl/counterfactual-reasoning | Counterfactual simulation |
| "Debug PyTorch gradient flow" | yzmir-pytorch-engineering | Low-level PyTorch debugging |
| "Optimize training loop performance" | yzmir-training-optimization | General training optimization |
| "Design transformer architecture" | yzmir-neural-architectures | Static architecture design |
| "Deploy morphogenetic model" | yzmir-ml-production | Production deployment |
Intersection with deep-rl: If using RL to control architecture decisions (when to grow/prune), combine this pack's lifecycle orchestration with deep-rl's policy gradient or actor-critic methods.
Counterfactual evaluation: Before committing to a live mutation (grow/prune), use deep-rl's to simulate the change and evaluate outcomes without risk. This is critical for production morphogenetic systems.
counterfactual-reasoning.md| 请求 | 对应核心工具包 | 原因 |
|---|---|---|
| "为架构决策实现PPO算法" | yzmir-deep-rl | RL算法实现 |
| "无需突变即可评估架构变更" | yzmir-deep-rl/counterfactual-reasoning | 反事实模拟 |
| "调试PyTorch梯度流" | yzmir-pytorch-engineering | 底层PyTorch调试 |
| "优化训练循环性能" | yzmir-training-optimization | 通用训练优化 |
| "设计Transformer架构" | yzmir-neural-architectures | 静态架构设计 |
| "部署形态发生模型" | yzmir-ml-production | 生产环境部署 |
与深度强化学习的交集:若使用RL控制架构决策(何时生长/剪枝),需将本工具包的生命周期编排与深度强化学习的策略梯度或演员-评论家方法结合。
反事实评估:在实际执行突变(生长/剪枝)前,使用深度强化学习的模拟变更并评估结果,这对生产环境中的形态发生系统至关重要。
counterfactual-reasoning.mdDiagnostic Question Templates
诊断问题模板
Use these to route users:
使用以下问题引导用户:
Problem Classification
问题分类
- "Are you training on multiple tasks sequentially, or growing a single-task network?"
- "Do you have an existing trained model you want to extend, or starting fresh?"
- "Is the issue forgetting (old performance drops) or instability (training explodes)?"
- "你是按顺序训练多个任务,还是在扩展单任务网络?"
- "你是要扩展已训练完成的模型,还是从头开始构建?"
- 问题是遗忘(旧任务性能下降)还是不稳定(训练过程震荡)?"
Architectural Questions
架构相关问题
- "Where do new modules attach to the existing network?"
- "How should new module outputs combine with existing outputs?"
- "What triggers growth? Loss plateau, manual, or learned?"
- "新模块将附着在现有网络的哪个位置?"
- "新模块的输出应如何与现有输出组合?"
- "触发模型生长的条件是什么?损失平台期、手动触发还是基于学习的判断?"
Lifecycle Questions
生命周期相关问题
- "What states can a module be in? (training, integrating, permanent, removed)"
- "What conditions must be met before integration?"
- "What happens if a module fails to improve performance?"
- "模块可处于哪些状态?(训练中、集成中、永久保留、已移除)"
- "集成前需满足哪些条件?"
- "若集成失败,应如何处理?"
Summary: Routing Decision Tree
总结:路由决策树
START: Dynamic architecture problem
├─ Forgetting old tasks?
│ └─ → continual-learning-foundations
├─ New module destabilizes existing?
│ └─ → gradient-isolation-techniques
├─ Fine-tuning LLM efficiently?
│ └─ → peft-adapter-techniques
├─ When/where to add capacity?
│ └─ → dynamic-architecture-patterns
├─ How modules combine?
│ └─ → modular-neural-composition
├─ Managing grow/train/integrate cycle?
│ └─ → ml-lifecycle-orchestration
├─ Warmup/cooldown for new capacity?
│ └─ → progressive-training-strategies
└─ Building complete morphogenetic system?
└─ → Start with dynamic-architecture-patterns
→ Then gradient-isolation-techniques
→ Then ml-lifecycle-orchestration开始:动态架构问题
├─ 是否存在旧任务遗忘?
│ └─ → continual-learning-foundations
├─ 新模块导致现有模型不稳定?
│ └─ → gradient-isolation-techniques
├─ 高效微调大语言模型?
│ └─ → peft-adapter-techniques
├─ 何时/何地新增容量?
│ └─ → dynamic-architecture-patterns
├─ 模块如何组合?
│ └─ → modular-neural-composition
├─ 管理生长/训练/集成循环?
│ └─ → ml-lifecycle-orchestration
├─ 新容量的预热/冷却?
│ └─ → progressive-training-strategies
└─ 构建完整的形态发生系统?
└─ → 从dynamic-architecture-patterns开始
→ 接着使用gradient-isolation-techniques
→ 再使用ml-lifecycle-orchestrationReference Sheets
参考文档
After routing, load the appropriate reference sheet:
- continual-learning-foundations.md - EWC, PackNet, rehearsal, forgetting theory
- gradient-isolation-techniques.md - Freezing, detach, alpha blending, hook surgery
- peft-adapter-techniques.md - LoRA, QLoRA, DoRA, adapter merging
- dynamic-architecture-patterns.md - Grow/prune patterns, triggers, scheduling
- modular-neural-composition.md - MoE, gating, grafting, interface contracts
- ml-lifecycle-orchestration.md - State machines, gates, controllers
- progressive-training-strategies.md - Staged expansion, warmup/cooldown
完成路由后,加载对应的参考文档:
- continual-learning-foundations.md - EWC、PackNet、重放策略、遗忘理论
- gradient-isolation-techniques.md - 冻结策略、detach、Alpha混合、钩子手术
- peft-adapter-techniques.md - LoRA、QLoRA、DoRA、适配器合并
- dynamic-architecture-patterns.md - 生长/剪枝模式、触发条件、调度
- modular-neural-composition.md - MoE、门控机制、移植、接口契约
- ml-lifecycle-orchestration.md - 状态机、门控、控制器
- progressive-training-strategies.md - 分阶段扩展、预热/冷却