using-dynamic-architectures

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Dynamic Architectures Meta-Skill

动态架构元技能

When to Use This Skill

何时使用该技能

Invoke this meta-skill when you encounter:

Growing Networks: Adding capacity during training (new layers, neurons, modules)
Pruning Networks: Removing capacity that isn't contributing
Continual Learning: Training on new tasks without forgetting old ones
Gradient Isolation: Training new modules without destabilizing existing weights
Modular Composition: Building networks from graftable, composable components
Lifecycle Management: State machines controlling when to grow, train, integrate, prune
Progressive Training: Staged capability expansion with warmup and cooldown

This is the entry point for dynamic/morphogenetic neural network patterns. It routes to 7 specialized reference sheets.

当你遇到以下场景时，可调用该元技能：

网络生长：在训练过程中增加模型容量（新增层、神经元、模块）
网络剪枝：移除无贡献的模型容量
持续学习：在学习新任务时不遗忘旧任务知识
梯度隔离：训练新模块时不破坏现有权重的稳定性
模块化组合：使用可移植、可组合的组件构建网络
生命周期管理：通过状态机控制网络的生长、训练、集成、剪枝时机
渐进式训练：分阶段扩展模型能力，包含预热和冷却环节

这是动态/形态发生神经网络模式的入口点，可引导至7份专业参考文档。

How to Access Reference Sheets

如何访问参考文档

IMPORTANT: All reference sheets are located in the SAME DIRECTORY as this SKILL.md file.

When this skill is loaded from:

skills/using-dynamic-architectures/SKILL.md

Reference sheets like

continual-learning-foundations.md

are at:

skills/using-dynamic-architectures/continual-learning-foundations.md

NOT at:

skills/continual-learning-foundations.md

(WRONG PATH)

重要提示：所有参考文档都位于与本SKILL.md文件同一目录下。

若本技能从以下路径加载：

skills/using-dynamic-architectures/SKILL.md

则类似

continual-learning-foundations.md

的参考文档路径为：

skills/using-dynamic-architectures/continual-learning-foundations.md

而非：

skills/continual-learning-foundations.md

（错误路径）

Core Principle

核心原则

Dynamic architectures grow capability, not just tune weights.

Static networks are a guess about capacity. Dynamic networks let training signal drive structure. The challenge is growing without forgetting, integrating without destabilizing, and knowing when to act.

Key tensions:

Stability vs. Plasticity: Preserve existing knowledge while adding new capacity
Isolation vs. Integration: Train new modules separately, then merge carefully
Exploration vs. Exploitation: When to add capacity vs. when to stabilize

动态架构是扩展能力，而非仅调优权重。

静态网络是对模型容量的预判，而动态网络让训练信号驱动结构演变。其挑战在于扩展时不遗忘知识、集成时不破坏稳定性，以及准确把握调整时机。

关键矛盾点：

稳定性vs可塑性：保留已有知识的同时新增模型容量
隔离性vs集成性：先独立训练新模块，再谨慎合并
探索vs利用：判断何时新增容量、何时稳定模型

The 7 Dynamic Architecture Skills

7项动态架构技能

continual-learning-foundations - EWC, PackNet, rehearsal strategies, catastrophic forgetting theory
gradient-isolation-techniques - Freezing, gradient masking, stop_grad patterns, alpha blending
peft-adapter-techniques - LoRA, QLoRA, DoRA, adapter placement, merging strategies
dynamic-architecture-patterns - Grow/prune patterns, slot-based expansion, capacity scheduling
modular-neural-composition - MoE, gating, grafting semantics, interface contracts
ml-lifecycle-orchestration - State machines, quality gates, transition triggers, controllers
progressive-training-strategies - Staged expansion, warmup/cooldown, knowledge transfer

continual-learning-foundations - EWC、PackNet、重放策略、灾难性遗忘理论
gradient-isolation-techniques - 冻结策略、梯度掩码、stop_grad模式、Alpha混合
peft-adapter-techniques - LoRA、QLoRA、DoRA、适配器部署、合并策略
dynamic-architecture-patterns - 生长/剪枝模式、基于插槽的扩展、容量调度
modular-neural-composition - MoE、门控机制、移植语义、接口契约
ml-lifecycle-orchestration - 状态机、质量门控、触发条件、控制器
progressive-training-strategies - 分阶段扩展、预热/冷却、知识迁移

Routing Decision Framework

路由决策框架

Step 1: Identify the Core Problem

步骤1：识别核心问题

Diagnostic Questions:

"Are you trying to prevent forgetting when training on new data/tasks?"
"Are you trying to add new capacity to an existing trained network?"
"Are you designing how multiple modules combine?"
"Are you deciding WHEN to grow, prune, or integrate?"

Quick Routing:

Problem	Primary Skill
"Model forgets old tasks when I train new ones"	continual-learning-foundations
"New module destabilizes existing weights"	gradient-isolation-techniques
"Fine-tune LLM efficiently without full training"	peft-adapter-techniques
"When should I add more capacity?"	dynamic-architecture-patterns
"How do module outputs combine?"	modular-neural-composition
"How do I manage the grow/train/integrate cycle?"	ml-lifecycle-orchestration
"How do I warm up new modules safely?"	progressive-training-strategies

诊断问题：

"你是否在尝试训练新数据/任务时避免遗忘？"
"你是否想为已训练完成的网络新增容量？"
"你是否在设计多模块的组合方式？"
"你是否在判断何时进行生长、剪枝或集成操作？"

快速路由：

问题	对应核心技能
"训练新任务时模型遗忘旧任务知识"	continual-learning-foundations
"新模块导致现有权重不稳定"	gradient-isolation-techniques
"高效微调大语言模型，无需全量训练"	peft-adapter-techniques
"我应该何时新增模型容量？"	dynamic-architecture-patterns
"模块输出如何组合？"	modular-neural-composition
"如何管理生长/训练/集成的循环流程？"	ml-lifecycle-orchestration
"如何安全地预热新模块？"	progressive-training-strategies

Step 2: Catastrophic Forgetting (Continual Learning)

步骤2：灾难性遗忘（持续学习）

Symptoms:

Performance on old tasks drops when training on new tasks
Model "forgets" previous capabilities
Fine-tuning overwrites learned features

Route to: continual-learning-foundations.md

Covers:

Why SGD causes forgetting (loss landscape geometry)
EWC, SI, MAS (regularization approaches)
Progressive Neural Networks, PackNet (architectural approaches)
Experience replay, generative replay (rehearsal approaches)
Measuring forgetting (backward/forward transfer)

When to Use:

Training sequentially on multiple tasks
Fine-tuning without forgetting base capabilities
Designing systems that accumulate knowledge over time

症状：

训练新任务时，旧任务的性能下降
模型“遗忘”之前具备的能力
微调覆盖已学习的特征

路由至： continual-learning-foundations.md

涵盖内容：

为什么SGD会导致遗忘（损失 landscape 几何特性）
EWC、SI、MAS（正则化方法）
渐进式神经网络、PackNet（架构层面方法）
经验重放、生成式重放（重放类方法）
遗忘程度的衡量（反向/正向迁移）

适用场景：

按顺序训练多个任务
微调时不遗忘基础能力
设计可随时间积累知识的系统

Step 3: Gradient Isolation

步骤3：梯度隔离

Symptoms:

New module training affects host network stability
Want to train on host errors without backprop flowing to host
Need gradual integration of new capacity

Route to: gradient-isolation-techniques.md

Covers:

Freezing strategies (full, partial, scheduled)
```
detach()
```
vs
```
no_grad()
```
semantics
Dual-path training (residual learning on errors)
Alpha blending for gradual integration
Hook-based gradient surgery

When to Use:

Training "seed" modules that learn from host errors
Preventing catastrophic interference during growth
Implementing safe module grafting

症状：

训练新模块影响宿主网络的稳定性
希望基于宿主误差训练，但不向宿主传播反向梯度
需要逐步集成新的模型容量

路由至： gradient-isolation-techniques.md

涵盖内容：

冻结策略（全量、部分、调度式）
```
detach()
```
与
```
no_grad()
```
的语义差异
双路径训练（基于误差的残差学习）
用于逐步集成的Alpha混合
基于钩子的梯度手术

适用场景：

训练从宿主误差中学习的“种子”模块
避免模型生长时的灾难性干扰
实现安全的模块移植

Step 4: PEFT Adapters (LoRA, QLoRA)

步骤4：PEFT适配器（LoRA、QLoRA）

Symptoms:

Want to fine-tune large pretrained models efficiently
Memory constraints prevent full fine-tuning
Need task-specific adaptation without modifying base weights

Route to: peft-adapter-techniques.md

Covers:

LoRA (low-rank adaptation) fundamentals
QLoRA (quantized base + LoRA adapters)
DoRA (weight-decomposed adaptation)
Adapter placement strategies
Merging adapters into base model
Multiple adapter management

When to Use:

Fine-tuning LLMs on limited compute
Creating task-specific model variants
Memory-efficient adaptation of large models

症状：

希望高效微调大尺寸预训练模型
内存限制导致无法进行全量微调
需要针对特定任务适配模型，但不修改基础权重

路由至： peft-adapter-techniques.md

涵盖内容：

LoRA（低秩适配）基础原理
QLoRA（量化基础模型+LoRA适配器）
DoRA（权重分解适配）
适配器部署策略
将适配器合并至基础模型
多适配器管理

适用场景：

在有限计算资源下微调大语言模型
创建特定任务的模型变体
对大模型进行内存高效的适配

Step 5: Dynamic Architecture Patterns

步骤5：动态架构模式

Symptoms:

Need to add capacity during training (not just before)
Want to prune underperforming components
Deciding when/where to grow the network

Route to: dynamic-architecture-patterns.md

Covers:

Growth patterns (slot-based, layer widening, depth extension)
Pruning patterns (magnitude, gradient-based, lottery ticket)
Trigger conditions (loss plateau, contribution metrics, budgets)
Capacity scheduling (grow-as-needed vs overparameterize-then-prune)

When to Use:

Building networks that expand during training
Implementing neural architecture search lite
Managing parameter budgets with dynamic allocation

症状：

需要在训练过程中新增容量（而非训练前）
希望剪枝性能不佳的组件
判断网络生长的时机和位置

路由至： dynamic-architecture-patterns.md

涵盖内容：

生长模式（基于插槽、层拓宽、深度扩展）
剪枝模式（基于幅度、梯度、彩票假设）
触发条件（损失平台期、贡献度指标、预算限制）
容量调度（按需生长 vs 先过参数化再剪枝）

适用场景：

构建训练过程中可扩展的网络
实现轻量型神经架构搜索
通过动态分配管理参数预算

Step 6: Modular Composition

步骤6：模块化组合

Symptoms:

Combining outputs from multiple modules
Designing gating/routing mechanisms
Need graftable, replaceable components

Route to: modular-neural-composition.md

Covers:

Combination mechanisms (additive, multiplicative, selective)
Mixture of Experts (sparse gating, load balancing)
Grafting semantics (input/output attachment points)
Interface contracts (shape matching, normalization boundaries)
Multi-module coordination (independent, competitive, cooperative)

When to Use:

Building modular architectures with interchangeable parts
Implementing MoE or gated architectures
Designing residual streams as module communication

症状：

组合多个模块的输出
设计门控/路由机制
需要可移植、可替换的组件

路由至： modular-neural-composition.md

涵盖内容：

组合机制（加法、乘法、选择式）
Mixture of Experts（稀疏门控、负载均衡）
移植语义（输入/输出附着点）
接口契约（形状匹配、归一化边界）
多模块协作（独立、竞争、协同）

适用场景：

构建具备可互换组件的模块化架构
实现MoE或门控架构
将残差流设计为模块通信通道

Step 7: Lifecycle Orchestration

步骤7：生命周期编排

Symptoms:

Need to decide WHEN to grow, train, integrate, prune
Building state machines for module lifecycle
Want quality gates before integration decisions

Route to: ml-lifecycle-orchestration.md

Covers:

State machine fundamentals (states, transitions, terminals)
Gate design patterns (structural, performance, stability, contribution)
Transition triggers (metric-based, time-based, budget-based)
Rollback and recovery (cooldown, hysteresis)
Controller patterns (heuristic, learned/RL, hybrid)

When to Use:

Designing grow/train/integrate/prune workflows
Implementing quality gates for safe integration
Building RL-controlled architecture decisions

症状：

需要判断何时进行生长、训练、集成、剪枝
为模块生命周期构建状态机
在集成决策前设置质量门控

路由至： ml-lifecycle-orchestration.md

涵盖内容：

状态机基础（状态、转换、终止条件）
门控设计模式（结构、性能、稳定性、贡献度）
转换触发条件（基于指标、时间、预算）
回滚与恢复（冷却、滞后机制）
控制器模式（启发式、基于强化学习、混合式）

适用场景：

设计生长/训练/集成/剪枝的工作流
为安全集成实现质量门控
构建由强化学习控制的架构决策系统

Step 8: Progressive Training

步骤8：渐进式训练

Symptoms:

New modules cause instability when integrated
Need warmup/cooldown for safe capacity addition
Planning multi-stage training schedules

Route to: progressive-training-strategies.md

Covers:

Staged capacity expansion strategies
Warmup patterns (zero-init, LR warmup, alpha ramp)
Cooldown and stabilization (settling periods, consolidation)
Multi-stage schedules (sequential, overlapping, budget-aware)
Knowledge transfer between stages (inheritance, distillation)

When to Use:

Ramping new modules safely into production
Designing curriculum over architecture (not just data)
Preventing stage transition shock

症状：

新模块集成时导致模型不稳定
需要通过预热/冷却安全新增容量
规划多阶段训练调度

路由至： progressive-training-strategies.md

涵盖内容：

分阶段容量扩展策略
预热模式（零初始化、学习率预热、Alpha渐变）
冷却与稳定（沉淀周期、巩固训练）
多阶段调度（顺序式、重叠式、预算感知）
阶段间知识迁移（继承、蒸馏）

适用场景：

将新模块安全部署至生产环境
针对架构设计课程式训练（而非仅针对数据）
避免阶段转换时的模型震荡

Common Multi-Skill Scenarios

常见多技能组合场景

Scenario: Building a Morphogenetic System

场景：构建形态发生系统

Need: Network that grows seeds, trains them in isolation, and grafts successful ones

Routing sequence:

dynamic-architecture-patterns - Slot-based expansion, where seeds attach
gradient-isolation-techniques - Train seeds on host errors without destabilizing host
modular-neural-composition - How seed outputs blend into host stream
ml-lifecycle-orchestration - State machine for seed lifecycle
progressive-training-strategies - Warmup/cooldown for grafting

需求： 可生长种子模块、隔离训练并移植成功模块的网络

路由顺序：

dynamic-architecture-patterns - 基于插槽的扩展，确定种子模块的附着位置
gradient-isolation-techniques - 在不影响宿主稳定性的前提下，基于宿主误差训练种子模块
modular-neural-composition - 种子模块输出与宿主流的融合方式
ml-lifecycle-orchestration - 种子模块生命周期的状态机
progressive-training-strategies - 移植时的预热/冷却流程

Scenario: Continual Learning Without Forgetting

场景：无遗忘的持续学习

Need: Train on sequence of tasks without catastrophic forgetting

Routing sequence:

continual-learning-foundations - Understand forgetting, choose approach
gradient-isolation-techniques - If using architectural approach (columns, modules)
progressive-training-strategies - Staged training across tasks

需求： 按顺序训练多个任务，避免灾难性遗忘

路由顺序：

continual-learning-foundations - 理解遗忘机制，选择合适方案
gradient-isolation-techniques - 若采用架构层面的方案（列、模块）
progressive-training-strategies - 跨任务的分阶段训练

Scenario: Neural Architecture Search (Lite)

场景：轻量型神经架构搜索

Need: Grow/prune network based on training signal

Routing sequence:

dynamic-architecture-patterns - Growth/pruning triggers and patterns
ml-lifecycle-orchestration - Automation via heuristics or RL
progressive-training-strategies - Stabilization between changes

需求： 根据训练信号生长/剪枝网络

路由顺序：

dynamic-architecture-patterns - 生长/剪枝的触发条件与模式
ml-lifecycle-orchestration - 通过启发式或强化学习实现自动化
progressive-training-strategies - 模型变更后的稳定训练

Scenario: RL-Controlled Architecture

场景：强化学习控制的架构

Need: RL agent deciding when to grow, prune, integrate

Routing sequence:

ml-lifecycle-orchestration - Learned controller patterns
dynamic-architecture-patterns - What actions the RL agent can take
gradient-isolation-techniques - Safe exploration during training

需求： 由RL agent决定何时生长、剪枝、集成

路由顺序：

ml-lifecycle-orchestration - 基于学习的控制器模式
dynamic-architecture-patterns - RL agent可执行的操作
gradient-isolation-techniques - 训练时的安全探索

Rationalization Resistance Table

常见误区纠正表

Rationalization	Reality	Counter-Guidance
"Just train a bigger model from scratch"	Transfer + growth often beats from-scratch	"Check continual-learning-foundations for why"
"I'll freeze everything except the new layer"	Full freeze may be too restrictive	"Check gradient-isolation-techniques for partial strategies"
"I'll add capacity whenever loss plateaus"	Need more than loss plateau (contribution check)	"Check ml-lifecycle-orchestration for proper gates"
"Modules can just sum their outputs"	Naive summation can cause interference	"Check modular-neural-composition for combination mechanisms"
"I'll integrate immediately when training finishes"	Need warmup/holding period	"Check progressive-training-strategies for safe integration"
"EWC solves all forgetting problems"	EWC has limitations, may need architectural approach	"Check continual-learning-foundations for trade-offs"

错误认知	实际情况	纠正建议
"直接从头训练更大的模型即可"	迁移+生长的效果通常优于从头训练	"查看continual-learning-foundations文档了解原因"
"我会冻结除新层之外的所有部分"	全量冻结可能过于受限	"查看gradient-isolation-techniques文档了解部分冻结策略"
"只要损失进入平台期就新增容量"	不能仅依赖损失平台期，还需检查贡献度	"查看ml-lifecycle-orchestration文档了解正确的门控机制"
"模块输出直接相加即可"	简单求和可能导致干扰	"查看modular-neural-composition文档了解组合机制"
"训练完成后立即集成模块"	需要预热/等待周期	"查看progressive-training-strategies文档了解安全集成方式"
"EWC能解决所有遗忘问题"	EWC存在局限性，可能需要架构层面的方案	"查看continual-learning-foundations文档了解权衡点"

Red Flags Checklist

风险警示清单

Relationship to Other Packs

与其他工具包的关联

Request	Primary Pack	Why
"Implement PPO for architecture decisions"	yzmir-deep-rl	RL algorithm implementation
"Evaluate architecture changes without mutation"	yzmir-deep-rl/counterfactual-reasoning	Counterfactual simulation
"Debug PyTorch gradient flow"	yzmir-pytorch-engineering	Low-level PyTorch debugging
"Optimize training loop performance"	yzmir-training-optimization	General training optimization
"Design transformer architecture"	yzmir-neural-architectures	Static architecture design
"Deploy morphogenetic model"	yzmir-ml-production	Production deployment

Intersection with deep-rl: If using RL to control architecture decisions (when to grow/prune), combine this pack's lifecycle orchestration with deep-rl's policy gradient or actor-critic methods.

Counterfactual evaluation: Before committing to a live mutation (grow/prune), use deep-rl's

counterfactual-reasoning.md

to simulate the change and evaluate outcomes without risk. This is critical for production morphogenetic systems.

请求	对应核心工具包	原因
"为架构决策实现PPO算法"	yzmir-deep-rl	RL算法实现
"无需突变即可评估架构变更"	yzmir-deep-rl/counterfactual-reasoning	反事实模拟
"调试PyTorch梯度流"	yzmir-pytorch-engineering	底层PyTorch调试
"优化训练循环性能"	yzmir-training-optimization	通用训练优化
"设计Transformer架构"	yzmir-neural-architectures	静态架构设计
"部署形态发生模型"	yzmir-ml-production	生产环境部署

与深度强化学习的交集：若使用RL控制架构决策（何时生长/剪枝），需将本工具包的生命周期编排与深度强化学习的策略梯度或演员-评论家方法结合。

反事实评估：在实际执行突变（生长/剪枝）前，使用深度强化学习的

counterfactual-reasoning.md

模拟变更并评估结果，这对生产环境中的形态发生系统至关重要。

Diagnostic Question Templates

诊断问题模板

Use these to route users:

使用以下问题引导用户：

Problem Classification

问题分类

"Are you training on multiple tasks sequentially, or growing a single-task network?"
"Do you have an existing trained model you want to extend, or starting fresh?"
"Is the issue forgetting (old performance drops) or instability (training explodes)?"

"你是按顺序训练多个任务，还是在扩展单任务网络？"
"你是要扩展已训练完成的模型，还是从头开始构建？"
问题是遗忘（旧任务性能下降）还是不稳定（训练过程震荡）？"

Architectural Questions

架构相关问题

"Where do new modules attach to the existing network?"
"How should new module outputs combine with existing outputs?"
"What triggers growth? Loss plateau, manual, or learned?"

"新模块将附着在现有网络的哪个位置？"
"新模块的输出应如何与现有输出组合？"
"触发模型生长的条件是什么？损失平台期、手动触发还是基于学习的判断？"

Lifecycle Questions

生命周期相关问题

"What states can a module be in? (training, integrating, permanent, removed)"
"What conditions must be met before integration?"
"What happens if a module fails to improve performance?"

"模块可处于哪些状态？（训练中、集成中、永久保留、已移除）"
"集成前需满足哪些条件？"
"若集成失败，应如何处理？"

Summary: Routing Decision Tree

总结：路由决策树

START: Dynamic architecture problem

├─ Forgetting old tasks?
│  └─ → continual-learning-foundations

├─ New module destabilizes existing?
│  └─ → gradient-isolation-techniques

├─ Fine-tuning LLM efficiently?
│  └─ → peft-adapter-techniques

├─ When/where to add capacity?
│  └─ → dynamic-architecture-patterns

├─ How modules combine?
│  └─ → modular-neural-composition

├─ Managing grow/train/integrate cycle?
│  └─ → ml-lifecycle-orchestration

├─ Warmup/cooldown for new capacity?
│  └─ → progressive-training-strategies

└─ Building complete morphogenetic system?
   └─ → Start with dynamic-architecture-patterns
      → Then gradient-isolation-techniques
      → Then ml-lifecycle-orchestration

开始：动态架构问题

├─ 是否存在旧任务遗忘？
│  └─ → continual-learning-foundations

├─ 新模块导致现有模型不稳定？
│  └─ → gradient-isolation-techniques

├─ 高效微调大语言模型？
│  └─ → peft-adapter-techniques

├─ 何时/何地新增容量？
│  └─ → dynamic-architecture-patterns

├─ 模块如何组合？
│  └─ → modular-neural-composition

├─ 管理生长/训练/集成循环？
│  └─ → ml-lifecycle-orchestration

├─ 新容量的预热/冷却？
│  └─ → progressive-training-strategies

└─ 构建完整的形态发生系统？
   └─ → 从dynamic-architecture-patterns开始
      → 接着使用gradient-isolation-techniques
      → 再使用ml-lifecycle-orchestration

Reference Sheets

参考文档

After routing, load the appropriate reference sheet:

continual-learning-foundations.md - EWC, PackNet, rehearsal, forgetting theory
gradient-isolation-techniques.md - Freezing, detach, alpha blending, hook surgery
peft-adapter-techniques.md - LoRA, QLoRA, DoRA, adapter merging
dynamic-architecture-patterns.md - Grow/prune patterns, triggers, scheduling
modular-neural-composition.md - MoE, gating, grafting, interface contracts
ml-lifecycle-orchestration.md - State machines, gates, controllers
progressive-training-strategies.md - Staged expansion, warmup/cooldown

完成路由后，加载对应的参考文档：

continual-learning-foundations.md - EWC、PackNet、重放策略、遗忘理论
gradient-isolation-techniques.md - 冻结策略、detach、Alpha混合、钩子手术
peft-adapter-techniques.md - LoRA、QLoRA、DoRA、适配器合并
dynamic-architecture-patterns.md - 生长/剪枝模式、触发条件、调度
modular-neural-composition.md - MoE、门控机制、移植、接口契约
ml-lifecycle-orchestration.md - 状态机、门控、控制器
progressive-training-strategies.md - 分阶段扩展、预热/冷却