architecture-design
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseArchitecture Design - ML Project Template
架构设计 - ML项目模板
This skill defines the standard code architecture for machine learning projects based on the template structure. When modifying or extending code, follow these patterns to maintain consistency.
本指南定义了基于模板结构的机器学习项目标准代码架构。修改或扩展代码时,请遵循以下模式以保持一致性。
Overview
概述
The project follows a modular, extensible architecture with clear separation of concerns. Each module (data, model, trainer, analysis) is independently organized using factory and registry patterns for maximum flexibility.
本项目采用模块化、可扩展的架构,权责清晰分离。每个模块(data、model、trainer、analysis)都使用工厂和注册模式独立组织,最大化灵活性。
Core Design Patterns
核心设计模式
Factory Pattern
Factory Pattern
Each module uses a factory to create instances dynamically:
python
undefined每个模块使用工厂动态创建实例:
python
undefinedExample from data_module/dataset/init.py
Example from data_module/dataset/init.py
DATASET_FACTORY: Dict = {}
def DatasetFactory(data_name: str):
dataset = DATASET_FACTORY.get(data_name, None)
if dataset is None:
print(f"{data_name} dataset is not implementation, use simple dataset")
dataset = DATASET_FACTORY.get('simple')
return dataset
For detailed guidance, refer to `references/factory_pattern.md`.DATASET_FACTORY: Dict = {}
def DatasetFactory(data_name: str):
dataset = DATASET_FACTORY.get(data_name, None)
if dataset is None:
print(f"{data_name} dataset is not implementation, use simple dataset")
dataset = DATASET_FACTORY.get('simple')
return dataset
详细指引请参考 `references/factory_pattern.md`。Registry Pattern
Registry Pattern
Components register themselves via decorators:
python
undefined组件通过装饰器完成自我注册:
python
undefinedExample from data_module/dataset/simple_dataset.py
Example from data_module/dataset/simple_dataset.py
@register_dataset("simple")
class SimpleDataset(Dataset):
def init(self, data):
self.data = data
For detailed guidance, refer to `references/registry_pattern.md`.@register_dataset("simple")
class SimpleDataset(Dataset):
def init(self, data):
self.data = data
详细指引请参考 `references/registry_pattern.md`。Auto-Import Pattern
Auto-Import Pattern
Modules automatically discover and import submodules:
python
undefined模块自动发现并导入子模块:
python
undefinedExample from data_module/dataset/init.py
Example from data_module/dataset/init.py
models_dir = os.path.dirname(file)
import_modules(models_dir, "src.data_module.dataset")
For detailed guidance, refer to `references/auto_import.md`.models_dir = os.path.dirname(file)
import_modules(models_dir, "src.data_module.dataset")
详细指引请参考 `references/auto_import.md`。Directory Structure
目录结构
project/
├── run/
│ ├── pipeline/ # Main workflow scripts
│ │ ├── training/ # Training pipelines
│ │ ├── prepare_data/ # Data preparation pipelines
│ │ └── analysis/ # Analysis pipelines
│ └── conf/ # Hydra configuration files
│ ├── training/ # Training configs
│ ├── dataset/ # Dataset configs
│ ├── model/ # Model configs
│ ├── prepare_data/ # Data prep configs
│ └── analysis/ # Analysis configs
│
├── src/
│ ├── data_module/ # Data processing module
│ │ ├── dataset/ # Dataset implementations
│ │ ├── augmentation/ # Data augmentation
│ │ ├── collate_fn/ # Collate functions
│ │ ├── compute_metrics/ # Metrics computation
│ │ ├── prepare_data/ # Data preparation logic
│ │ ├── data_func/ # Data utility functions
│ │ └── utils.py # Module-specific utilities
│ │
│ ├── model_module/ # Model implementations
│ │ ├── brain_decoder/ # Brain decoder models
│ │ └── model/ # Alternative model location
│ │
│ ├── trainer_module/ # Training logic
│ ├── analysis_module/ # Analysis and evaluation
│ ├── llm/ # LLM-related code
│ └── utils/ # Shared utilities
│
├── data/
│ ├── raw/ # Original, immutable data
│ ├── processed/ # Cleaned, transformed data
│ └── external/ # Third-party data
│
├── outputs/
│ ├── logs/ # Training and evaluation logs
│ ├── checkpoints/ # Model checkpoints
│ ├── tables/ # Result tables
│ └── figures/ # Plots and visualizations
│
├── pyproject.toml # Project configuration
├── uv.lock # Dependency lock file
├── TODO.md # Task tracking
├── README.md # Project documentation
└── .gitignore # Git ignore rulesFor detailed directory structure with file descriptions, refer to .
references/structure.mdproject/
├── run/
│ ├── pipeline/ # 主工作流脚本
│ │ ├── training/ # 训练流水线
│ │ ├── prepare_data/ # 数据准备流水线
│ │ └── analysis/ # 分析流水线
│ └── conf/ # Hydra 配置文件
│ ├── training/ # 训练配置
│ ├── dataset/ # 数据集配置
│ ├── model/ # 模型配置
│ ├── prepare_data/ # 数据准备配置
│ └── analysis/ # 分析配置
│
├── src/
│ ├── data_module/ # 数据处理模块
│ │ ├── dataset/ # 数据集实现
│ │ ├── augmentation/ # 数据增强
│ │ ├── collate_fn/ # Collate 函数
│ │ ├── compute_metrics/ # 指标计算
│ │ ├── prepare_data/ # 数据准备逻辑
│ │ ├── data_func/ # 数据工具函数
│ │ └── utils.py # 模块专属工具
│ │
│ ├── model_module/ # 模型实现
│ │ ├── brain_decoder/ # 脑解码器模型
│ │ └── model/ # 备用模型存放目录
│ │
│ ├── trainer_module/ # 训练逻辑
│ ├── analysis_module/ # 分析与评估
│ ├── llm/ # LLM相关代码
│ └── utils/ # 公共工具
│
├── data/
│ ├── raw/ # 原始不可修改数据
│ ├── processed/ # 清洗转换后的数据
│ └── external/ # 第三方数据
│
├── outputs/
│ ├── logs/ # 训练与评估日志
│ ├── checkpoints/ # 模型 checkpoint
│ ├── tables/ # 结果表格
│ └── figures/ # 图表与可视化结果
│
├── pyproject.toml # 项目配置
├── uv.lock # 依赖锁定文件
├── TODO.md # 任务跟踪
├── README.md # 项目文档
└── .gitignore # Git 忽略规则带文件说明的详细目录结构请参考 。
references/structure.mdModule Organization
模块组织规范
Creating a New Dataset
创建新数据集
When adding a new dataset:
- Create file in
src/data_module/dataset/ - Use decorator
@register_dataset("name") - Inherit from
torch.utils.data.Dataset - Implement ,
__init__,__len____getitem__
python
from torch.utils.data import Dataset
from typing import Dict
import torch
from src.data_module.dataset import register_dataset
@register_dataset("custom")
class CustomDataset(Dataset):
def __init__(self, data):
self.data = data
def __len__(self):
return len(self.data)
def __getitem__(self, i: int) -> Dict[str, torch.Tensor]:
return self.data[i]添加新数据集时遵循以下步骤:
- 在 下创建文件
src/data_module/dataset/ - 使用 装饰器
@register_dataset("name") - 继承自
torch.utils.data.Dataset - 实现 、
__init__、__len__方法__getitem__
python
from torch.utils.data import Dataset
from typing import Dict
import torch
from src.data_module.dataset import register_dataset
@register_dataset("custom")
class CustomDataset(Dataset):
def __init__(self, data):
self.data = data
def __len__(self):
return len(self.data)
def __getitem__(self, i: int) -> Dict[str, torch.Tensor]:
return self.data[i]Creating a New Model
创建新模型
CRITICAL: Models use config-driven pattern
When adding a new model:
- Create file in or appropriate module subdirectory
src/model_module/model/ - Use decorator
@register_model('ModelName') - accepts ONLY
__init__parameter - all hyperparameters come from configcfg - returns dict:
forward(){"loss": loss, "labels": labels, "logits": logits} - Handle training vs inference modes using
self.training
python
from src.model_module.brain_decoder import register_model
@register_model('MyModel')
class MyModel(nn.Module):
def __init__(self, cfg):
super().__init__()
self.cfg = cfg
self.task = cfg.dataset.task
# ALL parameters from cfg
self.hidden_dim = cfg.model.hidden_dim
self.output_dim = cfg.dataset.target_size[cfg.dataset.task]
def forward(self, x, labels=None, **kwargs):
if self.training:
# Training logic
pass
else:
# Inference logic
pass
return {"loss": loss, "labels": labels, "logits": logits}重要提示:模型采用配置驱动模式
添加新模型时遵循以下步骤:
- 在 或对应模块子目录下创建文件
src/model_module/model/ - 使用 装饰器
@register_model('ModelName') - 仅接受
__init__参数,所有超参数均来自配置cfg - 返回字典:
forward(){"loss": loss, "labels": labels, "logits": logits} - 使用 区分训练与推理模式
self.training
python
from src.model_module.brain_decoder import register_model
@register_model('MyModel')
class MyModel(nn.Module):
def __init__(self, cfg):
super().__init__()
self.cfg = cfg
self.task = cfg.dataset.task
# 所有参数均来自 cfg
self.hidden_dim = cfg.model.hidden_dim
self.output_dim = cfg.dataset.target_size[cfg.dataset.task]
def forward(self, x, labels=None, **kwargs):
if self.training:
# 训练逻辑
pass
else:
# 推理逻辑
pass
return {"loss": loss, "labels": labels, "logits": logits}Adding Data Augmentation
添加数据增强
When adding augmentation:
- Create file in
src/data_module/augmentation/ - Implement transformation function
- Register with factory if needed
添加增强逻辑时遵循以下步骤:
- 在 下创建文件
src/data_module/augmentation/ - 实现转换函数
- 按需注册到工厂
Code Style Guidelines
代码风格规范
For comprehensive style guidelines, refer to .
references/code_style.mdKey principles:
- Always use type hints for function signatures
- Follow import order: standard library → third-party → local
- Module files contain factory/registry logic
__init__.py - Model classes must be config-driven
完整的风格规范请参考 。
references/code_style.md核心原则:
- 函数签名始终使用类型提示
- 导入顺序遵循:标准库 → 第三方库 → 本地模块
- 模块的 文件仅存放工厂/注册逻辑
__init__.py - 模型类必须是配置驱动的
Configuration Management
配置管理
The project uses Hydra for configuration management:
- Config files in organize by module
run/conf/ - Each stage (training, analysis) has its own config structure
- Use YAML files for all configuration
项目使用Hydra进行配置管理:
- 配置文件存放在 下,按模块组织
run/conf/ - 每个阶段(训练、分析)都有独立的配置结构
- 所有配置均使用YAML文件
When Working on This Project
项目协作规范
Before Modifying Code
修改代码前
- Read the relevant module's factory/registry pattern
- Check existing implementations for consistency
- Follow the established directory structure
- Use registration decorators for new components
- 阅读对应模块的工厂/注册模式说明
- 参考现有实现保持一致性
- 遵循既定的目录结构
- 新组件使用注册装饰器
Adding New Features
添加新功能
- Determine which module the feature belongs to
- Check if similar functionality exists
- Follow factory/registry pattern if creating new component types
- Add configuration files if needed
- Update documentation
- 确认功能所属的模块
- 检查是否已存在类似功能
- 若创建新组件类型需遵循工厂/注册模式
- 按需添加配置文件
- 更新文档
Code Review Checklist
代码评审检查项
- Uses factory/registry pattern appropriately
- Follows module directory structure
- Has proper type annotations
- Imports are correctly ordered
- Registration decorator is used
- Configuration files are added if needed
- 正确使用工厂/注册模式
- 遵循模块目录结构
- 有完整的类型注解
- 导入顺序正确
- 使用了注册装饰器
- 按需添加了配置文件
Additional Resources
额外资源
Reference Files
参考文档
For detailed information, consult:
- - Detailed directory structure with file descriptions
references/structure.md - - Factory pattern in-depth explanation
references/factory_pattern.md - - Registry pattern in-depth explanation
references/registry_pattern.md - - Auto-import pattern in-depth explanation
references/auto_import.md - - Comprehensive code style guidelines
references/code_style.md
如需详细信息,请查阅:
- - 带文件说明的详细目录结构
references/structure.md - - 工厂模式深度说明
references/factory_pattern.md - - 注册模式深度说明
references/registry_pattern.md - - 自动导入模式深度说明
references/auto_import.md - - 完整代码风格规范
references/code_style.md
Example Files
示例文件
Working examples in :
examples/- - Custom dataset implementation
examples/custom_dataset.py - - Custom model implementation
examples/custom_model.py - - Data augmentation example
examples/augmentation_example.py - - Configuration file example
examples/config_example.yaml - - Pipeline script example
examples/pipeline_example.sh
examples/- - 自定义数据集实现示例
examples/custom_dataset.py - - 自定义模型实现示例
examples/custom_model.py - - 数据增强示例
examples/augmentation_example.py - - 配置文件示例
examples/config_example.yaml - - 流水线脚本示例
examples/pipeline_example.sh