folder-organization

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Folder Organization Best Practices

文件夹组织最佳实践

Expert guidance for organizing project directories, establishing file naming conventions, and maintaining clean, navigable project structures for research and development work.
为研发工作提供目录组织、文件命名规范制定,以及维护清晰易导航项目结构的专业指导。

When to Use This Skill

何时使用该技能

  • Setting up new projects
  • Reorganizing existing projects
  • Establishing team conventions
  • Creating reproducible research structures
  • Managing data-intensive projects
  • 搭建新项目
  • 重组现有项目
  • 制定团队规范
  • 创建可复现的研究结构
  • 管理数据密集型项目

Core Principles

核心原则

  1. Predictability - Standard locations for common file types
  2. Scalability - Structure grows gracefully with project
  3. Discoverability - Easy for others (and future you) to navigate
  4. Separation of Concerns - Code, data, documentation, outputs separated
  5. Version Control Friendly - Large/generated files excluded appropriately
  1. 可预测性 - 为常见文件类型设置标准存放位置
  2. 可扩展性 - 结构随项目发展自然扩展
  3. 易查找性 - 便于他人(以及未来的你)浏览
  4. 关注点分离 - 代码、数据、文档、输出相互分离
  5. 适配版本控制 - 合理排除大型/生成文件

Standard Project Structure

标准项目结构

Research/Analysis Projects

研究/分析项目

project-name/
├── README.md                 # Project overview and getting started
├── .gitignore               # Exclude data, outputs, env files
├── environment.yml          # Conda environment (or requirements.txt)
├── data/                    # Input data (often gitignored)
│   ├── raw/                # Original, immutable data
│   ├── processed/          # Cleaned, transformed data
│   └── external/           # Third-party data
├── notebooks/               # Jupyter notebooks for exploration
│   ├── 01-exploration.ipynb
│   ├── 02-analysis.ipynb
│   └── figures/            # Notebook-generated figures
├── src/                     # Source code (reusable modules)
│   ├── __init__.py
│   ├── data_processing.py
│   ├── analysis.py
│   └── visualization.py
├── scripts/                 # Standalone scripts and workflows
│   ├── download_data.sh
│   └── run_pipeline.py
├── tests/                   # Unit tests
│   └── test_analysis.py
├── docs/                    # Documentation
│   ├── methods.md
│   └── references.md
├── results/                 # Analysis outputs (gitignored)
│   ├── figures/
│   ├── tables/
│   └── models/
└── config/                  # Configuration files
    └── analysis_config.yaml
project-name/
├── README.md                 # 项目概述与快速上手
├── .gitignore               # 排除数据、输出、环境文件
├── environment.yml          # Conda环境配置(或requirements.txt)
├── data/                    # 输入数据(通常加入.gitignore)
│   ├── raw/                # 原始、不可修改的数据
│   ├── processed/          # 清洗、转换后的数据
│   └── external/           # 第三方数据
├── notebooks/               # 用于探索分析的Jupyter笔记本
│   ├── 01-exploration.ipynb
│   ├── 02-analysis.ipynb
│   └── figures/            # 笔记本生成的图表
├── src/                     # 源代码(可复用模块)
│   ├── __init__.py
│   ├── data_processing.py
│   ├── analysis.py
│   └── visualization.py
├── scripts/                 # 独立脚本与工作流
│   ├── download_data.sh
│   └── run_pipeline.py
├── tests/                   # 单元测试
│   └── test_analysis.py
├── docs/                    # 文档
│   ├── methods.md
│   └── references.md
├── results/                 # 分析输出(加入.gitignore)
│   ├── figures/
│   ├── tables/
│   └── models/
└── config/                  # 配置文件
    └── analysis_config.yaml

Development Projects

开发项目

project-name/
├── README.md
├── .gitignore
├── setup.py                 # Package configuration
├── requirements.txt         # or pyproject.toml
├── src/
│   └── package_name/
│       ├── __init__.py
│       ├── core.py
│       └── utils.py
├── tests/
│   ├── test_core.py
│   └── test_utils.py
├── docs/
│   ├── api.md
│   └── usage.md
├── examples/                # Example usage
│   └── example_workflow.py
└── .github/                 # CI/CD workflows
    └── workflows/
        └── tests.yml
project-name/
├── README.md
├── .gitignore
├── setup.py                 # 包配置
├── requirements.txt         # 或pyproject.toml
├── src/
│   └── package_name/
│       ├── __init__.py
│       ├── core.py
│       └── utils.py
├── tests/
│   ├── test_core.py
│   └── test_utils.py
├── docs/
│   ├── api.md
│   └── usage.md
├── examples/                # 使用示例
│   └── example_workflow.py
└── .github/                 # CI/CD工作流
    └── workflows/
        └── tests.yml

Bioinformatics/Workflow Projects

生物信息学/工作流项目

project-name/
├── README.md
├── data/
│   ├── raw/                # Raw sequencing data
│   ├── reference/          # Reference genomes, annotations
│   └── processed/          # Workflow outputs
├── workflows/               # Galaxy .ga or Snakemake files
│   ├── preprocessing.ga
│   └── assembly.ga
├── config/
│   ├── workflow_params.yaml
│   └── sample_sheet.tsv
├── scripts/                # Helper scripts
│   ├── submit_workflow.py
│   └── quality_check.py
├── results/                # Final outputs
│   ├── figures/
│   ├── tables/
│   └── reports/
└── logs/                   # Workflow execution logs
project-name/
├── README.md
├── data/
│   ├── raw/                # 原始测序数据
│   ├── reference/          # 参考基因组、注释文件
│   └── processed/          # 工作流输出
├── workflows/               # Galaxy .ga或Snakemake文件
│   ├── preprocessing.ga
│   └── assembly.ga
├── config/
│   ├── workflow_params.yaml
│   └── sample_sheet.tsv
├── scripts/                # 辅助脚本
│   ├── submit_workflow.py
│   └── quality_check.py
├── results/                # 最终输出
│   ├── figures/
│   ├── tables/
│   └── reports/
└── logs/                   # 工作流执行日志

File Naming Conventions

文件命名规范

General Rules

通用规则

  1. Use lowercase with hyphens or underscores
    • data-analysis.py
      or
      data_analysis.py
    • DataAnalysis.py
      or
      data analysis.py
  2. Be descriptive but concise
    • process-telomere-data.py
    • script.py
      or
      process_all_the_telomere_sequencing_data_from_experiments.py
  3. Use consistent separators
    • Choose either hyphens or underscores and stick with it
    • Convention: hyphens for file names, underscores for Python modules
  4. Include version/date for important outputs
    • report-2026-01-23.pdf
      or
      model-v2.pkl
    • report-final-final-v3.pdf
  1. 使用小写字母,搭配连字符或下划线
    • data-analysis.py
      data_analysis.py
    • DataAnalysis.py
      data analysis.py
  2. 描述性与简洁性兼顾
    • process-telomere-data.py
    • script.py
      process_all_the_telomere_sequencing_data_from_experiments.py
  3. 使用统一的分隔符
    • 选择连字符或下划线并保持一致
    • 惯例:文件名用连字符,Python模块用下划线
  4. 重要输出文件包含版本/日期
    • report-2026-01-23.pdf
      model-v2.pkl
    • report-final-final-v3.pdf

Numbered Sequences

编号序列

For sequential files (notebooks, scripts), use zero-padded numbers:
notebooks/
├── 01-data-exploration.ipynb
├── 02-quality-control.ipynb
├── 03-statistical-analysis.ipynb
└── 04-visualization.ipynb
对于序列文件(笔记本、脚本),使用补零编号:
notebooks/
├── 01-data-exploration.ipynb
├── 02-quality-control.ipynb
├── 03-statistical-analysis.ipynb
└── 04-visualization.ipynb

Data Files

数据文件

Include metadata in filename when possible:
data/raw/
├── sample-A_hifi_reads_2026-01-15.fastq.gz
├── sample-B_hifi_reads_2026-01-15.fastq.gz
└── reference_genome_v3.fasta
尽可能在文件名中包含元数据:
data/raw/
├── sample-A_hifi_reads_2026-01-15.fastq.gz
├── sample-B_hifi_reads_2026-01-15.fastq.gz
└── reference_genome_v3.fasta

Directory Management Best Practices

目录管理最佳实践

What to Version Control

哪些内容需要版本控制

DO commit:
  • Source code
  • Documentation
  • Configuration files
  • Small test datasets (<1MB)
  • Requirements/environment files
  • README files
DON'T commit:
  • Large data files (use
    .gitignore
    )
  • Generated outputs
  • Environment directories (
    venv/
    ,
    conda-env/
    )
  • Logs
  • Temporary files
  • API keys/secrets
建议提交:
  • 源代码
  • 文档
  • 配置文件
  • 小型测试数据集(<1MB)
  • 依赖/环境配置文件
  • README文件
请勿提交:
  • 大型数据文件(使用
    .gitignore
    排除)
  • 生成的输出文件
  • 环境目录(
    venv/
    conda-env/
  • 日志文件
  • 临时文件
  • API密钥/机密信息

.gitignore Template

.gitignore模板

gitignore
undefined
gitignore
undefined

Python

Python

pycache/ *.py[cod] *$py.class .venv/ venv/ *.egg-info/
pycache/ *.py[cod] *$py.class .venv/ venv/ *.egg-info/

Jupyter

Jupyter

.ipynb_checkpoints/ *.ipynb_checkpoints
.ipynb_checkpoints/ *.ipynb_checkpoints

Data

Data

data/raw/ data/processed/ *.fastq.gz *.bam *.vcf.gz
data/raw/ data/processed/ *.fastq.gz *.bam *.vcf.gz

Outputs

Outputs

results/ outputs/ *.png *.pdf *.html
results/ outputs/ *.png *.pdf *.html

Logs

Logs

logs/ *.log
logs/ *.log

Environment

Environment

.env environment.local.yml
.env environment.local.yml

OS

OS

.DS_Store Thumbs.db
undefined
.DS_Store Thumbs.db
undefined

Data Organization

数据组织

Raw Data is Sacred

原始数据不可修改

  • Never modify raw data - Always keep originals untouched
  • Store in
    data/raw/
    and make it read-only if possible
  • Document data provenance (where it came from, when downloaded)
  • 绝不修改原始数据 - 始终保持原始文件未被改动
  • 存储在
    data/raw/
    目录,尽可能设置为只读
  • 记录数据来源(获取渠道、下载时间)

Processed Data Hierarchy

处理后数据层级

data/
├── raw/                    # Original, immutable
├── interim/                # Intermediate processing steps
├── processed/              # Final, analysis-ready data
└── external/               # Third-party data
data/
├── raw/                    # 原始、不可修改
├── interim/                # 中间处理步骤数据
├── processed/              # 最终可用于分析的数据
└── external/               # 第三方数据

Documentation Standards

文档标准

README.md Essentials

README.md核心内容

Every project should have a README with:
markdown
undefined
每个项目都应包含README,内容包括:
markdown
undefined

Project Name

项目名称

Brief description
项目简要描述

Installation

安装

How to set up the environment
环境搭建方法

Usage

使用

How to run the analysis/code
分析/代码运行方式

Project Structure

项目结构

Brief overview of directories
目录简要说明

Data

数据

Where data lives and how to access it
数据存储位置及获取方式

Results

结果

Where to find outputs
undefined
输出文件位置
undefined

Code Documentation

代码文档

  • Docstrings for all functions/classes
  • Comments for complex logic
  • CHANGELOG.md for tracking changes
  • TODO.md for tracking work (gitignored or removed before merge)
  • 所有函数/类添加文档字符串(Docstrings)
  • 复杂逻辑添加注释
  • 使用CHANGELOG.md记录变更
  • 使用TODO.md跟踪待办工作(可加入.gitignore或合并前移除)

Common Anti-Patterns to Avoid

需避免的常见反模式

Flat structure with everything in root
project/
├── script1.py
├── script2.py
├── data.csv
├── output1.png
├── output2.png
└── final_really_final_v3.xlsx
Ambiguous naming
notebooks/
├── notebook1.ipynb
├── test.ipynb
├── analysis.ipynb
└── analysis_new.ipynb
Mixed concerns
project/
├── src/
│   ├── analysis.py
│   ├── data.csv          # Data in source code directory
│   └── figure1.png       # Output in source code directory
所有文件都放在根目录的扁平结构
project/
├── script1.py
├── script2.py
├── data.csv
├── output1.png
├── output2.png
└── final_really_final_v3.xlsx
命名模糊不清
notebooks/
├── notebook1.ipynb
├── test.ipynb
├── analysis.ipynb
└── analysis_new.ipynb
关注点混杂
project/
├── src/
│   ├── analysis.py
│   ├── data.csv          # 数据存放在源代码目录
│   └── figure1.png       # 输出文件存放在源代码目录

Cleanup and Maintenance

清理与维护

Regular Maintenance Tasks

定期维护任务

  1. Archive old branches - Delete merged feature branches
  2. Clean temp files - Remove
    TODO.md
    ,
    NOTES.md
    from completed work
  3. Update documentation - Keep README current with changes
  4. Review .gitignore - Ensure large files aren't tracked
  5. Organize notebooks - Rename/renumber as project evolves
  1. 归档旧分支 - 删除已合并的功能分支
  2. 清理临时文件 - 移除已完成工作中的
    TODO.md
    NOTES.md
  3. 更新文档 - 保持README与项目变更同步
  4. 检查.gitignore - 确保大型文件未被追踪
  5. 整理笔记本 - 随项目演进重命名/重新编号

End-of-Project Checklist

项目收尾检查清单

  • README complete and accurate
  • Code documented
  • Tests passing
  • Large files gitignored
  • Working files removed (TODO.md, scratch notebooks)
  • Final outputs in
    results/
  • Environment files current
  • License added (if applicable)
  • README完整准确
  • 代码已添加文档
  • 测试全部通过
  • 大型文件已加入.gitignore
  • 工作文件已移除(TODO.md、草稿笔记本)
  • 最终输出已存入
    results/
  • 环境配置文件为最新版本
  • 已添加许可证(如适用)

Integration with Other Skills

与其他技能的集成

This skill works well with:
  • python-environment - Environment setup and management
  • claude-collaboration - Team workflow best practices
  • jupyter-notebook-analysis - Notebook organization standards
该技能可与以下技能配合使用:
  • python-environment - 环境搭建与管理
  • claude-collaboration - 团队工作流最佳实践
  • jupyter-notebook-analysis - 笔记本组织标准

Templates and Tools

模板与工具

Quick Project Setup

快速项目搭建

bash
undefined
bash
undefined

Create standard research project structure

创建标准研究项目结构

mkdir -p data/{raw,processed,external} notebooks scripts src tests docs results config touch README.md .gitignore environment.yml
undefined
mkdir -p data/{raw,processed,external} notebooks scripts src tests docs results config touch README.md .gitignore environment.yml
undefined

Cookiecutter Templates

Cookiecutter模板

Consider using cookiecutter for standardized project templates:
  • cookiecutter-data-science
    - Data science projects
  • cookiecutter-research
    - Research projects
  • Custom team templates
可使用cookiecutter创建标准化项目模板:
  • cookiecutter-data-science
    - 数据科学项目
  • cookiecutter-research
    - 研究项目
  • 自定义团队模板

References and Resources

参考资料