project-sharing
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseProject Sharing and Output Preparation
项目共享与输出准备
Expert guidance for preparing project outputs for sharing with collaborators, reviewers, or repositories. Creates organized packages at different sharing levels while preserving your working directory.
为与合作者、评审人员或代码库共享的项目输出提供专业指导。在保留工作目录的同时,创建不同共享层级的有序包。
When to Use This Skill
何时使用此技能
- Sharing analysis results with collaborators
- Preparing supplementary materials for publications
- Creating reproducible research packages
- Archiving completed projects
- Handoff to other researchers
- Submitting to data repositories
- 与合作者分享分析结果
- 为出版物准备补充材料
- 创建可复现的研究包
- 归档已完成的项目
- 交接给其他研究人员
- 向数据仓库提交内容
Core Principles
核心原则
- Work on copies - Never modify the working directory
- Choose appropriate level - Match sharing depth to audience needs
- Document everything - Include clear guides and metadata
- Clean before sharing - Remove debug code, clear outputs, anonymize if needed
- Make it reproducible - Include dependencies and instructions
- ⚠️ CRITICAL: After creating sharing folder, all future work happens in the main project directory, NOT in the sharing folder - Sharing folders are read-only snapshots
- 基于副本操作 - 绝不修改工作目录
- 选择合适层级 - 匹配受众需求的共享深度
- 全面文档化 - 包含清晰的指南和元数据
- 共享前清理 - 删除调试代码、清除输出、必要时匿名化
- 确保可复现性 - 包含依赖项和操作说明
- ⚠️ 关键提示:创建共享文件夹后,所有后续工作都在主项目目录中进行,而非共享文件夹 - 共享文件夹是只读快照
Three Sharing Levels
三个共享层级
Level 1: Summary Only
层级1:仅摘要
Purpose: Quick sharing for presentations, reports, or high-level review
What to include:
- PDF export of final notebook(s)
- Final data/results (CSV, Excel, figures) - optional
- Brief README
Use when:
- Sharing results with non-technical stakeholders
- Presentations or talks
- Quick review without reproduction needs
- Space/time constraints
Structure:
shared-summary/
├── README.md # Brief overview
├── analysis-YYYY-MM-DD.pdf # Notebook as PDF
└── results/
├── figures/
│ ├── fig1-main-result.png
│ └── fig2-comparison.png
└── tables/
└── summary-statistics.csv用途: 用于演示、报告或高层级评审的快速共享
包含内容:
- 最终Notebook的PDF导出文件
- 最终数据/结果(CSV、Excel、图表)- 可选
- 简短的README
适用场景:
- 与非技术利益相关者共享结果
- 演示或演讲
- 无需复现的快速评审
- 空间/时间受限
结构:
shared-summary/
├── README.md # 简要概述
├── analysis-YYYY-MM-DD.pdf # 导出为PDF的Notebook
└── results/
├── figures/
│ ├── fig1-main-result.png
│ └── fig2-comparison.png
└── tables/
└── summary-statistics.csvLevel 2: Reproducible
层级2:可复现
Purpose: Enable others to reproduce your analysis from processed data
What to include:
- Analysis notebooks (.ipynb) - cleaned
- Scripts for figure generation
- Processed/analysis-ready data
- Requirements file (requirements.txt or environment.yml)
- Detailed README with instructions
Use when:
- Sharing with collaborating researchers
- Peer review / manuscript supplementary materials
- Teaching or tutorials
- Standard collaboration needs
Structure:
shared-reproducible/
├── README.md # Setup and reproduction instructions
├── MANIFEST.md # File descriptions
├── environment.yml # Conda environment OR requirements.txt
├── notebooks/
│ ├── 01-data-processing.ipynb # Cleaned, outputs cleared
│ ├── 02-analysis.ipynb
│ └── 03-visualization.ipynb
├── scripts/
│ ├── generate_figures.py # Standalone scripts
│ └── utils.py
└── data/
├── processed/
│ ├── cleaned_data.csv
│ └── processed_results.tsv
└── README.md # Data provenance用途: 让他人能够基于处理后的数据复现你的分析
包含内容:
- 清理后的分析Notebook(.ipynb)
- 图表生成脚本
- 处理好的/可用于分析的数据
- 依赖项文件(requirements.txt或environment.yml)
- 包含操作说明的详细README
适用场景:
- 与合作研究人员共享
- 同行评审/手稿补充材料
- 教学或教程
- 标准协作需求
结构:
shared-reproducible/
├── README.md # 安装和复现说明
├── MANIFEST.md # 文件说明
├── environment.yml # Conda环境配置或requirements.txt
├── notebooks/
│ ├── 01-data-processing.ipynb # 已清理、输出已清除
│ ├── 02-analysis.ipynb
│ └── 03-visualization.ipynb
├── scripts/
│ ├── generate_figures.py # 独立脚本
│ └── utils.py
└── data/
├── processed/
│ ├── cleaned_data.csv
│ └── processed_results.tsv
└── README.md # 数据来源说明Level 3: Full Traceability
层级3:完全可追溯
Purpose: Complete transparency from raw data through all processing steps
What to include:
- Starting/raw data
- All processing scripts and notebooks
- All intermediate files
- Final results
- Complete documentation
- Full dependency specification
Use when:
- Archiving for future reference
- Regulatory compliance
- High-stakes reproducibility (clinical, policy)
- Data repository submission (Zenodo, Dryad, etc.)
- Complete project handoff
Structure:
shared-complete/
├── README.md # Complete project guide
├── MANIFEST.md # Comprehensive file listing
├── environment.yml
├── data/
│ ├── raw/ # Original, unmodified data
│ │ ├── sample_A_reads.fastq.gz
│ │ └── README.md # Data source, download date
│ ├── intermediate/ # Processing steps
│ │ ├── 01-filtered/
│ │ ├── 02-aligned/
│ │ └── README.md
│ └── processed/ # Final analysis-ready
│ └── final_dataset.csv
├── scripts/
│ ├── 01-download-data.sh
│ ├── 02-quality-control.py
│ ├── 03-filtering.py
│ ├── 04-analysis.py
│ └── utils/
├── notebooks/
│ ├── exploratory/ # Early exploration
│ └── final/ # Publication analyses
├── results/
│ ├── figures/
│ ├── tables/
│ └── supplementary/
└── documentation/
├── methods.md # Detailed methodology
├── changelog.md # Processing decisions
└── data-dictionary.md # Variable definitions用途: 实现从原始数据到所有处理步骤的完全透明
包含内容:
- 初始/原始数据
- 所有处理脚本和Notebook
- 所有中间文件
- 最终结果
- 完整文档
- 全面的依赖项说明
适用场景:
- 为未来参考进行归档
- 合规性要求
- 高风险可复现性场景(临床、政策)
- 向数据仓库提交(Zenodo、Dryad等)
- 完整项目交接
结构:
shared-complete/
├── README.md # 完整项目指南
├── MANIFEST.md # 全面文件列表
├── environment.yml
├── data/
│ ├── raw/ # 原始、未修改的数据
│ │ ├── sample_A_reads.fastq.gz
│ │ └── README.md # 数据源、下载日期
│ ├── intermediate/ # 处理步骤文件
│ │ ├── 01-filtered/
│ │ ├── 02-aligned/
│ │ └── README.md
│ └── processed/ # 最终可用于分析的数据
│ └── final_dataset.csv
├── scripts/
│ ├── 01-download-data.sh
│ ├── 02-quality-control.py
│ ├── 03-filtering.py
│ ├── 04-analysis.py
│ └── utils/
├── notebooks/
│ ├── exploratory/ # 早期探索性Notebook
│ └── final/ # 用于发表的分析Notebook
├── results/
│ ├── figures/
│ ├── tables/
│ └── supplementary/
└── documentation/
├── methods.md # 详细方法论
├── changelog.md # 处理决策记录
└── data-dictionary.md # 变量定义Preparation Workflow
准备工作流
Step 1: Ask User for Sharing Level
步骤1:询问用户所需的共享层级
Questions to determine level:
Which sharing level do you need?
1. Summary Only - PDF + final results (quick sharing)
2. Reproducible - Notebooks + scripts + data (standard sharing)
3. Full Traceability - Everything from raw data (archival/compliance)
Additional questions:
- Who is the audience? (colleagues, reviewers, public)
- Are there size constraints?
- Any sensitive data to handle?
- Timeline for sharing?用于确定层级的问题:
你需要哪个共享层级?
1. 仅摘要 - PDF + 最终结果(快速共享)
2. 可复现 - Notebook + 脚本 + 数据(标准共享)
3. 完全可追溯 - 从原始数据开始的所有内容(归档/合规)
附加问题:
- 受众是谁?(同事、评审人员、公众)
- 有大小限制吗?
- 是否有需要处理的敏感数据?
- 共享的时间线?Step 2: Identify Files to Include
步骤2:确定要包含的文件
For each level, identify:
Level 1 - Summary:
- Main analysis notebook(s)
- Key figures (publication-quality)
- Summary tables/statistics
- Optional: Final processed dataset
Level 2 - Reproducible:
- All analysis notebooks (not exploratory)
- Figure generation scripts
- Processed/cleaned data
- Environment specification
- Any utility functions/modules
Level 3 - Full:
- Raw data (or links if too large)
- All processing scripts
- All notebooks (including exploratory)
- All intermediate files
- Complete documentation
针对每个层级,确定:
层级1 - 摘要:
- 主要分析Notebook
- 关键图表(达到出版物质量)
- 摘要表格/统计数据
- 可选:最终处理后的数据集
层级2 - 可复现:
- 所有分析Notebook(不包括探索性的)
- 图表生成脚本
- 处理后的/清理好的数据
- 环境配置文件
- 任何工具函数/模块
层级3 - 完全可追溯:
- 原始数据(如果过大则提供链接)
- 所有处理脚本
- 所有Notebook(包括探索性的)
- 所有中间文件
- 完整文档
Step 3: Create Sharing Directory
步骤3:创建共享目录
bash
undefinedbash
undefinedCreate dated directory
创建带日期的目录
SHARE_DIR="shared-$(date +%Y%m%d)-[level]"
mkdir -p "$SHARE_DIR"
SHARE_DIR="shared-$(date +%Y%m%d)-[level]"
mkdir -p "$SHARE_DIR"
Create subdirectories based on level
根据层级创建子目录
... appropriate structure from above
... 上述对应的目录结构
undefinedundefinedStep 4: Copy and Clean Files
步骤4:复制并清理文件
For notebooks (.ipynb):
python
import nbformat
from nbconvert.preprocessors import ClearOutputPreprocessor
def clean_notebook(input_path, output_path):
"""Clean notebook: clear outputs, remove debug cells."""
# Read notebook
with open(input_path, 'r') as f:
nb = nbformat.read(f, as_version=4)
# Clear outputs
clear_output = ClearOutputPreprocessor()
nb, _ = clear_output.preprocess(nb, {})
# Remove cells tagged as 'debug' or 'remove'
nb.cells = [cell for cell in nb.cells
if 'debug' not in cell.metadata.get('tags', [])
and 'remove' not in cell.metadata.get('tags', [])]
# Write cleaned notebook
with open(output_path, 'w') as f:
nbformat.write(nb, f)For data files:
- Copy as-is for small files
- Consider compression for large files
- Check for sensitive information
For scripts:
- Remove debugging code
- Add docstrings if missing
- Ensure paths are relative
针对Notebook(.ipynb):
python
import nbformat
from nbconvert.preprocessors import ClearOutputPreprocessor
def clean_notebook(input_path, output_path):
"""清理Notebook:清除输出,删除调试单元格。"""
# 读取Notebook
with open(input_path, 'r') as f:
nb = nbformat.read(f, as_version=4)
# 清除输出
clear_output = ClearOutputPreprocessor()
nb, _ = clear_output.preprocess(nb, {})
# 删除标记为'debug'或'remove'的单元格
nb.cells = [cell for cell in nb.cells
if 'debug' not in cell.metadata.get('tags', [])
and 'remove' not in cell.metadata.get('tags', [])]
# 写入清理后的Notebook
with open(output_path, 'w') as f:
nbformat.write(nb, f)针对数据文件:
- 小文件直接复制
- 大文件考虑压缩
- 检查是否包含敏感信息
针对脚本:
- 删除调试代码
- 补充缺失的文档字符串
- 确保路径为相对路径
Step 5: Generate Documentation
步骤5:生成文档
README.md Template
README.md模板
markdown
undefinedmarkdown
undefinedProject: [Project Name]
项目:[项目名称]
Date: YYYY-MM-DD
Author: [Your Name]
Sharing Level: [Summary/Reproducible/Full]
日期: YYYY-MM-DD
作者: [你的姓名]
共享层级: [摘要/可复现/完全可追溯]
Overview
概述
Brief description of the project and analysis.
项目和分析的简要描述。
Contents
内容
See MANIFEST.md for detailed file descriptions.
详细文件说明请查看MANIFEST.md。
Requirements
环境要求
[For Reproducible/Full levels]
- Python 3.X
- See environment.yml for dependencies
[针对可复现/完全可追溯层级]
- Python 3.X
- 依赖项请查看environment.yml
Setup
安装步骤
```bash
```bash
Create environment
创建环境
conda env create -f environment.yml
conda activate project-name
```
conda env create -f environment.yml
conda activate project-name
```
Reproduction Steps
复现步骤
[For Reproducible/Full levels]
-
[Description of first step] ```bash jupyter notebook notebooks/01-analysis.ipynb ```
-
[Description of second step]
[针对可复现/完全可追溯层级]
-
[第一步说明] ```bash jupyter notebook notebooks/01-analysis.ipynb ```
-
[第二步说明]
Data Sources
数据源
[For Full level]
- Dataset A: [Source, download date, version]
- Dataset B: [Source, download date, version]
[针对完全可追溯层级]
- 数据集A:[来源、下载日期、版本]
- 数据集B:[来源、下载日期、版本]
Contact
联系方式
[Your email or preferred contact]
[你的邮箱或首选联系方式]
License
许可证
[If applicable - e.g., CC BY 4.0, MIT]
undefined[如适用 - 例如 CC BY 4.0、MIT]
undefinedMANIFEST.md Template
MANIFEST.md模板
markdown
undefinedmarkdown
undefinedFile Manifest
文件清单
Generated: YYYY-MM-DD
生成日期:YYYY-MM-DD
Directory Structure
目录结构
```
shared-YYYYMMDD/
├── README.md - Project overview and setup
├── MANIFEST.md - This file
[... complete tree ...]
```
```
shared-YYYYMMDD/
├── README.md - 项目概述和安装说明
├── MANIFEST.md - 本文件
[... 完整目录树 ...]
```
File Descriptions
文件说明
Notebooks
Notebook
- `notebooks/01-data-processing.ipynb` - Initial data loading and cleaning
- `notebooks/02-analysis.ipynb` - Main statistical analysis
- `notebooks/03-visualization.ipynb` - Figure generation for publication
- `notebooks/01-data-processing.ipynb` - 初始数据加载和清理
- `notebooks/02-analysis.ipynb` - 主要统计分析
- `notebooks/03-visualization.ipynb` - 用于发表的图表生成
Data
数据
- `data/processed/cleaned_data.csv` - Quality-controlled dataset (N=XXX samples)
- Columns: [list key columns]
- Missing values handled by [method]
- `data/processed/cleaned_data.csv` - 质量控制后的数据集(样本量N=XXX)
- 列:[关键列列表]
- 缺失值处理方式:[方法]
Scripts
脚本
- `scripts/generate_figures.py` - Automated figure generation
- Usage: `python generate_figures.py --input data/processed/cleaned_data.csv`
- `scripts/generate_figures.py` - 自动化图表生成
- 使用方法:`python generate_figures.py --input data/processed/cleaned_data.csv`
Results
结果
- `results/figures/fig1-main.png` - Main result showing [description]
- `results/tables/summary_stats.csv` - Descriptive statistics
[Continue for all files...]
undefined- `results/figures/fig1-main.png` - 主要结果,展示[描述]
- `results/tables/summary_stats.csv` - 描述性统计数据
[所有文件的说明...]
undefinedStep 6: Handle Sensitive Data
步骤6:处理敏感数据
Check for sensitive information:
- Personal identifiable information (PII)
- Access credentials (API keys, passwords)
- Proprietary data
- Institutional data with sharing restrictions
- Patient/subject identifiers
Strategies:
- Anonymize - Remove or hash identifiers
- Exclude - Don't include sensitive files
- Aggregate - Share summary statistics only
- Document restrictions - Note what's excluded and why
Example anonymization:
python
import hashlib
def anonymize_ids(df, id_column='subject_id'):
"""Replace IDs with hashed values."""
df[id_column] = df[id_column].apply(
lambda x: hashlib.sha256(str(x).encode()).hexdigest()[:8]
)
return df检查敏感信息:
- 个人可识别信息(PII)
- 访问凭证(API密钥、密码)
- 专有数据
- 有共享限制的机构数据
- 患者/受试者标识符
处理策略:
- 匿名化 - 删除或哈希处理标识符
- 排除 - 不包含敏感文件
- 聚合 - 仅共享摘要统计数据
- 记录限制 - 说明排除的内容及原因
匿名化示例:
python
import hashlib
def anonymize_ids(df, id_column='subject_id'):
"""用哈希值替换ID。"""
df[id_column] = df[id_column].apply(
lambda x: hashlib.sha256(str(x).encode()).hexdigest()[:8]
)
return dfStep 7: Package and Compress
步骤7:打包与压缩
For smaller packages (<100MB):
bash
undefined针对较小的包(<100MB):
bash
undefinedCreate zip archive
创建zip归档
zip -r shared-YYYYMMDD.zip shared-YYYYMMDD/
**For larger packages:**
```bashzip -r shared-YYYYMMDD.zip shared-YYYYMMDD/
**针对较大的包:**
```bashCreate tar.gz (better compression)
创建tar.gz(压缩效果更好)
tar -czf shared-YYYYMMDD.tar.gz shared-YYYYMMDD/
tar -czf shared-YYYYMMDD.tar.gz shared-YYYYMMDD/
Or split into parts if very large
如果非常大,可分块
tar -czf - shared-YYYYMMDD/ | split -b 1G - shared-YYYYMMDD.tar.gz.part
**Document package contents:**
- Total size
- Number of files
- Compression method
- How to extracttar -czf - shared-YYYYMMDD/ | split -b 1G - shared-YYYYMMDD.tar.gz.part
**记录包内容:**
- 总大小
- 文件数量
- 压缩方式
- 解压方法Step 8: Return to Working Directory
步骤8:返回工作目录
⚠️ IMPORTANT: After creating the sharing package, always work in the main project directory.
The sharing folder is a snapshot for distribution only. Any future development, analysis, or modifications should happen in your original working directory, not in the folder.
shared-*/Claude should:
- Change directory back to main project: (if needed)
cd .. - Confirm working directory:
pwd - Continue all work in the original project location
- Treat sharing folders as read-only archives
Example:
bash
undefined⚠️ 重要提示:创建共享包后,始终在主项目目录中工作。
共享文件夹仅用于分发的快照。任何后续开发、分析或修改都应在原始工作目录中进行,而非文件夹中。
shared-*/Claude应执行:
- 切换回主项目目录:(如有需要)
cd .. - 确认当前目录:
pwd - 所有后续工作都在原始项目位置进行
- 将共享文件夹视为只读归档
示例:
bash
undefinedAfter creating sharing package
创建共享包后
cd /path/to/main/project # Return to working directory
pwd # Verify location
cd /path/to/main/project # 返回工作目录
pwd # 验证位置
Continue work here, NOT in shared-YYYYMMDD/
在此处继续工作,而非shared-YYYYMMDD/
---
---Best Practices
最佳实践
Notebook Cleaning
Notebook清理
Before sharing notebooks:
-
Clear all outputsbash
jupyter nbconvert --clear-output --inplace notebook.ipynb -
Remove debug cells
- Tag cells for removal: Cell → Cell Tags → add "remove"
- Filter during copy
-
Add markdown explanations
- Ensure each code cell has context
- Add section headers
- Document assumptions
-
Check cell execution order
- Run "Restart & Run All" to verify
- Fix any out-of-order dependencies
-
Remove absolute pathspython
# ❌ Bad data = pd.read_csv('/Users/yourname/project/data.csv') # ✅ Good data = pd.read_csv('../data/data.csv') # or from pathlib import Path data_dir = Path(__file__).parent / 'data'
共享Notebook前:
-
清除所有输出bash
jupyter nbconvert --clear-output --inplace notebook.ipynb -
删除调试单元格
- 标记要删除的单元格:Cell → Cell Tags → 添加"remove"
- 复制时过滤这些单元格
-
添加Markdown说明
- 确保每个代码单元格都有上下文
- 添加章节标题
- 记录假设条件
-
检查单元格执行顺序
- 运行“重启并全部运行”以验证
- 修复任何顺序依赖问题
-
删除绝对路径python
# ❌ 不良示例 data = pd.read_csv('/Users/yourname/project/data.csv') # ✅ 良好示例 data = pd.read_csv('../data/data.csv') # 或 from pathlib import Path data_dir = Path(__file__).parent / 'data'
File Organization
文件组织
Naming conventions for shared files:
- Use descriptive names: not
telomere_analysis_results.csvresults.csv - Include dates for time-sensitive data:
data_2024-01-15.csv - Version if applicable:
analysis_v2.ipynb - No spaces: use or
-_
Size considerations:
- Document large files in README
- Consider hosting large data separately (institutional storage, Zenodo)
- Provide download links instead of including in package
- Use for large file tracking if using Git
.gitattributes
共享文件的命名规范:
- 使用描述性名称:而非
telomere_analysis_results.csvresults.csv - 对时间敏感的数据添加日期:
data_2024-01-15.csv - 适用时添加版本:
analysis_v2.ipynb - 不使用空格:使用或
-_
大小注意事项:
- 在README中记录大文件
- 考虑单独托管大文件(机构存储、Zenodo)
- 提供下载链接而非包含在包中
- 如果使用Git,用跟踪大文件
.gitattributes
Documentation Requirements
文档要求
Minimum documentation for each level:
Level 1 - Summary:
- What the results show
- Key findings
- Date and author
Level 2 - Reproducible:
- Setup instructions
- How to run the analysis
- Software dependencies
- Expected runtime
- Data source information
Level 3 - Full:
- Complete methodology
- All data sources with versions
- Processing decisions and rationale
- Known issues or limitations
- Contact information
每个层级的最低文档要求:
层级1 - 摘要:
- 结果展示内容
- 关键发现
- 日期和作者
层级2 - 可复现:
- 安装说明
- 分析运行方法
- 软件依赖项
- 预期运行时间
- 数据源信息
层级3 - 完全可追溯:
- 完整方法论
- 所有带版本的数据源
- 处理决策及理由
- 已知问题或限制
- 联系方式
Dependency Management
依赖项管理
Create requirements file:
For pip:
bash
undefined创建依赖项文件:
针对pip:
bash
undefinedFrom active environment
从活跃环境导出
pip freeze > requirements.txt
pip freeze > requirements.txt
Or manually curated (better)
或手动整理(推荐)
cat > requirements.txt << EOF
pandas>=1.5.0
numpy>=1.23.0
matplotlib>=3.6.0
scipy>=1.9.0
EOF
**For conda:**
```bashcat > requirements.txt << EOF
pandas>=1.5.0
numpy>=1.23.0
matplotlib>=3.6.0
scipy>=1.9.0
EOF
**针对conda:**
```bashExport current environment
导出当前环境
conda env export > environment.yml
conda env export > environment.yml
Or minimal (recommended)
或最小化导出(推荐)
conda env export --from-history > environment.yml
conda env export --from-history > environment.yml
Then edit to remove build-specific details
然后编辑以移除构建相关细节
---
---Common Scenarios
常见场景
Scenario 1: Sharing with Lab Collaborators
场景1:与实验室合作者共享
Level: Reproducible
Include:
- Cleaned analysis notebooks
- Processed data
- Figure generation scripts
- environment.yml
- README with reproduction steps
Don't include:
- Exploratory notebooks
- Failed analysis attempts
- Debug outputs
- Personal notes
层级: 可复现
包含内容:
- 清理后的分析Notebook
- 处理后的数据
- 图表生成脚本
- environment.yml
- 包含复现步骤的README
不包含内容:
- 探索性Notebook
- 失败的分析尝试
- 调试输出
- 个人笔记
Scenario 2: Manuscript Supplementary Material
场景2:手稿补充材料
Level: Reproducible or Full (depending on journal)
Include:
- All notebooks used for figures in paper
- Scripts for each figure panel
- Processed data (or instructions to obtain)
- Complete environment specification
- Detailed methods document
Best practices:
- Number notebooks to match paper sections
- Export key figures in publication formats (PDF, high-res PNG)
- Include data dictionary for all variables
- Test reproduction on clean environment
层级: 可复现或完全可追溯(取决于期刊要求)
包含内容:
- 论文中图表所用的所有Notebook
- 每个图板的脚本
- 处理后的数据(或获取说明)
- 完整的环境配置
- 详细的方法文档
最佳实践:
- 按论文章节编号Notebook
- 以出版物格式导出关键图表(PDF、高分辨率PNG)
- 包含所有变量的数据字典
- 在干净环境中测试复现
Scenario 3: Project Archival
场景3:项目归档
Level: Full Traceability
Include:
- Complete data pipeline from raw to processed
- All versions of analysis
- Meeting notes or decision logs
- External tool versions
- System information
Organization tips:
- Use dates in directory names
- Keep chronological changelog
- Document all external dependencies
- Include contact info for questions
层级: 完全可追溯
包含内容:
- 从原始到处理的完整数据流水线
- 所有版本的分析
- 会议记录或决策日志
- 外部工具版本
- 系统信息
组织技巧:
- 在目录名称中使用日期
- 按时间顺序维护变更日志
- 记录所有外部依赖项
- 包含问题咨询的联系方式
Scenario 4: Data Repository Submission (Zenodo, Figshare)
场景4:向数据仓库提交(Zenodo、Figshare)
Level: Full Traceability
Additional considerations:
- Add LICENSE file (CC BY 4.0, MIT, etc.)
- Include CITATION.cff or CITATION.txt
- Comprehensive metadata
- README with DOI/reference instructions
- Consider maximum file sizes
- Review repository-specific guidelines
层级: 完全可追溯
附加注意事项:
- 添加LICENSE文件(CC BY 4.0、MIT等)
- 包含CITATION.cff或CITATION.txt
- 全面的元数据
- 包含DOI/引用说明的README
- 考虑最大文件大小限制
- 查看仓库特定指南
Quality Checklist
质量检查清单
Before finalizing the sharing package:
最终确定共享包前:
File Quality
文件质量
- All notebooks run without errors
- Notebook outputs cleared
- No absolute paths in code
- No hardcoded credentials or API keys
- File sizes documented
- Large files compressed or linked
- 所有Notebook运行无错误
- Notebook输出已清除
- 代码中无绝对路径
- 无硬编码凭证或API密钥
- 文件大小已记录
- 大文件已压缩或提供链接
Documentation
文档
- README explains setup and usage
- MANIFEST describes all files
- Data sources documented
- Dependencies specified
- Contact information included
- License specified (if applicable)
- README说明安装和使用方法
- MANIFEST描述所有文件
- 数据源已记录
- 依赖项已明确
- 包含联系方式
- 已指定许可证(如适用)
Reproducibility
可复现性
- Requirements file tested in clean environment
- All data accessible (included or linked)
- Scripts run in documented order
- Expected outputs match actual outputs
- Processing time documented
- 依赖项文件已在干净环境中测试
- 所有数据可访问(已包含或提供链接)
- 脚本按记录的顺序运行
- 预期输出与实际输出匹配
- 处理时间已记录
Privacy & Sensitivity
隐私与敏感信息
- No sensitive data included
- Identifiers anonymized if needed
- Institutional policies checked
- Collaborator permissions obtained
- 未包含敏感数据
- 必要时已匿名化标识符
- 已检查机构政策
- 已获得合作者许可
Organization
组织性
- Clear directory structure
- Consistent naming conventions
- Files logically grouped
- No duplicate files
- No unnecessary files (cache, .DS_Store, etc.)
- 清晰的目录结构
- 一致的命名规范
- 文件逻辑分组
- 无重复文件
- 无不必要文件(缓存、.DS_Store等)
Integration with Other Skills
与其他技能的集成
Works well with:
- folder-organization - Ensures source project is well-organized before sharing
- jupyter-notebook-analysis - Creates notebooks that are share-ready
- managing-environments - Documents dependencies properly
Before using this skill:
- Organize working directory (folder-organization)
- Finalize analysis (jupyter-notebook-analysis)
- Document environment (managing-environments)
After using this skill:
- Test package in clean environment
- Share via appropriate channel (email, repository, cloud storage)
- Keep archived copy for reference
适配良好的技能:
- folder-organization - 确保源项目在共享前已整理有序
- jupyter-notebook-analysis - 创建可直接共享的Notebook
- managing-environments - 正确记录依赖项
使用此技能前:
- 整理工作目录(folder-organization)
- 完成分析(jupyter-notebook-analysis)
- 记录环境信息(managing-environments)
使用此技能后:
- 在干净环境中测试包
- 通过合适渠道共享(邮件、代码库、云存储)
- 保留归档副本以供参考
Example Scripts
示例脚本
Create Sharing Package Script
创建共享包脚本
python
#!/usr/bin/env python3
"""Create sharing package for project."""
import shutil
from pathlib import Path
from datetime import date
import nbformat
from nbconvert.preprocessors import ClearOutputPreprocessor
def create_sharing_package(level='reproducible', output_dir=None):
"""
Create sharing package.
Args:
level: 'summary', 'reproducible', or 'full'
output_dir: Output directory name (auto-generated if None)
"""
# Create output directory
if output_dir is None:
output_dir = f"shared-{date.today():%Y%m%d}-{level}"
share_path = Path(output_dir)
share_path.mkdir(exist_ok=True)
print(f"Creating {level} sharing package in {share_path}")
# Create structure based on level
if level == 'summary':
create_summary_package(share_path)
elif level == 'reproducible':
create_reproducible_package(share_path)
elif level == 'full':
create_full_package(share_path)
print(f"✓ Package created: {share_path}")
print(f" Review and compress: tar -czf {share_path}.tar.gz {share_path}")
def clean_notebook(input_path, output_path):
"""Clean notebook outputs and debug cells."""
with open(input_path) as f:
nb = nbformat.read(f, as_version=4)
# Clear outputs
clear = ClearOutputPreprocessor()
nb, _ = clear.preprocess(nb, {})
# Remove debug cells
nb.cells = [c for c in nb.cells
if 'debug' not in c.metadata.get('tags', [])]
with open(output_path, 'w') as f:
nbformat.write(nb, f)python
#!/usr/bin/env python3
"""为项目创建共享包。"""
import shutil
from pathlib import Path
from datetime import date
import nbformat
from nbconvert.preprocessors import ClearOutputPreprocessor
def create_sharing_package(level='reproducible', output_dir=None):
"""
创建共享包。
参数:
level: 'summary', 'reproducible', 或 'full'
output_dir: 输出目录名称(未指定则自动生成)
"""
# 创建输出目录
if output_dir is None:
output_dir = f"shared-{date.today():%Y%m%d}-{level}"
share_path = Path(output_dir)
share_path.mkdir(exist_ok=True)
print(f"在{share_path}中创建{level}级共享包")
# 根据层级创建结构
if level == 'summary':
create_summary_package(share_path)
elif level == 'reproducible':
create_reproducible_package(share_path)
elif level == 'full':
create_full_package(share_path)
print(f"✓ 包已创建: {share_path}")
print(f" 请检查并压缩: tar -czf {share_path}.tar.gz {share_path}")
def clean_notebook(input_path, output_path):
"""清理Notebook输出和调试单元格。"""
with open(input_path) as f:
nb = nbformat.read(f, as_version=4)
# 清除输出
clear = ClearOutputPreprocessor()
nb, _ = clear.preprocess(nb, {})
# 删除调试单元格
nb.cells = [c for c in nb.cells
if 'debug' not in c.metadata.get('tags', [])]
with open(output_path, 'w') as f:
nbformat.write(nb, f)... implement level-specific functions ...
... 实现各层级的具体函数 ...
if name == 'main':
import sys
level = sys.argv[1] if len(sys.argv) > 1 else 'reproducible'
create_sharing_package(level)
---if name == 'main':
import sys
level = sys.argv[1] if len(sys.argv) > 1 else 'reproducible'
create_sharing_package(level)
---Summary
总结
Key principles for project sharing:
- 🎯 Choose the right level - Match sharing depth to audience needs
- 📋 Copy, don't move - Preserve your working directory
- 🧹 Clean thoroughly - Remove debug code, clear outputs
- 📝 Document everything - README + MANIFEST minimum
- 🔒 Check sensitivity - Anonymize or exclude as needed
- ✅ Test before sharing - Run in clean environment
- 📦 Package properly - Compress and document contents
- ⚠️ Work in main directory - After creating sharing package, ALL future work happens in the original project directory, NOT in the sharing folder
Remember: Good sharing practices benefit both collaborators and your future self!
项目共享的关键原则:
- 🎯 选择合适层级 - 匹配受众需求的共享深度
- 📋 基于副本操作 - 保留工作目录
- 🧹 彻底清理 - 删除调试代码、清除输出
- 📝 全面文档化 - 至少包含README + MANIFEST
- 🔒 检查敏感信息 - 必要时匿名化或排除
- ✅ 共享前测试 - 在干净环境中运行
- 📦 正确打包 - 压缩并记录内容
- ⚠️ 在主目录工作 - 创建共享包后,所有后续工作都在原始项目目录中进行,而非共享文件夹
记住: 良好的共享实践对合作者和未来的你都有好处!
⚠️ Critical Reminder for Claude
⚠️ 给Claude的关键提醒
After creating any sharing package:
- Always return to the main project directory
- Never work in directories - These are read-only snapshots
shared-*/ - All future edits, analysis, and development happen in the original working directory
- Sharing folders are for distribution only, not active development
If the user asks to modify files, always check the current directory and ensure you're working in the main project location, not in a sharing package.
创建任何共享包后:
- 始终返回主项目目录
- 绝不在目录中工作 - 这些是只读快照
shared-*/ - 所有后续编辑、分析和开发都在原始工作目录中进行
- 共享文件夹仅用于分发,而非活跃开发
如果用户要求修改文件,始终检查当前目录,确保在主项目位置工作,而非共享包中。