project-sharing

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Project Sharing and Output Preparation

项目共享与输出准备

Expert guidance for preparing project outputs for sharing with collaborators, reviewers, or repositories. Creates organized packages at different sharing levels while preserving your working directory.

为与合作者、评审人员或代码库共享的项目输出提供专业指导。在保留工作目录的同时，创建不同共享层级的有序包。

When to Use This Skill

何时使用此技能

Sharing analysis results with collaborators
Preparing supplementary materials for publications
Creating reproducible research packages
Archiving completed projects
Handoff to other researchers
Submitting to data repositories

与合作者分享分析结果
为出版物准备补充材料
创建可复现的研究包
归档已完成的项目
交接给其他研究人员
向数据仓库提交内容

Core Principles

核心原则

Work on copies - Never modify the working directory
Choose appropriate level - Match sharing depth to audience needs
Document everything - Include clear guides and metadata
Clean before sharing - Remove debug code, clear outputs, anonymize if needed
Make it reproducible - Include dependencies and instructions
⚠️ CRITICAL: After creating sharing folder, all future work happens in the main project directory, NOT in the sharing folder - Sharing folders are read-only snapshots

基于副本操作 - 绝不修改工作目录
选择合适层级 - 匹配受众需求的共享深度
全面文档化 - 包含清晰的指南和元数据
共享前清理 - 删除调试代码、清除输出、必要时匿名化
确保可复现性 - 包含依赖项和操作说明
⚠️ 关键提示：创建共享文件夹后，所有后续工作都在主项目目录中进行，而非共享文件夹 - 共享文件夹是只读快照

Three Sharing Levels

三个共享层级

Level 1: Summary Only

层级1：仅摘要

Purpose: Quick sharing for presentations, reports, or high-level review

What to include:

PDF export of final notebook(s)
Final data/results (CSV, Excel, figures) - optional
Brief README

Use when:

Sharing results with non-technical stakeholders
Presentations or talks
Quick review without reproduction needs
Space/time constraints

Structure:

shared-summary/
├── README.md                          # Brief overview
├── analysis-YYYY-MM-DD.pdf           # Notebook as PDF
└── results/
    ├── figures/
    │   ├── fig1-main-result.png
    │   └── fig2-comparison.png
    └── tables/
        └── summary-statistics.csv

用途： 用于演示、报告或高层级评审的快速共享

包含内容：

最终Notebook的PDF导出文件
最终数据/结果（CSV、Excel、图表）- 可选
简短的README

适用场景：

与非技术利益相关者共享结果
演示或演讲
无需复现的快速评审
空间/时间受限

结构：

shared-summary/
├── README.md                          # 简要概述
├── analysis-YYYY-MM-DD.pdf           # 导出为PDF的Notebook
└── results/
    ├── figures/
    │   ├── fig1-main-result.png
    │   └── fig2-comparison.png
    └── tables/
        └── summary-statistics.csv

Level 2: Reproducible

层级2：可复现

Purpose: Enable others to reproduce your analysis from processed data

What to include:

Analysis notebooks (.ipynb) - cleaned
Scripts for figure generation
Processed/analysis-ready data
Requirements file (requirements.txt or environment.yml)
Detailed README with instructions

Use when:

Sharing with collaborating researchers
Peer review / manuscript supplementary materials
Teaching or tutorials
Standard collaboration needs

Structure:

shared-reproducible/
├── README.md                          # Setup and reproduction instructions
├── MANIFEST.md                        # File descriptions
├── environment.yml                    # Conda environment OR requirements.txt
├── notebooks/
│   ├── 01-data-processing.ipynb      # Cleaned, outputs cleared
│   ├── 02-analysis.ipynb
│   └── 03-visualization.ipynb
├── scripts/
│   ├── generate_figures.py           # Standalone scripts
│   └── utils.py
└── data/
    ├── processed/
    │   ├── cleaned_data.csv
    │   └── processed_results.tsv
    └── README.md                      # Data provenance

用途： 让他人能够基于处理后的数据复现你的分析

包含内容：

清理后的分析Notebook（.ipynb）
图表生成脚本
处理好的/可用于分析的数据
依赖项文件（requirements.txt或environment.yml）
包含操作说明的详细README

适用场景：

与合作研究人员共享
同行评审/手稿补充材料
教学或教程
标准协作需求

结构：

shared-reproducible/
├── README.md                          # 安装和复现说明
├── MANIFEST.md                        # 文件说明
├── environment.yml                    # Conda环境配置或requirements.txt
├── notebooks/
│   ├── 01-data-processing.ipynb      # 已清理、输出已清除
│   ├── 02-analysis.ipynb
│   └── 03-visualization.ipynb
├── scripts/
│   ├── generate_figures.py           # 独立脚本
│   └── utils.py
└── data/
    ├── processed/
    │   ├── cleaned_data.csv
    │   └── processed_results.tsv
    └── README.md                      # 数据来源说明

Level 3: Full Traceability

层级3：完全可追溯

Purpose: Complete transparency from raw data through all processing steps

What to include:

Starting/raw data
All processing scripts and notebooks
All intermediate files
Final results
Complete documentation
Full dependency specification

Use when:

Archiving for future reference
Regulatory compliance
High-stakes reproducibility (clinical, policy)
Data repository submission (Zenodo, Dryad, etc.)
Complete project handoff

Structure:

shared-complete/
├── README.md                          # Complete project guide
├── MANIFEST.md                        # Comprehensive file listing
├── environment.yml
├── data/
│   ├── raw/                          # Original, unmodified data
│   │   ├── sample_A_reads.fastq.gz
│   │   └── README.md                 # Data source, download date
│   ├── intermediate/                 # Processing steps
│   │   ├── 01-filtered/
│   │   ├── 02-aligned/
│   │   └── README.md
│   └── processed/                    # Final analysis-ready
│       └── final_dataset.csv
├── scripts/
│   ├── 01-download-data.sh
│   ├── 02-quality-control.py
│   ├── 03-filtering.py
│   ├── 04-analysis.py
│   └── utils/
├── notebooks/
│   ├── exploratory/                  # Early exploration
│   └── final/                        # Publication analyses
├── results/
│   ├── figures/
│   ├── tables/
│   └── supplementary/
└── documentation/
    ├── methods.md                    # Detailed methodology
    ├── changelog.md                  # Processing decisions
    └── data-dictionary.md            # Variable definitions

用途： 实现从原始数据到所有处理步骤的完全透明

包含内容：

初始/原始数据
所有处理脚本和Notebook
所有中间文件
最终结果
完整文档
全面的依赖项说明

适用场景：

为未来参考进行归档
合规性要求
高风险可复现性场景（临床、政策）
向数据仓库提交（Zenodo、Dryad等）
完整项目交接

结构：

shared-complete/
├── README.md                          # 完整项目指南
├── MANIFEST.md                        # 全面文件列表
├── environment.yml
├── data/
│   ├── raw/                          # 原始、未修改的数据
│   │   ├── sample_A_reads.fastq.gz
│   │   └── README.md                 # 数据源、下载日期
│   ├── intermediate/                 # 处理步骤文件
│   │   ├── 01-filtered/
│   │   ├── 02-aligned/
│   │   └── README.md
│   └── processed/                    # 最终可用于分析的数据
│       └── final_dataset.csv
├── scripts/
│   ├── 01-download-data.sh
│   ├── 02-quality-control.py
│   ├── 03-filtering.py
│   ├── 04-analysis.py
│   └── utils/
├── notebooks/
│   ├── exploratory/                  # 早期探索性Notebook
│   └── final/                        # 用于发表的分析Notebook
├── results/
│   ├── figures/
│   ├── tables/
│   └── supplementary/
└── documentation/
    ├── methods.md                    # 详细方法论
    ├── changelog.md                  # 处理决策记录
    └── data-dictionary.md            # 变量定义

Preparation Workflow

准备工作流

Step 1: Ask User for Sharing Level

步骤1：询问用户所需的共享层级

Questions to determine level:

Which sharing level do you need?

1. Summary Only - PDF + final results (quick sharing)
2. Reproducible - Notebooks + scripts + data (standard sharing)
3. Full Traceability - Everything from raw data (archival/compliance)

Additional questions:
- Who is the audience? (colleagues, reviewers, public)
- Are there size constraints?
- Any sensitive data to handle?
- Timeline for sharing?

用于确定层级的问题：

你需要哪个共享层级？

1. 仅摘要 - PDF + 最终结果（快速共享）
2. 可复现 - Notebook + 脚本 + 数据（标准共享）
3. 完全可追溯 - 从原始数据开始的所有内容（归档/合规）

附加问题：
- 受众是谁？（同事、评审人员、公众）
- 有大小限制吗？
- 是否有需要处理的敏感数据？
- 共享的时间线？

Step 2: Identify Files to Include

步骤2：确定要包含的文件

For each level, identify:

Level 1 - Summary:

Main analysis notebook(s)
Key figures (publication-quality)
Summary tables/statistics
Optional: Final processed dataset

Level 2 - Reproducible:

All analysis notebooks (not exploratory)
Figure generation scripts
Processed/cleaned data
Environment specification
Any utility functions/modules

Level 3 - Full:

Raw data (or links if too large)
All processing scripts
All notebooks (including exploratory)
All intermediate files
Complete documentation

针对每个层级，确定：

层级1 - 摘要：

主要分析Notebook
关键图表（达到出版物质量）
摘要表格/统计数据
可选：最终处理后的数据集

层级2 - 可复现：

所有分析Notebook（不包括探索性的）
图表生成脚本
处理后的/清理好的数据
环境配置文件
任何工具函数/模块

层级3 - 完全可追溯：

原始数据（如果过大则提供链接）
所有处理脚本
所有Notebook（包括探索性的）
所有中间文件
完整文档

Step 3: Create Sharing Directory

步骤3：创建共享目录

bash

undefined

bash

undefined

Create dated directory

创建带日期的目录

SHARE_DIR="shared-$(date +%Y%m%d)-[level]" mkdir -p "$SHARE_DIR"

Create subdirectories based on level

根据层级创建子目录

... appropriate structure from above

... 上述对应的目录结构

undefined

undefined

Step 4: Copy and Clean Files

步骤4：复制并清理文件

For notebooks (.ipynb):

python

import nbformat
from nbconvert.preprocessors import ClearOutputPreprocessor

def clean_notebook(input_path, output_path):
    """Clean notebook: clear outputs, remove debug cells."""

    # Read notebook
    with open(input_path, 'r') as f:
        nb = nbformat.read(f, as_version=4)

    # Clear outputs
    clear_output = ClearOutputPreprocessor()
    nb, _ = clear_output.preprocess(nb, {})

    # Remove cells tagged as 'debug' or 'remove'
    nb.cells = [cell for cell in nb.cells
                if 'debug' not in cell.metadata.get('tags', [])
                and 'remove' not in cell.metadata.get('tags', [])]

    # Write cleaned notebook
    with open(output_path, 'w') as f:
        nbformat.write(nb, f)

For data files:

Copy as-is for small files
Consider compression for large files
Check for sensitive information

For scripts:

Remove debugging code
Add docstrings if missing
Ensure paths are relative

针对Notebook（.ipynb）：

python

import nbformat
from nbconvert.preprocessors import ClearOutputPreprocessor

def clean_notebook(input_path, output_path):
    """清理Notebook：清除输出，删除调试单元格。"""

    # 读取Notebook
    with open(input_path, 'r') as f:
        nb = nbformat.read(f, as_version=4)

    # 清除输出
    clear_output = ClearOutputPreprocessor()
    nb, _ = clear_output.preprocess(nb, {})

    # 删除标记为'debug'或'remove'的单元格
    nb.cells = [cell for cell in nb.cells
                if 'debug' not in cell.metadata.get('tags', [])
                and 'remove' not in cell.metadata.get('tags', [])]

    # 写入清理后的Notebook
    with open(output_path, 'w') as f:
        nbformat.write(nb, f)

针对数据文件：

小文件直接复制
大文件考虑压缩
检查是否包含敏感信息

针对脚本：

删除调试代码
补充缺失的文档字符串
确保路径为相对路径

Step 5: Generate Documentation

步骤5：生成文档

README.md Template

README.md模板

markdown

undefined

markdown

undefined

Project: [Project Name]

项目：[项目名称]

Date: YYYY-MM-DD Author: [Your Name] Sharing Level: [Summary/Reproducible/Full]

日期： YYYY-MM-DD 作者： [你的姓名] 共享层级： [摘要/可复现/完全可追溯]

Overview

概述

Brief description of the project and analysis.

项目和分析的简要描述。

内容

See MANIFEST.md for detailed file descriptions.

详细文件说明请查看MANIFEST.md。

Requirements

环境要求

[For Reproducible/Full levels]

Python 3.X
See environment.yml for dependencies

[针对可复现/完全可追溯层级]

Python 3.X
依赖项请查看environment.yml

Setup

安装步骤

```bash

Create environment

创建环境

conda env create -f environment.yml conda activate project-name ```

Reproduction Steps

复现步骤

[For Reproducible/Full levels]

[Description of first step] ```bash jupyter notebook notebooks/01-analysis.ipynb ```
[Description of second step]

[针对可复现/完全可追溯层级]

[第一步说明] ```bash jupyter notebook notebooks/01-analysis.ipynb ```
[第二步说明]

Data Sources

数据源

[For Full level]

Dataset A: [Source, download date, version]
Dataset B: [Source, download date, version]

[针对完全可追溯层级]

数据集A：[来源、下载日期、版本]
数据集B：[来源、下载日期、版本]

Contact

联系方式

[Your email or preferred contact]

[你的邮箱或首选联系方式]

License

许可证

[If applicable - e.g., CC BY 4.0, MIT]

undefined

[如适用 - 例如 CC BY 4.0、MIT]

undefined

MANIFEST.md Template

MANIFEST.md模板

markdown

undefined

markdown

undefined

File Manifest

文件清单

Generated: YYYY-MM-DD

生成日期：YYYY-MM-DD

Directory Structure

目录结构

``` shared-YYYYMMDD/ ├── README.md - Project overview and setup ├── MANIFEST.md - This file [... complete tree ...] ```

``` shared-YYYYMMDD/ ├── README.md - 项目概述和安装说明 ├── MANIFEST.md - 本文件 [... 完整目录树 ...] ```

File Descriptions

文件说明

Notebooks

Notebook

`notebooks/01-data-processing.ipynb` - Initial data loading and cleaning
`notebooks/02-analysis.ipynb` - Main statistical analysis
`notebooks/03-visualization.ipynb` - Figure generation for publication

`notebooks/01-data-processing.ipynb` - 初始数据加载和清理
`notebooks/02-analysis.ipynb` - 主要统计分析
`notebooks/03-visualization.ipynb` - 用于发表的图表生成

Data

数据

`data/processed/cleaned_data.csv` - Quality-controlled dataset (N=XXX samples)
- Columns: [list key columns]
- Missing values handled by [method]

`data/processed/cleaned_data.csv` - 质量控制后的数据集（样本量N=XXX）
- 列：[关键列列表]
- 缺失值处理方式：[方法]

Scripts

脚本

`scripts/generate_figures.py` - Automated figure generation
- Usage: `python generate_figures.py --input data/processed/cleaned_data.csv`

`scripts/generate_figures.py` - 自动化图表生成
- 使用方法：`python generate_figures.py --input data/processed/cleaned_data.csv`

Results

结果

`results/figures/fig1-main.png` - Main result showing [description]
`results/tables/summary_stats.csv` - Descriptive statistics

[Continue for all files...]

undefined

`results/figures/fig1-main.png` - 主要结果，展示[描述]
`results/tables/summary_stats.csv` - 描述性统计数据

[所有文件的说明...]

undefined

Step 6: Handle Sensitive Data

步骤6：处理敏感数据

Check for sensitive information:

Personal identifiable information (PII)
Access credentials (API keys, passwords)
Proprietary data
Institutional data with sharing restrictions
Patient/subject identifiers

Strategies:

Anonymize - Remove or hash identifiers
Exclude - Don't include sensitive files
Aggregate - Share summary statistics only
Document restrictions - Note what's excluded and why

Example anonymization:

python

import hashlib

def anonymize_ids(df, id_column='subject_id'):
    """Replace IDs with hashed values."""
    df[id_column] = df[id_column].apply(
        lambda x: hashlib.sha256(str(x).encode()).hexdigest()[:8]
    )
    return df

检查敏感信息：

个人可识别信息（PII）
访问凭证（API密钥、密码）
专有数据
有共享限制的机构数据
患者/受试者标识符

处理策略：

匿名化 - 删除或哈希处理标识符
排除 - 不包含敏感文件
聚合 - 仅共享摘要统计数据
记录限制 - 说明排除的内容及原因

匿名化示例：

python

import hashlib

def anonymize_ids(df, id_column='subject_id'):
    """用哈希值替换ID。"""
    df[id_column] = df[id_column].apply(
        lambda x: hashlib.sha256(str(x).encode()).hexdigest()[:8]
    )
    return df

Step 7: Package and Compress

步骤7：打包与压缩

For smaller packages (<100MB):

bash

undefined

针对较小的包（<100MB）：

bash

undefined

Create zip archive

创建zip归档

zip -r shared-YYYYMMDD.zip shared-YYYYMMDD/


**For larger packages:**
```bash

zip -r shared-YYYYMMDD.zip shared-YYYYMMDD/


**针对较大的包：**
```bash

Create tar.gz (better compression)

创建tar.gz（压缩效果更好）

tar -czf shared-YYYYMMDD.tar.gz shared-YYYYMMDD/

Or split into parts if very large

如果非常大，可分块

tar -czf - shared-YYYYMMDD/ | split -b 1G - shared-YYYYMMDD.tar.gz.part


**Document package contents:**
- Total size
- Number of files
- Compression method
- How to extract

tar -czf - shared-YYYYMMDD/ | split -b 1G - shared-YYYYMMDD.tar.gz.part


**记录包内容：**
- 总大小
- 文件数量
- 压缩方式
- 解压方法

Step 8: Return to Working Directory

步骤8：返回工作目录

⚠️ IMPORTANT: After creating the sharing package, always work in the main project directory.

The sharing folder is a snapshot for distribution only. Any future development, analysis, or modifications should happen in your original working directory, not in the

shared-*/

folder.

Claude should:

Change directory back to main project:
```
cd ..
```
(if needed)
Confirm working directory:
```
pwd
```
Continue all work in the original project location
Treat sharing folders as read-only archives

Example:

bash

undefined

⚠️ 重要提示：创建共享包后，始终在主项目目录中工作。

共享文件夹仅用于分发的快照。任何后续开发、分析或修改都应在原始工作目录中进行，而非

shared-*/

文件夹中。

Claude应执行：

切换回主项目目录：
```
cd ..
```
（如有需要）
确认当前目录：
```
pwd
```
所有后续工作都在原始项目位置进行
将共享文件夹视为只读归档

示例：

bash

undefined

After creating sharing package

创建共享包后

cd /path/to/main/project # Return to working directory pwd # Verify location

cd /path/to/main/project # 返回工作目录 pwd # 验证位置

Continue work here, NOT in shared-YYYYMMDD/

在此处继续工作，而非shared-YYYYMMDD/

---

---

Best Practices

最佳实践

Notebook Cleaning

Notebook清理

Before sharing notebooks:

Clear all outputs

bash

jupyter nbconvert --clear-output --inplace notebook.ipynb

Remove debug cells
- Tag cells for removal: Cell → Cell Tags → add "remove"
- Filter during copy
Add markdown explanations
- Ensure each code cell has context
- Add section headers
- Document assumptions
Check cell execution order
- Run "Restart & Run All" to verify
- Fix any out-of-order dependencies

Remove absolute paths

python

# ❌ Bad
data = pd.read_csv('/Users/yourname/project/data.csv')

# ✅ Good
data = pd.read_csv('../data/data.csv')
# or
from pathlib import Path
data_dir = Path(__file__).parent / 'data'

共享Notebook前：

清除所有输出

bash

jupyter nbconvert --clear-output --inplace notebook.ipynb

删除调试单元格
- 标记要删除的单元格：Cell → Cell Tags → 添加"remove"
- 复制时过滤这些单元格
添加Markdown说明
- 确保每个代码单元格都有上下文
- 添加章节标题
- 记录假设条件
检查单元格执行顺序
- 运行“重启并全部运行”以验证
- 修复任何顺序依赖问题

删除绝对路径

python

# ❌ 不良示例
data = pd.read_csv('/Users/yourname/project/data.csv')

# ✅ 良好示例
data = pd.read_csv('../data/data.csv')
# 或
from pathlib import Path
data_dir = Path(__file__).parent / 'data'

File Organization

文件组织

Naming conventions for shared files:

Use descriptive names:

telomere_analysis_results.csv

not

results.csv

Include dates for time-sensitive data:
```
data_2024-01-15.csv
```
Version if applicable:
```
analysis_v2.ipynb
```
No spaces: use
```
-
```
or
```
_
```

Size considerations:

Document large files in README
Consider hosting large data separately (institutional storage, Zenodo)
Provide download links instead of including in package
Use
```
.gitattributes
```
for large file tracking if using Git

共享文件的命名规范：

使用描述性名称：

telomere_analysis_results.csv

而非

results.csv

对时间敏感的数据添加日期：
```
data_2024-01-15.csv
```
适用时添加版本：
```
analysis_v2.ipynb
```
不使用空格：使用
```
-
```
或
```
_
```

大小注意事项：

在README中记录大文件
考虑单独托管大文件（机构存储、Zenodo）
提供下载链接而非包含在包中
如果使用Git，用
```
.gitattributes
```
跟踪大文件

Documentation Requirements

文档要求

Minimum documentation for each level:

Level 1 - Summary:

What the results show
Key findings
Date and author

Level 2 - Reproducible:

Setup instructions
How to run the analysis
Software dependencies
Expected runtime
Data source information

Level 3 - Full:

Complete methodology
All data sources with versions
Processing decisions and rationale
Known issues or limitations
Contact information

每个层级的最低文档要求：

层级1 - 摘要：

结果展示内容
关键发现
日期和作者

层级2 - 可复现：

安装说明
分析运行方法
软件依赖项
预期运行时间
数据源信息

层级3 - 完全可追溯：

完整方法论
所有带版本的数据源
处理决策及理由
已知问题或限制
联系方式

Dependency Management

依赖项管理

Create requirements file:

For pip:

bash

undefined

创建依赖项文件：

针对pip：

bash

undefined

From active environment

从活跃环境导出

pip freeze > requirements.txt

Or manually curated (better)

或手动整理（推荐）

cat > requirements.txt << EOF pandas>=1.5.0 numpy>=1.23.0 matplotlib>=3.6.0 scipy>=1.9.0 EOF


**For conda:**
```bash

cat > requirements.txt << EOF pandas>=1.5.0 numpy>=1.23.0 matplotlib>=3.6.0 scipy>=1.9.0 EOF


**针对conda：**
```bash

Export current environment

导出当前环境

conda env export > environment.yml

Or minimal (recommended)

或最小化导出（推荐）

conda env export --from-history > environment.yml

Then edit to remove build-specific details

然后编辑以移除构建相关细节

---

---

Common Scenarios

常见场景

Scenario 1: Sharing with Lab Collaborators

场景1：与实验室合作者共享

Level: Reproducible

Include:

Cleaned analysis notebooks
Processed data
Figure generation scripts
environment.yml
README with reproduction steps

Don't include:

Exploratory notebooks
Failed analysis attempts
Debug outputs
Personal notes

层级： 可复现

包含内容：

清理后的分析Notebook
处理后的数据
图表生成脚本
environment.yml
包含复现步骤的README

不包含内容：

探索性Notebook
失败的分析尝试
调试输出
个人笔记

Scenario 2: Manuscript Supplementary Material

场景2：手稿补充材料

Level: Reproducible or Full (depending on journal)

Include:

All notebooks used for figures in paper
Scripts for each figure panel
Processed data (or instructions to obtain)
Complete environment specification
Detailed methods document

Best practices:

Number notebooks to match paper sections
Export key figures in publication formats (PDF, high-res PNG)
Include data dictionary for all variables
Test reproduction on clean environment

层级： 可复现或完全可追溯（取决于期刊要求）

包含内容：

论文中图表所用的所有Notebook
每个图板的脚本
处理后的数据（或获取说明）
完整的环境配置
详细的方法文档

最佳实践：

按论文章节编号Notebook
以出版物格式导出关键图表（PDF、高分辨率PNG）
包含所有变量的数据字典
在干净环境中测试复现

Scenario 3: Project Archival

场景3：项目归档

Level: Full Traceability

Include:

Complete data pipeline from raw to processed
All versions of analysis
Meeting notes or decision logs
External tool versions
System information

Organization tips:

Use dates in directory names
Keep chronological changelog
Document all external dependencies
Include contact info for questions

层级： 完全可追溯

包含内容：

从原始到处理的完整数据流水线
所有版本的分析
会议记录或决策日志
外部工具版本
系统信息

组织技巧：

在目录名称中使用日期
按时间顺序维护变更日志
记录所有外部依赖项
包含问题咨询的联系方式

Scenario 4: Data Repository Submission (Zenodo, Figshare)

场景4：向数据仓库提交（Zenodo、Figshare）

Level: Full Traceability

Additional considerations:

Add LICENSE file (CC BY 4.0, MIT, etc.)
Include CITATION.cff or CITATION.txt
Comprehensive metadata
README with DOI/reference instructions
Consider maximum file sizes
Review repository-specific guidelines

层级： 完全可追溯

附加注意事项：

添加LICENSE文件（CC BY 4.0、MIT等）
包含CITATION.cff或CITATION.txt
全面的元数据
包含DOI/引用说明的README
考虑最大文件大小限制
查看仓库特定指南

Quality Checklist

质量检查清单

Before finalizing the sharing package:

最终确定共享包前：

File Quality

文件质量

Documentation

文档

Reproducibility

可复现性

Privacy & Sensitivity

隐私与敏感信息

Organization

组织性

Integration with Other Skills

与其他技能的集成

Works well with:

folder-organization - Ensures source project is well-organized before sharing
jupyter-notebook-analysis - Creates notebooks that are share-ready
managing-environments - Documents dependencies properly

Before using this skill:

Organize working directory (folder-organization)
Finalize analysis (jupyter-notebook-analysis)
Document environment (managing-environments)

After using this skill:

Test package in clean environment
Share via appropriate channel (email, repository, cloud storage)
Keep archived copy for reference

适配良好的技能：

folder-organization - 确保源项目在共享前已整理有序
jupyter-notebook-analysis - 创建可直接共享的Notebook
managing-environments - 正确记录依赖项

使用此技能前：

整理工作目录（folder-organization）
完成分析（jupyter-notebook-analysis）
记录环境信息（managing-environments）

使用此技能后：

在干净环境中测试包
通过合适渠道共享（邮件、代码库、云存储）
保留归档副本以供参考

Example Scripts

示例脚本

Create Sharing Package Script

创建共享包脚本

python

#!/usr/bin/env python3
"""Create sharing package for project."""

import shutil
from pathlib import Path
from datetime import date
import nbformat
from nbconvert.preprocessors import ClearOutputPreprocessor

def create_sharing_package(level='reproducible', output_dir=None):
    """
    Create sharing package.

    Args:
        level: 'summary', 'reproducible', or 'full'
        output_dir: Output directory name (auto-generated if None)
    """

    # Create output directory
    if output_dir is None:
        output_dir = f"shared-{date.today():%Y%m%d}-{level}"

    share_path = Path(output_dir)
    share_path.mkdir(exist_ok=True)

    print(f"Creating {level} sharing package in {share_path}")

    # Create structure based on level
    if level == 'summary':
        create_summary_package(share_path)
    elif level == 'reproducible':
        create_reproducible_package(share_path)
    elif level == 'full':
        create_full_package(share_path)

    print(f"✓ Package created: {share_path}")
    print(f"  Review and compress: tar -czf {share_path}.tar.gz {share_path}")

def clean_notebook(input_path, output_path):
    """Clean notebook outputs and debug cells."""
    with open(input_path) as f:
        nb = nbformat.read(f, as_version=4)

    # Clear outputs
    clear = ClearOutputPreprocessor()
    nb, _ = clear.preprocess(nb, {})

    # Remove debug cells
    nb.cells = [c for c in nb.cells
                if 'debug' not in c.metadata.get('tags', [])]

    with open(output_path, 'w') as f:
        nbformat.write(nb, f)

python

#!/usr/bin/env python3
"""为项目创建共享包。"""

import shutil
from pathlib import Path
from datetime import date
import nbformat
from nbconvert.preprocessors import ClearOutputPreprocessor

def create_sharing_package(level='reproducible', output_dir=None):
    """
    创建共享包。

    参数:
        level: 'summary', 'reproducible', 或 'full'
        output_dir: 输出目录名称（未指定则自动生成）
    """

    # 创建输出目录
    if output_dir is None:
        output_dir = f"shared-{date.today():%Y%m%d}-{level}"

    share_path = Path(output_dir)
    share_path.mkdir(exist_ok=True)

    print(f"在{share_path}中创建{level}级共享包")

    # 根据层级创建结构
    if level == 'summary':
        create_summary_package(share_path)
    elif level == 'reproducible':
        create_reproducible_package(share_path)
    elif level == 'full':
        create_full_package(share_path)

    print(f"✓ 包已创建: {share_path}")
    print(f"  请检查并压缩: tar -czf {share_path}.tar.gz {share_path}")

def clean_notebook(input_path, output_path):
    """清理Notebook输出和调试单元格。"""
    with open(input_path) as f:
        nb = nbformat.read(f, as_version=4)

    # 清除输出
    clear = ClearOutputPreprocessor()
    nb, _ = clear.preprocess(nb, {})

    # 删除调试单元格
    nb.cells = [c for c in nb.cells
                if 'debug' not in c.metadata.get('tags', [])]

    with open(output_path, 'w') as f:
        nbformat.write(nb, f)

... implement level-specific functions ...

... 实现各层级的具体函数 ...

if name == 'main': import sys level = sys.argv[1] if len(sys.argv) > 1 else 'reproducible' create_sharing_package(level)

---

if name == 'main': import sys level = sys.argv[1] if len(sys.argv) > 1 else 'reproducible' create_sharing_package(level)

---

Summary

总结

Key principles for project sharing:

🎯 Choose the right level - Match sharing depth to audience needs
📋 Copy, don't move - Preserve your working directory
🧹 Clean thoroughly - Remove debug code, clear outputs
📝 Document everything - README + MANIFEST minimum
🔒 Check sensitivity - Anonymize or exclude as needed
✅ Test before sharing - Run in clean environment
📦 Package properly - Compress and document contents
⚠️ Work in main directory - After creating sharing package, ALL future work happens in the original project directory, NOT in the sharing folder

Remember: Good sharing practices benefit both collaborators and your future self!

项目共享的关键原则：

🎯 选择合适层级 - 匹配受众需求的共享深度
📋 基于副本操作 - 保留工作目录
🧹 彻底清理 - 删除调试代码、清除输出
📝 全面文档化 - 至少包含README + MANIFEST
🔒 检查敏感信息 - 必要时匿名化或排除
✅ 共享前测试 - 在干净环境中运行
📦 正确打包 - 压缩并记录内容
⚠️ 在主目录工作 - 创建共享包后，所有后续工作都在原始项目目录中进行，而非共享文件夹

记住： 良好的共享实践对合作者和未来的你都有好处！

⚠️ Critical Reminder for Claude

⚠️ 给Claude的关键提醒

After creating any sharing package:

Always return to the main project directory
Never work in
shared-*/
directories - These are read-only snapshots
All future edits, analysis, and development happen in the original working directory
Sharing folders are for distribution only, not active development

If the user asks to modify files, always check the current directory and ensure you're working in the main project location, not in a sharing package.

创建任何共享包后：

始终返回主项目目录
绝不在
shared-*/
目录中工作 - 这些是只读快照
所有后续编辑、分析和开发都在原始工作目录中进行
共享文件夹仅用于分发，而非活跃开发

如果用户要求修改文件，始终检查当前目录，确保在主项目位置工作，而非共享包中。

project-sharing

Original

Translation

Project Sharing and Output Preparation

项目共享与输出准备

When to Use This Skill

何时使用此技能

Core Principles

核心原则

Three Sharing Levels

三个共享层级

Level 1: Summary Only

层级1：仅摘要

Level 2: Reproducible

层级2：可复现

Level 3: Full Traceability

层级3：完全可追溯

Preparation Workflow

准备工作流

Step 1: Ask User for Sharing Level

步骤1：询问用户所需的共享层级

Step 2: Identify Files to Include

步骤2：确定要包含的文件

Step 3: Create Sharing Directory

步骤3：创建共享目录

Create dated directory

创建带日期的目录

Create subdirectories based on level

根据层级创建子目录

... appropriate structure from above

... 上述对应的目录结构

Step 4: Copy and Clean Files

步骤4：复制并清理文件

Step 5: Generate Documentation

步骤5：生成文档

README.md Template

README.md模板

Project: [Project Name]

项目：[项目名称]

Overview

概述

Contents

内容

Requirements

环境要求

Setup

安装步骤

Create environment

创建环境

Reproduction Steps

复现步骤

Data Sources

数据源

Contact

联系方式

License

许可证

MANIFEST.md Template

MANIFEST.md模板

File Manifest

文件清单

Directory Structure

目录结构

File Descriptions

文件说明

Notebooks

Notebook

Data

数据

Scripts

脚本

Results

结果

Step 6: Handle Sensitive Data

步骤6：处理敏感数据

Step 7: Package and Compress

步骤7：打包与压缩

Create zip archive

创建zip归档

Create tar.gz (better compression)