Loading...
Loading...
Prepare organized packages of project files for sharing at different levels - from summary PDFs to fully reproducible archives. Creates copies with cleaned notebooks, documentation, and appropriate file selection. After creating sharing package, all work continues in the main project directory.
npx skill4agent add delphine-l/claude_global project-sharingshared-summary/
├── README.md # Brief overview
├── analysis-YYYY-MM-DD.pdf # Notebook as PDF
└── results/
├── figures/
│ ├── fig1-main-result.png
│ └── fig2-comparison.png
└── tables/
└── summary-statistics.csvshared-reproducible/
├── README.md # Setup and reproduction instructions
├── MANIFEST.md # File descriptions
├── environment.yml # Conda environment OR requirements.txt
├── notebooks/
│ ├── 01-data-processing.ipynb # Cleaned, outputs cleared
│ ├── 02-analysis.ipynb
│ └── 03-visualization.ipynb
├── scripts/
│ ├── generate_figures.py # Standalone scripts
│ └── utils.py
└── data/
├── processed/
│ ├── cleaned_data.csv
│ └── processed_results.tsv
└── README.md # Data provenanceshared-complete/
├── README.md # Complete project guide
├── MANIFEST.md # Comprehensive file listing
├── environment.yml
├── data/
│ ├── raw/ # Original, unmodified data
│ │ ├── sample_A_reads.fastq.gz
│ │ └── README.md # Data source, download date
│ ├── intermediate/ # Processing steps
│ │ ├── 01-filtered/
│ │ ├── 02-aligned/
│ │ └── README.md
│ └── processed/ # Final analysis-ready
│ └── final_dataset.csv
├── scripts/
│ ├── 01-download-data.sh
│ ├── 02-quality-control.py
│ ├── 03-filtering.py
│ ├── 04-analysis.py
│ └── utils/
├── notebooks/
│ ├── exploratory/ # Early exploration
│ └── final/ # Publication analyses
├── results/
│ ├── figures/
│ ├── tables/
│ └── supplementary/
└── documentation/
├── methods.md # Detailed methodology
├── changelog.md # Processing decisions
└── data-dictionary.md # Variable definitionsWhich sharing level do you need?
1. Summary Only - PDF + final results (quick sharing)
2. Reproducible - Notebooks + scripts + data (standard sharing)
3. Full Traceability - Everything from raw data (archival/compliance)
Additional questions:
- Who is the audience? (colleagues, reviewers, public)
- Are there size constraints?
- Any sensitive data to handle?
- Timeline for sharing?# Create dated directory
SHARE_DIR="shared-$(date +%Y%m%d)-[level]"
mkdir -p "$SHARE_DIR"
# Create subdirectories based on level
# ... appropriate structure from aboveimport nbformat
from nbconvert.preprocessors import ClearOutputPreprocessor
def clean_notebook(input_path, output_path):
"""Clean notebook: clear outputs, remove debug cells."""
# Read notebook
with open(input_path, 'r') as f:
nb = nbformat.read(f, as_version=4)
# Clear outputs
clear_output = ClearOutputPreprocessor()
nb, _ = clear_output.preprocess(nb, {})
# Remove cells tagged as 'debug' or 'remove'
nb.cells = [cell for cell in nb.cells
if 'debug' not in cell.metadata.get('tags', [])
and 'remove' not in cell.metadata.get('tags', [])]
# Write cleaned notebook
with open(output_path, 'w') as f:
nbformat.write(nb, f)# Project: [Project Name]
**Date:** YYYY-MM-DD
**Author:** [Your Name]
**Sharing Level:** [Summary/Reproducible/Full]
## Overview
Brief description of the project and analysis.
## Contents
See MANIFEST.md for detailed file descriptions.
## Requirements
[For Reproducible/Full levels]
- Python 3.X
- See environment.yml for dependencies
## Setup
\`\`\`bash
# Create environment
conda env create -f environment.yml
conda activate project-name
\`\`\`
## Reproduction Steps
[For Reproducible/Full levels]
1. [Description of first step]
\`\`\`bash
jupyter notebook notebooks/01-analysis.ipynb
\`\`\`
2. [Description of second step]
## Data Sources
[For Full level]
- Dataset A: [Source, download date, version]
- Dataset B: [Source, download date, version]
## Contact
[Your email or preferred contact]
## License
[If applicable - e.g., CC BY 4.0, MIT]# File Manifest
Generated: YYYY-MM-DD
## Directory Structure
\`\`\`
shared-YYYYMMDD/
├── README.md - Project overview and setup
├── MANIFEST.md - This file
[... complete tree ...]
\`\`\`
## File Descriptions
### Notebooks
- \`notebooks/01-data-processing.ipynb\` - Initial data loading and cleaning
- \`notebooks/02-analysis.ipynb\` - Main statistical analysis
- \`notebooks/03-visualization.ipynb\` - Figure generation for publication
### Data
- \`data/processed/cleaned_data.csv\` - Quality-controlled dataset (N=XXX samples)
- Columns: [list key columns]
- Missing values handled by [method]
### Scripts
- \`scripts/generate_figures.py\` - Automated figure generation
- Usage: \`python generate_figures.py --input data/processed/cleaned_data.csv\`
### Results
- \`results/figures/fig1-main.png\` - Main result showing [description]
- \`results/tables/summary_stats.csv\` - Descriptive statistics
[Continue for all files...]import hashlib
def anonymize_ids(df, id_column='subject_id'):
"""Replace IDs with hashed values."""
df[id_column] = df[id_column].apply(
lambda x: hashlib.sha256(str(x).encode()).hexdigest()[:8]
)
return df# Create zip archive
zip -r shared-YYYYMMDD.zip shared-YYYYMMDD/# Create tar.gz (better compression)
tar -czf shared-YYYYMMDD.tar.gz shared-YYYYMMDD/
# Or split into parts if very large
tar -czf - shared-YYYYMMDD/ | split -b 1G - shared-YYYYMMDD.tar.gz.partshared-*/cd ..pwd# After creating sharing package
cd /path/to/main/project # Return to working directory
pwd # Verify location
# Continue work here, NOT in shared-YYYYMMDD/jupyter nbconvert --clear-output --inplace notebook.ipynb# ❌ Bad
data = pd.read_csv('/Users/yourname/project/data.csv')
# ✅ Good
data = pd.read_csv('../data/data.csv')
# or
from pathlib import Path
data_dir = Path(__file__).parent / 'data'telomere_analysis_results.csvresults.csvdata_2024-01-15.csvanalysis_v2.ipynb-_.gitattributes# From active environment
pip freeze > requirements.txt
# Or manually curated (better)
cat > requirements.txt << EOF
pandas>=1.5.0
numpy>=1.23.0
matplotlib>=3.6.0
scipy>=1.9.0
EOF# Export current environment
conda env export > environment.yml
# Or minimal (recommended)
conda env export --from-history > environment.yml
# Then edit to remove build-specific details#!/usr/bin/env python3
"""Create sharing package for project."""
import shutil
from pathlib import Path
from datetime import date
import nbformat
from nbconvert.preprocessors import ClearOutputPreprocessor
def create_sharing_package(level='reproducible', output_dir=None):
"""
Create sharing package.
Args:
level: 'summary', 'reproducible', or 'full'
output_dir: Output directory name (auto-generated if None)
"""
# Create output directory
if output_dir is None:
output_dir = f"shared-{date.today():%Y%m%d}-{level}"
share_path = Path(output_dir)
share_path.mkdir(exist_ok=True)
print(f"Creating {level} sharing package in {share_path}")
# Create structure based on level
if level == 'summary':
create_summary_package(share_path)
elif level == 'reproducible':
create_reproducible_package(share_path)
elif level == 'full':
create_full_package(share_path)
print(f"✓ Package created: {share_path}")
print(f" Review and compress: tar -czf {share_path}.tar.gz {share_path}")
def clean_notebook(input_path, output_path):
"""Clean notebook outputs and debug cells."""
with open(input_path) as f:
nb = nbformat.read(f, as_version=4)
# Clear outputs
clear = ClearOutputPreprocessor()
nb, _ = clear.preprocess(nb, {})
# Remove debug cells
nb.cells = [c for c in nb.cells
if 'debug' not in c.metadata.get('tags', [])]
with open(output_path, 'w') as f:
nbformat.write(nb, f)
# ... implement level-specific functions ...
if __name__ == '__main__':
import sys
level = sys.argv[1] if len(sys.argv) > 1 else 'reproducible'
create_sharing_package(level)shared-*/