IPYNB Notebook (.ipynb)
Overview
This skill guides you to operate
files and notebook projects in an "engineered" manner (not limited to Jupyter, also applicable to environments like Google Colab / VS Code Notebook):
- Clear file structure: Notebook serves as the interface, with logic sunk into reusable and
- Efficient token workflow: When AI reads/writes notebooks, only read structure/code as much as possible, not large outputs
- Presentable mode: Structure and output specifications for demos, team sharing, and documentation
- Reproducible environment: Prefer , or fall back to , to ensure repeatable execution
Applicable Scenarios
Use this skill in the following scenarios:
- Creating a new notebook project or single notebook
- Reviewing / editing existing files (especially large files with many outputs and unreadable diffs)
- Organizing notebook project structures, extracting "reusable logic" from notebooks into modules/scripts
- Organizing "runnable, reproducible, exportable" notebooks for demos, sharing, and archiving
- Improving long-term maintainability and version control experience of notebooks
Core Principles
Notebook is an interface, not a library.
Notebooks are suitable for interactive exploration and narrative presentation; reusable, testable, automatable logic should be placed in:
- : Directly runnable scripts (no dependency on notebook UI)
- : Reusable modules (imported by both notebooks and scripts)
Benefits of this approach:
- Reuse the same logic across multiple notebooks
- Test key logic without running notebooks
- Easier automation in CI/CD (e.g., export, scheduled data processing)
- Cleaner diffs and more friendly version control
Quick Start
Create a new notebook project (uv recommended)
-
Initialize project (uv)
bash
# Create project directory
mkdir notebook-project && cd notebook-project
# Initialize uv project
uv init
# Add dependencies (pick what you need)
uv add jupyterlab pandas plotly
-
Set up directory structure
bash
mkdir -p scripts lib data/{raw,processed} reports docs .archive
touch data/.gitkeep data/raw/.gitkeep data/processed/.gitkeep reports/.gitkeep
-
gitignore
# Virtual environments
.venv/
# Data and outputs (keep .gitkeep)
data/**
!data/**/
!data/**/.gitkeep
reports/**
!reports/**/
!reports/**/.gitkeep
# Jupyter
.ipynb_checkpoints/
# Python
__pycache__/
*.pyc
# Environment
.env
-
Start notebook environment
-
Load reference documents when more detailed patterns are needed:
references/file-structure.md
: Directory structure and project organization
references/presentation-patterns.md
: Demonstration/sharing structure and output specifications
references/token-efficiency.md
: Token efficiency strategies for AI reading/writing notebooks
Review / compare an existing notebook (focus on structure and code as much as possible)
Recommended workflow:
-
Check structure first, don't read outputs
bash
# Cell types and counts
jq '.cells | group_by(.cell_type) | map({type: .[0].cell_type, count: length})' notebook.ipynb
# Code cells with outputs
jq '[.cells[] | select(.cell_type == "code") | select(.outputs | length > 0)] | length' notebook.ipynb
-
Compare only code cells
bash
# Extract code sources to compare
jq '.cells[] | select(.cell_type == "code") | .source' notebook1.ipynb > /tmp/code1.json
jq '.cells[] | select(.cell_type == "code") | .source' notebook2.ipynb > /tmp/code2.json
diff /tmp/code1.json /tmp/code2.json
-
Read notebook content only when necessary
- Clarify which section or cell type to read before accessing
- For large notebooks, prefer segmented reading by cell range/topic
- Details in
references/token-efficiency.md
Organize a notebook project (extract logic, control outputs, make it reproducible)
Directory organization suggestions are in
references/file-structure.md
. Here are minimal executable migration steps:
- Count root directory files:
- Move scripts to , documents to , old notebooks to
- Update imports in notebooks:
from lib import module_name
- Verify normal operation is still possible
Reproducible Environment (uv / venv)
Why prefer uv?
uv is suitable for:
- Fast, reproducible dependency management
- Running tools in project dependency environments (e.g., , )
- No pollution to global Python
- Better cross-platform consistency
Common command patterns
Add dependencies:
bash
uv add plotly pandas duckdb
Install tools (optional):
bash
uv tool install jupyterlab
Run in project environment:
Single-file script dependency declaration (for ):
python
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "pandas",
# "plotly",
# ]
# ///
import pandas as pd
import plotly.express as px
# Script code here
If you can't use uv, you can also use
+
, but ensure one-click reproducibility (recommend
or
+ lockfile).
Token Efficient Workflow (for AI and Version Control)
Default strategy: Clean outputs before committing
Recommended pre-commit:
yaml
# .pre-commit-config.yaml
repos:
- repo: https://github.com/kynan/nbstripout
rev: 0.6.1
hooks:
- id: nbstripout
When outputs must be retained (not recommended):
bash
SKIP=nbstripout git commit -m "Add notebook with visualization outputs"
A more common practice is: save outputs to
, keep notebooks in a state where "re-running can reproduce outputs" (see
references/token-efficiency.md
).
Query before reading (structure first)
Check structure first:
bash
jq '.cells | group_by(.cell_type) | map({type: .[0].cell_type, count: length})' notebook.ipynb
View only code:
bash
jq '.cells[] | select(.cell_type == "code") | .source' notebook.ipynb
Output should be "controllable, reproducible"
Prefer output summaries, don't directly dump large objects:
python
print(f"[OK] Loaded {len(df_alarms):,} rows")
print(f"Columns: {', '.join(df_alarms.columns)}")
print(f"Date range: {df_alarms['timestamp'].min()} to {df_alarms['timestamp'].max()}")
Save large outputs to files:
python
fig.write_html(report_dir / "visualization.html")
print(f"[OK] Saved visualization to {report_dir}/visualization.html")
Complete strategies are in
references/token-efficiency.md
.
Demonstration / Sharing Mode
Recommended notebook structure
- Title & Overview - Background and objectives
- Preparation - Imports and configuration
- Data Loading - With feedback and error handling
- Summary - High-level statistics
- Visualization - With explanations and usage tips
- Conclusion - Key findings
More "professional" output habits
Unified status output:
python
print("[OK] Success")
print("[WARN] Warning")
print("[ERR] Error")
print("[INFO] Note")
Number formatting:
python
print(f"Total: {count:,}") # 2,055 instead of 2055
Save to reports by date:
python
from datetime import datetime
today = datetime.now().strftime('%Y-%m-%d')
report_dir = Path("reports") / today
report_dir.mkdir(parents=True, exist_ok=True)
fig.write_html(report_dir / "chart.html")
latest = Path("reports/latest")
if latest.exists():
latest.unlink()
latest.symlink_to(today, target_is_directory=True)
Complete patterns and templates are in
references/presentation-patterns.md
.
Resource Index
references/file-structure.md
Includes:
- Recommended directory structure
- File organization rules and naming conventions
- Git-friendly practices (ignore, diff, output cleaning)
- Migration steps for existing projects
- Example structures
Suitable for: Loading when creating new projects, refactoring directories, unifying conventions.
references/token-efficiency.md
Includes:
- Output cleaning and version control strategies
- Structured query methods without reading outputs
- Segmented reading and diff ideas for large notebooks
- Common / CLI patterns
- Cell output management
Suitable for: Loading when token saving is needed, reviewing large notebooks, or performing automated processing.
references/presentation-patterns.md
Includes:
- Structure templates for demonstration notebooks
- Readability and narrative rhythm
- Interactive elements and export strategies
- Error handling and reproducibility checkpoints
- Division of labor between Markdown / Code cells
- Notes on exporting to HTML/PDF
Suitable for: Loading before creating demos, team sharing, or publishing documentation.
Best Practices Cheat Sheet
- Structure: Notebook as interface, logic sunk into /
- Dependencies: Prefer uv to ensure one-click reproducibility
- Version Control: Clean outputs by default (pre-commit/nbstripout/nbconvert)
- Token Saving: Query structure before reading; save large outputs to files
- Presentation: Clear narrative, restrained outputs, explicit error handling
- Reproducibility: Ensure "Restart & Run All" works
- Data Flow: raw → processed → reports
- Git-friendly: Ignore data and products, keep directory skeleton ()
Example Workflow
bash
# 1. Create project
mkdir my-analysis && cd my-analysis
uv init
uv add jupyterlab pandas plotly
# 2. Set up structure
mkdir -p scripts lib data/{raw,processed} reports
touch data/.gitkeep data/raw/.gitkeep data/processed/.gitkeep reports/.gitkeep
# 3. Create notebook
uv run jupyter lab
# 4. As you work:
# - Keep logic in lib/ and scripts/
# - Save outputs to reports/ with dates
# - Keep outputs minimal
# - Strip outputs before committing
# 5. Before presenting:
# - Run "Restart & Run All" to test
# - Add context and documentation
# - Consider exporting to HTML
jupyter nbconvert --to html --execute notebook.ipynb
Cheat Sheet
Directory Organization:
- Notebooks: Project root (or split into by scale)
- Scripts:
- Modules:
- Data: ,
- Reports:
- Archive:
Common uv Commands:
- : Initialize project
- : Add dependencies
- : Run command in project environment
- : Run temporary tool (not written to project dependencies)
Token Saving:
- Clean outputs: pre-commit hook, or
jupyter nbconvert --ClearOutputPreprocessor.enabled=True --inplace notebook.ipynb
- Query structure:
jq '.cells | group_by(.cell_type)'
- Compare code:
jq '.cells[] | select(.cell_type == "code") | .source'
Presentation:
- Number formatting:
- Save to files by date:
- Execution verification:
jupyter nbconvert --execute