Literate Programming Skill

CRITICAL: This skill MUST be activated BEFORE making any changes to .nw files!

You are an expert in literate programming using the noweb system.

Reference Files

This skill includes detailed references in

references/

File	Content	Search patterns
`noweb-commands.md`	Tangling, weaving, flags, troubleshooting	`notangle` , `noweave` , `-R` , `-L`
`testing-patterns.md`	Test organization, placement, dependency testing	`test functions` , `pytest` , `after implementation`
`git-workflow.md`	Version control, .gitignore, pre-commit	`git` , `commit` , `generated files`
`multi-directory-projects.md`	Large project organization, makefiles	`src/` , `doc/` , `tests/` , `MODULES`
`project-initialization.md`	New project setup, templates, checklist	`new project` , `initialize` , `pyproject.toml`
`preamble.tex`	Standard LaTeX preamble for documentation	`\usepackage` , `memoir`

When to Use This Skill

Correct Workflow

User asks to modify a .nw file
YOU ACTIVATE THIS SKILL IMMEDIATELY
You plan the changes with literate programming principles
You make the changes following the principles
You regenerate code with make/notangle

Anti-pattern (NEVER do this)

User asks to modify a .nw file
You directly edit the .nw file ← WRONG
Later review finds literate quality problems
You have to redo everything

Remember

.nw files are NOT regular source code files
They combine documentation and code for human readers
Literate quality is AS IMPORTANT as code correctness
Bad literate quality = failed task, even if code works

Planning Changes

When making changes to a .nw file:

Read the existing file to understand structure and narrative
Plan with literate programming in mind:
- What is the "why" behind this change?
- How does this fit into the existing narrative?
- What new chunks are needed? What are their meaningful names?
- Where in the pedagogical order should this be explained?
Design documentation BEFORE writing code:
- Write prose explaining the problem and solution
- Use subsections to structure complex explanations
Decompose code into well-named chunks:
- Each chunk = one coherent concept
- Names describe purpose, not syntax (like pseudocode)
Write the code chunks
Regenerate and test

Key principle: If you find yourself writing code comments to explain logic, that explanation belongs in the documentation chunks instead.

Reviewing Literate Programs

When reviewing, evaluate:

Narrative flow: Coherent story? Pedagogical order?
Variation theory: Contrasts used? "Whole, parts, whole" structure?
Chunk quality: Meaningful names? Focused on single concepts?
Explanation quality: Explains "why" not just "what"? Red flags: prose that begins "We [verb] the [noun]" matching a function name; prose that describes parameter types visible in the signature; prose that restates conditionals without explaining why they matter.
Test organization: Tests after implementation, not before?
Proper noweb syntax:
```
[[code]]
```
notation? Valid chunk references?

Core Philosophy

Literate programming (Knuth) has two goals:

Explain to human beings what we want a computer to do
Present concepts in order best for human understanding (psychological order, not compiler order)

Variation Theory

Apply

variation-theory

skill when structuring explanations:

Contrast: Show what something IS vs what it is NOT
Separation: Start with whole (module outline), then parts (chunks)
Generalization: Show pattern across different contexts
Fusion: Integrate parts back into coherent whole

CRITICAL: Show concrete examples FIRST, then state general principles. Readers cannot discern a pattern without first experiencing variation.

Noweb File Format

Documentation Chunks

Begin with
```
@
```
followed by space or newline
Contain explanatory text (LaTeX, Markdown, etc.)
Copied verbatim by noweave

Code Chunks

Begin with
```
<<chunk name>>=
```
on a line by itself (column 1)
End when another chunk begins or at end of file
Reference other chunks using
```
<<chunk name>>
```
Multiple chunks with same name are concatenated

Syntax Rules

Quote code in documentation using
```
[[code]]
```
(escapes LaTeX special chars)
Escape:
```
@<<
```
for literal
```
<<
```
,
```
@@
```
in column 1 for literal
```
@
```

Writing Guidelines

Start with the human story - problem, approach, design decisions
Introduce concepts in pedagogical order - not compiler order
Use meaningful chunk names - 2-5 word summary of purpose (like pseudocode)
Reference variables in chunk names - when a chunk operates on a specific variable, use
```
[[variable]]
```
notation in the chunk name to make the connection explicit (e.g.,
```
<<add graders to [[graders]] list>>
```
)
Decompose by concept, not syntax
Explain the "why" - don't just describe what the code does. Prose that merely restates the code in English teaches nothing. Good prose explains why a design choice was made: what alternative was rejected, what would break without this approach, or what constraint drives the implementation.

Self-test: If your prose could be mechanically generated from the function signature, it's "what" not "why." Ask yourself: What design decision does this paragraph justify? What alternative did we reject and why? If the paragraph doesn't answer either question, rewrite it.

BAD — prose restates code in English:
noweb
```
\subsection{Counting $n$-grams}

We count overlapping $n$-grams.
If $n$ is larger than the input, the result is empty.

<<functions>>=
def ngram_counts(text, *, n):
    ...
@
```
GOOD — prose explains why this design choice:
noweb
```
\subsection{Counting $n$-grams}

We use overlapping $n$-grams because they capture all positional
contexts---in \enquote{THE}, overlapping bigrams yield TH and HE,
whereas non-overlapping would only yield TH.  This matches the
standard definition used in cryptanalysis.

<<functions>>=
def ngram_counts(text, *, n):
    ...
@
```
Red flags that prose is "what" not "why":
- Begins "We [verb] the [noun]" where the verb matches a function name
- Describes parameter types or return values already in the signature
- Restates conditional logic ("If X, we do Y") without explaining why X matters

Keep chunks focused — one function per
<<functions>>=
chunk with prose before it. Each function (or small group of tightly related functions) gets its own

<<functions>>=

chunk preceded by explanatory prose. Never put multiple unrelated functions in a single chunk.

BAD — four functions crammed into one chunk with minimal prose:

noweb

\subsection{Helper Functions}

We provide several utility functions.

<<functions>>=
def normalize_text(text): ...

def letters_only(text): ...

def key_shifts(key): ...

def index_of_coincidence(text): ...
@

GOOD — each function with its own subsection and prose:

noweb

\subsection{Text Normalization}

Before analysis, we strip non-alphabetic characters and
convert to lowercase so that frequency counts are meaningful.

<<functions>>=
def normalize_text(text): ...
@

\subsection{Index of Coincidence}

The index of coincidence measures how likely two randomly
chosen letters from a text are identical ...

<<functions>>=
def index_of_coincidence(text): ...
@

Decompose long functions into named sub-chunks — If a function has more than ~25 lines and contains two or more distinct algorithmic phases, decompose it into named sub-chunks. Each sub-chunk name should read like a step in an algorithm description. The prose before each sub-chunk explains why that phase works the way it does. This is the classic Knuth technique.

BAD — 80-line function with one line of prose:

noweb

We generate plaintext by concatenating sentences.

<<functions>>=
def generate_plaintext(size, *, sources, seed=None):
    """..."""
    if size <= 0:
        raise ValueError(...)
    paragraphs = extract_paragraphs(sources, ...)
    ...  # 75 more lines
    return normalize(prefix, options)
@

GOOD — function body decomposed into named sub-chunks with prose:

noweb

<<functions>>=
def generate_plaintext(size, *, sources, seed=None):
    """..."""
    <<prepare filtered paragraphs>>
    <<pick random starting point>>
    <<collect sentences until target length>>
    <<select closest sentence boundary>>
@

We extract paragraphs from the corpus, removing headings and ToC
entries.  Paragraphs lacking sentence-ending punctuation are
discarded---they are typically list items or table rows.

<<prepare filtered paragraphs>>=
if size <= 0:
    raise ValueError("size must be positive")
...
@

To avoid always starting at the beginning of the corpus, we
rotate to a random paragraph.

<<pick random starting point>>=
rng = random.Random(seed)
...
@

Use bucket chunks — distribute
<<constants>>=
near their relevant code - Define each constant in the section where it is conceptually relevant. Never group all constants into a single

\subsection{Constants}

BAD — all constants dumped in one subsection:

noweb

\subsection{Constants}

<<constants>>=
DATA_DIR = ...        % used in loading section
GUTENBERG_START = ... % used in extraction section
SENTENCE_RE = ...     % used in sentence splitting section
KEEP_PUNCT = ...      % used in normalization section
@

GOOD — each constant near the code that uses it:

noweb

\subsection{Loading Texts}

<<constants>>=
DATA_DIR = Path(__file__).parent / "data"
@

<<functions>>=
def load_text(path): ...
@

\subsection{Extracting Body Text}

<<constants>>=
GUTENBERG_START = "*** START OF"
GUTENBERG_END = "*** END OF"
@

<<functions>>=
def extract_body(text): ...
@

Define constants for magic numbers - never hardcode values
Co-locate dependencies with features - feature's imports in feature's section
Prefer public functions - Default to making functions public with docstrings. Only use
```
_
```
-prefixed private functions for true internal helpers tightly coupled to a single caller. Public utilities (e.g.,
```
normalize_text
```
,
```
letters_only
```
) are reusable across modules and discoverable via
```
help()
```
. Duplicated private helpers across modules (e.g.,
```
_to_ascii
```
in both
```
vigenere.nw
```
and
```
plaintexts.nw
```
) are a sign the function should be public in a shared module.
Keep lines under 80 characters - both prose and code

LaTeX Documentation Quality

Apply

latex-writing

skill. Most common anti-patterns in .nw files:

Lists with bold labels: Use

\begin{description}

with

\item[Label]

, NOT

\begin{itemize}

with

\item \textbf{Label}:

Code with manual escaping: Use

[[code]]

, NOT

\texttt{...\_...}

Manual quotes: Use

\enquote{...}

, NOT

"..."

...'' ``

Manual cross-references: Use

\cref{...}

, NOT

Section~\ref{...}

Progressive Disclosure Pattern

When introducing high-level structure, use abstract placeholder chunks that defer specifics:

noweb

def cli_show(user_regex,
             <<options for filtering>>):
  <<implementation>>
@

[... later, explain each option ...]

\paragraph{The --all option}
<<options for filtering>>=
all: Annotated[bool, all_opt] = False,
@

Benefits: readable high-level structure, pedagogical ordering, maintainability.

The same technique applies to function bodies: long functions can use

<<phase name>>

sub-chunks to present algorithmic steps in pedagogical order with prose between them (see Writing Guideline 8, "Decompose long functions").

Chunk Concatenation Patterns

Use multiple definitions when building up a parameter list pedagogically:

noweb

\subsection{Adding the diff flag}
<<args for diff>>=
diff=args.diff,
@

[... later ...]

\subsection{Fine-tuning thresholds}
<<args for diff>>=
threshold=args.threshold
@

Use separate chunks when contexts differ (different scopes):

noweb

<<args from command line>>=  # Has args object
diff=args.diff,
@

<<params for recursion>>=    # No args, only parameters
diff=diff,
@

Test Organization

CRITICAL: Tests MUST appear AFTER implementation, distributed throughout the file near the code they verify. NEVER create a

\section{Tests}

\section{Unit Tests}

that groups all tests at the end of the file.

See

references/testing-patterns.md

for detailed patterns.

Key rules:

Each implementation section is followed by its
```
<<test functions>>=
```
chunk
Use single
```
<<test functions>>
```
chunk name — noweb concatenates them
Use
```
from module import *
```
in the test file header
Frame tests pedagogically: "Let's verify this works..."

BAD — all tests collected at the end:

noweb

\section{Encryption}
<<functions>>=
def encrypt(text, key): ...
@

\section{Decryption}
<<functions>>=
def decrypt(text, key): ...
@

\section{Tests}          % ← NEVER do this

<<test functions>>=
def test_encrypt(): ...
def test_decrypt(): ...
@

GOOD — each test immediately after its implementation:

noweb

\section{Encryption}
<<functions>>=
def encrypt(text, key): ...
@

Let's verify that encryption produces the expected ciphertext:

<<test functions>>=
def test_encrypt(): ...
@

\section{Decryption}
<<functions>>=
def decrypt(text, key): ...
@

We can verify that decryption inverts encryption:

<<test functions>>=
def test_decrypt(): ...
@

Multi-Directory Projects

For large projects (5+ .nw files), see

references/multi-directory-projects.md

Key structure:

project/
├── Makefile       # Root orchestrator (compile → test → docs)
├── pyproject.toml # Poetry packaging configuration
├── src/           # .nw files → .py + .tex
├── doc/           # Document wrapper (.nw), preamble.tex
├── tests/         # Extracted test files (unit/ subdir)
└── makefiles/     # Shared build rules (noweb.mk, subdir.mk)

Initializing a New Project

See

references/project-initialization.md

for full details. Quick checklist:

Create
```
pyproject.toml
```
with
```
[tool.poetry]
```
packages/include/exclude

Create

src/.gitignore

(

*.py

*.tex

) and

tests/.gitignore

(

*.py

)

Create
```
src/packagename/Makefile
```
with explicit
```
__init__.py
```
rule

Create

src/packagename/packagename.nw

with

<<[[__init__.py]]>>

and

<<test [[packagename.py]]>>

chunks

Create
```
tests/Makefile
```
with auto-discovery (uses
```
%20
```
encoding,
```
cpif
```
,
```
unit/
```
subdirectory)

Create

doc/packagename.nw

wrapper,

doc/Makefile

doc/preamble.tex

Create root
```
Makefile
```
orchestrating compile → test → docs

LaTeX-Safe Chunk Names

Use

[[...]]

notation for Python chunks with underscores:

noweb

<<[[module_name.py]]>>=
def my_function():
    pass
@

Extract with:

notangle -R"[[module_name.py]]" file.nw > module_name.py

Best Practices Summary

Write documentation first - then add code
Keep lines under 80 characters
Check for unused chunks - run
```
noroots
```
to find typos
Keep tangled code in .gitignore - .nw is source of truth
NEVER commit generated files - .py and .tex from .nw are build artifacts
Test your tangles - ensure extracted code runs

Require PEP-257 docstrings on all public functions - Prose in

.nw

is for maintainers reading the literate source; docstrings are for users of the compiled

.py

who never see the

.nw

file. Both are needed. Private functions (prefixed

) may omit docstrings. Never use

\cref

or other LaTeX commands inside docstrings.

BAD — function with prose but no docstring:

noweb

We convert text to lowercase ASCII for uniform comparison.

<<functions>>=
def normalize_text(text):
    return text.lower().encode("ascii", "ignore").decode()
@

GOOD — prose for maintainers AND docstring for users:

noweb

We convert text to lowercase ASCII for uniform comparison.

<<functions>>=
def normalize_text(text):
    """Return lowercase ASCII version of ``text``.

    Non-ASCII characters are silently dropped.
    """
    return text.lower().encode("ascii", "ignore").decode()
@

Include table of contents - add
```
\tableofcontents
```
in documentation

Git Workflow

See

references/git-workflow.md

for details.

Core rules:

Only commit .nw files to git
Add generated files to .gitignore immediately
Regenerate code with
```
make
```
after checkout/pull
Never commit generated .py or .tex files

Noweb Commands Quick Reference

See

references/noweb-commands.md

for details.

bash

# Tangling
notangle -R"[[module.py]]" file.nw > module.py
noroots file.nw                              # List root chunks

# Weaving
noweave -n -delay -x -t2 file.nw > file.tex  # For inclusion
noweave -latex -x file.nw > file.tex         # Standalone

When Literate Programming Is Valuable

Complex algorithms requiring detailed explanation
Educational code where understanding is paramount
Code maintained by others
Programs where design decisions need documentation
Projects combining multiple languages/tools

literate-programming

NPX Install

Tags

SKILL.md Content

Literate Programming Skill

Reference Files

When to Use This Skill

Correct Workflow

Anti-pattern (NEVER do this)

Remember

Planning Changes

Reviewing Literate Programs

Core Philosophy

Variation Theory

Noweb File Format

Documentation Chunks

Code Chunks

Syntax Rules

Writing Guidelines

LaTeX Documentation Quality

Progressive Disclosure Pattern

Chunk Concatenation Patterns

Test Organization

Multi-Directory Projects

Initializing a New Project

LaTeX-Safe Chunk Names

Best Practices Summary

Git Workflow

Noweb Commands Quick Reference

When Literate Programming Is Valuable