Literate Programming Skill
CRITICAL: This skill MUST be activated BEFORE making any changes to .nw files!
You are an expert in literate programming using the noweb system.
Reference Files
This skill includes detailed references in
:
| File | Content | Search patterns |
|---|
| Tangling, weaving, flags, troubleshooting | , , , |
| Test organization, placement, dependency testing | , , |
| Version control, .gitignore, pre-commit | , , |
multi-directory-projects.md
| Large project organization, makefiles | , , , |
project-initialization.md
| New project setup, templates, checklist | , , |
| Standard LaTeX preamble for documentation | , |
When to Use This Skill
Correct Workflow
- User asks to modify a .nw file
- YOU ACTIVATE THIS SKILL IMMEDIATELY
- You plan the changes with literate programming principles
- You make the changes following the principles
- You regenerate code with make/notangle
Anti-pattern (NEVER do this)
- User asks to modify a .nw file
- You directly edit the .nw file ← WRONG
- Later review finds literate quality problems
- You have to redo everything
Remember
- .nw files are NOT regular source code files
- They combine documentation and code for human readers
- Literate quality is AS IMPORTANT as code correctness
- Bad literate quality = failed task, even if code works
Planning Changes
When making changes to a .nw file:
- Read the existing file to understand structure and narrative
- Plan with literate programming in mind:
- What is the "why" behind this change?
- How does this fit into the existing narrative?
- What new chunks are needed? What are their meaningful names?
- Where in the pedagogical order should this be explained?
- Design documentation BEFORE writing code:
- Write prose explaining the problem and solution
- Use subsections to structure complex explanations
- Decompose code into well-named chunks:
- Each chunk = one coherent concept
- Names describe purpose, not syntax (like pseudocode)
- Write the code chunks
- Regenerate and test
Key principle: If you find yourself writing code comments to explain logic, that explanation belongs in the documentation chunks instead.
Reviewing Literate Programs
When reviewing, evaluate:
- Narrative flow: Coherent story? Pedagogical order?
- Variation theory: Contrasts used? "Whole, parts, whole" structure?
- Chunk quality: Meaningful names? Focused on single concepts?
- Explanation quality: Explains "why" not just "what"? Red flags:
prose that begins "We [verb] the [noun]" matching a function name;
prose that describes parameter types visible in the signature;
prose that restates conditionals without explaining why they matter.
- Test organization: Tests after implementation, not before?
- Proper noweb syntax: notation? Valid chunk references?
Core Philosophy
Literate programming (Knuth) has two goals:
- Explain to human beings what we want a computer to do
- Present concepts in order best for human understanding (psychological order, not compiler order)
Variation Theory
Apply
skill when structuring explanations:
- Contrast: Show what something IS vs what it is NOT
- Separation: Start with whole (module outline), then parts (chunks)
- Generalization: Show pattern across different contexts
- Fusion: Integrate parts back into coherent whole
CRITICAL: Show concrete examples FIRST, then state general principles. Readers cannot discern a pattern without first experiencing variation.
Noweb File Format
Documentation Chunks
- Begin with followed by space or newline
- Contain explanatory text (LaTeX, Markdown, etc.)
- Copied verbatim by noweave
Code Chunks
- Begin with on a line by itself (column 1)
- End when another chunk begins or at end of file
- Reference other chunks using
- Multiple chunks with same name are concatenated
Syntax Rules
- Quote code in documentation using (escapes LaTeX special chars)
- Escape: for literal , in column 1 for literal
Writing Guidelines
-
Start with the human story - problem, approach, design decisions
-
Introduce concepts in pedagogical order - not compiler order
-
Use meaningful chunk names - 2-5 word summary of purpose (like pseudocode)
-
Reference variables in chunk names - when a chunk operates on a specific variable, use
notation in the chunk name to make the connection explicit (e.g.,
<<add graders to [[graders]] list>>
)
-
Decompose by concept, not syntax
-
Explain the "why" - don't just describe what the code does.
Prose that merely restates the code in English teaches nothing. Good
prose explains why a design choice was made: what alternative was
rejected, what would break without this approach, or what constraint
drives the implementation.
Self-test: If your prose could be mechanically generated from the
function signature, it's "what" not "why." Ask yourself: What design
decision does this paragraph justify? What alternative did we reject
and why? If the paragraph doesn't answer either question, rewrite it.
BAD — prose restates code in English:
noweb
\subsection{Counting $n$-grams}
We count overlapping $n$-grams.
If $n$ is larger than the input, the result is empty.
<<functions>>=
def ngram_counts(text, *, n):
...
@
GOOD — prose explains why this design choice:
noweb
\subsection{Counting $n$-grams}
We use overlapping $n$-grams because they capture all positional
contexts---in \enquote{THE}, overlapping bigrams yield TH and HE,
whereas non-overlapping would only yield TH. This matches the
standard definition used in cryptanalysis.
<<functions>>=
def ngram_counts(text, *, n):
...
@
Red flags that prose is "what" not "why":
- Begins "We [verb] the [noun]" where the verb matches a function name
- Describes parameter types or return values already in the signature
- Restates conditional logic ("If X, we do Y") without explaining
why X matters
-
Keep chunks focused — one function per chunk with
prose before it. Each function (or small group of tightly related
functions) gets its own
chunk preceded by explanatory
prose. Never put multiple unrelated functions in a single chunk.
BAD — four functions crammed into one chunk with minimal prose:
noweb
\subsection{Helper Functions}
We provide several utility functions.
<<functions>>=
def normalize_text(text): ...
def letters_only(text): ...
def key_shifts(key): ...
def index_of_coincidence(text): ...
@
GOOD — each function with its own subsection and prose:
noweb
\subsection{Text Normalization}
Before analysis, we strip non-alphabetic characters and
convert to lowercase so that frequency counts are meaningful.
<<functions>>=
def normalize_text(text): ...
@
\subsection{Index of Coincidence}
The index of coincidence measures how likely two randomly
chosen letters from a text are identical ...
<<functions>>=
def index_of_coincidence(text): ...
@
-
Decompose long functions into named sub-chunks — If a function has
more than ~25 lines and contains two or more distinct algorithmic
phases, decompose it into named sub-chunks. Each sub-chunk name
should read like a step in an algorithm description. The prose before
each sub-chunk explains why that phase works the way it does. This
is the classic Knuth technique.
BAD — 80-line function with one line of prose:
noweb
We generate plaintext by concatenating sentences.
<<functions>>=
def generate_plaintext(size, *, sources, seed=None):
"""..."""
if size <= 0:
raise ValueError(...)
paragraphs = extract_paragraphs(sources, ...)
... # 75 more lines
return normalize(prefix, options)
@
GOOD — function body decomposed into named sub-chunks with prose:
noweb
<<functions>>=
def generate_plaintext(size, *, sources, seed=None):
"""..."""
<<prepare filtered paragraphs>>
<<pick random starting point>>
<<collect sentences until target length>>
<<select closest sentence boundary>>
@
We extract paragraphs from the corpus, removing headings and ToC
entries. Paragraphs lacking sentence-ending punctuation are
discarded---they are typically list items or table rows.
<<prepare filtered paragraphs>>=
if size <= 0:
raise ValueError("size must be positive")
...
@
To avoid always starting at the beginning of the corpus, we
rotate to a random paragraph.
<<pick random starting point>>=
rng = random.Random(seed)
...
@
-
Use bucket chunks — distribute near their relevant
code - Define each constant in the section where it is conceptually
relevant. Never group all constants into a single
.
BAD — all constants dumped in one subsection:
noweb
\subsection{Constants}
<<constants>>=
DATA_DIR = ... % used in loading section
GUTENBERG_START = ... % used in extraction section
SENTENCE_RE = ... % used in sentence splitting section
KEEP_PUNCT = ... % used in normalization section
@
GOOD — each constant near the code that uses it:
noweb
\subsection{Loading Texts}
<<constants>>=
DATA_DIR = Path(__file__).parent / "data"
@
<<functions>>=
def load_text(path): ...
@
\subsection{Extracting Body Text}
<<constants>>=
GUTENBERG_START = "*** START OF"
GUTENBERG_END = "*** END OF"
@
<<functions>>=
def extract_body(text): ...
@
-
Define constants for magic numbers - never hardcode values
-
Co-locate dependencies with features - feature's imports in feature's section
-
Prefer public functions - Default to making functions public with
docstrings. Only use
-prefixed private functions for true internal
helpers tightly coupled to a single caller. Public utilities (e.g.,
,
) are reusable across modules and
discoverable via
. Duplicated private helpers across modules
(e.g.,
in both
and
) are a
sign the function should be public in a shared module.
-
Keep lines under 80 characters - both prose and code
LaTeX Documentation Quality
Apply
skill. Most common anti-patterns in .nw files:
Lists with bold labels: Use
with
, NOT
with
Code with manual escaping: Use
, NOT
Manual quotes: Use
, NOT
or
...'' ``
Manual cross-references: Use
, NOT
Progressive Disclosure Pattern
When introducing high-level structure, use abstract placeholder chunks that defer specifics:
noweb
def cli_show(user_regex,
<<options for filtering>>):
<<implementation>>
@
[... later, explain each option ...]
\paragraph{The --all option}
<<options for filtering>>=
all: Annotated[bool, all_opt] = False,
@
Benefits: readable high-level structure, pedagogical ordering, maintainability.
The same technique applies to
function bodies: long functions can use
sub-chunks to present algorithmic steps in pedagogical
order with prose between them (see Writing Guideline 8, "Decompose long
functions").
Chunk Concatenation Patterns
Use multiple definitions when building up a parameter list pedagogically:
noweb
\subsection{Adding the diff flag}
<<args for diff>>=
diff=args.diff,
@
[... later ...]
\subsection{Fine-tuning thresholds}
<<args for diff>>=
threshold=args.threshold
@
Use separate chunks when contexts differ (different scopes):
noweb
<<args from command line>>= # Has args object
diff=args.diff,
@
<<params for recursion>>= # No args, only parameters
diff=diff,
@
Test Organization
CRITICAL: Tests MUST appear AFTER implementation, distributed throughout
the file near the code they verify.
NEVER create a
or
that groups all tests at the end of the file.
See
references/testing-patterns.md
for detailed patterns.
Key rules:
- Each implementation section is followed by its chunk
- Use single chunk name — noweb concatenates them
- Use in the test file header
- Frame tests pedagogically: "Let's verify this works..."
BAD — all tests collected at the end:
noweb
\section{Encryption}
<<functions>>=
def encrypt(text, key): ...
@
\section{Decryption}
<<functions>>=
def decrypt(text, key): ...
@
\section{Tests} % ← NEVER do this
<<test functions>>=
def test_encrypt(): ...
def test_decrypt(): ...
@
GOOD — each test immediately after its implementation:
noweb
\section{Encryption}
<<functions>>=
def encrypt(text, key): ...
@
Let's verify that encryption produces the expected ciphertext:
<<test functions>>=
def test_encrypt(): ...
@
\section{Decryption}
<<functions>>=
def decrypt(text, key): ...
@
We can verify that decryption inverts encryption:
<<test functions>>=
def test_decrypt(): ...
@
Multi-Directory Projects
For large projects (5+ .nw files), see
references/multi-directory-projects.md
.
Key structure:
project/
├── Makefile # Root orchestrator (compile → test → docs)
├── pyproject.toml # Poetry packaging configuration
├── src/ # .nw files → .py + .tex
├── doc/ # Document wrapper (.nw), preamble.tex
├── tests/ # Extracted test files (unit/ subdir)
└── makefiles/ # Shared build rules (noweb.mk, subdir.mk)
Initializing a New Project
See
references/project-initialization.md
for full details. Quick checklist:
- Create with packages/include/exclude
- Create (, ) and ()
- Create with explicit rule
- Create
src/packagename/packagename.nw
with and
<<test [[packagename.py]]>>
chunks
- Create with auto-discovery (uses encoding, ,
subdirectory)
- Create wrapper, ,
- Create root orchestrating compile → test → docs
LaTeX-Safe Chunk Names
Use
notation for Python chunks with underscores:
noweb
<<[[module_name.py]]>>=
def my_function():
pass
@
Extract with:
notangle -R"[[module_name.py]]" file.nw > module_name.py
Best Practices Summary
-
Write documentation first - then add code
-
Keep lines under 80 characters
-
Check for unused chunks - run
to find typos
-
Keep tangled code in .gitignore - .nw is source of truth
-
NEVER commit generated files - .py and .tex from .nw are build artifacts
-
Test your tangles - ensure extracted code runs
-
Require PEP-257 docstrings on all public functions - Prose in
is for
maintainers reading the literate source; docstrings are for
users of the compiled
who never see the
file. Both are
needed. Private functions (prefixed
) may omit docstrings. Never use
or other LaTeX commands inside docstrings.
BAD — function with prose but no docstring:
noweb
We convert text to lowercase ASCII for uniform comparison.
<<functions>>=
def normalize_text(text):
return text.lower().encode("ascii", "ignore").decode()
@
GOOD — prose for maintainers AND docstring for users:
noweb
We convert text to lowercase ASCII for uniform comparison.
<<functions>>=
def normalize_text(text):
"""Return lowercase ASCII version of ``text``.
Non-ASCII characters are silently dropped.
"""
return text.lower().encode("ascii", "ignore").decode()
@
-
Include table of contents - add
in documentation
Git Workflow
See
references/git-workflow.md
for details.
Core rules:
- Only commit .nw files to git
- Add generated files to .gitignore immediately
- Regenerate code with after checkout/pull
- Never commit generated .py or .tex files
Noweb Commands Quick Reference
See
references/noweb-commands.md
for details.
bash
# Tangling
notangle -R"[[module.py]]" file.nw > module.py
noroots file.nw # List root chunks
# Weaving
noweave -n -delay -x -t2 file.nw > file.tex # For inclusion
noweave -latex -x file.nw > file.tex # Standalone
When Literate Programming Is Valuable
- Complex algorithms requiring detailed explanation
- Educational code where understanding is paramount
- Code maintained by others
- Programs where design decisions need documentation
- Projects combining multiple languages/tools