split-pdf

Original🇺🇸 English
Translated

Download, split, and deeply read academic PDFs for teaching preparation or research. Works with Cowork to split PDFs into 4-page chunks, read in batches, and produce structured reading notes on research methodology and contributions.

1installs
Added on

NPX Install

npx skill4agent add kaligi/lecture-pipeline split-pdf

Tags

Translated version includes tags in frontmatter

Split-PDF: Download, Split, and Deep-Read Academic Papers

CRITICAL RULE: Never read a full PDF. Only read the 4-page split files, and only 3 splits at a time (~12 pages). Reading a full PDF will either crash the session with a "context limit exceeded" error or produce shallow, hallucinated output.

When This Skill Is Invoked

You want to read, review, or analyze an academic paper either for:
  • Teaching workflow: Reading papers to prepare lectures, understand context, extract key findings
  • Research workflow: Reading papers for a research project, building background, identifying methodology
The input is either:
  • A file path to a local PDF (e.g.,
    /path/to/articles/smith_2024.pdf
    )
  • A search query or paper title (e.g.,
    "Gentzkow Shapiro Sinkinson 2014 competition newspapers"
    )
Important: You cannot search for a paper you don't know exists. The user MUST provide either a file path or a specific search query — an author name, a title, keywords, a year, or some combination that identifies the paper. If invoked without specifying what paper to read, ask the user.

Step 1: Acquire the PDF

If a local file path is provided:
  • Verify the file exists
  • If the file is NOT already inside
    articles/
    , copy it there (preserve the original location)
  • Proceed to Step 2
If a search query or paper title is provided:
  1. Use WebSearch to find the paper
  2. Download the PDF (request user permission if required)
  3. Save it to
    articles/
    in the project directory (create the directory if needed)
  4. Proceed to Step 2
CRITICAL: Always preserve the original PDF. The PDF in
articles/
must NEVER be deleted, moved, or overwritten. Split files are derivatives — the original is permanent.

Step 2: Split the PDF into 4-Page Chunks

Create a subdirectory for the splits and run the splitting script:
python
from PyPDF2 import PdfReader, PdfWriter
import os

def split_pdf(input_path, output_dir, pages_per_chunk=4):
    """Split PDF into 4-page chunks. Preserves original."""
    os.makedirs(output_dir, exist_ok=True)
    reader = PdfReader(input_path)
    total = len(reader.pages)
    prefix = os.path.splitext(os.path.basename(input_path))[0]

    for start in range(0, total, pages_per_chunk):
        end = min(start + pages_per_chunk, total)
        writer = PdfWriter()
        for i in range(start, end):
            writer.add_page(reader.pages[i])

        out_name = f"{prefix}_pp{start+1}-{end}.pdf"
        out_path = os.path.join(output_dir, out_name)
        with open(out_path, "wb") as f:
            writer.write(f)

    print(f"Split {total} pages into {-(-total // pages_per_chunk)} chunks in {output_dir}")
Directory convention:
articles/
├── smith_2024.pdf                    # original — NEVER DELETE
└── split_smith_2024/                 # split subdirectory
    ├── smith_2024_pp1-4.pdf
    ├── smith_2024_pp5-8.pdf
    ├── smith_2024_pp9-12.pdf
    └── notes.md                      # structured notes
If PyPDF2 is not installed:
pip install PyPDF2

Step 3: Read in Batches of 3 Splits

Read exactly 3 split files at a time (~12 pages). After each batch:
  1. Read the 3 split PDFs using Cowork's Read tool
  2. Update the running notes file (
    notes.md
    in the split subdirectory)
  3. Report to the user:
"I have finished reading splits [X-Y] and updated the notes. I have [N] more splits remaining. Would you like me to continue?"
  1. Wait for user confirmation before reading the next batch
Do NOT read ahead. Do NOT read all splits at once.

Step 4: Structured Extraction

As you read, collect information along 8 dimensions and write them into
notes.md
:
  1. Research Question — What is the paper asking? Why does it matter?
  2. Audience — Which research community cares about this work?
  3. Method — How do they answer the question? Identification strategy?
  4. Data Sources — What data? Where from? Unit of observation? Sample size? Time period?
  5. Statistical Methods — What econometric or statistical techniques? Key specifications?
  6. Findings — Main results? Key coefficient estimates and standard errors?
  7. Contributions — What is new? What did we learn?
  8. Replication Feasibility — Is data public? Replication archive? Data appendix? URLs?
These 8 dimensions extract what a researcher needs to build on or replicate the work.

Step 5: The Notes File

The output is
notes.md
in the split subdirectory:
articles/split_smith_2024/notes.md
This file is updated incrementally after each batch. Structure it with headers for each of the 8 dimensions. After each batch, update whichever dimensions have new information — do not rewrite from scratch.
By the final batch, the notes should contain specific data sources, variable names, equation references, sample sizes, coefficient estimates, and standard errors. A structured extraction, not a summary.

When NOT to Split

  • Papers shorter than ~15 pages: Read directly using Cowork's Read tool
  • Policy briefs or non-technical documents: A rough summary is acceptable
  • Triage only: Read just the first split (pages 1-4, abstract + introduction)

Quick Reference

StepAction
AcquireDownload to
articles/
or use existing file
Split4-page chunks into
articles/split_<name>/
Read3 splits at a time; pause after each batch
WriteUpdate
notes.md
with 8 dimensions
ConfirmAsk user before continuing to next batch

Key Differences from Original

  • Cowork compatible: No
    .claude/commands
    , no slash commands. Works with Cowork's file system and tools.
  • Dual workflow: Explicitly supports both teaching (lecture prep) and research (project work).
  • PyPDF2-based splitting: Uses industry-standard PDF library.
  • Preserved originals: Split files saved to
    articles/split_<name>/
    , originals never deleted.
  • Structured 8-dimension extraction: Methodical note-taking across research dimensions.