Split-PDF: Download, Split, and Deep-Read Academic Papers

CRITICAL RULE: Never read a full PDF. Only read the 4-page split files, and only 3 splits at a time (~12 pages). Reading a full PDF will either crash the session with a "context limit exceeded" error or produce shallow, hallucinated output.

When This Skill Is Invoked

You want to read, review, or analyze an academic paper either for:

Teaching workflow: Reading papers to prepare lectures, understand context, extract key findings
Research workflow: Reading papers for a research project, building background, identifying methodology

The input is either:

A file path to a local PDF (e.g.,
```
/path/to/articles/smith_2024.pdf
```
)

A search query or paper title (e.g.,

"Gentzkow Shapiro Sinkinson 2014 competition newspapers"

)

Important: You cannot search for a paper you don't know exists. The user MUST provide either a file path or a specific search query — an author name, a title, keywords, a year, or some combination that identifies the paper. If invoked without specifying what paper to read, ask the user.

Step 1: Acquire the PDF

If a local file path is provided:

Verify the file exists
If the file is NOT already inside
```
articles/
```
, copy it there (preserve the original location)
Proceed to Step 2

If a search query or paper title is provided:

Use WebSearch to find the paper
Download the PDF (request user permission if required)
Save it to
```
articles/
```
in the project directory (create the directory if needed)
Proceed to Step 2

CRITICAL: Always preserve the original PDF. The PDF in

articles/

must NEVER be deleted, moved, or overwritten. Split files are derivatives — the original is permanent.

Step 2: Split the PDF into 4-Page Chunks

Create a subdirectory for the splits and run the splitting script:

python

from PyPDF2 import PdfReader, PdfWriter
import os

def split_pdf(input_path, output_dir, pages_per_chunk=4):
    """Split PDF into 4-page chunks. Preserves original."""
    os.makedirs(output_dir, exist_ok=True)
    reader = PdfReader(input_path)
    total = len(reader.pages)
    prefix = os.path.splitext(os.path.basename(input_path))[0]

    for start in range(0, total, pages_per_chunk):
        end = min(start + pages_per_chunk, total)
        writer = PdfWriter()
        for i in range(start, end):
            writer.add_page(reader.pages[i])

        out_name = f"{prefix}_pp{start+1}-{end}.pdf"
        out_path = os.path.join(output_dir, out_name)
        with open(out_path, "wb") as f:
            writer.write(f)

    print(f"Split {total} pages into {-(-total // pages_per_chunk)} chunks in {output_dir}")

Directory convention:

articles/
├── smith_2024.pdf                    # original — NEVER DELETE
└── split_smith_2024/                 # split subdirectory
    ├── smith_2024_pp1-4.pdf
    ├── smith_2024_pp5-8.pdf
    ├── smith_2024_pp9-12.pdf
    └── notes.md                      # structured notes

If PyPDF2 is not installed:

pip install PyPDF2

Step 3: Read in Batches of 3 Splits

Read exactly 3 split files at a time (~12 pages). After each batch:

Read the 3 split PDFs using Cowork's Read tool
Update the running notes file (
```
notes.md
```
in the split subdirectory)
Report to the user:

"I have finished reading splits [X-Y] and updated the notes. I have [N] more splits remaining. Would you like me to continue?"

Wait for user confirmation before reading the next batch

Do NOT read ahead. Do NOT read all splits at once.

Step 4: Structured Extraction

As you read, collect information along 8 dimensions and write them into

notes.md

Research Question — What is the paper asking? Why does it matter?
Audience — Which research community cares about this work?
Method — How do they answer the question? Identification strategy?
Data Sources — What data? Where from? Unit of observation? Sample size? Time period?
Statistical Methods — What econometric or statistical techniques? Key specifications?
Findings — Main results? Key coefficient estimates and standard errors?
Contributions — What is new? What did we learn?
Replication Feasibility — Is data public? Replication archive? Data appendix? URLs?

These 8 dimensions extract what a researcher needs to build on or replicate the work.

Step 5: The Notes File

The output is

notes.md

in the split subdirectory:

articles/split_smith_2024/notes.md

This file is updated incrementally after each batch. Structure it with headers for each of the 8 dimensions. After each batch, update whichever dimensions have new information — do not rewrite from scratch.

By the final batch, the notes should contain specific data sources, variable names, equation references, sample sizes, coefficient estimates, and standard errors. A structured extraction, not a summary.

When NOT to Split

Papers shorter than ~15 pages: Read directly using Cowork's Read tool
Policy briefs or non-technical documents: A rough summary is acceptable
Triage only: Read just the first split (pages 1-4, abstract + introduction)

Quick Reference

Step	Action
Acquire	Download to `articles/` or use existing file
Split	4-page chunks into `articles/split_<name>/`
Read	3 splits at a time; pause after each batch
Write	Update `notes.md` with 8 dimensions
Confirm	Ask user before continuing to next batch

Key Differences from Original

Cowork compatible: No
```
.claude/commands
```
, no slash commands. Works with Cowork's file system and tools.
Dual workflow: Explicitly supports both teaching (lecture prep) and research (project work).
PyPDF2-based splitting: Uses industry-standard PDF library.
Preserved originals: Split files saved to
```
articles/split_<name>/
```
, originals never deleted.
Structured 8-dimension extraction: Methodical note-taking across research dimensions.

split-pdf

NPX Install

Tags

SKILL.md Content