Novel Reader - Intelligent Long Text Novel Reader

Use Python to read long text novels safely and reliably, solving the problem of limited LLM context window. It perfectly supports Chinese novels encoded in UTF-8. It automatically filters irrelevant content and extracts detailed character, item, and scene information in real time.

Core Features

Multi-format support: If the input file is in PDF, DOC, DOCX format, first use the doc-to-txt skill to convert it to TXT plain text format, then read the converted TXT file
Read by character position: Use Python to read accurately by character position, no garbled characters will occur
Segmented reading: Read up to 3000 characters each time, the default value is 3000 characters, to avoid exceeding the tool output limit. Skip reading is strictly prohibited, must read continuously paragraph by paragraph
Flexible positioning: Can start reading from any position
Encoding security: Natively supports UTF-8 encoding, correctly handles Chinese characters
Smart content filtering: Automatically identifies and skips content irrelevant to the main text of the novel (launch testimonials, author thanks, interviews, advertisements, etc.) through the large model
Real-time asset extraction: After reading a piece of content, immediately identify and record detailed information about characters, items, and scenes in the novel
Outline recording: Record the summary of each paragraph of content to form a complete outline
Progress tracking: Record progress information such as current reading position, number of words read, etc.
Context management: Make full use of the Agent's context compact mechanism, and automatically compress when the context is almost used up

Important Requirements

Skip reading is strictly prohibited

Must read continuously paragraph by paragraph: After reading 3000 characters each time, the next time must continue reading from the current end position (i.e. start = last start + 3000)
Skip reading is prohibited: It is absolutely not allowed to skip the middle content and jump directly to the later position
For loop reading is absolutely prohibited: It is strictly prohibited to use a for loop to read files in batches (e.g.
```
for i in $(seq ...)
```
), because the Agent tool will compress the output, and the for loop will cause the content to be truncated. When you see (some characters truncated), it means the content has been truncated, you can only manually re-read paragraph by paragraph.
Prohibit calling Python scripts multiple times in one command: Each tool can only output 3000 words, the excess content will be truncated, and one tool call can only call the Python command at most once!
Improve efficiency: Because you need to read super long text, you can call the reading tool script continuously (but you can't use a for loop), without thinking or speaking in the middle.
Update loop: Every time you read 10 novel fragments (3000 characters each, a total of 30000 characters), update the outline, progress, and asset files once, and continue to the next loop.
Prohibit preset tasks: Prohibit the use of TODO lists, Task list related tools!

Usage

Core Command

bash

python3 read_novel.py <novel file path> [--start <start position>]

Parameter Description

Parameter	Description	Example
`<novel file path>`	Path of the novel file (required), supports TXT, PDF, DOC, DOCX formats. If it is PDF/DOC/DOCX format, first use the doc-to-txt skill to convert to TXT	`test-files/novel.txt` or `test-files/novel.pdf`
`--info`	Get novel information (total number of characters, total number of lines, non-empty lines)	`--info`
`--start`	Start position (character index, starting from 0, default: 0, optional)	`--start 10000`

Examples

Get novel information:

bash

python3 read_novel.py ./novel.txt --info

Read the first 3000 characters of the novel from the beginning (use default parameters):

bash

python3 read_novel.py ./novel.txt --start 0

Read 3000 characters of the novel starting from the 3000th character (only specify start):

bash

python3 read_novel.py ./novel.txt --start 3000

View help:

bash

python3 read_novel.py --help

Workflow

The complete workflow is divided into three stages: file format processing → initialization → loop reading → completion summary

Pre-step: File format processing

Check input file format
If it is PDF, DOC, DOCX format, first use the doc-to-txt skill to convert to TXT plain text format
After the conversion is completed, use the converted TXT file for subsequent operations

Stage 1: Initialization (only once)

Step 1: Get novel information

bash

python3 read_novel.py novel.txt --info

Output example:

Total characters: 5512508
Total lines: 158108
Non-empty lines: 79053

Step 2: Check reading progress

Check if the
```
reading_progress.txt
```
file exists
If exists: Read the current position and continue from that position (resume reading from breakpoint)
If not exists: Start a new read from position 0

Stage 2: Loop reading (core stage)

Each loop reads 10 fragments (total 30000 characters), divided into two sub-stages:

Sub-stage A: Batch reading (continuous execution, no analysis in between)

Call the read command 10 times continuously, read 3000 characters each time:

1st time:
```
--start <current position>
```
2nd time:
```
--start <current position+3000>
```
...
10th time:
```
--start <current position+27000>
```

Example (first loop starting from 0):

Read fragment 1: --start 0
Read fragment 2: --start 3000
...
Read fragment 10: --start 27000

Example (third loop starting from 60000):

Read fragment 21: --start 60000
Read fragment 22: --start 63000
...
Read fragment 30: --start 87000

⚠️ Key rule: 10 reads must be executed continuously, no analysis, no summary, no file update in between

Sub-stage B: Analysis and update (executed after reading 10 fragments)

Analyze the content of these 10 fragments
Update
```
outline.txt
```
: Add chapter summaries of these 10 fragments
Update
```
reading_progress.txt
```
: Record the current position (e.g. 30000)
Create or update asset files: Extract newly appeared characters, items, scenes

Sub-stage C: Judge whether to continue

Calculate the read progress (number of words read / total number of words)
If the user's requirement is not met (e.g. "read 5%"): Return to sub-stage A, continue the next batch of 10 fragments
If the user's requirement is met: Enter stage three

Stage 3: Completion summary

Output final reading progress
Summarize the extracted asset statistics (number of characters, number of items, number of scenes)

Execution sequence example

Suppose the user requires to read the first 5% of the novel, and the novel file is

novel.pdf

Pre-step: File format processing

Check file format: .pdf
Use doc-to-txt skill to convert novel.pdf to novel.txt
Subsequent steps use novel.txt file

Step 1: Initialization

Get novel information → total number of words 5512508
Check
```
reading_progress.txt
```
→ does not exist, start from position 0

Step 2: Loop reading

Loop 1 (fragments 1-10, 0-30000 characters):

Batch reading: Continuously execute
```
--start 0
```
to
```
--start 27000
```
Analysis and update: Update outline, progress (30000), assets
Check progress: 30000/5512508 = 0.54%, not reaching 5%, continue

Loop 2 (fragments 11-20, 30000-60000 characters):

Batch reading: Continuously execute
```
--start 30000
```
to
```
--start 57000
```
Analysis and update: Update outline, progress (60000), assets
Check progress: 60000/5512508 = 1.09%, not reaching 5%, continue

Loop 3 (fragments 21-30, 60000-90000 characters):

Batch reading: Continuously execute
```
--start 60000
```
to
```
--start 87000
```
Analysis and update: Update outline, progress (90000), assets
Check progress: 90000/5512508 = 1.63%, not reaching 5%, continue

... Continue looping ...

Loop 10 (fragments 91-100, 270000-300000 characters):

Batch reading: Continuously execute
```
--start 270000
```
to
```
--start 297000
```
Analysis and update: Update outline, progress (300000), assets
Check progress: 300000/5512508 = 5.44%, reached 5%, stop

Step 3: Completion summary

Output final progress: 5.44%
Count assets: X characters, Y items, Z scenes

Key Rules

Rule	Description
It is strictly prohibited to analyze while reading	It is a wrong practice to analyze and update after reading a fragment
Must read in batches	Unified analysis and update after all 10 fragments are read
Support breakpoint resume reading	Realize resuming after interruption through reading_progress.txt
Progress calculation	Read 30000 characters per loop (10 × 3000)

Asset Extraction

Smart Content Filtering

After reading the novel content by executing the Python script, let the large model understand the content, automatically identify and ignore the content irrelevant to the main text of the novel, and only retain the main chapter content of the novel for analysis and asset extraction. No rules or regular expressions are used for filtering, completely relying on the understanding ability of the large model.

Real-time Asset Extraction

Every time a piece of content (no more than 3000 characters) is read, immediately identify and extract newly appeared assets (characters, items, scenes) from it, and update the corresponding files in real time. Don't wait until all reading is completed to update. The large model will make full use of the context compact mechanism and automatically compress when the context is almost used up.

Extracted Elements

From each piece of read content, identify and extract the following three types of assets, and collect detailed information as much as possible:

Characters - Characters appearing in the novel, including:
- Basic information: name, age, gender, identity, role positioning (protagonist/supporting role/villain/second male/second female, etc.)
- Appearance features: face, height, body shape, dress
- Personality characteristics: mantra, habitual actions, behavior patterns
- Background relationships: family, friends, enemies, teachers
- Ability and cultivation: strength level, special abilities, weapons and equipment
- Appearance plot: first appearance, important events
Items - Items, magic weapons, weapons, etc. appearing in the novel, including:
- Basic information: name, type, source
- Appearance features: shape, color, material, size
- Functional characteristics: special abilities, usage methods, effects
- Historical background: origin, former owner, important events
- Related characters: owner, user, contender
Scenes - Locations, environments, places, etc. appearing in the novel, including:
- Basic information: name, type, geographical location
- Environmental characteristics: terrain, climate, architectural style, atmosphere
- Functional purposes: residence, cultivation, trading, combat
- Related forces: affiliated forces, managers, permanent personnel
- Important events: key plots that have occurred

Directory Structure

Create the following structure in the directory where the novel file is located:

Novel name or project name/
├── outline.txt
├── reading_progress.txt
├── characters/
│   ├── <character name 1>.txt
│   ├── <character name 2>.txt
│   └── ...
├── items/
│   ├── <item name 1>.txt
│   ├── <item name 2>.txt
│   └── ...
└── scenes/
    ├── <scene name 1>.txt
    ├── <scene name 2>.txt
    └── ...

File Content Format

outline.txt

1. <Summary of the first paragraph>

2. <Summary of the second paragraph>

...

Description:

Add summaries according to the logical segmentation of the novel content, not according to the fixed number of words
The large model naturally decides when to add a new summary entry according to the content
Each summary is identified by a serial number, arranged in reading order
Only record the summary of the main content of the novel, skip irrelevant content
Skip content irrelevant to the main text of the novel, such as launch testimonials, author thanks, interviews, advertisements, etc.

reading_progress.txt

Current position: <character index>
Words read: <number>
Total words: <number>
Progress: <percentage>%

Character file (characters/<character name>.txt)

Each character file contains a detailed description of the character, refer to the following format:

<Character name>

【Basic Information】
Name:
Age:
Gender:
Identity:
Role positioning:

【Appearance Features】
Face:
Height:
Body shape:
Dress:

【Personality Characteristics】
Personality:
Mantra:
Habitual action:
Behavior pattern:

【Background Relationships】
Family:
Friends:
Enemies:
Teachers:

【Ability and Cultivation】
Strength level:
Special ability:
Weapons and equipment:

【Appearance Plot】
First appearance:

Item file (items/<item name>.txt)

Each item file contains a detailed description of the item, refer to the following format:

<Item name>

【Basic Information】
Name:
Type:
Source:

【Appearance Features】
Shape:
Color:
Material:
Size:

【Functional Characteristics】
Special ability:
Usage method:
Effect:

【Historical Background】
Origin:
Important events:

【Related Characters】
Owner:
User:
Contender:

Scene file (scenes/<scene name>.txt)

Each scene file contains a detailed description of the scene, refer to the following format:

<Scene name>

【Basic Information】
Name:
Type:
Geographical location:

【Environmental Characteristics】
Terrain:
Climate:
Architectural style:
Atmosphere:

【Functional Purposes】
Residence:
Cultivation:
Trading:
Combat:

【Related Forces】
Affiliated force:
Manager:
Permanent personnel:

【Important Events】
Key plots that occurred:

Extraction Rules

Deduplication: If the asset file already exists, only append new information, do not create it repeatedly
Accuracy: Ensure that the extracted asset name is accurate and avoid typos
Completeness: Record all relevant information of the asset as much as possible, use the above detailed format
Natural language: Use natural language description to let the large model understand the extraction requirements
Content filtering: Before extracting assets, first filter out irrelevant content, only extract from the main text
Real-time update: Every time a piece of content (no more than 3000 characters) is read, immediately identify and extract newly appeared assets, don't wait until all reading is completed to update
No rules/regex: Completely obtain content by executing Python scripts, let the large model understand and analyze by itself, do not use any rules or regular expressions for matching
Importance screening: Only record important assets, unimportant characters, scenes, items can be ignored. The judgment criteria include:
- Characters: Main characters, key supporting roles, important villains, etc. that promote the plot; passers-by, extras, secondary characters that appear once can be ignored
- Items: Magic weapons, weapons, important items that play a key role in the plot; ordinary items, disposable props can be ignored
- Scenes: Main locations and important places where the story takes place; temporary scenes, locations mentioned in passing can be ignored

Tips

Get the total number of characters in the file:

bash

python3 -c "
with open('test-files/novel.txt', 'r', encoding='utf-8', errors='replace') as f:
    print(len(f.read()))
"

Segmented reading: Read up to 3000 characters each time, the default value is 3000 characters, to avoid exceeding the tool output limit
Context management: Large models usually have a 128k context window, the Agent will automatically compress when the context is almost used up, please use it reasonably

Why use Python?

✅ Encoding security: Natively supports UTF-8, no garbled characters
✅ Read by character: Not by byte, correctly handle Chinese characters (each Chinese character counts as 1)
✅ Cross-platform: Same on Windows/Mac/Linux
✅ Pre-installed in system: Almost all modern systems have Python 3 pre-installed
✅ Simple syntax: One pattern handles all scenarios

novel-reader

NPX Install

Tags

SKILL.md Content (Chinese)

Novel Reader - Intelligent Long Text Novel Reader

Core Features

Important Requirements

Skip reading is strictly prohibited

Usage

Core Command

Parameter Description

Examples

Workflow

Pre-step: File format processing

Stage 1: Initialization (only once)

Stage 2: Loop reading (core stage)

Stage 3: Completion summary

Execution sequence example

Key Rules

Asset Extraction

Smart Content Filtering

Real-time Asset Extraction

Extracted Elements

Directory Structure

File Content Format

outline.txt

reading_progress.txt

Character file (characters/<character name>.txt)

Item file (items/<item name>.txt)

Scene file (scenes/<scene name>.txt)

Extraction Rules

Tips

Why use Python?