Novel Reader - Intelligent Long Text Novel Reader
Use Python to read long text novels safely and reliably, solving the problem of limited LLM context window. It perfectly supports Chinese novels encoded in UTF-8. It automatically filters irrelevant content and extracts detailed character, item, and scene information in real time.
Core Features
- Multi-format support: If the input file is in PDF, DOC, DOCX format, first use the doc-to-txt skill to convert it to TXT plain text format, then read the converted TXT file
- Read by character position: Use Python to read accurately by character position, no garbled characters will occur
- Segmented reading: Read up to 3000 characters each time, the default value is 3000 characters, to avoid exceeding the tool output limit. Skip reading is strictly prohibited, must read continuously paragraph by paragraph
- Flexible positioning: Can start reading from any position
- Encoding security: Natively supports UTF-8 encoding, correctly handles Chinese characters
- Smart content filtering: Automatically identifies and skips content irrelevant to the main text of the novel (launch testimonials, author thanks, interviews, advertisements, etc.) through the large model
- Real-time asset extraction: After reading a piece of content, immediately identify and record detailed information about characters, items, and scenes in the novel
- Outline recording: Record the summary of each paragraph of content to form a complete outline
- Progress tracking: Record progress information such as current reading position, number of words read, etc.
- Context management: Make full use of the Agent's context compact mechanism, and automatically compress when the context is almost used up
Important Requirements
Skip reading is strictly prohibited
- Must read continuously paragraph by paragraph: After reading 3000 characters each time, the next time must continue reading from the current end position (i.e. start = last start + 3000)
- Skip reading is prohibited: It is absolutely not allowed to skip the middle content and jump directly to the later position
- For loop reading is absolutely prohibited: It is strictly prohibited to use a for loop to read files in batches (e.g. ), because the Agent tool will compress the output, and the for loop will cause the content to be truncated. When you see (some characters truncated), it means the content has been truncated, you can only manually re-read paragraph by paragraph.
- Prohibit calling Python scripts multiple times in one command: Each tool can only output 3000 words, the excess content will be truncated, and one tool call can only call the Python command at most once!
- Improve efficiency: Because you need to read super long text, you can call the reading tool script continuously (but you can't use a for loop), without thinking or speaking in the middle.
- Update loop: Every time you read 10 novel fragments (3000 characters each, a total of 30000 characters), update the outline, progress, and asset files once, and continue to the next loop.
- Prohibit preset tasks: Prohibit the use of TODO lists, Task list related tools!
Usage
Core Command
bash
python3 read_novel.py <novel file path> [--start <start position>]
Parameter Description
| Parameter | Description | Example |
|---|
| Path of the novel file (required), supports TXT, PDF, DOC, DOCX formats. If it is PDF/DOC/DOCX format, first use the doc-to-txt skill to convert to TXT | or |
| Get novel information (total number of characters, total number of lines, non-empty lines) | |
| Start position (character index, starting from 0, default: 0, optional) | |
Examples
Get novel information:
bash
python3 read_novel.py ./novel.txt --info
Read the first 3000 characters of the novel from the beginning (use default parameters):
bash
python3 read_novel.py ./novel.txt --start 0
Read 3000 characters of the novel starting from the 3000th character (only specify start):
bash
python3 read_novel.py ./novel.txt --start 3000
View help:
bash
python3 read_novel.py --help
Workflow
The complete workflow is divided into three stages: file format processing → initialization → loop reading → completion summary
Pre-step: File format processing
- Check input file format
- If it is PDF, DOC, DOCX format, first use the doc-to-txt skill to convert to TXT plain text format
- After the conversion is completed, use the converted TXT file for subsequent operations
Stage 1: Initialization (only once)
Step 1: Get novel information
bash
python3 read_novel.py novel.txt --info
Output example:
Total characters: 5512508
Total lines: 158108
Non-empty lines: 79053
Step 2: Check reading progress
- Check if the file exists
- If exists: Read the current position and continue from that position (resume reading from breakpoint)
- If not exists: Start a new read from position 0
Stage 2: Loop reading (core stage)
Each loop reads 10 fragments (total 30000 characters), divided into two sub-stages:
Sub-stage A: Batch reading (continuous execution, no analysis in between)
Call the read command 10 times continuously, read 3000 characters each time:
- 1st time:
--start <current position>
- 2nd time:
--start <current position+3000>
- ...
- 10th time:
--start <current position+27000>
Example (first loop starting from 0):
Read fragment 1: --start 0
Read fragment 2: --start 3000
...
Read fragment 10: --start 27000
Example (third loop starting from 60000):
Read fragment 21: --start 60000
Read fragment 22: --start 63000
...
Read fragment 30: --start 87000
⚠️ Key rule: 10 reads must be executed continuously, no analysis, no summary, no file update in between
Sub-stage B: Analysis and update (executed after reading 10 fragments)
- Analyze the content of these 10 fragments
- Update : Add chapter summaries of these 10 fragments
- Update : Record the current position (e.g. 30000)
- Create or update asset files: Extract newly appeared characters, items, scenes
Sub-stage C: Judge whether to continue
- Calculate the read progress (number of words read / total number of words)
- If the user's requirement is not met (e.g. "read 5%"): Return to sub-stage A, continue the next batch of 10 fragments
- If the user's requirement is met: Enter stage three
Stage 3: Completion summary
- Output final reading progress
- Summarize the extracted asset statistics (number of characters, number of items, number of scenes)
Execution sequence example
Suppose the user requires to read the first 5% of the novel, and the novel file is
:
Pre-step: File format processing
- Check file format: .pdf
- Use doc-to-txt skill to convert novel.pdf to novel.txt
- Subsequent steps use novel.txt file
Step 1: Initialization
- Get novel information → total number of words 5512508
- Check → does not exist, start from position 0
Step 2: Loop reading
Loop 1 (fragments 1-10, 0-30000 characters):
- Batch reading: Continuously execute to
- Analysis and update: Update outline, progress (30000), assets
- Check progress: 30000/5512508 = 0.54%, not reaching 5%, continue
Loop 2 (fragments 11-20, 30000-60000 characters):
- Batch reading: Continuously execute to
- Analysis and update: Update outline, progress (60000), assets
- Check progress: 60000/5512508 = 1.09%, not reaching 5%, continue
Loop 3 (fragments 21-30, 60000-90000 characters):
- Batch reading: Continuously execute to
- Analysis and update: Update outline, progress (90000), assets
- Check progress: 90000/5512508 = 1.63%, not reaching 5%, continue
... Continue looping ...
Loop 10 (fragments 91-100, 270000-300000 characters):
- Batch reading: Continuously execute to
- Analysis and update: Update outline, progress (300000), assets
- Check progress: 300000/5512508 = 5.44%, reached 5%, stop
Step 3: Completion summary
- Output final progress: 5.44%
- Count assets: X characters, Y items, Z scenes
Key Rules
| Rule | Description |
|---|
| It is strictly prohibited to analyze while reading | It is a wrong practice to analyze and update after reading a fragment |
| Must read in batches | Unified analysis and update after all 10 fragments are read |
| Support breakpoint resume reading | Realize resuming after interruption through reading_progress.txt |
| Progress calculation | Read 30000 characters per loop (10 × 3000) |
Asset Extraction
Smart Content Filtering
After reading the novel content by executing the Python script, let the large model understand the content, automatically identify and ignore the content irrelevant to the main text of the novel, and only retain the main chapter content of the novel for analysis and asset extraction. No rules or regular expressions are used for filtering, completely relying on the understanding ability of the large model.
Real-time Asset Extraction
Every time a piece of content (no more than 3000 characters) is read, immediately identify and extract newly appeared assets (characters, items, scenes) from it, and update the corresponding files in real time. Don't wait until all reading is completed to update. The large model will make full use of the context compact mechanism and automatically compress when the context is almost used up.
Extracted Elements
From each piece of read content, identify and extract the following three types of assets, and collect detailed information as much as possible:
-
Characters - Characters appearing in the novel, including:
- Basic information: name, age, gender, identity, role positioning (protagonist/supporting role/villain/second male/second female, etc.)
- Appearance features: face, height, body shape, dress
- Personality characteristics: mantra, habitual actions, behavior patterns
- Background relationships: family, friends, enemies, teachers
- Ability and cultivation: strength level, special abilities, weapons and equipment
- Appearance plot: first appearance, important events
-
Items - Items, magic weapons, weapons, etc. appearing in the novel, including:
- Basic information: name, type, source
- Appearance features: shape, color, material, size
- Functional characteristics: special abilities, usage methods, effects
- Historical background: origin, former owner, important events
- Related characters: owner, user, contender
-
Scenes - Locations, environments, places, etc. appearing in the novel, including:
- Basic information: name, type, geographical location
- Environmental characteristics: terrain, climate, architectural style, atmosphere
- Functional purposes: residence, cultivation, trading, combat
- Related forces: affiliated forces, managers, permanent personnel
- Important events: key plots that have occurred
Directory Structure
Create the following structure in the directory where the novel file is located:
Novel name or project name/
├── outline.txt
├── reading_progress.txt
├── characters/
│ ├── <character name 1>.txt
│ ├── <character name 2>.txt
│ └── ...
├── items/
│ ├── <item name 1>.txt
│ ├── <item name 2>.txt
│ └── ...
└── scenes/
├── <scene name 1>.txt
├── <scene name 2>.txt
└── ...
File Content Format
outline.txt
1. <Summary of the first paragraph>
2. <Summary of the second paragraph>
...
Description:
- Add summaries according to the logical segmentation of the novel content, not according to the fixed number of words
- The large model naturally decides when to add a new summary entry according to the content
- Each summary is identified by a serial number, arranged in reading order
- Only record the summary of the main content of the novel, skip irrelevant content
- Skip content irrelevant to the main text of the novel, such as launch testimonials, author thanks, interviews, advertisements, etc.
reading_progress.txt
Current position: <character index>
Words read: <number>
Total words: <number>
Progress: <percentage>%
Character file (characters/<character name>.txt)
Each character file contains a detailed description of the character, refer to the following format:
<Character name>
【Basic Information】
Name:
Age:
Gender:
Identity:
Role positioning:
【Appearance Features】
Face:
Height:
Body shape:
Dress:
【Personality Characteristics】
Personality:
Mantra:
Habitual action:
Behavior pattern:
【Background Relationships】
Family:
Friends:
Enemies:
Teachers:
【Ability and Cultivation】
Strength level:
Special ability:
Weapons and equipment:
【Appearance Plot】
First appearance:
Item file (items/<item name>.txt)
Each item file contains a detailed description of the item, refer to the following format:
<Item name>
【Basic Information】
Name:
Type:
Source:
【Appearance Features】
Shape:
Color:
Material:
Size:
【Functional Characteristics】
Special ability:
Usage method:
Effect:
【Historical Background】
Origin:
Important events:
【Related Characters】
Owner:
User:
Contender:
Scene file (scenes/<scene name>.txt)
Each scene file contains a detailed description of the scene, refer to the following format:
<Scene name>
【Basic Information】
Name:
Type:
Geographical location:
【Environmental Characteristics】
Terrain:
Climate:
Architectural style:
Atmosphere:
【Functional Purposes】
Residence:
Cultivation:
Trading:
Combat:
【Related Forces】
Affiliated force:
Manager:
Permanent personnel:
【Important Events】
Key plots that occurred:
Extraction Rules
- Deduplication: If the asset file already exists, only append new information, do not create it repeatedly
- Accuracy: Ensure that the extracted asset name is accurate and avoid typos
- Completeness: Record all relevant information of the asset as much as possible, use the above detailed format
- Natural language: Use natural language description to let the large model understand the extraction requirements
- Content filtering: Before extracting assets, first filter out irrelevant content, only extract from the main text
- Real-time update: Every time a piece of content (no more than 3000 characters) is read, immediately identify and extract newly appeared assets, don't wait until all reading is completed to update
- No rules/regex: Completely obtain content by executing Python scripts, let the large model understand and analyze by itself, do not use any rules or regular expressions for matching
- Importance screening: Only record important assets, unimportant characters, scenes, items can be ignored. The judgment criteria include:
- Characters: Main characters, key supporting roles, important villains, etc. that promote the plot; passers-by, extras, secondary characters that appear once can be ignored
- Items: Magic weapons, weapons, important items that play a key role in the plot; ordinary items, disposable props can be ignored
- Scenes: Main locations and important places where the story takes place; temporary scenes, locations mentioned in passing can be ignored
Tips
- Get the total number of characters in the file:
bash
python3 -c "
with open('test-files/novel.txt', 'r', encoding='utf-8', errors='replace') as f:
print(len(f.read()))
"
-
Segmented reading: Read up to 3000 characters each time, the default value is 3000 characters, to avoid exceeding the tool output limit
-
Context management: Large models usually have a 128k context window, the Agent will automatically compress when the context is almost used up, please use it reasonably
Why use Python?
- ✅ Encoding security: Natively supports UTF-8, no garbled characters
- ✅ Read by character: Not by byte, correctly handle Chinese characters (each Chinese character counts as 1)
- ✅ Cross-platform: Same on Windows/Mac/Linux
- ✅ Pre-installed in system: Almost all modern systems have Python 3 pre-installed
- ✅ Simple syntax: One pattern handles all scenarios