text-summarizer

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Text Summarizer

文本摘要生成器

Create concise summaries from long text documents using extractive summarization. Identifies and extracts the most important sentences while preserving meaning.

使用抽取式摘要法从长文本文档生成简洁摘要。在保留原文含义的前提下识别并提取最重要的句子。

Quick Start

快速开始

python

from scripts.text_summarizer import TextSummarizer

python

from scripts.text_summarizer import TextSummarizer

Summarize text

生成文本摘要

summarizer = TextSummarizer() summary = summarizer.summarize(long_text, ratio=0.2) # 20% of original print(summary)

summarizer = TextSummarizer() summary = summarizer.summarize(long_text, ratio=0.2) # 原文长度的20% print(summary)

Summarize file

生成文件摘要

summary = summarizer.summarize_file("article.txt", num_sentences=5)

undefined

summary = summarizer.summarize_file("article.txt", num_sentences=5)

undefined

Features

功能特性

Extractive Summarization: Selects key sentences from original text
Length Control: By ratio, sentence count, or word count
Multiple Algorithms: TextRank, LSA, frequency-based
Key Points: Extract bullet-point summaries
Batch Processing: Summarize multiple documents
Preserve Structure: Maintains sentence order option

抽取式摘要：从原文中选择关键句子
长度控制：支持按比例、句子数量或单词数量控制
多种算法：TextRank、LSA、基于频率的算法
要点提取：生成项目符号格式的摘要
批量处理：可同时处理多个文档
结构保留：可选保留原文句子顺序

API Reference

API参考

Initialization

初始化

python

summarizer = TextSummarizer(
    method="textrank",    # textrank, lsa, frequency
    language="english"
)

python

summarizer = TextSummarizer(
    method="textrank",    # 可选值：textrank, lsa, frequency
    language="english"
)

Summarization

生成摘要

python

undefined

python

undefined

By ratio (20% of original length)

按比例生成（原文长度的20%）

summary = summarizer.summarize(text, ratio=0.2)

By sentence count

按句子数量生成

summary = summarizer.summarize(text, num_sentences=5)

By word count

按单词数量生成

summary = summarizer.summarize(text, max_words=100)

undefined

summary = summarizer.summarize(text, max_words=100)

undefined

Key Points Extraction

关键要点提取

python

undefined

python

undefined

Get bullet points

获取项目符号格式的要点

points = summarizer.extract_key_points(text, num_points=5) for point in points: print(f"• {point}")

undefined

points = summarizer.extract_key_points(text, num_points=5) for point in points: print(f"• {point}")

undefined

Batch Processing

批量处理

python

undefined

python

undefined

Summarize multiple texts

批量生成多个文本的摘要

texts = [text1, text2, text3] summaries = summarizer.summarize_batch(texts, ratio=0.2)

Summarize files in directory

批量处理目录中的文件

summaries = summarizer.summarize_directory("./articles/", ratio=0.3)

undefined

summaries = summarizer.summarize_directory("./articles/", ratio=0.3)

undefined

Options

可选参数

python

undefined

python

undefined

Preserve original sentence order

保留原文句子顺序

summary = summarizer.summarize(text, preserve_order=True)

Include title/first sentence

包含标题/第一句话

summary = summarizer.summarize(text, include_first=True)

Minimum sentence length filter

设置最小句子长度过滤

summarizer.min_sentence_length = 10

undefined

summarizer.min_sentence_length = 10

undefined

CLI Usage

CLI使用方法

bash

undefined

bash

undefined

Summarize text file

生成文本文件的摘要

python text_summarizer.py --input article.txt --ratio 0.2

Specific sentence count

指定句子数量生成摘要

python text_summarizer.py --input article.txt --sentences 5

Extract key points

提取关键要点

python text_summarizer.py --input article.txt --points 5

Batch process

批量处理

python text_summarizer.py --input-dir ./docs --output-dir ./summaries --ratio 0.3

Output to file

将结果输出到文件

python text_summarizer.py --input article.txt --output summary.txt --ratio 0.2

undefined

python text_summarizer.py --input article.txt --output summary.txt --ratio 0.2

undefined

CLI Arguments

CLI参数说明

Argument	Description	Default
`--input`	Input file path	Required
`--output`	Output file path	stdout
`--input-dir`	Directory of files	-
`--output-dir`	Output directory	-
`--ratio`	Summary ratio (0.0-1.0)	0.2
`--sentences`	Number of sentences	-
`--words`	Maximum words	-
`--points`	Extract N key points	-
`--method`	Algorithm to use	textrank
`--preserve-order`	Keep sentence order	False

参数	描述	默认值
`--input`	输入文件路径	必填
`--output`	输出文件路径	标准输出
`--input-dir`	待处理文件所在目录	-
`--output-dir`	结果输出目录	-
`--ratio`	摘要占原文的比例（0.0-1.0）	0.2
`--sentences`	摘要的句子数量	-
`--words`	摘要的最大单词数	-
`--points`	提取的关键要点数量	-
`--method`	使用的算法	textrank
`--preserve-order`	是否保留原文句子顺序	False

Examples

示例

News Article Summary

新闻文章摘要

python

summarizer = TextSummarizer()

article = """
[Long news article text...]
"""

python

summarizer = TextSummarizer()

article = """
[长新闻文章文本...]
"""

Get a 3-sentence summary

生成包含3个句子的摘要

summary = summarizer.summarize(article, num_sentences=3) print("Summary:") print(summary)

summary = summarizer.summarize(article, num_sentences=3) print("摘要:") print(summary)

Get key points

提取关键要点

points = summarizer.extract_key_points(article, num_points=5) print("\nKey Points:") for i, point in enumerate(points, 1): print(f"{i}. {point}")

undefined

points = summarizer.extract_key_points(article, num_points=5) print("\n关键要点:") for i, point in enumerate(points, 1): print(f"{i}. {point}")

undefined

Research Paper Abstract

研究论文摘要生成

python

summarizer = TextSummarizer(method="lsa")

paper = open("research_paper.txt").read()

python

summarizer = TextSummarizer(method="lsa")

paper = open("research_paper.txt").read()

Create abstract-length summary

生成摘要长度的内容

abstract = summarizer.summarize(paper, max_words=250) print(abstract)

undefined

abstract = summarizer.summarize(paper, max_words=250) print(abstract)

undefined

Meeting Notes Summary

会议记录摘要

python

summarizer = TextSummarizer()

notes = """
Meeting started at 2pm. John presented Q3 results showing 15% growth.
Sarah raised concerns about supply chain delays affecting Q4 projections.
The team discussed mitigation strategies including dual-sourcing.
Budget allocation for marketing was approved at $50k.
Next steps include vendor outreach by Friday.
Follow-up meeting scheduled for next Tuesday.
"""

summary = summarizer.summarize(notes, num_sentences=3)
points = summarizer.extract_key_points(notes, num_points=4)

print("Summary:", summary)
print("\nAction Items:")
for point in points:
    print(f"• {point}")

python

summarizer = TextSummarizer()

notes = """
会议于下午2点开始。John展示了第三季度业绩，增长率达15%。
Sarah提出了供应链延迟影响第四季度预测的担忧。
团队讨论了包括双重采购在内的缓解策略。
营销预算获批5万美元。
下一步是在周五前联系供应商。
后续会议定于下周二举行。
"""

summary = summarizer.summarize(notes, num_sentences=3)
points = summarizer.extract_key_points(notes, num_points=4)

print("摘要:", summary)
print("\n行动项:")
for point in points:
    print(f"• {point}")

Batch Document Summarization

批量文档摘要生成

python

summarizer = TextSummarizer()

import os
for filename in os.listdir("./documents"):
    if filename.endswith(".txt"):
        text = open(f"./documents/{filename}").read()
        summary = summarizer.summarize(text, ratio=0.2)

        with open(f"./summaries/{filename}", "w") as f:
            f.write(summary)

        print(f"Summarized: {filename}")

python

summarizer = TextSummarizer()

import os
for filename in os.listdir("./documents"):
    if filename.endswith(".txt"):
        text = open(f"./documents/{filename}").read()
        summary = summarizer.summarize(text, ratio=0.2)

        with open(f"./summaries/{filename}", "w") as f:
            f.write(summary)

        print(f"已完成摘要: {filename}")

Algorithm Comparison

算法对比

Algorithm	Speed	Quality	Best For
TextRank	Medium	High	General text
LSA	Fast	Good	Technical docs
Frequency	Fast	Medium	Quick summaries

算法	速度	质量	适用场景
TextRank	中等	高	通用文本
LSA	快	良好	技术文档
基于频率的算法	快	中等	快速生成摘要

Dependencies

依赖项

nltk>=3.8.0
numpy>=1.24.0
scikit-learn>=1.2.0

nltk>=3.8.0
numpy>=1.24.0
scikit-learn>=1.2.0

Limitations

局限性

Extractive only (doesn't paraphrase or generate new text)
Works best with well-structured text (paragraphs, clear sentences)
Very short texts may not summarize well
Doesn't understand context deeply (may miss nuance)

仅支持抽取式摘要（不会改写或生成新文本）
对结构清晰的文本（段落分明、句子清晰）处理效果最佳
极短文本可能无法生成有效摘要
无法深度理解上下文（可能会遗漏细微含义）