text-summarizer

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Text Summarizer

文本摘要生成器

Create concise summaries from long text documents using extractive summarization. Identifies and extracts the most important sentences while preserving meaning.
使用抽取式摘要法从长文本文档生成简洁摘要。在保留原文含义的前提下识别并提取最重要的句子。

Quick Start

快速开始

python
from scripts.text_summarizer import TextSummarizer
python
from scripts.text_summarizer import TextSummarizer

Summarize text

生成文本摘要

summarizer = TextSummarizer() summary = summarizer.summarize(long_text, ratio=0.2) # 20% of original print(summary)
summarizer = TextSummarizer() summary = summarizer.summarize(long_text, ratio=0.2) # 原文长度的20% print(summary)

Summarize file

生成文件摘要

summary = summarizer.summarize_file("article.txt", num_sentences=5)
undefined
summary = summarizer.summarize_file("article.txt", num_sentences=5)
undefined

Features

功能特性

  • Extractive Summarization: Selects key sentences from original text
  • Length Control: By ratio, sentence count, or word count
  • Multiple Algorithms: TextRank, LSA, frequency-based
  • Key Points: Extract bullet-point summaries
  • Batch Processing: Summarize multiple documents
  • Preserve Structure: Maintains sentence order option
  • 抽取式摘要:从原文中选择关键句子
  • 长度控制:支持按比例、句子数量或单词数量控制
  • 多种算法:TextRank、LSA、基于频率的算法
  • 要点提取:生成项目符号格式的摘要
  • 批量处理:可同时处理多个文档
  • 结构保留:可选保留原文句子顺序

API Reference

API参考

Initialization

初始化

python
summarizer = TextSummarizer(
    method="textrank",    # textrank, lsa, frequency
    language="english"
)
python
summarizer = TextSummarizer(
    method="textrank",    # 可选值:textrank, lsa, frequency
    language="english"
)

Summarization

生成摘要

python
undefined
python
undefined

By ratio (20% of original length)

按比例生成(原文长度的20%)

summary = summarizer.summarize(text, ratio=0.2)
summary = summarizer.summarize(text, ratio=0.2)

By sentence count

按句子数量生成

summary = summarizer.summarize(text, num_sentences=5)
summary = summarizer.summarize(text, num_sentences=5)

By word count

按单词数量生成

summary = summarizer.summarize(text, max_words=100)
undefined
summary = summarizer.summarize(text, max_words=100)
undefined

Key Points Extraction

关键要点提取

python
undefined
python
undefined

Get bullet points

获取项目符号格式的要点

points = summarizer.extract_key_points(text, num_points=5) for point in points: print(f"• {point}")
undefined
points = summarizer.extract_key_points(text, num_points=5) for point in points: print(f"• {point}")
undefined

Batch Processing

批量处理

python
undefined
python
undefined

Summarize multiple texts

批量生成多个文本的摘要

texts = [text1, text2, text3] summaries = summarizer.summarize_batch(texts, ratio=0.2)
texts = [text1, text2, text3] summaries = summarizer.summarize_batch(texts, ratio=0.2)

Summarize files in directory

批量处理目录中的文件

summaries = summarizer.summarize_directory("./articles/", ratio=0.3)
undefined
summaries = summarizer.summarize_directory("./articles/", ratio=0.3)
undefined

Options

可选参数

python
undefined
python
undefined

Preserve original sentence order

保留原文句子顺序

summary = summarizer.summarize(text, preserve_order=True)
summary = summarizer.summarize(text, preserve_order=True)

Include title/first sentence

包含标题/第一句话

summary = summarizer.summarize(text, include_first=True)
summary = summarizer.summarize(text, include_first=True)

Minimum sentence length filter

设置最小句子长度过滤

summarizer.min_sentence_length = 10
undefined
summarizer.min_sentence_length = 10
undefined

CLI Usage

CLI使用方法

bash
undefined
bash
undefined

Summarize text file

生成文本文件的摘要

python text_summarizer.py --input article.txt --ratio 0.2
python text_summarizer.py --input article.txt --ratio 0.2

Specific sentence count

指定句子数量生成摘要

python text_summarizer.py --input article.txt --sentences 5
python text_summarizer.py --input article.txt --sentences 5

Extract key points

提取关键要点

python text_summarizer.py --input article.txt --points 5
python text_summarizer.py --input article.txt --points 5

Batch process

批量处理

python text_summarizer.py --input-dir ./docs --output-dir ./summaries --ratio 0.3
python text_summarizer.py --input-dir ./docs --output-dir ./summaries --ratio 0.3

Output to file

将结果输出到文件

python text_summarizer.py --input article.txt --output summary.txt --ratio 0.2
undefined
python text_summarizer.py --input article.txt --output summary.txt --ratio 0.2
undefined

CLI Arguments

CLI参数说明

ArgumentDescriptionDefault
--input
Input file pathRequired
--output
Output file pathstdout
--input-dir
Directory of files-
--output-dir
Output directory-
--ratio
Summary ratio (0.0-1.0)0.2
--sentences
Number of sentences-
--words
Maximum words-
--points
Extract N key points-
--method
Algorithm to usetextrank
--preserve-order
Keep sentence orderFalse
参数描述默认值
--input
输入文件路径必填
--output
输出文件路径标准输出
--input-dir
待处理文件所在目录-
--output-dir
结果输出目录-
--ratio
摘要占原文的比例(0.0-1.0)0.2
--sentences
摘要的句子数量-
--words
摘要的最大单词数-
--points
提取的关键要点数量-
--method
使用的算法textrank
--preserve-order
是否保留原文句子顺序False

Examples

示例

News Article Summary

新闻文章摘要

python
summarizer = TextSummarizer()

article = """
[Long news article text...]
"""
python
summarizer = TextSummarizer()

article = """
[长新闻文章文本...]
"""

Get a 3-sentence summary

生成包含3个句子的摘要

summary = summarizer.summarize(article, num_sentences=3) print("Summary:") print(summary)
summary = summarizer.summarize(article, num_sentences=3) print("摘要:") print(summary)

Get key points

提取关键要点

points = summarizer.extract_key_points(article, num_points=5) print("\nKey Points:") for i, point in enumerate(points, 1): print(f"{i}. {point}")
undefined
points = summarizer.extract_key_points(article, num_points=5) print("\n关键要点:") for i, point in enumerate(points, 1): print(f"{i}. {point}")
undefined

Research Paper Abstract

研究论文摘要生成

python
summarizer = TextSummarizer(method="lsa")

paper = open("research_paper.txt").read()
python
summarizer = TextSummarizer(method="lsa")

paper = open("research_paper.txt").read()

Create abstract-length summary

生成摘要长度的内容

abstract = summarizer.summarize(paper, max_words=250) print(abstract)
undefined
abstract = summarizer.summarize(paper, max_words=250) print(abstract)
undefined

Meeting Notes Summary

会议记录摘要

python
summarizer = TextSummarizer()

notes = """
Meeting started at 2pm. John presented Q3 results showing 15% growth.
Sarah raised concerns about supply chain delays affecting Q4 projections.
The team discussed mitigation strategies including dual-sourcing.
Budget allocation for marketing was approved at $50k.
Next steps include vendor outreach by Friday.
Follow-up meeting scheduled for next Tuesday.
"""

summary = summarizer.summarize(notes, num_sentences=3)
points = summarizer.extract_key_points(notes, num_points=4)

print("Summary:", summary)
print("\nAction Items:")
for point in points:
    print(f"• {point}")
python
summarizer = TextSummarizer()

notes = """
会议于下午2点开始。John展示了第三季度业绩,增长率达15%。
Sarah提出了供应链延迟影响第四季度预测的担忧。
团队讨论了包括双重采购在内的缓解策略。
营销预算获批5万美元。
下一步是在周五前联系供应商。
后续会议定于下周二举行。
"""

summary = summarizer.summarize(notes, num_sentences=3)
points = summarizer.extract_key_points(notes, num_points=4)

print("摘要:", summary)
print("\n行动项:")
for point in points:
    print(f"• {point}")

Batch Document Summarization

批量文档摘要生成

python
summarizer = TextSummarizer()

import os
for filename in os.listdir("./documents"):
    if filename.endswith(".txt"):
        text = open(f"./documents/{filename}").read()
        summary = summarizer.summarize(text, ratio=0.2)

        with open(f"./summaries/{filename}", "w") as f:
            f.write(summary)

        print(f"Summarized: {filename}")
python
summarizer = TextSummarizer()

import os
for filename in os.listdir("./documents"):
    if filename.endswith(".txt"):
        text = open(f"./documents/{filename}").read()
        summary = summarizer.summarize(text, ratio=0.2)

        with open(f"./summaries/{filename}", "w") as f:
            f.write(summary)

        print(f"已完成摘要: {filename}")

Algorithm Comparison

算法对比

AlgorithmSpeedQualityBest For
TextRankMediumHighGeneral text
LSAFastGoodTechnical docs
FrequencyFastMediumQuick summaries
算法速度质量适用场景
TextRank中等通用文本
LSA良好技术文档
基于频率的算法中等快速生成摘要

Dependencies

依赖项

nltk>=3.8.0
numpy>=1.24.0
scikit-learn>=1.2.0
nltk>=3.8.0
numpy>=1.24.0
scikit-learn>=1.2.0

Limitations

局限性

  • Extractive only (doesn't paraphrase or generate new text)
  • Works best with well-structured text (paragraphs, clear sentences)
  • Very short texts may not summarize well
  • Doesn't understand context deeply (may miss nuance)
  • 仅支持抽取式摘要(不会改写或生成新文本)
  • 对结构清晰的文本(段落分明、句子清晰)处理效果最佳
  • 极短文本可能无法生成有效摘要
  • 无法深度理解上下文(可能会遗漏细微含义)