text-summarizer
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseText Summarizer
文本摘要生成器
Create concise summaries from long text documents using extractive summarization. Identifies and extracts the most important sentences while preserving meaning.
使用抽取式摘要法从长文本文档生成简洁摘要。在保留原文含义的前提下识别并提取最重要的句子。
Quick Start
快速开始
python
from scripts.text_summarizer import TextSummarizerpython
from scripts.text_summarizer import TextSummarizerSummarize text
生成文本摘要
summarizer = TextSummarizer()
summary = summarizer.summarize(long_text, ratio=0.2) # 20% of original
print(summary)
summarizer = TextSummarizer()
summary = summarizer.summarize(long_text, ratio=0.2) # 原文长度的20%
print(summary)
Summarize file
生成文件摘要
summary = summarizer.summarize_file("article.txt", num_sentences=5)
undefinedsummary = summarizer.summarize_file("article.txt", num_sentences=5)
undefinedFeatures
功能特性
- Extractive Summarization: Selects key sentences from original text
- Length Control: By ratio, sentence count, or word count
- Multiple Algorithms: TextRank, LSA, frequency-based
- Key Points: Extract bullet-point summaries
- Batch Processing: Summarize multiple documents
- Preserve Structure: Maintains sentence order option
- 抽取式摘要:从原文中选择关键句子
- 长度控制:支持按比例、句子数量或单词数量控制
- 多种算法:TextRank、LSA、基于频率的算法
- 要点提取:生成项目符号格式的摘要
- 批量处理:可同时处理多个文档
- 结构保留:可选保留原文句子顺序
API Reference
API参考
Initialization
初始化
python
summarizer = TextSummarizer(
method="textrank", # textrank, lsa, frequency
language="english"
)python
summarizer = TextSummarizer(
method="textrank", # 可选值:textrank, lsa, frequency
language="english"
)Summarization
生成摘要
python
undefinedpython
undefinedBy ratio (20% of original length)
按比例生成(原文长度的20%)
summary = summarizer.summarize(text, ratio=0.2)
summary = summarizer.summarize(text, ratio=0.2)
By sentence count
按句子数量生成
summary = summarizer.summarize(text, num_sentences=5)
summary = summarizer.summarize(text, num_sentences=5)
By word count
按单词数量生成
summary = summarizer.summarize(text, max_words=100)
undefinedsummary = summarizer.summarize(text, max_words=100)
undefinedKey Points Extraction
关键要点提取
python
undefinedpython
undefinedGet bullet points
获取项目符号格式的要点
points = summarizer.extract_key_points(text, num_points=5)
for point in points:
print(f"• {point}")
undefinedpoints = summarizer.extract_key_points(text, num_points=5)
for point in points:
print(f"• {point}")
undefinedBatch Processing
批量处理
python
undefinedpython
undefinedSummarize multiple texts
批量生成多个文本的摘要
texts = [text1, text2, text3]
summaries = summarizer.summarize_batch(texts, ratio=0.2)
texts = [text1, text2, text3]
summaries = summarizer.summarize_batch(texts, ratio=0.2)
Summarize files in directory
批量处理目录中的文件
summaries = summarizer.summarize_directory("./articles/", ratio=0.3)
undefinedsummaries = summarizer.summarize_directory("./articles/", ratio=0.3)
undefinedOptions
可选参数
python
undefinedpython
undefinedPreserve original sentence order
保留原文句子顺序
summary = summarizer.summarize(text, preserve_order=True)
summary = summarizer.summarize(text, preserve_order=True)
Include title/first sentence
包含标题/第一句话
summary = summarizer.summarize(text, include_first=True)
summary = summarizer.summarize(text, include_first=True)
Minimum sentence length filter
设置最小句子长度过滤
summarizer.min_sentence_length = 10
undefinedsummarizer.min_sentence_length = 10
undefinedCLI Usage
CLI使用方法
bash
undefinedbash
undefinedSummarize text file
生成文本文件的摘要
python text_summarizer.py --input article.txt --ratio 0.2
python text_summarizer.py --input article.txt --ratio 0.2
Specific sentence count
指定句子数量生成摘要
python text_summarizer.py --input article.txt --sentences 5
python text_summarizer.py --input article.txt --sentences 5
Extract key points
提取关键要点
python text_summarizer.py --input article.txt --points 5
python text_summarizer.py --input article.txt --points 5
Batch process
批量处理
python text_summarizer.py --input-dir ./docs --output-dir ./summaries --ratio 0.3
python text_summarizer.py --input-dir ./docs --output-dir ./summaries --ratio 0.3
Output to file
将结果输出到文件
python text_summarizer.py --input article.txt --output summary.txt --ratio 0.2
undefinedpython text_summarizer.py --input article.txt --output summary.txt --ratio 0.2
undefinedCLI Arguments
CLI参数说明
| Argument | Description | Default |
|---|---|---|
| Input file path | Required |
| Output file path | stdout |
| Directory of files | - |
| Output directory | - |
| Summary ratio (0.0-1.0) | 0.2 |
| Number of sentences | - |
| Maximum words | - |
| Extract N key points | - |
| Algorithm to use | textrank |
| Keep sentence order | False |
| 参数 | 描述 | 默认值 |
|---|---|---|
| 输入文件路径 | 必填 |
| 输出文件路径 | 标准输出 |
| 待处理文件所在目录 | - |
| 结果输出目录 | - |
| 摘要占原文的比例(0.0-1.0) | 0.2 |
| 摘要的句子数量 | - |
| 摘要的最大单词数 | - |
| 提取的关键要点数量 | - |
| 使用的算法 | textrank |
| 是否保留原文句子顺序 | False |
Examples
示例
News Article Summary
新闻文章摘要
python
summarizer = TextSummarizer()
article = """
[Long news article text...]
"""python
summarizer = TextSummarizer()
article = """
[长新闻文章文本...]
"""Get a 3-sentence summary
生成包含3个句子的摘要
summary = summarizer.summarize(article, num_sentences=3)
print("Summary:")
print(summary)
summary = summarizer.summarize(article, num_sentences=3)
print("摘要:")
print(summary)
Get key points
提取关键要点
points = summarizer.extract_key_points(article, num_points=5)
print("\nKey Points:")
for i, point in enumerate(points, 1):
print(f"{i}. {point}")
undefinedpoints = summarizer.extract_key_points(article, num_points=5)
print("\n关键要点:")
for i, point in enumerate(points, 1):
print(f"{i}. {point}")
undefinedResearch Paper Abstract
研究论文摘要生成
python
summarizer = TextSummarizer(method="lsa")
paper = open("research_paper.txt").read()python
summarizer = TextSummarizer(method="lsa")
paper = open("research_paper.txt").read()Create abstract-length summary
生成摘要长度的内容
abstract = summarizer.summarize(paper, max_words=250)
print(abstract)
undefinedabstract = summarizer.summarize(paper, max_words=250)
print(abstract)
undefinedMeeting Notes Summary
会议记录摘要
python
summarizer = TextSummarizer()
notes = """
Meeting started at 2pm. John presented Q3 results showing 15% growth.
Sarah raised concerns about supply chain delays affecting Q4 projections.
The team discussed mitigation strategies including dual-sourcing.
Budget allocation for marketing was approved at $50k.
Next steps include vendor outreach by Friday.
Follow-up meeting scheduled for next Tuesday.
"""
summary = summarizer.summarize(notes, num_sentences=3)
points = summarizer.extract_key_points(notes, num_points=4)
print("Summary:", summary)
print("\nAction Items:")
for point in points:
print(f"• {point}")python
summarizer = TextSummarizer()
notes = """
会议于下午2点开始。John展示了第三季度业绩,增长率达15%。
Sarah提出了供应链延迟影响第四季度预测的担忧。
团队讨论了包括双重采购在内的缓解策略。
营销预算获批5万美元。
下一步是在周五前联系供应商。
后续会议定于下周二举行。
"""
summary = summarizer.summarize(notes, num_sentences=3)
points = summarizer.extract_key_points(notes, num_points=4)
print("摘要:", summary)
print("\n行动项:")
for point in points:
print(f"• {point}")Batch Document Summarization
批量文档摘要生成
python
summarizer = TextSummarizer()
import os
for filename in os.listdir("./documents"):
if filename.endswith(".txt"):
text = open(f"./documents/{filename}").read()
summary = summarizer.summarize(text, ratio=0.2)
with open(f"./summaries/{filename}", "w") as f:
f.write(summary)
print(f"Summarized: {filename}")python
summarizer = TextSummarizer()
import os
for filename in os.listdir("./documents"):
if filename.endswith(".txt"):
text = open(f"./documents/{filename}").read()
summary = summarizer.summarize(text, ratio=0.2)
with open(f"./summaries/{filename}", "w") as f:
f.write(summary)
print(f"已完成摘要: {filename}")Algorithm Comparison
算法对比
| Algorithm | Speed | Quality | Best For |
|---|---|---|---|
| TextRank | Medium | High | General text |
| LSA | Fast | Good | Technical docs |
| Frequency | Fast | Medium | Quick summaries |
| 算法 | 速度 | 质量 | 适用场景 |
|---|---|---|---|
| TextRank | 中等 | 高 | 通用文本 |
| LSA | 快 | 良好 | 技术文档 |
| 基于频率的算法 | 快 | 中等 | 快速生成摘要 |
Dependencies
依赖项
nltk>=3.8.0
numpy>=1.24.0
scikit-learn>=1.2.0nltk>=3.8.0
numpy>=1.24.0
scikit-learn>=1.2.0Limitations
局限性
- Extractive only (doesn't paraphrase or generate new text)
- Works best with well-structured text (paragraphs, clear sentences)
- Very short texts may not summarize well
- Doesn't understand context deeply (may miss nuance)
- 仅支持抽取式摘要(不会改写或生成新文本)
- 对结构清晰的文本(段落分明、句子清晰)处理效果最佳
- 极短文本可能无法生成有效摘要
- 无法深度理解上下文(可能会遗漏细微含义)