ai-summarizing
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseBuild an AI Summarizer
构建AI摘要工具
Guide the user through building AI that condenses long content into useful summaries. Uses DSPy to produce consistent, faithful summaries with controllable length and detail.
引导用户构建可将长内容浓缩为实用摘要的AI工具。借助DSPy生成一致性强、忠实于原文且长度和细节可控的摘要。
Step 1: Understand the task
步骤1:明确任务
Ask the user:
- What are you summarizing? (meeting transcripts, articles, support threads, documents, emails?)
- What format should the summary be? (bullet points, narrative paragraph, executive brief, action items?)
- How long should summaries be? (one sentence, a paragraph, 3-5 bullets, custom word limit?)
- Who reads the summaries? (executives, team members, customers, developers?)
询问用户:
- 你要总结的内容类型是什么?(会议记录、文章、支持对话线程、文档、邮件?)
- 摘要需要什么格式?(要点列表、叙述段落、执行简报、行动项?)
- 摘要的长度要求是多少?(一句话、一段文字、3-5个要点、自定义字数限制?)
- 摘要的阅读对象是谁?(高管、团队成员、客户、开发人员?)
Step 2: Build a basic summarizer
步骤2:构建基础摘要工具
Simple text-to-summary
简单文本转摘要
python
import dspy
class Summarize(dspy.Signature):
"""Summarize the text concisely while preserving key information."""
text: str = dspy.InputField(desc="The text to summarize")
summary: str = dspy.OutputField(desc="A concise summary of the text")
summarizer = dspy.ChainOfThought(Summarize)
result = summarizer(text="...")
print(result.summary)python
import dspy
class Summarize(dspy.Signature):
"""Summarize the text concisely while preserving key information."""
text: str = dspy.InputField(desc="The text to summarize")
summary: str = dspy.OutputField(desc="A concise summary of the text")
summarizer = dspy.ChainOfThought(Summarize)
result = summarizer(text="...")
print(result.summary)Audience-aware summary
面向特定受众的摘要
Adapt the signature for specific audiences:
python
class SummarizeForAudience(dspy.Signature):
"""Summarize the text for the target audience."""
text: str = dspy.InputField(desc="The text to summarize")
audience: str = dspy.InputField(desc="Who will read this summary")
summary: str = dspy.OutputField(desc="A summary tailored to the audience")针对特定受众调整签名:
python
class SummarizeForAudience(dspy.Signature):
"""Summarize the text for the target audience."""
text: str = dspy.InputField(desc="The text to summarize")
audience: str = dspy.InputField(desc="Who will read this summary")
summary: str = dspy.OutputField(desc="A summary tailored to the audience")Step 3: Structured summaries
步骤3:结构化摘要
Extract multiple aspects from the same content at once:
从同一内容中同时提取多个维度的信息:
Meeting transcript processor
会议记录处理器
python
from pydantic import BaseModel, Field
class MeetingSummary(BaseModel):
tldr: str = Field(description="One-sentence overview of the meeting")
decisions: list[str] = Field(description="Decisions that were made")
action_items: list[str] = Field(description="Tasks assigned with owners if mentioned")
key_points: list[str] = Field(description="Important facts or updates discussed")
class SummarizeMeeting(dspy.Signature):
"""Extract a structured summary from a meeting transcript."""
transcript: str = dspy.InputField(desc="Meeting transcript")
summary: MeetingSummary = dspy.OutputField()
summarizer = dspy.ChainOfThought(SummarizeMeeting)python
from pydantic import BaseModel, Field
class MeetingSummary(BaseModel):
tldr: str = Field(description="One-sentence overview of the meeting")
decisions: list[str] = Field(description="Decisions that were made")
action_items: list[str] = Field(description="Tasks assigned with owners if mentioned")
key_points: list[str] = Field(description="Important facts or updates discussed")
class SummarizeMeeting(dspy.Signature):
"""Extract a structured summary from a meeting transcript."""
transcript: str = dspy.InputField(desc="Meeting transcript")
summary: MeetingSummary = dspy.OutputField()
summarizer = dspy.ChainOfThought(SummarizeMeeting)Parallel multi-aspect extraction
并行多维度提取
Extract different aspects independently for better quality:
python
class ExtractDecisions(dspy.Signature):
"""Extract decisions made in this meeting."""
transcript: str = dspy.InputField()
decisions: list[str] = dspy.OutputField(desc="Decisions that were made")
class ExtractActionItems(dspy.Signature):
"""Extract action items with assigned owners."""
transcript: str = dspy.InputField()
action_items: list[str] = dspy.OutputField(desc="Tasks with owners")
class ExtractKeyFacts(dspy.Signature):
"""Extract key facts and updates discussed."""
transcript: str = dspy.InputField()
key_facts: list[str] = dspy.OutputField(desc="Important facts and updates")
class MeetingSummarizer(dspy.Module):
def __init__(self):
self.tldr = dspy.ChainOfThought("transcript -> tldr")
self.decisions = dspy.ChainOfThought(ExtractDecisions)
self.actions = dspy.ChainOfThought(ExtractActionItems)
self.facts = dspy.ChainOfThought(ExtractKeyFacts)
def forward(self, transcript):
return dspy.Prediction(
tldr=self.tldr(transcript=transcript).tldr,
decisions=self.decisions(transcript=transcript).decisions,
action_items=self.actions(transcript=transcript).action_items,
key_facts=self.facts(transcript=transcript).key_facts,
)独立提取不同维度信息以提升质量:
python
class ExtractDecisions(dspy.Signature):
"""Extract decisions made in this meeting."""
transcript: str = dspy.InputField()
decisions: list[str] = dspy.OutputField(desc="Decisions that were made")
class ExtractActionItems(dspy.Signature):
"""Extract action items with assigned owners."""
transcript: str = dspy.InputField()
action_items: list[str] = dspy.OutputField(desc="Tasks with owners")
class ExtractKeyFacts(dspy.Signature):
"""Extract key facts and updates discussed."""
transcript: str = dspy.InputField()
key_facts: list[str] = dspy.OutputField(desc="Important facts and updates")
class MeetingSummarizer(dspy.Module):
def __init__(self):
self.tldr = dspy.ChainOfThought("transcript -> tldr")
self.decisions = dspy.ChainOfThought(ExtractDecisions)
self.actions = dspy.ChainOfThought(ExtractActionItems)
self.facts = dspy.ChainOfThought(ExtractKeyFacts)
def forward(self, transcript):
return dspy.Prediction(
tldr=self.tldr(transcript=transcript).tldr,
decisions=self.decisions(transcript=transcript).decisions,
action_items=self.actions(transcript=transcript).action_items,
key_facts=self.facts(transcript=transcript).key_facts,
)Step 4: Control length and detail
步骤4:控制摘要长度与细节
Word limit enforcement
字数限制强制执行
python
class LengthControlledSummarizer(dspy.Module):
def __init__(self):
self.summarize = dspy.ChainOfThought(SummarizeWithLimit)
def forward(self, text, max_words=100):
result = self.summarize(text=text, max_words=max_words)
word_count = len(result.summary.split())
dspy.Assert(
word_count <= max_words,
f"Summary is {word_count} words but must be under {max_words}. "
"Make it more concise."
)
return result
class SummarizeWithLimit(dspy.Signature):
"""Summarize the text within the word limit."""
text: str = dspy.InputField()
max_words: int = dspy.InputField(desc="Maximum number of words for the summary")
summary: str = dspy.OutputField(desc="A concise summary within the word limit")python
class LengthControlledSummarizer(dspy.Module):
def __init__(self):
self.summarize = dspy.ChainOfThought(SummarizeWithLimit)
def forward(self, text, max_words=100):
result = self.summarize(text=text, max_words=max_words)
word_count = len(result.summary.split())
dspy.Assert(
word_count <= max_words,
f"Summary is {word_count} words but must be under {max_words}. "
"Make it more concise."
)
return result
class SummarizeWithLimit(dspy.Signature):
"""Summarize the text within the word limit."""
text: str = dspy.InputField()
max_words: int = dspy.InputField(desc="Maximum number of words for the summary")
summary: str = dspy.OutputField(desc="A concise summary within the word limit")Detail level control
细节程度控制
Use a detail parameter to control how much information to keep:
python
from typing import Literal
class SummarizeWithDetail(dspy.Signature):
"""Summarize the text at the specified detail level."""
text: str = dspy.InputField()
detail_level: Literal["brief", "standard", "detailed"] = dspy.InputField(
desc="brief = 1-2 sentences, standard = short paragraph, detailed = comprehensive"
)
summary: str = dspy.OutputField()
class MultiDetailSummarizer(dspy.Module):
def __init__(self):
self.summarize = dspy.ChainOfThought(SummarizeWithDetail)
def forward(self, text, detail_level="standard"):
result = self.summarize(text=text, detail_level=detail_level)
# Enforce approximate length expectations
word_count = len(result.summary.split())
limits = {"brief": 50, "standard": 150, "detailed": 400}
max_words = limits[detail_level]
dspy.Suggest(
word_count <= max_words,
f"Summary is {word_count} words for '{detail_level}' level, "
f"aim for under {max_words}."
)
return result使用细节参数控制保留的信息量:
python
from typing import Literal
class SummarizeWithDetail(dspy.Signature):
"""Summarize the text at the specified detail level."""
text: str = dspy.InputField()
detail_level: Literal["brief", "standard", "detailed"] = dspy.InputField(
desc="brief = 1-2 sentences, standard = short paragraph, detailed = comprehensive"
)
summary: str = dspy.OutputField()
class MultiDetailSummarizer(dspy.Module):
def __init__(self):
self.summarize = dspy.ChainOfThought(SummarizeWithDetail)
def forward(self, text, detail_level="standard"):
result = self.summarize(text=text, detail_level=detail_level)
# 强制执行大致长度要求
word_count = len(result.summary.split())
limits = {"brief": 50, "standard": 150, "detailed": 400}
max_words = limits[detail_level]
dspy.Suggest(
word_count <= max_words,
f"Summary is {word_count} words for '{detail_level}' level, "
f"aim for under {max_words}."
)
return resultStep 5: Handle long documents
步骤5:处理长篇文档
When the input is too long for a single LM call, use chunked summarization.
当输入内容过长无法单次调用大模型时,使用分块摘要法。
Map-reduce pattern
映射-归约模式
Split → summarize each chunk → combine:
python
class SummarizeChunk(dspy.Signature):
"""Summarize this section of a larger document."""
chunk: str = dspy.InputField(desc="A section of a larger document")
chunk_summary: str = dspy.OutputField(desc="Key points from this section")
class CombineSummaries(dspy.Signature):
"""Combine section summaries into one coherent summary."""
section_summaries: list[str] = dspy.InputField(desc="Summaries of each section")
original_length: int = dspy.InputField(desc="Word count of the original document")
summary: str = dspy.OutputField(desc="A unified summary of the full document")
class LongDocSummarizer(dspy.Module):
def __init__(self, chunk_size=2000):
self.chunk_size = chunk_size
self.map_step = dspy.ChainOfThought(SummarizeChunk)
self.reduce_step = dspy.ChainOfThought(CombineSummaries)
def forward(self, text):
chunks = self._split(text)
# Map: summarize each chunk
chunk_summaries = []
for chunk in chunks:
result = self.map_step(chunk=chunk)
chunk_summaries.append(result.chunk_summary)
# Reduce: combine into final summary
return self.reduce_step(
section_summaries=chunk_summaries,
original_length=len(text.split()),
)
def _split(self, text):
words = text.split()
chunks = []
for i in range(0, len(words), self.chunk_size):
chunks.append(" ".join(words[i:i + self.chunk_size]))
return chunks拆分→摘要每个分块→合并结果:
python
class SummarizeChunk(dspy.Signature):
"""Summarize this section of a larger document."""
chunk: str = dspy.InputField(desc="A section of a larger document")
chunk_summary: str = dspy.OutputField(desc="Key points from this section")
class CombineSummaries(dspy.Signature):
"""Combine section summaries into one coherent summary."""
section_summaries: list[str] = dspy.InputField(desc="Summaries of each section")
original_length: int = dspy.InputField(desc="Word count of the original document")
summary: str = dspy.OutputField(desc="A unified summary of the full document")
class LongDocSummarizer(dspy.Module):
def __init__(self, chunk_size=2000):
self.chunk_size = chunk_size
self.map_step = dspy.ChainOfThought(SummarizeChunk)
self.reduce_step = dspy.ChainOfThought(CombineSummaries)
def forward(self, text):
chunks = self._split(text)
# 映射:摘要每个分块
chunk_summaries = []
for chunk in chunks:
result = self.map_step(chunk=chunk)
chunk_summaries.append(result.chunk_summary)
# 归约:合并为最终摘要
return self.reduce_step(
section_summaries=chunk_summaries,
original_length=len(text.split()),
)
def _split(self, text):
words = text.split()
chunks = []
for i in range(0, len(words), self.chunk_size):
chunks.append(" ".join(words[i:i + self.chunk_size]))
return chunksHierarchical summarization
分层摘要法
For very long documents, summarize chunks, then summarize the summaries:
python
class HierarchicalSummarizer(dspy.Module):
def __init__(self, chunk_size=2000, max_chunks_per_level=10):
self.chunk_size = chunk_size
self.max_chunks = max_chunks_per_level
self.summarize_chunk = dspy.ChainOfThought(SummarizeChunk)
self.combine = dspy.ChainOfThought(CombineSummaries)
def forward(self, text):
chunks = self._split(text)
summaries = [self.summarize_chunk(chunk=c).chunk_summary for c in chunks]
# If still too many summaries, summarize again
while len(summaries) > self.max_chunks:
grouped = [summaries[i:i+self.max_chunks]
for i in range(0, len(summaries), self.max_chunks)]
summaries = [
self.combine(
section_summaries=group,
original_length=len(text.split()),
).summary
for group in grouped
]
return self.combine(
section_summaries=summaries,
original_length=len(text.split()),
)
def _split(self, text):
words = text.split()
return [" ".join(words[i:i+self.chunk_size])
for i in range(0, len(words), self.chunk_size)]针对极长篇文档,先摘要分块,再摘要分块的结果:
python
class HierarchicalSummarizer(dspy.Module):
def __init__(self, chunk_size=2000, max_chunks_per_level=10):
self.chunk_size = chunk_size
self.max_chunks = max_chunks_per_level
self.summarize_chunk = dspy.ChainOfThought(SummarizeChunk)
self.combine = dspy.ChainOfThought(CombineSummaries)
def forward(self, text):
chunks = self._split(text)
summaries = [self.summarize_chunk(chunk=c).chunk_summary for c in chunks]
# 如果摘要数量仍过多,再次进行摘要
while len(summaries) > self.max_chunks:
grouped = [summaries[i:i+self.max_chunks]
for i in range(0, len(summaries), self.max_chunks)]
summaries = [
self.combine(
section_summaries=group,
original_length=len(text.split()),
).summary
for group in grouped
]
return self.combine(
section_summaries=summaries,
original_length=len(text.split()),
)
def _split(self, text):
words = text.split()
return [" ".join(words[i:i+self.chunk_size])
for i in range(0, len(words), self.chunk_size)]Step 6: Multi-format output
步骤6:多格式输出
Generate different summary formats from the same input:
python
class FlexibleSummarizer(dspy.Module):
def __init__(self):
self.bullets = dspy.ChainOfThought(BulletSummary)
self.narrative = dspy.ChainOfThought(NarrativeSummary)
self.executive = dspy.ChainOfThought(ExecutiveBrief)
def forward(self, text, format="bullets"):
if format == "bullets":
return self.bullets(text=text)
elif format == "narrative":
return self.narrative(text=text)
elif format == "executive":
return self.executive(text=text)
class BulletSummary(dspy.Signature):
"""Summarize as a bulleted list of key points."""
text: str = dspy.InputField()
summary: str = dspy.OutputField(desc="Bulleted list of key points")
class NarrativeSummary(dspy.Signature):
"""Summarize as a flowing narrative paragraph."""
text: str = dspy.InputField()
summary: str = dspy.OutputField(desc="A narrative paragraph summary")
class ExecutiveBrief(dspy.Signature):
"""Create a brief executive summary with context, key findings, and recommendation."""
text: str = dspy.InputField()
context: str = dspy.OutputField(desc="One sentence of context")
key_findings: list[str] = dspy.OutputField(desc="3-5 most important findings")
recommendation: str = dspy.OutputField(desc="Suggested next step")从同一输入生成不同格式的摘要:
python
class FlexibleSummarizer(dspy.Module):
def __init__(self):
self.bullets = dspy.ChainOfThought(BulletSummary)
self.narrative = dspy.ChainOfThought(NarrativeSummary)
self.executive = dspy.ChainOfThought(ExecutiveBrief)
def forward(self, text, format="bullets"):
if format == "bullets":
return self.bullets(text=text)
elif format == "narrative":
return self.narrative(text=text)
elif format == "executive":
return self.executive(text=text)
class BulletSummary(dspy.Signature):
"""Summarize as a bulleted list of key points."""
text: str = dspy.InputField()
summary: str = dspy.OutputField(desc="Bulleted list of key points")
class NarrativeSummary(dspy.Signature):
"""Summarize as a flowing narrative paragraph."""
text: str = dspy.InputField()
summary: str = dspy.OutputField(desc="A narrative paragraph summary")
class ExecutiveBrief(dspy.Signature):
"""Create a brief executive summary with context, key findings, and recommendation."""
text: str = dspy.InputField()
context: str = dspy.OutputField(desc="One sentence of context")
key_findings: list[str] = dspy.OutputField(desc="3-5 most important findings")
recommendation: str = dspy.OutputField(desc="Suggested next step")Step 7: Test and optimize
步骤7:测试与优化
Faithfulness metric
忠实度指标
Does the summary accurately reflect the source? No fabricated claims?
python
class JudgeFaithfulness(dspy.Signature):
"""Judge whether the summary is faithful to the source text."""
source_text: str = dspy.InputField()
summary: str = dspy.InputField()
is_faithful: bool = dspy.OutputField(desc="Does the summary only contain info from the source?")
hallucinated_claims: list[str] = dspy.OutputField(desc="Claims not in the source, if any")
def faithfulness_metric(example, prediction, trace=None):
judge = dspy.Predict(JudgeFaithfulness)
result = judge(source_text=example.text, summary=prediction.summary)
return result.is_faithful摘要是否准确反映原文?是否存在虚构内容?
python
class JudgeFaithfulness(dspy.Signature):
"""Judge whether the summary is faithful to the source text."""
source_text: str = dspy.InputField()
summary: str = dspy.InputField()
is_faithful: bool = dspy.OutputField(desc="Does the summary only contain info from the source?")
hallucinated_claims: list[str] = dspy.OutputField(desc="Claims not in the source, if any")
def faithfulness_metric(example, prediction, trace=None):
judge = dspy.Predict(JudgeFaithfulness)
result = judge(source_text=example.text, summary=prediction.summary)
return result.is_faithfulKey-point coverage metric
要点覆盖度指标
Does the summary capture the important points?
python
class JudgeCoverage(dspy.Signature):
"""Judge whether the summary covers the key points."""
source_text: str = dspy.InputField()
summary: str = dspy.InputField()
reference_summary: str = dspy.InputField(desc="Gold-standard summary for comparison")
coverage_score: float = dspy.OutputField(desc="0.0-1.0 how well key points are covered")
def coverage_metric(example, prediction, trace=None):
judge = dspy.Predict(JudgeCoverage)
result = judge(
source_text=example.text,
summary=prediction.summary,
reference_summary=example.summary,
)
return result.coverage_score摘要是否涵盖了所有重要要点?
python
class JudgeCoverage(dspy.Signature):
"""Judge whether the summary covers the key points."""
source_text: str = dspy.InputField()
summary: str = dspy.InputField()
reference_summary: str = dspy.InputField(desc="Gold-standard summary for comparison")
coverage_score: float = dspy.OutputField(desc="0.0-1.0 how well key points are covered")
def coverage_metric(example, prediction, trace=None):
judge = dspy.Predict(JudgeCoverage)
result = judge(
source_text=example.text,
summary=prediction.summary,
reference_summary=example.summary,
)
return result.coverage_scoreCombined metric
综合指标
python
def summary_metric(example, prediction, trace=None):
faithful = faithfulness_metric(example, prediction, trace)
coverage = coverage_metric(example, prediction, trace)
concise = len(prediction.summary.split()) < len(example.text.split()) * 0.3
return (faithful * 0.4) + (coverage * 0.4) + (concise * 0.2)python
def summary_metric(example, prediction, trace=None):
faithful = faithfulness_metric(example, prediction, trace)
coverage = coverage_metric(example, prediction, trace)
concise = len(prediction.summary.split()) < len(example.text.split()) * 0.3
return (faithful * 0.4) + (coverage * 0.4) + (concise * 0.2)Optimize
优化
python
optimizer = dspy.BootstrapFewShot(metric=summary_metric, max_bootstrapped_demos=4)
optimized = optimizer.compile(summarizer, trainset=trainset)python
optimizer = dspy.BootstrapFewShot(metric=summary_metric, max_bootstrapped_demos=4)
optimized = optimizer.compile(summarizer, trainset=trainset)Key patterns
核心模式
- ChainOfThought for summaries — reasoning helps the model decide what's important to keep
- Pydantic models for structured summaries — extract action items, decisions, key facts in one pass
- Assert for length limits — enforce word counts; DSPy retries with feedback
- Map-reduce for long docs — chunk, summarize each piece, combine results
- Faithfulness metrics — always check that summaries don't fabricate claims
- Detail levels — give users control over summary depth with a simple parameter
- ChainOfThought生成摘要 — 推理过程帮助模型判断哪些信息需要保留
- Pydantic模型实现结构化摘要 — 一次性提取行动项、决策、关键要点
- Assert强制执行长度限制 — 控制字数;DSPy会根据反馈重试
- 映射-归约处理长文档 — 分块、摘要每个分块、合并结果
- 忠实度指标 — 始终检查摘要是否存在虚构内容
- 细节层级控制 — 通过简单参数让用户控制摘要的详细程度
Additional resources
额外资源
- For worked examples (meetings, support threads, long docs), see examples.md
- Need to extract structured fields instead of summaries? Use
/ai-parsing-data - Need to answer questions about docs? Use
/ai-searching-docs - Next: to measure and improve your summarizer
/ai-improving-accuracy
- 如需实战示例(会议、支持对话线程、长文档),请查看examples.md
- 如需提取结构化字段而非摘要?请使用
/ai-parsing-data - 如需针对文档答疑?请使用
/ai-searching-docs - 下一步:使用衡量并优化你的摘要工具
/ai-improving-accuracy