context-compression

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Context Compression Strategies

上下文压缩策略

When agent sessions generate millions of tokens of conversation history, compression becomes mandatory. The naive approach is aggressive compression to minimize tokens per request. The correct optimization target is tokens per task: total tokens consumed to complete a task, including re-fetching costs when compression loses critical information.

当Agent会话生成数百万Token的对话历史时，压缩就变得必不可少。简单粗暴的方式是激进压缩，以最小化每个请求的Token数。而正确的优化目标是按任务计算Token数：完成一项任务消耗的总Token数，包括因压缩丢失关键信息而导致的重新获取成本。

When to Activate

激活时机

Activate this skill when:

Agent sessions exceed context window limits
Codebases exceed context windows (5M+ token systems)
Designing conversation summarization strategies
Debugging cases where agents "forget" what files they modified
Building evaluation frameworks for compression quality

在以下场景激活本技能：

Agent会话超出上下文窗口限制
代码库超出上下文窗口（500万+ Token的系统）
设计对话总结策略时
调试Agent“遗忘”修改过哪些文件的情况时
构建压缩质量评估框架时

Core Concepts

核心概念

Context compression trades token savings against information loss. Three production-ready approaches exist:

Anchored Iterative Summarization: Maintain structured, persistent summaries with explicit sections for session intent, file modifications, decisions, and next steps. When compression triggers, summarize only the newly-truncated span and merge with the existing summary. Structure forces preservation by dedicating sections to specific information types.
Opaque Compression: Produce compressed representations optimized for reconstruction fidelity. Achieves highest compression ratios (99%+) but sacrifices interpretability. Cannot verify what was preserved.
Regenerative Full Summary: Generate detailed structured summaries on each compression. Produces readable output but may lose details across repeated compression cycles due to full regeneration rather than incremental merging.

The critical insight: structure forces preservation. Dedicated sections act as checklists that the summarizer must populate, preventing silent information drift.

上下文压缩是在Token节省和信息丢失之间做权衡。目前有三种可用于生产环境的方法：

锚定迭代总结：维护结构化的持久化总结，包含会话意图、文件修改、决策和下一步行动等明确板块。触发压缩时，仅总结新截断的内容段，并与现有总结合并。结构化的方式通过为特定信息类型分配专属板块，强制保留关键信息。
透明压缩（Opaque Compression）：生成针对重建保真度优化的压缩表示。能实现最高压缩比（99%+），但牺牲了可解释性，无法验证哪些信息被保留。
再生式完整总结：每次压缩时生成详细的结构化总结。输出内容可读性强，但由于是完全重新生成而非增量合并，多次压缩循环后可能会丢失细节。

关键洞见：结构化设计强制信息保留。专属板块相当于总结器必须填充的检查清单，防止信息悄无声息地流失。

Detailed Topics

详细主题

Why Tokens-Per-Task Matters

为什么按任务计算Token数至关重要

Traditional compression metrics target tokens-per-request. This is the wrong optimization. When compression loses critical details like file paths or error messages, the agent must re-fetch information, re-explore approaches, and waste tokens recovering context.

The right metric is tokens-per-task: total tokens consumed from task start to completion. A compression strategy saving 0.5% more tokens but causing 20% more re-fetching costs more overall.

传统压缩指标以每个请求的Token数为目标，这是错误的优化方向。当压缩丢失文件路径或错误信息等关键细节时，Agent必须重新获取信息、重新探索方案，浪费Token来恢复上下文。

正确的指标是按任务计算Token数：从任务开始到完成消耗的总Token数。一种压缩策略虽然多节省0.5%的Token，但导致重新获取成本增加20%，总体成本反而更高。

The Artifact Trail Problem

工件追踪问题

Artifact trail integrity is the weakest dimension across all compression methods, scoring 2.2-2.5 out of 5.0 in evaluations. Even structured summarization with explicit file sections struggles to maintain complete file tracking across long sessions.

Coding agents need to know:

Which files were created
Which files were modified and what changed
Which files were read but not changed
Function names, variable names, error messages

This problem likely requires specialized handling beyond general summarization: a separate artifact index or explicit file-state tracking in agent scaffolding.

所有压缩方法中，工件追踪的完整性是最薄弱的维度，评估得分仅为2.2-2.5分（满分5.0）。即使是带有明确文件板块的结构化总结，在长会话中也难以完整追踪文件。

代码Agent需要了解：

创建了哪些文件
修改了哪些文件以及具体变更内容
读取但未修改的文件
函数名、变量名、错误信息

这个问题可能需要超出通用总结的专门处理：单独的工件索引或Agent架构中明确的文件状态追踪。

Structured Summary Sections

结构化总结板块

Effective structured summaries include explicit sections:

markdown

undefined

有效的结构化总结包含以下明确板块：

markdown

undefined

Session Intent

会话意图

[What the user is trying to accomplish]

[用户试图完成的目标]

Files Modified

修改的文件

auth.controller.ts: Fixed JWT token generation
config/redis.ts: Updated connection pooling
tests/auth.test.ts: Added mock setup for new config

auth.controller.ts: 修复JWT Token生成问题
config/redis.ts: 更新连接池配置
tests/auth.test.ts: 为新配置添加模拟设置

Decisions Made

已做决策

Using Redis connection pool instead of per-request connections
Retry logic with exponential backoff for transient failures

使用Redis连接池而非逐请求连接
为瞬时故障添加指数退避重试逻辑

Current State

当前状态

14 tests passing, 2 failing
Remaining: mock setup for session service tests

14个测试通过，2个失败
剩余工作：会话服务测试的模拟设置

Next Steps

下一步行动

Fix remaining test failures
Run full test suite
Update documentation


This structure prevents silent loss of file paths or decisions because each section must be explicitly addressed.

修复剩余测试失败问题
运行完整测试套件
更新文档


这种结构避免了文件路径或决策的悄无声息丢失，因为每个板块都必须被明确处理。

Compression Trigger Strategies

压缩触发策略

When to trigger compression matters as much as how to compress:

Strategy	Trigger Point	Trade-off
Fixed threshold	70-80% context utilization	Simple but may compress too early
Sliding window	Keep last N turns + summary	Predictable context size
Importance-based	Compress low-relevance sections first	Complex but preserves signal
Task-boundary	Compress at logical task completions	Clean summaries but unpredictable timing

The sliding window approach with structured summaries provides the best balance of predictability and quality for most coding agent use cases.

何时触发压缩与如何压缩同样重要：

策略	触发点	权衡
固定阈值	上下文利用率达70-80%	简单但可能过早压缩
滑动窗口	保留最后N轮对话+总结	上下文大小可预测
基于重要性	优先压缩低相关性内容	复杂度高但保留关键信号
任务边界	在逻辑任务完成时压缩	总结清晰但时机不可预测

对于大多数代码Agent场景，结合结构化总结的滑动窗口方法在可预测性和质量之间达到最佳平衡。

Probe-Based Evaluation

探针式评估

Traditional metrics like ROUGE or embedding similarity fail to capture functional compression quality. A summary may score high on lexical overlap while missing the one file path the agent needs.

Probe-based evaluation directly measures functional quality by asking questions after compression:

Probe Type	What It Tests	Example Question
Recall	Factual retention	"What was the original error message?"
Artifact	File tracking	"Which files have we modified?"
Continuation	Task planning	"What should we do next?"
Decision	Reasoning chain	"What did we decide about the Redis issue?"

If compression preserved the right information, the agent answers correctly. If not, it guesses or hallucinates.

ROUGE或嵌入相似度等传统指标无法捕捉功能性压缩质量。一个总结可能在词汇重叠上得分很高，但却丢失了Agent需要的某个文件路径。

探针式评估通过在压缩后直接提问来衡量功能质量：

探针类型	测试内容	示例问题
召回率	事实留存	“原始错误信息是什么？”
工件追踪	文件追踪	“我们修改了哪些文件？”
任务延续	任务规划	“我们接下来应该做什么？”
决策留存	推理链	“我们针对Redis问题做出了什么决策？”

如果压缩保留了正确信息，Agent就能正确回答；否则就会猜测或生成幻觉内容。

Evaluation Dimensions

评估维度

Six dimensions capture compression quality for coding agents:

Accuracy: Are technical details correct? File paths, function names, error codes.
Context Awareness: Does the response reflect current conversation state?
Artifact Trail: Does the agent know which files were read or modified?
Completeness: Does the response address all parts of the question?
Continuity: Can work continue without re-fetching information?
Instruction Following: Does the response respect stated constraints?

Accuracy shows the largest variation between compression methods (0.6 point gap). Artifact trail is universally weak (2.2-2.5 range).

六个维度可衡量代码Agent的压缩质量：

准确性：技术细节是否正确？比如文件路径、函数名、错误代码。
上下文感知：响应是否反映当前对话状态？
工件追踪：Agent是否了解哪些文件被读取或修改？
完整性：响应是否涵盖问题的所有部分？
连续性：无需重新获取信息即可继续工作吗？
指令遵循：响应是否符合指定约束？

不同压缩方法的准确性差异最大（相差0.6分）。工件追踪是普遍薄弱的环节（得分范围2.2-2.5）。

Practical Guidance

实践指南

Three-Phase Compression Workflow

三阶段压缩工作流

For large codebases or agent systems exceeding context windows, apply compression through three phases:

Research Phase: Produce a research document from architecture diagrams, documentation, and key interfaces. Compress exploration into a structured analysis of components and dependencies. Output: single research document.
Planning Phase: Convert research into implementation specification with function signatures, type definitions, and data flow. A 5M token codebase compresses to approximately 2,000 words of specification.
Implementation Phase: Execute against the specification. Context remains focused on the spec rather than raw codebase exploration.

对于超出上下文窗口的大型代码库或Agent系统，可通过三个阶段应用压缩：

研究阶段：从架构图、文档和关键接口生成研究文档。将探索内容压缩为组件和依赖关系的结构化分析。输出：单个研究文档。
规划阶段：将研究内容转换为包含函数签名、类型定义和数据流的实现规范。一个500万Token的代码库可压缩为约2000字的规范。
实现阶段：根据规范执行任务。上下文始终聚焦于规范而非原始代码库探索内容。

Using Example Artifacts as Seeds

使用示例工件作为种子

When provided with a manual migration example or reference PR, use it as a template to understand the target pattern. The example reveals constraints that static analysis cannot surface: which invariants must hold, which services break on changes, and what a clean migration looks like.

This is particularly important when the agent cannot distinguish essential complexity (business requirements) from accidental complexity (legacy workarounds). The example artifact encodes that distinction.

当提供手动迁移示例或参考PR时，将其作为模板来理解目标模式。示例揭示了静态分析无法发现的约束：哪些不变量必须保持、哪些服务会因变更而中断、以及清晰的迁移是什么样的。

当Agent无法区分本质复杂度（业务需求）和偶发复杂度（遗留解决方案）时，这一点尤为重要。示例工件编码了这种区分。

Implementing Anchored Iterative Summarization

实现锚定迭代总结

Define explicit summary sections matching your agent's needs
On first compression trigger, summarize truncated history into sections
On subsequent compressions, summarize only new truncated content
Merge new summary into existing sections rather than regenerating
Track which information came from which compression cycle for debugging

根据Agent需求定义明确的总结板块
首次触发压缩时，将截断的历史内容总结到对应板块
后续压缩时，仅总结新截断的内容
将新总结合并到现有板块，而非重新生成
追踪各压缩周期的信息来源，以便调试

When to Use Each Approach

各方法的适用场景

Use anchored iterative summarization when:

Sessions are long-running (100+ messages)
File tracking matters (coding, debugging)
You need to verify what was preserved

Use opaque compression when:

Maximum token savings required
Sessions are relatively short
Re-fetching costs are low

Use regenerative summaries when:

Summary interpretability is critical
Sessions have clear phase boundaries
Full context review is acceptable on each compression

当以下情况时使用锚定迭代总结：

会话是长时间运行的（100+条消息）
文件追踪很重要（编码、调试场景）
需要验证哪些信息被保留

当以下情况时使用透明压缩：

需要最大化Token节省
会话相对较短
重新获取成本低

当以下情况时使用再生式总结：

总结的可解释性至关重要
会话有明确的阶段边界
每次压缩时可接受完整上下文回顾

Compression Ratio Considerations

压缩比考量

Method	Compression Ratio	Quality Score	Trade-off
Anchored Iterative	98.6%	3.70	Best quality, slightly less compression
Regenerative	98.7%	3.44	Good quality, moderate compression
Opaque	99.3%	3.35	Best compression, quality loss

The 0.7% additional tokens retained by structured summarization buys 0.35 quality points. For any task where re-fetching costs matter, this trade-off favors structured approaches.

方法	压缩比	质量得分	权衡
锚定迭代总结	98.6%	3.70	质量最佳，压缩比略低
再生式总结	98.7%	3.44	质量良好，压缩比适中
透明压缩	99.3%	3.35	压缩比最高，质量损失大

结构化总结多保留的0.7% Token换来了0.35的质量得分。对于任何重新获取成本较高的任务，这种权衡都更倾向于结构化方法。

Examples

示例

Example 1: Debugging Session Compression

Original context (89,000 tokens, 178 messages):

401 error on /api/auth/login endpoint
Traced through auth controller, middleware, session store
Found stale Redis connection
Fixed connection pooling, added retry logic
14 tests passing, 2 failing

Structured summary after compression:

markdown

undefined

示例1：调试会话压缩

原始上下文（89000 Token，178条消息）：

/api/auth/login端点出现401错误
追踪了auth控制器、中间件、会话存储
发现Redis连接过期
修复了连接池，添加了重试逻辑
14个测试通过，2个失败

压缩后的结构化总结：

markdown

undefined

Session Intent

会话意图

Debug 401 Unauthorized error on /api/auth/login despite valid credentials.

调试/api/auth/login端点在凭证有效的情况下返回401 Unauthorized错误的问题。

Root Cause

根本原因

Stale Redis connection in session store. JWT generated correctly but session could not be persisted.

会话存储中的Redis连接过期。JWT生成正确，但会话无法持久化。

Files Modified

修改的文件

auth.controller.ts: No changes (read only)
middleware/cors.ts: No changes (examined)
config/redis.ts: Fixed connection pooling configuration
services/session.service.ts: Added retry logic for transient failures
tests/auth.test.ts: Updated mock setup

auth.controller.ts：无变更（仅读取）
middleware/cors.ts：无变更（已检查）
config/redis.ts：修复了连接池配置
services/session.service.ts：为瞬时故障添加了重试逻辑
tests/auth.test.ts：更新了模拟设置

Test Status

测试状态

14 passing, 2 failing (mock setup issues)

14个通过，2个失败（模拟设置问题）

Next Steps

下一步行动

Fix remaining test failures (mock session service)
Run full test suite
Deploy to staging


**Example 2: Probe Response Quality**

After compression, asking "What was the original error?":

Good response (structured summarization):
> "The original error was a 401 Unauthorized response from the /api/auth/login endpoint. Users received this error with valid credentials. Root cause was stale Redis connection in session store."

Poor response (aggressive compression):
> "We were debugging an authentication issue. The login was failing. We fixed some configuration problems."

The structured response preserves endpoint, error code, and root cause. The aggressive response loses all technical detail.

修复剩余测试失败问题（模拟会话服务）
运行完整测试套件
部署到预发布环境


**示例2：探针响应质量**

压缩后询问“原始错误是什么？”：

良好响应（结构化总结）：
> “原始错误是/api/auth/login端点返回401 Unauthorized响应。用户使用有效凭证时仍会收到该错误。根本原因是会话存储中的Redis连接过期。”

糟糕响应（激进压缩）：
> “我们当时在调试认证问题。登录功能失败了。我们修复了一些配置问题。”

结构化响应保留了端点、错误代码和根本原因。激进响应则丢失了所有技术细节。

Guidelines

指南

Optimize for tokens-per-task, not tokens-per-request
Use structured summaries with explicit sections for file tracking
Trigger compression at 70-80% context utilization
Implement incremental merging rather than full regeneration
Test compression quality with probe-based evaluation
Track artifact trail separately if file tracking is critical
Accept slightly lower compression ratios for better quality retention
Monitor re-fetching frequency as a compression quality signal

按任务计算Token数进行优化，而非按请求计算
使用带明确板块的结构化总结来追踪文件
在上下文利用率达70-80%时触发压缩
实现增量合并而非完全重新生成总结
使用探针式评估测试压缩质量
如果文件追踪至关重要，单独追踪工件轨迹
为了更好的质量保留，接受略低的压缩比
将重新获取频率作为压缩质量的监控指标

Integration

集成

This skill connects to several others in the collection:

context-degradation - Compression is a mitigation strategy for degradation
context-optimization - Compression is one optimization technique among many
evaluation - Probe-based evaluation applies to compression testing
memory-systems - Compression relates to scratchpad and summary memory patterns

本技能与集合中的其他多个技能相关：

context-degradation - 压缩是缓解上下文退化的策略
context-optimization - 压缩是众多优化技术之一
evaluation - 探针式评估适用于压缩测试
memory-systems - 压缩与暂存器和总结内存模式相关

References

参考资料

Internal reference:

Evaluation Framework Reference - Detailed probe types and scoring rubrics

Related skills in this collection:

context-degradation - Understanding what compression prevents
context-optimization - Broader optimization strategies
evaluation - Building evaluation frameworks

External resources:

Factory Research: Evaluating Context Compression for AI Agents (December 2025)
Research on LLM-as-judge evaluation methodology (Zheng et al., 2023)
Netflix Engineering: "The Infinite Software Crisis" - Three-phase workflow and context compression at scale (AI Summit 2025)

内部参考：

评估框架参考 - 详细的探针类型和评分标准

本集合中的相关技能：

context-degradation - 了解压缩能预防什么问题
context-optimization - 更广泛的优化策略
evaluation - 构建评估框架

外部资源：

Factory Research: 《评估AI Agent的上下文压缩》（2025年12月）
LLM-as-judge评估方法研究（Zheng等人，2023）
Netflix工程团队：“无限软件危机” - 大规模场景下的三阶段工作流和上下文压缩（2025年AI峰会）

Skill Metadata

技能元数据

Created: 2025-12-22 Last Updated: 2025-12-26 Author: Agent Skills for Context Engineering Contributors Version: 1.1.0

创建时间：2025-12-22 最后更新：2025-12-26 作者：上下文工程Agent技能贡献者版本：1.1.0