context-compression

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Context Compression Strategies

上下文压缩策略

When agent sessions generate millions of tokens of conversation history, compression becomes mandatory. The naive approach is aggressive compression to minimize tokens per request. The correct optimization target is tokens per task: total tokens consumed to complete a task, including re-fetching costs when compression loses critical information.
当Agent会话生成数百万Token的对话历史时,压缩就变得必不可少。简单粗暴的方式是激进压缩,以最小化每个请求的Token数。而正确的优化目标是按任务计算Token数:完成一项任务消耗的总Token数,包括因压缩丢失关键信息而导致的重新获取成本。

When to Activate

激活时机

Activate this skill when:
  • Agent sessions exceed context window limits
  • Codebases exceed context windows (5M+ token systems)
  • Designing conversation summarization strategies
  • Debugging cases where agents "forget" what files they modified
  • Building evaluation frameworks for compression quality
在以下场景激活本技能:
  • Agent会话超出上下文窗口限制
  • 代码库超出上下文窗口(500万+ Token的系统)
  • 设计对话总结策略时
  • 调试Agent“遗忘”修改过哪些文件的情况时
  • 构建压缩质量评估框架时

Core Concepts

核心概念

Context compression trades token savings against information loss. Three production-ready approaches exist:
  1. Anchored Iterative Summarization: Maintain structured, persistent summaries with explicit sections for session intent, file modifications, decisions, and next steps. When compression triggers, summarize only the newly-truncated span and merge with the existing summary. Structure forces preservation by dedicating sections to specific information types.
  2. Opaque Compression: Produce compressed representations optimized for reconstruction fidelity. Achieves highest compression ratios (99%+) but sacrifices interpretability. Cannot verify what was preserved.
  3. Regenerative Full Summary: Generate detailed structured summaries on each compression. Produces readable output but may lose details across repeated compression cycles due to full regeneration rather than incremental merging.
The critical insight: structure forces preservation. Dedicated sections act as checklists that the summarizer must populate, preventing silent information drift.
上下文压缩是在Token节省和信息丢失之间做权衡。目前有三种可用于生产环境的方法:
  1. 锚定迭代总结:维护结构化的持久化总结,包含会话意图、文件修改、决策和下一步行动等明确板块。触发压缩时,仅总结新截断的内容段,并与现有总结合并。结构化的方式通过为特定信息类型分配专属板块,强制保留关键信息。
  2. 透明压缩(Opaque Compression):生成针对重建保真度优化的压缩表示。能实现最高压缩比(99%+),但牺牲了可解释性,无法验证哪些信息被保留。
  3. 再生式完整总结:每次压缩时生成详细的结构化总结。输出内容可读性强,但由于是完全重新生成而非增量合并,多次压缩循环后可能会丢失细节。
关键洞见:结构化设计强制信息保留。专属板块相当于总结器必须填充的检查清单,防止信息悄无声息地流失。

Detailed Topics

详细主题

Why Tokens-Per-Task Matters

为什么按任务计算Token数至关重要

Traditional compression metrics target tokens-per-request. This is the wrong optimization. When compression loses critical details like file paths or error messages, the agent must re-fetch information, re-explore approaches, and waste tokens recovering context.
The right metric is tokens-per-task: total tokens consumed from task start to completion. A compression strategy saving 0.5% more tokens but causing 20% more re-fetching costs more overall.
传统压缩指标以每个请求的Token数为目标,这是错误的优化方向。当压缩丢失文件路径或错误信息等关键细节时,Agent必须重新获取信息、重新探索方案,浪费Token来恢复上下文。
正确的指标是按任务计算Token数:从任务开始到完成消耗的总Token数。一种压缩策略虽然多节省0.5%的Token,但导致重新获取成本增加20%,总体成本反而更高。

The Artifact Trail Problem

工件追踪问题

Artifact trail integrity is the weakest dimension across all compression methods, scoring 2.2-2.5 out of 5.0 in evaluations. Even structured summarization with explicit file sections struggles to maintain complete file tracking across long sessions.
Coding agents need to know:
  • Which files were created
  • Which files were modified and what changed
  • Which files were read but not changed
  • Function names, variable names, error messages
This problem likely requires specialized handling beyond general summarization: a separate artifact index or explicit file-state tracking in agent scaffolding.
所有压缩方法中,工件追踪的完整性是最薄弱的维度,评估得分仅为2.2-2.5分(满分5.0)。即使是带有明确文件板块的结构化总结,在长会话中也难以完整追踪文件。
代码Agent需要了解:
  • 创建了哪些文件
  • 修改了哪些文件以及具体变更内容
  • 读取但未修改的文件
  • 函数名、变量名、错误信息
这个问题可能需要超出通用总结的专门处理:单独的工件索引或Agent架构中明确的文件状态追踪。

Structured Summary Sections

结构化总结板块

Effective structured summaries include explicit sections:
markdown
undefined
有效的结构化总结包含以下明确板块:
markdown
undefined

Session Intent

会话意图

[What the user is trying to accomplish]
[用户试图完成的目标]

Files Modified

修改的文件

  • auth.controller.ts: Fixed JWT token generation
  • config/redis.ts: Updated connection pooling
  • tests/auth.test.ts: Added mock setup for new config
  • auth.controller.ts: 修复JWT Token生成问题
  • config/redis.ts: 更新连接池配置
  • tests/auth.test.ts: 为新配置添加模拟设置

Decisions Made

已做决策

  • Using Redis connection pool instead of per-request connections
  • Retry logic with exponential backoff for transient failures
  • 使用Redis连接池而非逐请求连接
  • 为瞬时故障添加指数退避重试逻辑

Current State

当前状态

  • 14 tests passing, 2 failing
  • Remaining: mock setup for session service tests
  • 14个测试通过,2个失败
  • 剩余工作:会话服务测试的模拟设置

Next Steps

下一步行动

  1. Fix remaining test failures
  2. Run full test suite
  3. Update documentation

This structure prevents silent loss of file paths or decisions because each section must be explicitly addressed.
  1. 修复剩余测试失败问题
  2. 运行完整测试套件
  3. 更新文档

这种结构避免了文件路径或决策的悄无声息丢失,因为每个板块都必须被明确处理。

Compression Trigger Strategies

压缩触发策略

When to trigger compression matters as much as how to compress:
StrategyTrigger PointTrade-off
Fixed threshold70-80% context utilizationSimple but may compress too early
Sliding windowKeep last N turns + summaryPredictable context size
Importance-basedCompress low-relevance sections firstComplex but preserves signal
Task-boundaryCompress at logical task completionsClean summaries but unpredictable timing
The sliding window approach with structured summaries provides the best balance of predictability and quality for most coding agent use cases.
何时触发压缩与如何压缩同样重要:
策略触发点权衡
固定阈值上下文利用率达70-80%简单但可能过早压缩
滑动窗口保留最后N轮对话+总结上下文大小可预测
基于重要性优先压缩低相关性内容复杂度高但保留关键信号
任务边界在逻辑任务完成时压缩总结清晰但时机不可预测
对于大多数代码Agent场景,结合结构化总结的滑动窗口方法在可预测性和质量之间达到最佳平衡。

Probe-Based Evaluation

探针式评估

Traditional metrics like ROUGE or embedding similarity fail to capture functional compression quality. A summary may score high on lexical overlap while missing the one file path the agent needs.
Probe-based evaluation directly measures functional quality by asking questions after compression:
Probe TypeWhat It TestsExample Question
RecallFactual retention"What was the original error message?"
ArtifactFile tracking"Which files have we modified?"
ContinuationTask planning"What should we do next?"
DecisionReasoning chain"What did we decide about the Redis issue?"
If compression preserved the right information, the agent answers correctly. If not, it guesses or hallucinates.
ROUGE或嵌入相似度等传统指标无法捕捉功能性压缩质量。一个总结可能在词汇重叠上得分很高,但却丢失了Agent需要的某个文件路径。
探针式评估通过在压缩后直接提问来衡量功能质量:
探针类型测试内容示例问题
召回率事实留存“原始错误信息是什么?”
工件追踪文件追踪“我们修改了哪些文件?”
任务延续任务规划“我们接下来应该做什么?”
决策留存推理链“我们针对Redis问题做出了什么决策?”
如果压缩保留了正确信息,Agent就能正确回答;否则就会猜测或生成幻觉内容。

Evaluation Dimensions

评估维度

Six dimensions capture compression quality for coding agents:
  1. Accuracy: Are technical details correct? File paths, function names, error codes.
  2. Context Awareness: Does the response reflect current conversation state?
  3. Artifact Trail: Does the agent know which files were read or modified?
  4. Completeness: Does the response address all parts of the question?
  5. Continuity: Can work continue without re-fetching information?
  6. Instruction Following: Does the response respect stated constraints?
Accuracy shows the largest variation between compression methods (0.6 point gap). Artifact trail is universally weak (2.2-2.5 range).
六个维度可衡量代码Agent的压缩质量:
  1. 准确性:技术细节是否正确?比如文件路径、函数名、错误代码。
  2. 上下文感知:响应是否反映当前对话状态?
  3. 工件追踪:Agent是否了解哪些文件被读取或修改?
  4. 完整性:响应是否涵盖问题的所有部分?
  5. 连续性:无需重新获取信息即可继续工作吗?
  6. 指令遵循:响应是否符合指定约束?
不同压缩方法的准确性差异最大(相差0.6分)。工件追踪是普遍薄弱的环节(得分范围2.2-2.5)。

Practical Guidance

实践指南

Three-Phase Compression Workflow

三阶段压缩工作流

For large codebases or agent systems exceeding context windows, apply compression through three phases:
  1. Research Phase: Produce a research document from architecture diagrams, documentation, and key interfaces. Compress exploration into a structured analysis of components and dependencies. Output: single research document.
  2. Planning Phase: Convert research into implementation specification with function signatures, type definitions, and data flow. A 5M token codebase compresses to approximately 2,000 words of specification.
  3. Implementation Phase: Execute against the specification. Context remains focused on the spec rather than raw codebase exploration.
对于超出上下文窗口的大型代码库或Agent系统,可通过三个阶段应用压缩:
  1. 研究阶段:从架构图、文档和关键接口生成研究文档。将探索内容压缩为组件和依赖关系的结构化分析。输出:单个研究文档。
  2. 规划阶段:将研究内容转换为包含函数签名、类型定义和数据流的实现规范。一个500万Token的代码库可压缩为约2000字的规范。
  3. 实现阶段:根据规范执行任务。上下文始终聚焦于规范而非原始代码库探索内容。

Using Example Artifacts as Seeds

使用示例工件作为种子

When provided with a manual migration example or reference PR, use it as a template to understand the target pattern. The example reveals constraints that static analysis cannot surface: which invariants must hold, which services break on changes, and what a clean migration looks like.
This is particularly important when the agent cannot distinguish essential complexity (business requirements) from accidental complexity (legacy workarounds). The example artifact encodes that distinction.
当提供手动迁移示例或参考PR时,将其作为模板来理解目标模式。示例揭示了静态分析无法发现的约束:哪些不变量必须保持、哪些服务会因变更而中断、以及清晰的迁移是什么样的。
当Agent无法区分本质复杂度(业务需求)和偶发复杂度(遗留解决方案)时,这一点尤为重要。示例工件编码了这种区分。

Implementing Anchored Iterative Summarization

实现锚定迭代总结

  1. Define explicit summary sections matching your agent's needs
  2. On first compression trigger, summarize truncated history into sections
  3. On subsequent compressions, summarize only new truncated content
  4. Merge new summary into existing sections rather than regenerating
  5. Track which information came from which compression cycle for debugging
  1. 根据Agent需求定义明确的总结板块
  2. 首次触发压缩时,将截断的历史内容总结到对应板块
  3. 后续压缩时,仅总结新截断的内容
  4. 将新总结合并到现有板块,而非重新生成
  5. 追踪各压缩周期的信息来源,以便调试

When to Use Each Approach

各方法的适用场景

Use anchored iterative summarization when:
  • Sessions are long-running (100+ messages)
  • File tracking matters (coding, debugging)
  • You need to verify what was preserved
Use opaque compression when:
  • Maximum token savings required
  • Sessions are relatively short
  • Re-fetching costs are low
Use regenerative summaries when:
  • Summary interpretability is critical
  • Sessions have clear phase boundaries
  • Full context review is acceptable on each compression
当以下情况时使用锚定迭代总结:
  • 会话是长时间运行的(100+条消息)
  • 文件追踪很重要(编码、调试场景)
  • 需要验证哪些信息被保留
当以下情况时使用透明压缩:
  • 需要最大化Token节省
  • 会话相对较短
  • 重新获取成本低
当以下情况时使用再生式总结:
  • 总结的可解释性至关重要
  • 会话有明确的阶段边界
  • 每次压缩时可接受完整上下文回顾

Compression Ratio Considerations

压缩比考量

MethodCompression RatioQuality ScoreTrade-off
Anchored Iterative98.6%3.70Best quality, slightly less compression
Regenerative98.7%3.44Good quality, moderate compression
Opaque99.3%3.35Best compression, quality loss
The 0.7% additional tokens retained by structured summarization buys 0.35 quality points. For any task where re-fetching costs matter, this trade-off favors structured approaches.
方法压缩比质量得分权衡
锚定迭代总结98.6%3.70质量最佳,压缩比略低
再生式总结98.7%3.44质量良好,压缩比适中
透明压缩99.3%3.35压缩比最高,质量损失大
结构化总结多保留的0.7% Token换来了0.35的质量得分。对于任何重新获取成本较高的任务,这种权衡都更倾向于结构化方法。

Examples

示例

Example 1: Debugging Session Compression
Original context (89,000 tokens, 178 messages):
  • 401 error on /api/auth/login endpoint
  • Traced through auth controller, middleware, session store
  • Found stale Redis connection
  • Fixed connection pooling, added retry logic
  • 14 tests passing, 2 failing
Structured summary after compression:
markdown
undefined
示例1:调试会话压缩
原始上下文(89000 Token,178条消息):
  • /api/auth/login端点出现401错误
  • 追踪了auth控制器、中间件、会话存储
  • 发现Redis连接过期
  • 修复了连接池,添加了重试逻辑
  • 14个测试通过,2个失败
压缩后的结构化总结:
markdown
undefined

Session Intent

会话意图

Debug 401 Unauthorized error on /api/auth/login despite valid credentials.
调试/api/auth/login端点在凭证有效的情况下返回401 Unauthorized错误的问题。

Root Cause

根本原因

Stale Redis connection in session store. JWT generated correctly but session could not be persisted.
会话存储中的Redis连接过期。JWT生成正确,但会话无法持久化。

Files Modified

修改的文件

  • auth.controller.ts: No changes (read only)
  • middleware/cors.ts: No changes (examined)
  • config/redis.ts: Fixed connection pooling configuration
  • services/session.service.ts: Added retry logic for transient failures
  • tests/auth.test.ts: Updated mock setup
  • auth.controller.ts:无变更(仅读取)
  • middleware/cors.ts:无变更(已检查)
  • config/redis.ts:修复了连接池配置
  • services/session.service.ts:为瞬时故障添加了重试逻辑
  • tests/auth.test.ts:更新了模拟设置

Test Status

测试状态

14 passing, 2 failing (mock setup issues)
14个通过,2个失败(模拟设置问题)

Next Steps

下一步行动

  1. Fix remaining test failures (mock session service)
  2. Run full test suite
  3. Deploy to staging

**Example 2: Probe Response Quality**

After compression, asking "What was the original error?":

Good response (structured summarization):
> "The original error was a 401 Unauthorized response from the /api/auth/login endpoint. Users received this error with valid credentials. Root cause was stale Redis connection in session store."

Poor response (aggressive compression):
> "We were debugging an authentication issue. The login was failing. We fixed some configuration problems."

The structured response preserves endpoint, error code, and root cause. The aggressive response loses all technical detail.
  1. 修复剩余测试失败问题(模拟会话服务)
  2. 运行完整测试套件
  3. 部署到预发布环境

**示例2:探针响应质量**

压缩后询问“原始错误是什么?”:

良好响应(结构化总结):
> “原始错误是/api/auth/login端点返回401 Unauthorized响应。用户使用有效凭证时仍会收到该错误。根本原因是会话存储中的Redis连接过期。”

糟糕响应(激进压缩):
> “我们当时在调试认证问题。登录功能失败了。我们修复了一些配置问题。”

结构化响应保留了端点、错误代码和根本原因。激进响应则丢失了所有技术细节。

Guidelines

指南

  1. Optimize for tokens-per-task, not tokens-per-request
  2. Use structured summaries with explicit sections for file tracking
  3. Trigger compression at 70-80% context utilization
  4. Implement incremental merging rather than full regeneration
  5. Test compression quality with probe-based evaluation
  6. Track artifact trail separately if file tracking is critical
  7. Accept slightly lower compression ratios for better quality retention
  8. Monitor re-fetching frequency as a compression quality signal
  1. 按任务计算Token数进行优化,而非按请求计算
  2. 使用带明确板块的结构化总结来追踪文件
  3. 在上下文利用率达70-80%时触发压缩
  4. 实现增量合并而非完全重新生成总结
  5. 使用探针式评估测试压缩质量
  6. 如果文件追踪至关重要,单独追踪工件轨迹
  7. 为了更好的质量保留,接受略低的压缩比
  8. 将重新获取频率作为压缩质量的监控指标

Integration

集成

This skill connects to several others in the collection:
  • context-degradation - Compression is a mitigation strategy for degradation
  • context-optimization - Compression is one optimization technique among many
  • evaluation - Probe-based evaluation applies to compression testing
  • memory-systems - Compression relates to scratchpad and summary memory patterns
本技能与集合中的其他多个技能相关:
  • context-degradation - 压缩是缓解上下文退化的策略
  • context-optimization - 压缩是众多优化技术之一
  • evaluation - 探针式评估适用于压缩测试
  • memory-systems - 压缩与暂存器和总结内存模式相关

References

参考资料

Internal reference:
  • Evaluation Framework Reference - Detailed probe types and scoring rubrics
Related skills in this collection:
  • context-degradation - Understanding what compression prevents
  • context-optimization - Broader optimization strategies
  • evaluation - Building evaluation frameworks
External resources:
  • Factory Research: Evaluating Context Compression for AI Agents (December 2025)
  • Research on LLM-as-judge evaluation methodology (Zheng et al., 2023)
  • Netflix Engineering: "The Infinite Software Crisis" - Three-phase workflow and context compression at scale (AI Summit 2025)

内部参考:
  • 评估框架参考 - 详细的探针类型和评分标准
本集合中的相关技能:
  • context-degradation - 了解压缩能预防什么问题
  • context-optimization - 更广泛的优化策略
  • evaluation - 构建评估框架
外部资源:
  • Factory Research: 《评估AI Agent的上下文压缩》(2025年12月)
  • LLM-as-judge评估方法研究(Zheng等人,2023)
  • Netflix工程团队:“无限软件危机” - 大规模场景下的三阶段工作流和上下文压缩(2025年AI峰会)

Skill Metadata

技能元数据

Created: 2025-12-22 Last Updated: 2025-12-26 Author: Agent Skills for Context Engineering Contributors Version: 1.1.0
创建时间:2025-12-22 最后更新:2025-12-26 作者:上下文工程Agent技能贡献者 版本:1.1.0