context-compression
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseContext Compression Strategies
上下文压缩策略
When agent sessions generate millions of tokens of conversation history, compression becomes mandatory. The naive approach is aggressive compression to minimize tokens per request. The correct optimization target is tokens per task: total tokens consumed to complete a task, including re-fetching costs when compression loses critical information.
当Agent会话产生数百万令牌的对话历史时,压缩就变得必不可少。朴素的方法是采用激进压缩来最小化每个请求的令牌数,但正确的优化目标应该是每任务令牌数:即完成一项任务消耗的总令牌数,包括因压缩丢失关键信息而导致的重新获取成本。
When to Activate
启用场景
Activate this skill when:
- Agent sessions exceed context window limits
- Codebases exceed context windows (5M+ token systems)
- Designing conversation summarization strategies
- Debugging cases where agents "forget" what files they modified
- Building evaluation frameworks for compression quality
在以下场景中启用该技能:
- Agent会话超出上下文窗口限制
- 代码库超出上下文窗口(500万+令牌的系统)
- 设计对话总结策略
- 调试Agent“遗忘”已修改文件的案例
- 构建压缩质量评估框架
Core Concepts
核心概念
Context compression trades token savings against information loss. Three production-ready approaches exist:
-
Anchored Iterative Summarization: Maintain structured, persistent summaries with explicit sections for session intent, file modifications, decisions, and next steps. When compression triggers, summarize only the newly-truncated span and merge with the existing summary. Structure forces preservation by dedicating sections to specific information types.
-
Opaque Compression: Produce compressed representations optimized for reconstruction fidelity. Achieves highest compression ratios (99%+) but sacrifices interpretability. Cannot verify what was preserved.
-
Regenerative Full Summary: Generate detailed structured summaries on each compression. Produces readable output but may lose details across repeated compression cycles due to full regeneration rather than incremental merging.
The critical insight: structure forces preservation. Dedicated sections act as checklists that the summarizer must populate, preventing silent information drift.
上下文压缩是在令牌节省与信息丢失之间进行权衡。目前有三种可投入生产的方案:
-
锚定迭代摘要:维护结构化的持久化摘要,包含会话目标、文件修改、决策和下一步计划等明确板块。触发压缩时,仅对新截断的内容段进行总结,并与现有摘要合并。结构化的设计通过为特定信息类型分配专属板块,强制关键信息得以保留。
-
透明压缩:生成针对重建保真度优化的压缩表示。可实现最高压缩比(99%+),但牺牲了可解释性,无法验证哪些信息被保留。
-
再生式完整摘要:每次压缩时生成详细的结构化摘要。输出内容可读,但由于是完全再生而非增量合并,多次压缩循环后可能会丢失细节。
关键洞察:结构化设计强制信息保留。专属板块如同检查清单,总结器必须填充内容,从而避免信息悄无声息地流失。
Detailed Topics
详细主题
Why Tokens-Per-Task Matters
为何每任务令牌数至关重要
Traditional compression metrics target tokens-per-request. This is the wrong optimization. When compression loses critical details like file paths or error messages, the agent must re-fetch information, re-explore approaches, and waste tokens recovering context.
The right metric is tokens-per-task: total tokens consumed from task start to completion. A compression strategy saving 0.5% more tokens but causing 20% more re-fetching costs more overall.
传统压缩指标以每请求令牌数为目标,但这是错误的优化方向。当压缩丢失文件路径或错误信息等关键细节时,Agent必须重新获取信息、重新探索方案,在恢复上下文的过程中浪费大量令牌。
正确的指标是每任务令牌数:从任务启动到完成的总令牌消耗。一种压缩策略即使多节省0.5%的令牌,但导致重新获取成本增加20%,整体成本反而更高。
The Artifact Trail Problem
工件追踪难题
Artifact trail integrity is the weakest dimension across all compression methods, scoring 2.2-2.5 out of 5.0 in evaluations. Even structured summarization with explicit file sections struggles to maintain complete file tracking across long sessions.
Coding agents need to know:
- Which files were created
- Which files were modified and what changed
- Which files were read but not changed
- Function names, variable names, error messages
This problem likely requires specialized handling beyond general summarization: a separate artifact index or explicit file-state tracking in agent scaffolding.
工件追踪的完整性是所有压缩方法中最薄弱的环节,评估得分仅为2.2-2.5分(满分5.0)。即使是带有明确文件板块的结构化摘要,也难以在长会话中维持完整的文件追踪。
编码Agent需要了解:
- 创建了哪些文件
- 修改了哪些文件以及具体变更内容
- 读取但未修改的文件
- 函数名、变量名、错误信息
该问题可能需要通用摘要之外的专门处理:比如单独的工件索引,或在Agent架构中添加明确的文件状态追踪。
Structured Summary Sections
结构化摘要板块
Effective structured summaries include explicit sections:
markdown
undefined有效的结构化摘要包含以下明确板块:
markdown
undefinedSession Intent
会话目标
[What the user is trying to accomplish]
[用户试图完成的目标]
Files Modified
修改的文件
- auth.controller.ts: Fixed JWT token generation
- config/redis.ts: Updated connection pooling
- tests/auth.test.ts: Added mock setup for new config
- auth.controller.ts: 修复JWT令牌生成问题
- config/redis.ts: 更新连接池配置
- tests/auth.test.ts: 为新配置添加模拟设置
Decisions Made
已做决策
- Using Redis connection pool instead of per-request connections
- Retry logic with exponential backoff for transient failures
- 使用Redis连接池替代逐请求连接
- 为瞬时故障添加带指数退避的重试逻辑
Current State
当前状态
- 14 tests passing, 2 failing
- Remaining: mock setup for session service tests
- 14个测试通过,2个失败
- 剩余工作:会话服务测试的模拟设置
Next Steps
下一步计划
- Fix remaining test failures
- Run full test suite
- Update documentation
This structure prevents silent loss of file paths or decisions because each section must be explicitly addressed.- 修复剩余测试失败问题
- 运行完整测试套件
- 更新文档
这种结构可防止文件路径或决策等信息悄无声息丢失,因为每个板块都必须被明确填充。Compression Trigger Strategies
压缩触发策略
When to trigger compression matters as much as how to compress:
| Strategy | Trigger Point | Trade-off |
|---|---|---|
| Fixed threshold | 70-80% context utilization | Simple but may compress too early |
| Sliding window | Keep last N turns + summary | Predictable context size |
| Importance-based | Compress low-relevance sections first | Complex but preserves signal |
| Task-boundary | Compress at logical task completions | Clean summaries but unpredictable timing |
The sliding window approach with structured summaries provides the best balance of predictability and quality for most coding agent use cases.
何时触发压缩与如何压缩同样重要:
| 策略 | 触发点 | 权衡 |
|---|---|---|
| 固定阈值 | 上下文利用率达70-80% | 简单但可能过早压缩 |
| 滑动窗口 | 保留最后N轮对话+摘要 | 上下文大小可预测 |
| 基于重要性 | 优先压缩低相关性内容 | 复杂度高但能保留关键信息 |
| 任务边界 | 在逻辑任务完成时压缩 | 摘要清晰但触发时机不可预测 |
对于大多数编码Agent场景,结合结构化摘要的滑动窗口方法在可预测性和质量之间达到最佳平衡。
Probe-Based Evaluation
基于探针的评估
Traditional metrics like ROUGE or embedding similarity fail to capture functional compression quality. A summary may score high on lexical overlap while missing the one file path the agent needs.
Probe-based evaluation directly measures functional quality by asking questions after compression:
| Probe Type | What It Tests | Example Question |
|---|---|---|
| Recall | Factual retention | "What was the original error message?" |
| Artifact | File tracking | "Which files have we modified?" |
| Continuation | Task planning | "What should we do next?" |
| Decision | Reasoning chain | "What did we decide about the Redis issue?" |
If compression preserved the right information, the agent answers correctly. If not, it guesses or hallucinates.
传统指标如ROUGE或嵌入相似度无法体现压缩的功能质量。一个摘要可能在词汇重叠度上得分很高,但却丢失了Agent需要的某个文件路径。
基于探针的评估通过在压缩后提出问题,直接衡量功能质量:
| 探针类型 | 测试内容 | 示例问题 |
|---|---|---|
| 召回率 | 事实信息保留 | “原始错误信息是什么?” |
| 工件追踪 | 文件跟踪情况 | “我们修改了哪些文件?” |
| 任务延续 | 任务规划能力 | “我们下一步应该做什么?” |
| 决策保留 | 推理链完整性 | “我们针对Redis问题做出了什么决策?” |
如果压缩保留了正确信息,Agent就能给出准确回答;否则会猜测或产生幻觉。
Evaluation Dimensions
评估维度
Six dimensions capture compression quality for coding agents:
- Accuracy: Are technical details correct? File paths, function names, error codes.
- Context Awareness: Does the response reflect current conversation state?
- Artifact Trail: Does the agent know which files were read or modified?
- Completeness: Does the response address all parts of the question?
- Continuity: Can work continue without re-fetching information?
- Instruction Following: Does the response respect stated constraints?
Accuracy shows the largest variation between compression methods (0.6 point gap). Artifact trail is universally weak (2.2-2.5 range).
以下六个维度可衡量编码Agent的压缩质量:
- 准确性:技术细节是否正确?如文件路径、函数名、错误码。
- 上下文感知:响应是否反映当前对话状态?
- 工件追踪:Agent是否知晓哪些文件被读取或修改?
- 完整性:响应是否覆盖问题的所有部分?
- 连续性:无需重新获取信息即可继续工作吗?
- 指令遵循:响应是否符合指定约束?
不同压缩方法的准确性差异最大(相差0.6分),而工件追踪是所有方法的普遍短板(得分2.2-2.5)。
Practical Guidance
实践指南
Three-Phase Compression Workflow
三阶段压缩工作流
For large codebases or agent systems exceeding context windows, apply compression through three phases:
-
Research Phase: Produce a research document from architecture diagrams, documentation, and key interfaces. Compress exploration into a structured analysis of components and dependencies. Output: single research document.
-
Planning Phase: Convert research into implementation specification with function signatures, type definitions, and data flow. A 5M token codebase compresses to approximately 2,000 words of specification.
-
Implementation Phase: Execute against the specification. Context remains focused on the spec rather than raw codebase exploration.
针对大型代码库或超出上下文窗口的Agent系统,可通过三个阶段应用压缩:
-
研究阶段:从架构图、文档和核心接口生成研究文档。将探索内容压缩为组件和依赖关系的结构化分析。输出:单一研究文档。
-
规划阶段:将研究内容转换为包含函数签名、类型定义和数据流的实现规范。一个500万令牌的代码库可压缩为约2000字的规范。
-
实现阶段:根据规范执行任务。上下文始终聚焦于规范而非原始代码库探索内容。
Using Example Artifacts as Seeds
以示例工件为种子
When provided with a manual migration example or reference PR, use it as a template to understand the target pattern. The example reveals constraints that static analysis cannot surface: which invariants must hold, which services break on changes, and what a clean migration looks like.
This is particularly important when the agent cannot distinguish essential complexity (business requirements) from accidental complexity (legacy workarounds). The example artifact encodes that distinction.
当提供手动迁移示例或参考PR时,将其作为模板来理解目标模式。示例会揭示静态分析无法发现的约束:哪些不变量必须维持、哪些服务会因变更而中断、以及整洁的迁移应该是什么样子。
这在Agent无法区分本质复杂度(业务需求)和偶发复杂度(遗留解决方案)时尤为重要。示例工件编码了这种区分。
Implementing Anchored Iterative Summarization
实现锚定迭代摘要
- Define explicit summary sections matching your agent's needs
- On first compression trigger, summarize truncated history into sections
- On subsequent compressions, summarize only new truncated content
- Merge new summary into existing sections rather than regenerating
- Track which information came from which compression cycle for debugging
- 根据Agent的需求定义明确的摘要板块
- 首次触发压缩时,将截断的历史内容总结到对应板块
- 后续压缩时,仅对新截断的内容进行总结
- 将新摘要合并到现有板块中,而非完全再生
- 跟踪各压缩周期的来源信息以便调试
When to Use Each Approach
各方案的适用场景
Use anchored iterative summarization when:
- Sessions are long-running (100+ messages)
- File tracking matters (coding, debugging)
- You need to verify what was preserved
Use opaque compression when:
- Maximum token savings required
- Sessions are relatively short
- Re-fetching costs are low
Use regenerative summaries when:
- Summary interpretability is critical
- Sessions have clear phase boundaries
- Full context review is acceptable on each compression
当以下情况时使用锚定迭代摘要:
- 会话长期运行(100+条消息)
- 文件追踪至关重要(编码、调试场景)
- 需要验证哪些信息被保留
当以下情况时使用透明压缩:
- 需要最大程度节省令牌
- 会话相对较短
- 重新获取成本低
当以下情况时使用再生式摘要:
- 摘要的可解释性至关重要
- 会话有清晰的阶段边界
- 每次压缩时可接受完整上下文回顾
Compression Ratio Considerations
压缩比考量
| Method | Compression Ratio | Quality Score | Trade-off |
|---|---|---|---|
| Anchored Iterative | 98.6% | 3.70 | Best quality, slightly less compression |
| Regenerative | 98.7% | 3.44 | Good quality, moderate compression |
| Opaque | 99.3% | 3.35 | Best compression, quality loss |
The 0.7% additional tokens retained by structured summarization buys 0.35 quality points. For any task where re-fetching costs matter, this trade-off favors structured approaches.
| 方法 | 压缩比 | 质量得分 | 权衡 |
|---|---|---|---|
| 锚定迭代 | 98.6% | 3.70 | 质量最佳,压缩比略低 |
| 再生式 | 98.7% | 3.44 | 质量良好,压缩比适中 |
| 透明压缩 | 99.3% | 3.35 | 压缩比最高,存在质量损失 |
结构化摘要多保留的0.7%令牌换来了0.35分的质量提升。对于任何重新获取成本较高的任务,这种权衡都更倾向于结构化方案。
Examples
示例
Example 1: Debugging Session Compression
Original context (89,000 tokens, 178 messages):
- 401 error on /api/auth/login endpoint
- Traced through auth controller, middleware, session store
- Found stale Redis connection
- Fixed connection pooling, added retry logic
- 14 tests passing, 2 failing
Structured summary after compression:
markdown
undefined示例1:调试会话压缩
原始上下文(89000令牌,178条消息):
- /api/auth/login端点出现401错误
- 追踪了认证控制器、中间件、会话存储
- 发现Redis连接过期
- 修复连接池配置,添加重试逻辑
- 14个测试通过,2个失败
压缩后的结构化摘要:
markdown
undefinedSession Intent
会话目标
Debug 401 Unauthorized error on /api/auth/login despite valid credentials.
调试/api/auth/login端点在凭证有效的情况下仍返回401未授权的错误。
Root Cause
根本原因
Stale Redis connection in session store. JWT generated correctly but session could not be persisted.
会话存储中的Redis连接过期。JWT生成正常,但会话无法持久化。
Files Modified
修改的文件
- auth.controller.ts: No changes (read only)
- middleware/cors.ts: No changes (examined)
- config/redis.ts: Fixed connection pooling configuration
- services/session.service.ts: Added retry logic for transient failures
- tests/auth.test.ts: Updated mock setup
- auth.controller.ts: 无修改(仅读取)
- middleware/cors.ts: 无修改(已检查)
- config/redis.ts: 修复连接池配置
- services/session.service.ts: 为瞬时故障添加重试逻辑
- tests/auth.test.ts: 更新模拟设置
Test Status
测试状态
14 passing, 2 failing (mock setup issues)
14个通过,2个失败(模拟设置问题)
Next Steps
下一步计划
- Fix remaining test failures (mock session service)
- Run full test suite
- Deploy to staging
**Example 2: Probe Response Quality**
After compression, asking "What was the original error?":
Good response (structured summarization):
> "The original error was a 401 Unauthorized response from the /api/auth/login endpoint. Users received this error with valid credentials. Root cause was stale Redis connection in session store."
Poor response (aggressive compression):
> "We were debugging an authentication issue. The login was failing. We fixed some configuration problems."
The structured response preserves endpoint, error code, and root cause. The aggressive response loses all technical detail.- 修复剩余测试失败问题(模拟会话服务)
- 运行完整测试套件
- 部署到预发布环境
**示例2:探针响应质量**
压缩后询问“原始错误是什么?”:
良好响应(结构化摘要):
> “原始错误是/api/auth/login端点返回401未授权响应。用户使用有效凭证仍收到该错误,根本原因是会话存储中的Redis连接过期。”
糟糕响应(激进压缩):
> “我们在调试认证问题,登录失败,修复了一些配置问题。”
结构化响应保留了端点、错误码和根本原因,而激进压缩丢失了所有技术细节。Guidelines
指导原则
- Optimize for tokens-per-task, not tokens-per-request
- Use structured summaries with explicit sections for file tracking
- Trigger compression at 70-80% context utilization
- Implement incremental merging rather than full regeneration
- Test compression quality with probe-based evaluation
- Track artifact trail separately if file tracking is critical
- Accept slightly lower compression ratios for better quality retention
- Monitor re-fetching frequency as a compression quality signal
- 以每任务令牌数为优化目标,而非每请求令牌数
- 使用带明确板块的结构化摘要进行文件追踪
- 在上下文利用率达70-80%时触发压缩
- 实现增量合并而非完全再生摘要
- 用基于探针的评估测试压缩质量
- 若文件追踪至关重要,单独跟踪工件轨迹
- 为更好的质量保留接受略低的压缩比
- 将重新获取频率作为压缩质量的监控指标
Integration
集成
This skill connects to several others in the collection:
- context-degradation - Compression is a mitigation strategy for degradation
- context-optimization - Compression is one optimization technique among many
- evaluation - Probe-based evaluation applies to compression testing
- memory-systems - Compression relates to scratchpad and summary memory patterns
该技能与集合中的其他技能相关:
- context-degradation - 压缩是缓解上下文退化的策略
- context-optimization - 压缩是众多优化技术之一
- evaluation - 基于探针的评估适用于压缩测试
- memory-systems - 压缩与暂存器和摘要内存模式相关
References
参考资料
Internal reference:
- Evaluation Framework Reference - Detailed probe types and scoring rubrics
Related skills in this collection:
- context-degradation - Understanding what compression prevents
- context-optimization - Broader optimization strategies
- evaluation - Building evaluation frameworks
External resources:
- Factory Research: Evaluating Context Compression for AI Agents (December 2025)
- Research on LLM-as-judge evaluation methodology (Zheng et al., 2023)
- Netflix Engineering: "The Infinite Software Crisis" - Three-phase workflow and context compression at scale (AI Summit 2025)
内部参考:
- 评估框架参考 - 详细的探针类型和评分标准
集合中的相关技能:
- context-degradation - 理解压缩能预防的问题
- context-optimization - 更广泛的优化策略
- evaluation - 构建评估框架
外部资源:
- Factory Research: 评估AI Agent的上下文压缩(2025年12月)
- LLM-as-judge评估方法研究(Zheng等人,2023)
- Netflix工程:“无限软件危机” - 大规模场景下的三阶段工作流与上下文压缩(2025年AI峰会)
Skill Metadata
技能元数据
Created: 2025-12-22
Last Updated: 2025-12-26
Author: Agent Skills for Context Engineering Contributors
Version: 1.1.0
创建时间: 2025-12-22
最后更新: 2025-12-26
作者: 上下文工程Agent技能贡献者
版本: 1.1.0