context-fundamentals
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseContext Engineering Fundamentals
上下文工程基础
Context is the complete state available to a language model at inference time. It includes everything the model can attend to when generating responses: system instructions, tool definitions, retrieved documents, message history, and tool outputs. Understanding context fundamentals is prerequisite to effective context engineering.
上下文是语言模型在推理时可用的完整状态。它包含模型生成响应时可以关注的所有内容:系统指令、工具定义、检索到的文档、消息历史和工具输出。了解上下文基础是开展有效上下文工程的前提。
When to Activate
适用场景
Activate this skill when:
- Designing new agent systems or modifying existing architectures
- Debugging unexpected agent behavior that may relate to context
- Optimizing context usage to reduce token costs or improve performance
- Onboarding new team members to context engineering concepts
- Reviewing context-related design decisions
在以下场景中启用此技能:
- 设计新的Agent系统或修改现有架构
- 调试与上下文相关的Agent异常行为
- 优化上下文使用以降低Token成本或提升性能
- 向新团队成员介绍上下文工程概念
- 评审与上下文相关的设计决策
Core Concepts
核心概念
Context comprises several distinct components, each with different characteristics and constraints. The attention mechanism creates a finite budget that constrains effective context usage. Progressive disclosure manages this constraint by loading information only as needed. The engineering discipline is curating the smallest high-signal token set that achieves desired outcomes.
上下文由几个不同的组成部分构成,每个部分都有不同的特性和约束条件。注意力机制会产生一个有限的预算,限制了上下文的有效使用。渐进式披露通过仅在需要时加载信息来管理这一约束。上下文工程的核心是筛选出能实现预期结果的最小高信号Token集合。
Detailed Topics
详细主题
The Anatomy of Context
上下文的组成结构
System Prompts
System prompts establish the agent's core identity, constraints, and behavioral guidelines. They are loaded once at session start and typically persist throughout the conversation. System prompts should be extremely clear and use simple, direct language at the right altitude for the agent.
The right altitude balances two failure modes. At one extreme, engineers hardcode complex brittle logic that creates fragility and maintenance burden. At the other extreme, engineers provide vague high-level guidance that fails to give concrete signals for desired outputs or falsely assumes shared context. The optimal altitude strikes a balance: specific enough to guide behavior effectively, yet flexible enough to provide strong heuristics.
Organize prompts into distinct sections using XML tagging or Markdown headers to delineate background information, instructions, tool guidance, and output description. The exact formatting matters less as models become more capable, but structural clarity remains valuable.
Tool Definitions
Tool definitions specify the actions an agent can take. Each tool includes a name, description, parameters, and return format. Tool definitions live near the front of context after serialization, typically before or after the system prompt.
Tool descriptions collectively steer agent behavior. Poor descriptions force agents to guess; optimized descriptions include usage context, examples, and defaults. The consolidation principle states that if a human engineer cannot definitively say which tool should be used in a given situation, an agent cannot be expected to do better.
Retrieved Documents
Retrieved documents provide domain-specific knowledge, reference materials, or task-relevant information. Agents use retrieval augmented generation to pull relevant documents into context at runtime rather than pre-loading all possible information.
The just-in-time approach maintains lightweight identifiers (file paths, stored queries, web links) and uses these references to load data into context dynamically. This mirrors human cognition: we generally do not memorize entire corpuses of information but rather use external organization and indexing systems to retrieve relevant information on demand.
Message History
Message history contains the conversation between the user and agent, including previous queries, responses, and reasoning. For long-running tasks, message history can grow to dominate context usage.
Message history serves as scratchpad memory where agents track progress, maintain task state, and preserve reasoning across turns. Effective management of message history is critical for long-horizon task completion.
Tool Outputs
Tool outputs are the results of agent actions: file contents, search results, command execution output, API responses, and similar data. Tool outputs comprise the majority of tokens in typical agent trajectories, with research showing observations (tool outputs) can reach 83.9% of total context usage.
Tool outputs consume context whether they are relevant to current decisions or not. This creates pressure for strategies like observation masking, compaction, and selective tool result retention.
系统提示词
系统提示词用于确立Agent的核心身份、约束条件和行为准则。它们在会话开始时加载一次,通常会在整个对话过程中保留。系统提示词应极其清晰,使用简洁直接的语言,且符合Agent的定位。
合适的定位需要平衡两种失效模式。一种极端情况是,工程师硬编码复杂且脆弱的逻辑,导致系统易出错且维护成本高。另一种极端情况是,工程师提供模糊的高层指导,无法为预期输出提供具体信号,或错误地假设存在共享上下文。最优定位需要达成平衡:既要足够具体以有效引导行为,又要足够灵活以提供强大的启发式规则。
使用XML标签或Markdown标题将提示词划分为不同的部分,区分背景信息、指令、工具指南和输出描述。随着模型能力的提升,具体格式的重要性有所降低,但结构清晰性仍然很有价值。
工具定义
工具定义指定了Agent可以执行的操作。每个工具包含名称、描述、参数和返回格式。序列化后的工具定义通常位于上下文的前部,一般在系统提示词之前或之后。
工具描述共同引导Agent的行为。描述不佳会迫使Agent猜测;优化后的描述应包含使用场景、示例和默认值。整合原则指出,如果人类工程师无法明确判断在特定场景下应使用哪个工具,那么也不能期望Agent做得更好。
检索到的文档
检索到的文档提供特定领域知识、参考资料或任务相关信息。Agent使用检索增强生成(Retrieval Augmented Generation,RAG)在运行时将相关文档拉入上下文,而不是预加载所有可能的信息。
即时加载方法通过维护轻量级标识符(文件路径、存储的查询、网页链接),并使用这些引用动态将数据加载到上下文中。这与人类的认知方式相似:我们通常不会记住全部信息语料库,而是使用外部组织和索引系统按需检索相关信息。
消息历史
消息历史包含用户与Agent之间的对话内容,包括之前的查询、回复和推理过程。对于长时间运行的任务,消息历史可能会占据上下文的主要部分。
消息历史充当临时记忆区,Agent在此跟踪进度、维护任务状态,并保留多轮对话中的推理过程。有效管理消息历史对于完成长期任务至关重要。
工具输出
工具输出是Agent执行操作的结果:文件内容、搜索结果、命令执行输出、API响应以及类似数据。在典型的Agent任务流程中,工具输出占据了大部分Token,研究表明观测结果(工具输出)可占上下文总使用量的83.9%。
无论工具输出是否与当前决策相关,都会占用上下文空间。这催生了观测掩码、压缩和选择性保留工具结果等策略的需求。
Context Windows and Attention Mechanics
上下文窗口与注意力机制
The Attention Budget Constraint
Language models process tokens through attention mechanisms that create pairwise relationships between all tokens in context. For n tokens, this creates n² relationships that must be computed and stored. As context length increases, the model's ability to capture these relationships gets stretched thin.
Models develop attention patterns from training data distributions where shorter sequences predominate. This means models have less experience with and fewer specialized parameters for context-wide dependencies. The result is an "attention budget" that depletes as context grows.
Position Encoding and Context Extension
Position encoding interpolation allows models to handle longer sequences by adapting them to originally trained smaller contexts. However, this adaptation introduces degradation in token position understanding. Models remain highly capable at longer contexts but show reduced precision for information retrieval and long-range reasoning compared to performance on shorter contexts.
The Progressive Disclosure Principle
Progressive disclosure manages context efficiently by loading information only as needed. At startup, agents load only skill names and descriptions—sufficient to know when a skill might be relevant. Full content loads only when a skill is activated for specific tasks.
This approach keeps agents fast while giving them access to more context on demand. The principle applies at multiple levels: skill selection, document loading, and even tool result retrieval.
注意力预算约束
语言模型通过注意力机制处理Token,该机制会在上下文中的所有Token之间建立成对关系。对于n个Token,这会产生n²个需要计算和存储的关系。随着上下文长度的增加,模型捕获这些关系的能力会逐渐减弱。
模型从以短序列为主的训练数据分布中学习注意力模式。这意味着模型在处理全上下文依赖方面经验较少,相关的专用参数也较少。结果就是随着上下文变长,‘注意力预算’会逐渐耗尽。
位置编码与上下文扩展
位置编码插值允许模型通过调整来处理更长的序列,适配其原本训练时的较小上下文。然而,这种调整会导致Token位置理解能力下降。在较长上下文中,模型仍然保持较高的能力,但与短上下文相比,信息检索和长距离推理的精度会有所降低。
渐进式披露原则
渐进式披露通过仅在需要时加载信息来高效管理上下文。启动时,Agent仅加载技能名称和描述——足以判断技能何时可能相关。只有当技能为特定任务激活时,才会加载完整内容。
这种方法在保证Agent速度的同时,使其能够按需访问更多上下文。该原则适用于多个层面:技能选择、文档加载甚至工具结果检索。
Context Quality Versus Context Quantity
上下文质量与数量的权衡
The assumption that larger context windows solve memory problems has been empirically debunked. Context engineering means finding the smallest possible set of high-signal tokens that maximize the likelihood of desired outcomes.
Several factors create pressure for context efficiency. Processing cost grows disproportionately with context length—not just double the cost for double the tokens, but exponentially more in time and computing resources. Model performance degrades beyond certain context lengths even when the window technically supports more tokens. Long inputs remain expensive even with prefix caching.
The guiding principle is informativity over exhaustiveness. Include what matters for the decision at hand, exclude what does not, and design systems that can access additional information on demand.
更大的上下文窗口能解决记忆问题的假设已被实证推翻。上下文工程的核心是找到最小的高信号Token集合,以最大化实现预期结果的可能性。
有多个因素推动对上下文效率的需求。处理成本随上下文长度不成比例地增长——并非Token数量翻倍成本就翻倍,而是时间和计算资源呈指数级增长。即使技术上窗口支持更多Token,当上下文长度超过一定限度时,模型性能也会下降。即使使用前缀缓存,长输入的处理成本仍然很高。
指导原则是信息量优先于全面性。包含与当前决策相关的内容,排除无关内容,并设计可按需访问额外信息的系统。
Context as Finite Resource
上下文作为有限资源
Context must be treated as a finite resource with diminishing marginal returns. Like humans with limited working memory, language models have an attention budget drawn on when parsing large volumes of context.
Every new token introduced depletes this budget by some amount. This creates the need for careful curation of available tokens. The engineering problem is optimizing utility against inherent constraints.
Context engineering is iterative and the curation phase happens each time you decide what to pass to the model. It is not a one-time prompt writing exercise but an ongoing discipline of context management.
上下文必须被视为一种边际收益递减的有限资源。与人类的工作记忆有限类似,语言模型在解析大量上下文时会消耗注意力预算。
每添加一个新Token都会消耗一定的预算。这就需要仔细筛选可用的Token。上下文工程的问题就是在固有约束下优化效用。
上下文工程是一个迭代过程,每次决定向模型传递什么内容时,都需要进行筛选。它不是一次性的提示词编写工作,而是持续的上下文管理规范。
Practical Guidance
实践指南
File-System-Based Access
基于文件系统的访问
Agents with filesystem access can use progressive disclosure naturally. Store reference materials, documentation, and data externally. Load files only when needed using standard filesystem operations. This pattern avoids stuffing context with information that may not be relevant.
The file system itself provides structure that agents can navigate. File sizes suggest complexity; naming conventions hint at purpose; timestamps serve as proxies for relevance. Metadata of file references provides a mechanism to efficiently refine behavior.
具备文件系统访问权限的Agent可以自然地使用渐进式披露。将参考资料、文档和数据存储在外部。仅在需要时通过标准文件系统操作加载文件。这种模式避免了将无关信息塞入上下文。
文件系统本身提供了Agent可以导航的结构。文件大小暗示了复杂度;命名约定提示了用途;时间戳可作为相关性的代理。文件引用的元数据提供了一种有效优化行为的机制。
Hybrid Strategies
混合策略
The most effective agents employ hybrid strategies. Pre-load some context for speed (like CLAUDE.md files or project rules), but enable autonomous exploration for additional context as needed. The decision boundary depends on task characteristics and context dynamics.
For contexts with less dynamic content, pre-loading more upfront makes sense. For rapidly changing or highly specific information, just-in-time loading avoids stale context.
最有效的Agent采用混合策略。为了速度预加载部分上下文(如CLAUDE.md文件或项目规则),但允许根据需要自主探索额外的上下文。决策边界取决于任务特性和上下文动态变化。
对于内容变化较少的上下文,预加载更多内容是合理的。对于快速变化或高度特定的信息,即时加载可避免使用过时的上下文。
Context Budgeting
上下文预算管理
Design with explicit context budgets in mind. Know the effective context limit for your model and task. Monitor context usage during development. Implement compaction triggers at appropriate thresholds. Design systems assuming context will degrade rather than hoping it will not.
Effective context budgeting requires understanding not just raw token counts but also attention distribution patterns. The middle of context receives less attention than the beginning and end. Place critical information at attention-favored positions.
在设计时要考虑明确的上下文预算。了解你的模型和任务的有效上下文限制。在开发过程中监控上下文使用情况。在适当的阈值处触发压缩机制。设计系统时要假设上下文会退化,而不是期望它不会退化。
有效的上下文预算管理不仅需要了解原始Token数量,还需要了解注意力分布模式。上下文中间部分得到的注意力比开头和结尾少。应将关键信息放在注意力集中的位置。
Examples
示例
Example 1: Organizing System Prompts
markdown
<BACKGROUND_INFORMATION>
You are a Python expert helping a development team.
Current project: Data processing pipeline in Python 3.9+
</BACKGROUND_INFORMATION>
<INSTRUCTIONS>
- Write clean, idiomatic Python code
- Include type hints for function signatures
- Add docstrings for public functions
- Follow PEP 8 style guidelines
</INSTRUCTIONS>
<TOOL_GUIDANCE>
Use bash for shell operations, python for code tasks.
File operations should use pathlib for cross-platform compatibility.
</TOOL_GUIDANCE>
<OUTPUT_DESCRIPTION>
Provide code blocks with syntax highlighting.
Explain non-obvious decisions in comments.
</OUTPUT_DESCRIPTION>Example 2: Progressive Document Loading
markdown
undefined示例1:组织系统提示词
markdown
<BACKGROUND_INFORMATION>
You are a Python expert helping a development team.
Current project: Data processing pipeline in Python 3.9+
</BACKGROUND_INFORMATION>
<INSTRUCTIONS>
- Write clean, idiomatic Python code
- Include type hints for function signatures
- Add docstrings for public functions
- Follow PEP 8 style guidelines
</INSTRUCTIONS>
<TOOL_GUIDANCE>
Use bash for shell operations, python for code tasks.
File operations should use pathlib for cross-platform compatibility.
</TOOL_GUIDANCE>
<OUTPUT_DESCRIPTION>
Provide code blocks with syntax highlighting.
Explain non-obvious decisions in comments.
</OUTPUT_DESCRIPTION>示例2:渐进式文档加载
markdown
undefinedInstead of loading all documentation at once:
Instead of loading all documentation at once:
Step 1: Load summary
Step 1: Load summary
docs/api_summary.md # Lightweight overview
docs/api_summary.md # Lightweight overview
Step 2: Load specific section as needed
Step 2: Load specific section as needed
docs/api/endpoints.md # Only when API calls needed
docs/api/authentication.md # Only when auth context needed
undefineddocs/api/endpoints.md # Only when API calls needed
docs/api/authentication.md # Only when auth context needed
undefinedGuidelines
指导原则
- Treat context as a finite resource with diminishing returns
- Place critical information at attention-favored positions (beginning and end)
- Use progressive disclosure to defer loading until needed
- Organize system prompts with clear section boundaries
- Monitor context usage during development
- Implement compaction triggers at 70-80% utilization
- Design for context degradation rather than hoping to avoid it
- Prefer smaller high-signal context over larger low-signal context
- 将上下文视为边际收益递减的有限资源
- 将关键信息放在注意力集中的位置(开头和结尾)
- 使用渐进式披露,延迟加载直到需要时
- 用清晰的分区组织系统提示词
- 在开发过程中监控上下文使用情况
- 在使用率达到70-80%时触发压缩机制
- 针对上下文退化进行设计,而非期望避免退化
- 优先选择小而精的高信号上下文,而非大而杂的低信号上下文
Integration
技能集成
This skill provides foundational context that all other skills build upon. It should be studied first before exploring:
- context-degradation - Understanding how context fails
- context-optimization - Techniques for extending context capacity
- multi-agent-patterns - How context isolation enables multi-agent systems
- tool-design - How tool definitions interact with context
此技能为所有其他技能提供基础,应先学习此技能,再探索以下内容:
- context-degradation - 了解上下文失效的方式
- context-optimization - 扩展上下文容量的技术
- multi-agent-patterns - 上下文隔离如何支持多Agent系统
- tool-design - 工具定义如何与上下文交互
References
参考资料
Internal reference:
- Context Components Reference - Detailed technical reference
Related skills in this collection:
- context-degradation - Understanding context failure patterns
- context-optimization - Techniques for efficient context use
External resources:
- Research on transformer attention mechanisms
- Production engineering guides from leading AI labs
- Framework documentation on context window management
内部参考:
- 上下文组件参考 - 详细技术参考
本集合中的相关技能:
- context-degradation - 了解上下文失效模式
- context-optimization - 高效使用上下文的技术
外部资源:
- Transformer注意力机制相关研究
- 顶尖AI实验室的生产工程指南
- 上下文窗口管理相关框架文档
Skill Metadata
技能元数据
Created: 2025-12-20
Last Updated: 2025-12-20
Author: Agent Skills for Context Engineering Contributors
Version: 1.0.0
Created: 2025-12-20
Last Updated: 2025-12-20
作者: 上下文工程技能贡献者
Version: 1.0.0