context-engineering

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Context Engineering

上下文工程

Context engineering is the discipline of curating and maintaining the optimal set of tokens during LLM inference. Unlike prompt engineering (crafting individual prompts), context engineering focuses on what information enters the context window and when.

上下文工程是在LLM推理过程中精心筛选和维护最优令牌集合的学科。与提示工程（设计单个提示）不同，上下文工程关注的是哪些信息在何时进入上下文窗口。

Core Principles

核心原则

Context as a Finite Resource

上下文是有限资源

LLMs have limited "attention budgets." As context length increases, models experience context rot—decreased ability to accurately recall information. The goal is finding the smallest possible set of high-signal tokens that maximize desired outcomes.

Effective Context = Relevant Information / Total Tokens

Key insight: More context isn't better. The right context is better.

LLM的“注意力预算”有限。随着上下文长度增加，模型会出现上下文衰退——准确回忆信息的能力下降。目标是找到尽可能小的高信号令牌集合，以最大化预期结果。

有效上下文 = 相关信息 / 总令牌数

关键见解：更多上下文并不一定更好，合适的上下文才更重要。

The Context Pollution Problem

上下文污染问题

Every token added to context has costs:

Increased latency and compute
Diluted attention to important information
Higher risk of hallucination from conflicting data
Reduced model performance on retrieval tasks

添加到上下文中的每个令牌都有成本：

延迟和计算量增加
对重要信息的注意力被分散
冲突数据导致幻觉的风险升高
模型在检索任务上的性能下降

Context Management Strategies

上下文管理策略

1. Context Trimming

1. 上下文裁剪

Drop older conversation turns, keeping only the last N turns.

Aspect	Details
Mechanism	Sliding window over conversation history
Pros	Deterministic, zero latency, preserves recent context verbatim
Cons	Abrupt loss of long-range context, "amnesia" effect
Best for	Independent tasks, short interactions, predictable workflows

python

def trim_context(messages: list, keep_last_n: int = 10) -> list:
    """Keep system message + last N turns."""
    system_msgs = [m for m in messages if m["role"] == "system"]
    other_msgs = [m for m in messages if m["role"] != "system"]
    return system_msgs + other_msgs[-keep_last_n:]

丢弃较早的对话轮次，仅保留最后N轮。

方面	详情
机制	对对话历史使用滑动窗口
优点	确定性强、无延迟、完整保留近期上下文
缺点	突然丢失长距离上下文，出现“失忆”效应
最佳适用场景	独立任务、短交互、可预测的工作流

python

def trim_context(messages: list, keep_last_n: int = 10) -> list:
    """保留系统消息 + 最后N轮对话。"""
    system_msgs = [m for m in messages if m["role"] == "system"]
    other_msgs = [m for m in messages if m["role"] != "system"]
    return system_msgs + other_msgs[-keep_last_n:]

2. Context Summarization

2. 上下文总结

Compress prior messages into structured summaries.

Aspect	Details
Mechanism	LLM generates summary of older context
Pros	Retains long-range memory, smoother UX, scalable
Cons	Summarization bias risk, added latency, potential compounding errors
Best for	Complex multi-step tasks, long-horizon interactions

python

SUMMARIZATION_PROMPT = """Summarize the conversation so far, preserving:
1. Key decisions made
2. Important context established
3. Current task state and goals
4. Any constraints or preferences expressed

Be concise but complete. Output as structured markdown."""

async def summarize_context(messages: list, model) -> str:
    """Generate a summary of conversation history."""
    conversation_text = format_messages_for_summary(messages)
    response = await model.generate(
        system=SUMMARIZATION_PROMPT,
        user=conversation_text
    )
    return response.content

将之前的消息压缩为结构化摘要。

方面	详情
机制	由LLM生成较早上下文的摘要
优点	保留长距离记忆、用户体验更流畅、可扩展
缺点	存在总结偏差风险、增加延迟、可能出现复合错误
最佳适用场景	复杂多步骤任务、长周期交互

python

SUMMARIZATION_PROMPT = """总结迄今为止的对话，保留以下内容：
1. 已做出的关键决策
2. 已确立的重要上下文
3. 当前任务状态和目标
4. 表达的任何约束或偏好

简洁但完整。以结构化markdown格式输出。"""

async def summarize_context(messages: list, model) -> str:
    """生成对话历史的摘要。"""
    conversation_text = format_messages_for_summary(messages)
    response = await model.generate(
        system=SUMMARIZATION_PROMPT,
        user=conversation_text
    )
    return response.content

3. Hybrid Approach

3. 混合方法

Combine trimming and summarization for optimal balance.

python

class HybridContextManager:
    def __init__(
        self,
        keep_recent: int = 5,      # Recent turns to keep verbatim
        summary_threshold: int = 20, # When to trigger summarization
    ):
        self.keep_recent = keep_recent
        self.summary_threshold = summary_threshold
        self.running_summary = ""

    def process(self, messages: list) -> list:
        if len(messages) < self.summary_threshold:
            return messages

        # Summarize older messages
        old_messages = messages[:-self.keep_recent]
        self.running_summary = summarize(old_messages, self.running_summary)

        # Return summary + recent messages
        return [
            {"role": "system", "content": f"Previous context:\n{self.running_summary}"},
            *messages[-self.keep_recent:]
        ]

结合裁剪和总结以实现最优平衡。

python

class HybridContextManager:
    def __init__(
        self,
        keep_recent: int = 5,      # 要完整保留的近期对话轮次
        summary_threshold: int = 20, # 触发总结的阈值
    ):
        self.keep_recent = keep_recent
        self.summary_threshold = summary_threshold
        self.running_summary = ""

    def process(self, messages: list) -> list:
        if len(messages) < self.summary_threshold:
            return messages

        # 总结较早的消息
        old_messages = messages[:-self.keep_recent]
        self.running_summary = summarize(old_messages, self.running_summary)

        # 返回摘要 + 近期消息
        return [
            {"role": "system", "content": f"之前的上下文:\n{self.running_summary}"},
            *messages[-self.keep_recent:]
        ]

System Prompt Design

系统提示设计

Principles for Context-Efficient Prompts

上下文高效提示的原则

Clear and direct language: Avoid ambiguity that requires clarification turns
Structured sections: Organize by purpose (role, capabilities, constraints)
Minimal yet comprehensive: Include only what affects behavior
Self-contained instructions: Reduce need for context retrieval

清晰直接的语言：避免需要额外澄清的歧义
结构化章节：按用途组织（角色、能力、约束）
简洁但全面：仅包含影响行为的内容
自包含指令：减少上下文检索需求

Example Structure

示例结构

markdown

undefined

markdown

undefined

Role

角色

You are [specific role] that [primary function].

你是[具体角色]，负责[主要功能]。

Capabilities

能力

[Capability 1 with scope]
[Capability 2 with scope]

[带适用范围的能力1]
[带适用范围的能力2]

Constraints

约束

[Hard constraint]
[Preference]

[硬性约束]
[偏好设置]

Output Format

输出格式

[Specific format requirements]

undefined

[具体格式要求]

undefined

Tool Design for Context Efficiency

面向上下文效率的工具设计

Just-in-Time Context Loading

即时上下文加载

Instead of front-loading all possible context, load information dynamically as needed.

python

undefined

不预先加载所有可能的上下文，而是根据需要动态加载信息。

python

undefined

Anti-pattern: Loading everything upfront

反模式：预先加载所有内容

context = load_all_user_data() # Large, mostly unused context += load_all_documents() # Even larger

context = load_all_user_data() # 数据量大，大多未被使用 context += load_all_documents() # 数据量更大

Better: Just-in-time retrieval

更佳方案：即时检索

tools = [ Tool( name="get_user_preference", description="Get specific user preference by key", # Only fetches what's needed when asked ), Tool( name="search_documents", description="Search documents by query", # Returns relevant subset ), ]

undefined

tools = [ Tool( name="get_user_preference", description="通过键获取特定用户偏好", # 仅在需要时获取所需内容 ), Tool( name="search_documents", description="按查询词搜索文档", # 返回相关子集 ), ]

undefined

Tool Design Principles

工具设计原则

Self-contained: Each tool returns complete, usable information
Scoped: Tools do one thing well
Descriptive: Names and descriptions guide LLM toward correct usage
Error-robust: Return informative errors that don't pollute context

python

undefined

自包含：每个工具返回完整、可用的信息
聚焦：工具专注做好一件事
描述性：名称和描述引导LLM正确使用
容错性：返回有用的错误信息，避免污染上下文

python

undefined

Well-designed tool

设计良好的工具

def search_codebase(query: str, max_results: int = 5) -> str: """Search codebase for relevant code snippets.

Args:
    query: Natural language description of what to find
    max_results: Maximum snippets to return (default 5)

Returns:
    Formatted code snippets with file paths and line numbers,
    or 'No results found' if nothing matches.
"""
results = perform_search(query, limit=max_results)
if not results:
    return "No results found for query."
return format_results(results)  # Concise, structured output

undefined

def search_codebase(query: str, max_results: int = 5) -> str: """在代码库中搜索相关代码片段。

参数:
    query: 要查找内容的自然语言描述
    max_results: 返回的最大片段数（默认5）

返回:
    带文件路径和行号的格式化代码片段，
    如果无匹配项则返回'No results found'。
"""
results = perform_search(query, limit=max_results)
if not results:
    return "No results found for query."
return format_results(results)  # 简洁、结构化的输出

undefined

Long-Horizon Task Patterns

长周期任务模式

Pattern 1: Compaction

模式1：压缩

Periodically compress conversation history to reclaim context space.

python

async def compaction_loop(agent, messages, task):
    while not task.complete:
        # Process next step
        response = await agent.run(messages)
        messages.append(response)

        # Compact when approaching limit
        if estimate_tokens(messages) > TOKEN_LIMIT * 0.8:
            summary = await summarize_context(messages[:-3])
            messages = [
                {"role": "system", "content": agent.system_prompt},
                {"role": "assistant", "content": f"Summary of progress:\n{summary}"},
                *messages[-3:]  # Keep recent context
            ]

    return messages

定期压缩对话历史以回收上下文空间。

python

async def compaction_loop(agent, messages, task):
    while not task.complete:
        # 处理下一步
        response = await agent.run(messages)
        messages.append(response)

        # 接近限制时进行压缩
        if estimate_tokens(messages) > TOKEN_LIMIT * 0.8:
            summary = await summarize_context(messages[:-3])
            messages = [
                {"role": "system", "content": agent.system_prompt},
                {"role": "assistant", "content": f"进度摘要:\n{summary}"},
                *messages[-3:]  # 保留近期上下文
            ]

    return messages

Pattern 2: Structured Note-Taking

模式2：结构化笔记

Agent maintains external notes, retrieving as needed.

python

class NoteTakingAgent:
    def __init__(self):
        self.notes = {}  # Key-value store outside context

    async def run(self, messages):
        tools = [
            Tool("save_note", self.save_note, "Save information for later"),
            Tool("get_note", self.get_note, "Retrieve saved information"),
            Tool("list_notes", self.list_notes, "List all saved note keys"),
        ]
        return await self.agent.run(messages, tools=tools)

    def save_note(self, key: str, content: str) -> str:
        self.notes[key] = content
        return f"Saved note: {key}"

    def get_note(self, key: str) -> str:
        return self.notes.get(key, f"No note found for key: {key}")

Agent维护外部笔记，根据需要检索。

python

class NoteTakingAgent:
    def __init__(self):
        self.notes = {}  # 上下文之外的键值存储

    async def run(self, messages):
        tools = [
            Tool("save_note", self.save_note, "保存信息供后续使用"),
            Tool("get_note", self.get_note, "检索已保存的信息"),
            Tool("list_notes", self.list_notes, "列出所有已保存笔记的键"),
        ]
        return await self.agent.run(messages, tools=tools)

    def save_note(self, key: str, content: str) -> str:
        self.notes[key] = content
        return f"Saved note: {key}"

    def get_note(self, key: str) -> str:
        return self.notes.get(key, f"No note found for key: {key}")

Pattern 3: Sub-Agent Architecture

模式3：子Agent架构

Delegate focused tasks to specialized agents with clean context.

python

class OrchestratorAgent:
    def __init__(self):
        self.sub_agents = {
            "researcher": ResearchAgent(),
            "coder": CodingAgent(),
            "reviewer": ReviewAgent(),
        }

    async def delegate(self, task: str, agent_type: str) -> str:
        """Delegate to sub-agent, receive condensed summary."""
        agent = self.sub_agents[agent_type]

        # Sub-agent works with fresh context
        result = await agent.run(task)

        # Return only essential findings to main context
        return result.summary  # Not the full conversation

Benefits:

Each sub-agent has focused, clean context
Main agent receives condensed results
Parallelization opportunities
Failure isolation

将聚焦任务委托给具有干净上下文的专用Agent。

python

class OrchestratorAgent:
    def __init__(self):
        self.sub_agents = {
            "researcher": ResearchAgent(),
            "coder": CodingAgent(),
            "reviewer": ReviewAgent(),
        }

    async def delegate(self, task: str, agent_type: str) -> str:
        """委托给子Agent，接收浓缩摘要。"""
        agent = self.sub_agents[agent_type]

        # 子Agent在全新上下文环境中工作
        result = await agent.run(task)

        # 仅向主上下文返回关键结果
        return result.summary  # 不返回完整对话

优势:

每个子Agent都有聚焦、干净的上下文
主Agent接收浓缩后的结果
具备并行化处理的可能
故障隔离

Implementation Patterns

实现模式

Session Memory Manager

会话内存管理器

python

class SessionMemory:
    def __init__(
        self,
        keep_last_n_turns: int = 5,
        context_limit: int = 100_000,  # tokens
        summarizer = None,
    ):
        self.keep_last_n_turns = keep_last_n_turns
        self.context_limit = context_limit
        self.summarizer = summarizer
        self.messages = []
        self.summary = ""

    async def add_message(self, message: dict):
        self.messages.append(message)
        await self._maybe_compact()

    async def _maybe_compact(self):
        current_tokens = estimate_tokens(self.messages)

        if current_tokens > self.context_limit * 0.8:
            # Summarize all but recent messages
            old_messages = self.messages[:-self.keep_last_n_turns]
            new_summary = await self.summarizer.summarize(
                old_messages,
                previous_summary=self.summary
            )
            self.summary = new_summary
            self.messages = self.messages[-self.keep_last_n_turns:]

    def get_context(self) -> list:
        context = []
        if self.summary:
            context.append({
                "role": "system",
                "content": f"Conversation summary:\n{self.summary}"
            })
        context.extend(self.messages)
        return context

python

class SessionMemory:
    def __init__(
        self,
        keep_last_n_turns: int = 5,
        context_limit: int = 100_000,  # 令牌数
        summarizer = None,
    ):
        self.keep_last_n_turns = keep_last_n_turns
        self.context_limit = context_limit
        self.summarizer = summarizer
        self.messages = []
        self.summary = ""

    async def add_message(self, message: dict):
        self.messages.append(message)
        await self._maybe_compact()

    async def _maybe_compact(self):
        current_tokens = estimate_tokens(self.messages)

        if current_tokens > self.context_limit * 0.8:
            # 总结除近期消息外的所有内容
            old_messages = self.messages[:-self.keep_last_n_turns]
            new_summary = await self.summarizer.summarize(
                old_messages,
                previous_summary=self.summary
            )
            self.summary = new_summary
            self.messages = self.messages[-self.keep_last_n_turns:]

    def get_context(self) -> list:
        context = []
        if self.summary:
            context.append({
                "role": "system",
                "content": f"对话摘要:\n{self.summary}"
            })
        context.extend(self.messages)
        return context

Token Estimation

令牌估算

python

def estimate_tokens(messages: list) -> int:
    """Rough token estimation (4 chars ≈ 1 token for English)."""
    total_chars = sum(
        len(m.get("content", ""))
        for m in messages
    )
    return total_chars // 4

def estimate_tokens_accurate(messages: list, model: str) -> int:
    """Accurate token count using tiktoken."""
    import tiktoken
    encoding = tiktoken.encoding_for_model(model)
    return sum(
        len(encoding.encode(m.get("content", "")))
        for m in messages
    )

python

def estimate_tokens(messages: list) -> int:
    """粗略令牌估算（英文中约4个字符≈1个令牌）。"""
    total_chars = sum(
        len(m.get("content", ""))
        for m in messages
    )
    return total_chars // 4

def estimate_tokens_accurate(messages: list, model: str) -> int:
    """使用tiktoken进行精确令牌计数。"""
    import tiktoken
    encoding = tiktoken.encoding_for_model(model)
    return sum(
        len(encoding.encode(m.get("content", "")))
        for m in messages
    )

Best Practices

最佳实践

Treat context as precious: Every token has a cost. Include only information that improves task performance.
Use progressive disclosure: Start minimal, expand context only when needed via tools.
Design for recoverability: Agents should be able to reconstruct critical context from external sources.
Monitor context health: Track token usage, retrieval accuracy, and task completion rates.
Prefer structured over raw data: JSON, markdown tables, and clear formatting improve information density.
Implement graceful degradation: When context limits approach, prioritize recent and high-signal information.
Test with long conversations: Validate agent behavior after many turns, not just initial interactions.
Separate concerns: Use different context regions for system instructions, user history, and tool outputs.
Version your summaries: When compacting, maintain enough structure to debug summarization issues.
Measure and iterate: Context engineering is empirical—test what information actually improves outcomes.

视上下文为宝贵资源：每个令牌都有成本。仅包含能提升任务性能的信息。
逐步披露信息：从最小上下文开始，仅在需要时通过工具扩展上下文。
设计可恢复性：Agent应能从外部源重建关键上下文。
监控上下文健康度：跟踪令牌使用量、检索准确性和任务完成率。
优先使用结构化数据而非原始数据：JSON、markdown表格和清晰格式能提升信息密度。
实现优雅降级：当接近上下文限制时，优先保留近期和高信号信息。
用长对话测试：验证Agent在多轮对话后的行为，而非仅初始交互。
关注点分离：为系统指令、用户历史和工具输出使用不同的上下文区域。
为摘要版本化：压缩时，保留足够结构以调试总结相关问题。
衡量并迭代：上下文工程是实证性工作——测试哪些信息真正能提升结果。

References

参考资料

reference/evaluation-strategies.md - Testing context management effectiveness
reference/summarization-patterns.md - Detailed summarization implementations

reference/evaluation-strategies.md - 测试上下文管理的有效性
reference/summarization-patterns.md - 详细的总结实现方案

context-engineering

Original

Translation

Context Engineering

上下文工程

Table of Contents

目录

Core Principles

核心原则

Context as a Finite Resource

上下文是有限资源

The Context Pollution Problem

上下文污染问题

Context Management Strategies

上下文管理策略

1. Context Trimming

1. 上下文裁剪

2. Context Summarization

2. 上下文总结

3. Hybrid Approach

3. 混合方法

System Prompt Design

系统提示设计

Principles for Context-Efficient Prompts

上下文高效提示的原则

Example Structure

示例结构

Role

角色

Capabilities

能力

Constraints

约束

Output Format

输出格式

Tool Design for Context Efficiency

面向上下文效率的工具设计

Just-in-Time Context Loading

即时上下文加载

Anti-pattern: Loading everything upfront

反模式：预先加载所有内容

Better: Just-in-time retrieval

更佳方案：即时检索

Tool Design Principles

工具设计原则

Well-designed tool

设计良好的工具

Long-Horizon Task Patterns

长周期任务模式

Pattern 1: Compaction

模式1：压缩

Pattern 2: Structured Note-Taking

模式2：结构化笔记

Pattern 3: Sub-Agent Architecture

模式3：子Agent架构

Implementation Patterns

实现模式

Session Memory Manager

会话内存管理器

Token Estimation

令牌估算

Best Practices

最佳实践

References

参考资料