memory-systems

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Memory System Design

内存系统设计

Memory provides the persistence layer that allows agents to maintain continuity across sessions and reason over accumulated knowledge. Simple agents rely entirely on context for memory, losing all state when sessions end. Sophisticated agents implement layered memory architectures that balance immediate context needs with long-term knowledge retention. The evolution from vector stores to knowledge graphs to temporal knowledge graphs represents increasing investment in structured memory for improved retrieval and reasoning.

内存提供了持久化层，使Agent能够跨会话保持连续性，并基于积累的知识进行推理。简单Agent完全依赖上下文作为内存，会话结束时所有状态都会丢失。复杂Agent则实现分层内存架构，平衡即时上下文需求与长期知识留存。从向量存储到知识图谱，再到时序知识图谱的演进，代表着为提升检索和推理能力而对结构化内存的投入不断增加。

When to Activate

适用场景

Activate this skill when:

Building agents that must persist across sessions
Needing to maintain entity consistency across conversations
Implementing reasoning over accumulated knowledge
Designing systems that learn from past interactions
Creating knowledge bases that grow over time
Building temporal-aware systems that track state changes

在以下场景中启用该技能：

构建必须跨会话持久化的Agent
需要在对话中保持实体一致性
基于积累的知识实现推理功能
设计能从过往交互中学习的系统
创建随时间增长的知识库
构建能跟踪状态变化的时序感知系统

Core Concepts

核心概念

Memory exists on a spectrum from immediate context to permanent storage. At one extreme, working memory in the context window provides zero-latency access but vanishes when sessions end. At the other extreme, permanent storage persists indefinitely but requires retrieval to enter context.

Simple vector stores lack relationship and temporal structure. Knowledge graphs preserve relationships for reasoning. Temporal knowledge graphs add validity periods for time-aware queries. Implementation choices depend on query complexity, infrastructure constraints, and accuracy requirements.

内存存在于从即时上下文到永久存储的连续区间中。一端是上下文窗口中的工作内存，提供零延迟访问，但会话结束后即消失；另一端是永久存储，可无限期留存，但需要检索才能进入上下文。

简单向量存储缺乏关系和时序结构。知识图谱保留关系以支持推理，时序知识图谱则为时间感知查询添加了有效期。实现方案的选择取决于查询复杂度、基础设施限制和准确性要求。

Detailed Topics

详细主题

Memory Architecture Fundamentals

内存架构基础

The Context-Memory Spectrum Memory exists on a spectrum from immediate context to permanent storage. At one extreme, working memory in the context window provides zero-latency access but vanishes when sessions end. At the other extreme, permanent storage persists indefinitely but requires retrieval to enter context. Effective architectures use multiple layers along this spectrum.

The spectrum includes working memory (context window, zero latency, volatile), short-term memory (session-persistent, searchable, volatile), long-term memory (cross-session persistent, structured, semi-permanent), and permanent memory (archival, queryable, permanent). Each layer has different latency, capacity, and persistence characteristics.

Why Simple Vector Stores Fall Short Vector RAG provides semantic retrieval by embedding queries and documents in a shared embedding space. Similarity search retrieves the most semantically similar documents. This works well for document retrieval but lacks structure for agent memory.

Vector stores lose relationship information. If an agent learns that "Customer X purchased Product Y on Date Z," a vector store can retrieve this fact if asked directly. But it cannot answer "What products did customers who purchased Product Y also buy?" because relationship structure is not preserved.

Vector stores also struggle with temporal validity. Facts change over time, but vector stores provide no mechanism to distinguish "current fact" from "outdated fact" except through explicit metadata and filtering.

The Move to Graph-Based Memory Knowledge graphs preserve relationships between entities. Instead of isolated document chunks, graphs encode that Entity A has Relationship R to Entity B. This enables queries that traverse relationships rather than just similarity.

Temporal knowledge graphs add validity periods to facts. Each fact has a "valid from" and optionally "valid until" timestamp. This enables time-travel queries that reconstruct knowledge at specific points in time.

Benchmark Performance Comparison The Deep Memory Retrieval (DMR) benchmark provides concrete performance data across memory architectures:

Memory System	DMR Accuracy	Retrieval Latency	Notes
Zep (Temporal KG)	94.8%	2.58s	Best accuracy, fast retrieval
MemGPT	93.4%	Variable	Good general performance
GraphRAG	~75-85%	Variable	20-35% gains over baseline RAG
Vector RAG	~60-70%	Fast	Loses relationship structure
Recursive Summarization	35.3%	Low	Severe information loss

Zep demonstrated 90% reduction in retrieval latency compared to full-context baselines (2.58s vs 28.9s for GPT-5.2). This efficiency comes from retrieving only relevant subgraphs rather than entire context history.

GraphRAG achieves approximately 20-35% accuracy gains over baseline RAG in complex reasoning tasks and reduces hallucination by up to 30% through community-based summarization.

上下文-内存连续区间 内存存在于从即时上下文到永久存储的连续区间中。一端是上下文窗口中的工作内存，提供零延迟访问，但会话结束后即消失；另一端是永久存储，可无限期留存，但需要检索才能进入上下文。有效的架构会利用该区间中的多个层级。

该区间包括工作内存（上下文窗口、零延迟、易失性）、短期内存（会话持久、可搜索、易失性）、长期内存（跨会话持久、结构化、半永久）和永久内存（归档、可查询、永久）。每个层级都有不同的延迟、容量和持久性特征。

简单向量存储的局限性 Vector RAG通过将查询和文档嵌入共享嵌入空间来提供语义检索，相似性搜索会检索语义最相似的文档。这种方式适用于文档检索，但缺乏Agent内存所需的结构。

向量存储会丢失关系信息。如果Agent得知“客户X在日期Z购买了产品Y”，向量存储在被直接询问时可以检索到该事实，但无法回答“购买了产品Y的客户还购买了哪些产品？”，因为关系结构未被保留。

向量存储还难以处理时序有效性。事实会随时间变化，但向量存储除了通过显式元数据和过滤外，没有机制区分“当前事实”和“过时事实”。

转向基于图的内存 知识图谱保留实体之间的关系。与孤立的文档片段不同，图谱会编码实体A与实体B之间存在关系R，这使得查询可以遍历关系而非仅依赖相似性。

时序知识图谱为事实添加了有效期，每个事实都有“生效时间”和可选的“失效时间”戳，这使得时间回溯查询能够重构特定时间点的知识。

性能基准对比 深度内存检索（DMR）基准提供了不同内存架构的具体性能数据：

内存系统	DMR准确率	检索延迟	说明
Zep (Temporal KG)	94.8%	2.58s	准确率最高，检索速度快
MemGPT	93.4%	可变	综合性能良好
GraphRAG	~75-85%	可变	相比基线RAG提升20-35%
Vector RAG	~60-70%	快	丢失关系结构
递归摘要	35.3%	低	信息丢失严重

与全上下文基线相比，Zep的检索延迟降低了90%（2.58s vs GPT-5.2的28.9s）。这种效率来自仅检索相关子图而非整个上下文历史。

在复杂推理任务中，GraphRAG相比基线RAG实现了约20-35%的准确率提升，并通过基于社区的摘要将幻觉率降低了多达30%。

Memory Layer Architecture

内存层级架构

Layer 1: Working Memory Working memory is the context window itself. It provides immediate access to information currently being processed but has limited capacity and vanishes when sessions end.

Working memory usage patterns include scratchpad calculations where agents track intermediate results, conversation history that preserves dialogue for current task, current task state that tracks progress on active objectives, and active retrieved documents that hold information currently being used.

Optimize working memory by keeping only active information, summarizing completed work before it falls out of attention, and using attention-favored positions for critical information.

Layer 2: Short-Term Memory Short-term memory persists across the current session but not across sessions. It provides search and retrieval capabilities without the latency of permanent storage.

Common implementations include session-scoped databases that persist until session end, file-system storage in designated session directories, and in-memory caches keyed by session ID.

Short-term memory use cases include tracking conversation state across turns without stuffing context, storing intermediate results from tool calls that may be needed later, maintaining task checklists and progress tracking, and caching retrieved information within sessions.

Layer 3: Long-Term Memory Long-term memory persists across sessions indefinitely. It enables agents to learn from past interactions and build knowledge over time.

Long-term memory implementations range from simple key-value stores to sophisticated graph databases. The choice depends on complexity of relationships to model, query patterns required, and acceptable infrastructure complexity.

Long-term memory use cases include learning user preferences across sessions, building domain knowledge bases that grow over time, maintaining entity registries with relationship history, and storing successful patterns that can be reused.

Layer 4: Entity Memory Entity memory specifically tracks information about entities (people, places, concepts, objects) to maintain consistency. This creates a rudimentary knowledge graph where entities are recognized across multiple interactions.

Entity memory maintains entity identity by tracking that "John Doe" mentioned in one conversation is the same person in another. It maintains entity properties by storing facts discovered about entities over time. It maintains entity relationships by tracking relationships between entities as they are discovered.

Layer 5: Temporal Knowledge Graphs Temporal knowledge graphs extend entity memory with explicit validity periods. Facts are not just true or false but true during specific time ranges.

This enables queries like "What was the user's address on Date X?" by retrieving facts valid during that date range. It prevents context clash when outdated information contradicts new data. It enables temporal reasoning about how entities changed over time.

第一层：工作内存 工作内存就是上下文窗口本身，它提供对当前正在处理的信息的即时访问，但容量有限，会话结束后即消失。

工作内存的使用场景包括：Agent跟踪中间结果的草稿计算、保留当前任务对话历史、跟踪活动目标的当前任务状态，以及存储当前正在使用的已检索文档。

优化工作内存的方法包括：仅保留活跃信息、在完成的工作超出注意力范围前进行摘要、将关键信息放在注意力优先的位置。

第二层：短期内存 短期内存在当前会话中持久存在，但不会跨会话留存。它提供搜索和检索能力，且没有永久存储的延迟。

常见实现包括：会话作用域的数据库（会话结束后持久化）、指定会话目录中的文件系统存储，以及按会话ID键控的内存缓存。

短期内存的使用场景包括：跨轮次跟踪对话状态而不填充上下文、存储后续可能需要的工具调用中间结果、维护任务清单和进度跟踪、在会话内缓存已检索信息。

第三层：长期内存 长期内存可跨会话无限期留存，使Agent能够从过往交互中学习并随时间积累知识。

长期内存的实现从简单的键值存储到复杂的图数据库不等，选择取决于要建模的关系复杂度、所需的查询模式以及可接受的基础设施复杂度。

长期内存的使用场景包括：跨会话学习用户偏好、构建随时间增长的领域知识库、维护带有关系历史的实体注册表、存储可复用的成功模式。

第四层：实体内存 实体内存专门跟踪实体（人物、地点、概念、对象）的信息以保持一致性，这会创建一个基础的知识图谱，使实体在多次交互中被识别。

实体内存通过以下方式维护实体一致性：跟踪某次对话中提到的“John Doe”与另一次对话中的是同一人、存储随时间发现的实体属性、跟踪发现的实体间关系。

第五层：时序知识图谱 时序知识图谱为实体内存添加了显式的有效期，事实不再只是真或假，而是在特定时间范围内为真。

这使得诸如“用户在2024年1月15日的地址是什么？”的查询成为可能，通过检索该日期范围内有效的事实实现。它还能防止过时信息与新数据产生上下文冲突，并支持关于实体随时间变化的时序推理。

Memory Implementation Patterns

内存实现模式

Pattern 1: File-System-as-Memory The file system itself can serve as a memory layer. This pattern is simple, requires no additional infrastructure, and enables the same just-in-time loading that makes file-system-based context effective.

Implementation uses the file system hierarchy for organization. Use naming conventions that convey meaning. Store facts in structured formats (JSON, YAML). Use timestamps in filenames or metadata for temporal tracking.

Advantages: Simplicity, transparency, portability. Disadvantages: No semantic search, no relationship tracking, manual organization required.

Pattern 2: Vector RAG with Metadata Vector stores enhanced with rich metadata provide semantic search with filtering capabilities.

Implementation embeds facts or documents and stores with metadata including entity tags, temporal validity, source attribution, and confidence scores. Query includes metadata filters alongside semantic search.

Pattern 3: Knowledge Graph Knowledge graphs explicitly model entities and relationships. Implementation defines entity types and relationship types, uses graph database or property graph storage, and maintains indexes for common query patterns.

Pattern 4: Temporal Knowledge Graph Temporal knowledge graphs add validity periods to facts, enabling time-travel queries and preventing context clash from outdated information.

模式1：文件系统作为内存 文件系统本身可作为内存层。该模式简单，无需额外基础设施，且支持使基于文件系统的上下文有效的即时加载机制。

实现时使用文件系统层级进行组织，采用有意义的命名约定，以结构化格式（JSON、YAML）存储事实，在文件名或元数据中使用时间戳进行时序跟踪。

优势：简单、透明、可移植。劣势：无语义搜索、无关系跟踪、需要手动组织。

模式2：带元数据的Vector RAG 增强了丰富元数据的向量存储提供带过滤能力的语义搜索。

实现时对事实或文档进行嵌入，并与包含实体标签、时序有效性、来源归属和置信度分数的元数据一起存储。查询时将元数据过滤与语义搜索结合使用。

模式3：知识图谱 知识图谱显式建模实体和关系。实现时定义实体类型和关系类型，使用图数据库或属性图存储，并为常见查询模式维护索引。

模式4：时序知识图谱 时序知识图谱为事实添加有效期，支持时间回溯查询并防止过时信息导致的上下文冲突。

Memory Retrieval Patterns

内存检索模式

Semantic Retrieval Retrieve memories semantically similar to current query using embedding similarity search.

Entity-Based Retrieval Retrieve all memories related to specific entities by traversing graph relationships.

Temporal Retrieval Retrieve memories valid at specific time or within time range using validity period filters.

语义检索 使用嵌入相似性搜索检索与当前查询语义相似的内存。

基于实体的检索 通过遍历图关系检索与特定实体相关的所有内存。

时序检索 使用有效期过滤检索特定时间或时间范围内有效的内存。

Memory Consolidation

内存整合

Memories accumulate over time and require consolidation to prevent unbounded growth and remove outdated information.

Consolidation Triggers Trigger consolidation after significant memory accumulation, when retrieval returns too many outdated results, periodically on a schedule, or when explicit consolidation is requested.

Consolidation Process Identify outdated facts, merge related facts, update validity periods, archive or delete obsolete facts, and rebuild indexes.

内存会随时间积累，需要进行整合以防止无限制增长并移除过时信息。

整合触发条件 在以下情况触发整合：内存大量积累后、检索返回过多过时结果时、按定期计划执行，或接收到显式的整合请求时。

整合流程 识别过时事实、合并相关事实、更新有效期、归档或删除废弃事实、重建索引。

Practical Guidance

实践指南

Integration with Context

与上下文系统整合

Memories must integrate with context systems to be useful. Use just-in-time memory loading to retrieve relevant memories when needed. Use strategic injection to place memories in attention-favored positions.

内存必须与上下文系统整合才能发挥作用。使用即时内存加载在需要时检索相关内存，使用策略性注入将内存放在注意力优先的位置。

Memory System Selection

内存系统选择

Choose memory architecture based on requirements:

Simple persistence needs: File-system memory
Semantic search needs: Vector RAG with metadata
Relationship reasoning needs: Knowledge graph
Temporal validity needs: Temporal knowledge graph

根据需求选择内存架构：

简单持久化需求：文件系统内存
语义搜索需求：带元数据的Vector RAG
关系推理需求：知识图谱
时序有效性需求：时序知识图谱

Examples

示例

Example 1: Entity Tracking

python

undefined

示例1：实体跟踪

python

undefined

Track entity across conversations

def remember_entity(entity_id, properties): memory.store({ "type": "entity", "id": entity_id, "properties": properties, "last_updated": now() })

def get_entity(entity_id): return memory.retrieve_entity(entity_id)


**Example 2: Temporal Query**

```python

def remember_entity(entity_id, properties): memory.store({ "type": "entity", "id": entity_id, "properties": properties, "last_updated": now() })

def get_entity(entity_id): return memory.retrieve_entity(entity_id)


**示例2：时序查询**

```python

What was the user's address on January 15, 2024?

def query_address_at_time(user_id, query_time): return temporal_graph.query(""" MATCH (user)-[r:LIVES_AT]->(address) WHERE user.id = $user_id AND r.valid_from <= $query_time AND (r.valid_until IS NULL OR r.valid_until > $query_time) RETURN address """, {"user_id": user_id, "query_time": query_time})

undefined

undefined

Guidelines

指导原则

Match memory architecture to query requirements
Implement progressive disclosure for memory access
Use temporal validity to prevent outdated information conflicts
Consolidate memories periodically to prevent unbounded growth
Design for memory retrieval failures gracefully
Consider privacy implications of persistent memory
Implement backup and recovery for critical memories
Monitor memory growth and performance over time

使内存架构与查询需求匹配
为内存访问实现渐进式披露
使用时序有效性防止过时信息冲突
定期整合内存以防止无限制增长
优雅设计内存检索失败的处理逻辑
考虑持久化内存的隐私影响
为关键内存实现备份与恢复机制
随时间监控内存增长和性能

Integration

整合关系

This skill builds on context-fundamentals. It connects to:

multi-agent-patterns - Shared memory across agents
context-optimization - Memory-based context loading
evaluation - Evaluating memory quality

该技能基于上下文基础技能构建，与以下技能相关：

multi-agent-patterns - Agent间共享内存
context-optimization - 基于内存的上下文加载
evaluation - 内存质量评估

References

参考资料

Internal reference:

Implementation Reference - Detailed implementation patterns

Related skills in this collection:

context-fundamentals - Context basics
multi-agent-patterns - Cross-agent memory

External resources:

Graph database documentation (Neo4j, etc.)
Vector store documentation (Pinecone, Weaviate, etc.)
Research on knowledge graphs and reasoning

内部参考：

实现参考 - 详细实现模式

本集合中的相关技能：

context-fundamentals - 上下文基础
multi-agent-patterns - 跨Agent内存

外部资源：

图数据库文档（Neo4j等）
向量存储文档（Pinecone、Weaviate等）
知识图谱与推理相关研究

Skill Metadata

技能元数据

Created: 2025-12-20 Last Updated: 2025-12-20 Author: Agent Skills for Context Engineering Contributors Version: 1.0.0

创建时间: 2025-12-20 最后更新时间: 2025-12-20 作者: Agent Skills for Context Engineering Contributors 版本: 1.0.0