state-management-patterns

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

State Management Patterns Skill

状态管理模式Skill

Standardized state management and persistence patterns for the autonomous-dev plugin ecosystem. Ensures reliable, crash-resistant state persistence across Claude restarts and system failures.

为autonomous-dev插件生态系统提供标准化的状态管理与持久化模式，确保在Claude重启和系统故障时实现可靠、抗崩溃的状态持久化。

When This Skill Activates

本Skill的触发场景

Implementing state persistence
Managing crash recovery
Handling concurrent state access
Versioning state schemas
Tracking batch operations
Managing user preferences
Keywords: "state", "persistence", "JSON", "atomic", "crash recovery", "checkpoint"

实现状态持久化
管理崩溃恢复
处理并发状态访问
版本化状态schema
跟踪批量操作
管理用户偏好
关键词："state"、"persistence"、"JSON"、"atomic"、"crash recovery"、"checkpoint"

Core Patterns

核心模式

1. JSON Persistence with Atomic Writes

1. 基于原子写入的JSON持久化

Definition: Store state in JSON files with atomic writes to prevent corruption on crash.

Pattern:

python

import json
from pathlib import Path
from typing import Dict, Any
import tempfile
import os

def save_state_atomic(state: Dict[str, Any], state_file: Path) -> None:
    """Save state with atomic write to prevent corruption.

    Args:
        state: State dictionary to persist
        state_file: Target state file path

    Security:
        - Atomic Write: Prevents partial writes on crash
        - Temp File: Write to temp, then rename (atomic operation)
        - Permissions: Preserves file permissions
    """
    # Write to temporary file first
    temp_fd, temp_path = tempfile.mkstemp(
        dir=state_file.parent,
        prefix=f".{state_file.name}.",
        suffix=".tmp"
    )

    try:
        # Write JSON to temp file
        with os.fdopen(temp_fd, 'w') as f:
            json.dump(state, f, indent=2)

        # Atomic rename (overwrites target)
        os.replace(temp_path, state_file)

    except Exception:
        # Clean up temp file on failure
        if Path(temp_path).exists():
            Path(temp_path).unlink()
        raise

See:

docs/json-persistence.md

examples/batch-state-example.py

定义：将状态存储在JSON文件中，通过原子写入防止崩溃时的数据损坏。

模式:

python

import json
from pathlib import Path
from typing import Dict, Any
import tempfile
import os

def save_state_atomic(state: Dict[str, Any], state_file: Path) -> None:
    """Save state with atomic write to prevent corruption.

    Args:
        state: State dictionary to persist
        state_file: Target state file path

    Security:
        - Atomic Write: Prevents partial writes on crash
        - Temp File: Write to temp, then rename (atomic operation)
        - Permissions: Preserves file permissions
    """
    # Write to temporary file first
    temp_fd, temp_path = tempfile.mkstemp(
        dir=state_file.parent,
        prefix=f".{state_file.name}.",
        suffix=".tmp"
    )

    try:
        # Write JSON to temp file
        with os.fdopen(temp_fd, 'w') as f:
            json.dump(state, f, indent=2)

        # Atomic rename (overwrites target)
        os.replace(temp_path, state_file)

    except Exception:
        # Clean up temp file on failure
        if Path(temp_path).exists():
            Path(temp_path).unlink()
        raise

参考：

docs/json-persistence.md

examples/batch-state-example.py

2. File Locking for Concurrent Access

2. 用于并发访问的文件锁定

Definition: Use file locks to prevent concurrent modification of state files.

Pattern:

python

import fcntl
import json
from pathlib import Path
from contextlib import contextmanager

@contextmanager
def file_lock(filepath: Path):
    """Acquire exclusive file lock for state file.

    Args:
        filepath: Path to file to lock

    Yields:
        Open file handle with exclusive lock

    Example:
        >>> with file_lock(state_file) as f:
        ...     state = json.load(f)
        ...     state['count'] += 1
        ...     f.seek(0)
        ...     f.truncate()
        ...     json.dump(state, f)
    """
    with filepath.open('r+') as f:
        fcntl.flock(f.fileno(), fcntl.LOCK_EX)
        try:
            yield f
        finally:
            fcntl.flock(f.fileno(), fcntl.LOCK_UN)

See:

docs/file-locking.md

templates/file-lock-template.py

定义：使用文件锁定防止状态文件被并发修改。

模式:

python

import fcntl
import json
from pathlib import Path
from contextlib import contextmanager

@contextmanager
def file_lock(filepath: Path):
    """Acquire exclusive file lock for state file.

    Args:
        filepath: Path to file to lock

    Yields:
        Open file handle with exclusive lock

    Example:
        >>> with file_lock(state_file) as f:
        ...     state = json.load(f)
        ...     state['count'] += 1
        ...     f.seek(0)
        ...     f.truncate()
        ...     json.dump(state, f)
    """
    with filepath.open('r+') as f:
        fcntl.flock(f.fileno(), fcntl.LOCK_EX)
        try:
            yield f
        finally:
            fcntl.flock(f.fileno(), fcntl.LOCK_UN)

参考：

docs/file-locking.md

templates/file-lock-template.py

3. Crash Recovery Pattern

3. 崩溃恢复模式

Definition: Design state to enable recovery after crashes or interruptions.

Principles:

State includes enough context to resume operations
Progress tracking enables "resume from last checkpoint"
State validation detects corruption
Migration paths handle schema changes

Example:

python

@dataclass
class BatchState:
    """Batch processing state with crash recovery support.

    Attributes:
        batch_id: Unique batch identifier
        features: List of all features to process
        current_index: Index of current feature
        completed: List of completed feature names
        failed: List of failed feature names
        created_at: State creation timestamp
        last_updated: Last update timestamp
    """
    batch_id: str
    features: List[str]
    current_index: int = 0
    completed: List[str] = None
    failed: List[str] = None
    created_at: str = None
    last_updated: str = None

    def __post_init__(self):
        if self.completed is None:
            self.completed = []
        if self.failed is None:
            self.failed = []
        if self.created_at is None:
            self.created_at = datetime.now().isoformat()
        self.last_updated = datetime.now().isoformat()

See:

docs/crash-recovery.md

examples/crash-recovery-example.py

定义：设计可在崩溃或中断后恢复的状态机制。

原则:

状态包含足够上下文以恢复操作
进度跟踪支持“从最后检查点恢复”
状态验证可检测损坏
迁移路径处理schema变更

示例:

python

@dataclass
class BatchState:
    """Batch processing state with crash recovery support.

    Attributes:
        batch_id: Unique batch identifier
        features: List of all features to process
        current_index: Index of current feature
        completed: List of completed feature names
        failed: List of failed feature names
        created_at: State creation timestamp
        last_updated: Last update timestamp
    """
    batch_id: str
    features: List[str]
    current_index: int = 0
    completed: List[str] = None
    failed: List[str] = None
    created_at: str = None
    last_updated: str = None

    def __post_init__(self):
        if self.completed is None:
            self.completed = []
        if self.failed is None:
            self.failed = []
        if self.created_at is None:
            self.created_at = datetime.now().isoformat()
        self.last_updated = datetime.now().isoformat()

参考：

docs/crash-recovery.md

examples/crash-recovery-example.py

4. State Versioning and Migration

4. 状态版本化与迁移

Definition: Version state schemas to enable graceful upgrades.

Pattern:

python

STATE_VERSION = "2.0.0"

def migrate_state(state: Dict[str, Any]) -> Dict[str, Any]:
    """Migrate state from old version to current.

    Args:
        state: State dictionary (any version)

    Returns:
        Migrated state (current version)
    """
    version = state.get("version", "1.0.0")

    if version == "1.0.0":
        # Migrate 1.0.0 → 1.1.0
        state = _migrate_1_0_to_1_1(state)
        version = "1.1.0"

    if version == "1.1.0":
        # Migrate 1.1.0 → 2.0.0
        state = _migrate_1_1_to_2_0(state)
        version = "2.0.0"

    state["version"] = STATE_VERSION
    return state

See:

docs/state-versioning.md

templates/state-manager-template.py

定义：为状态schema添加版本以实现平滑升级。

模式:

python

STATE_VERSION = "2.0.0"

def migrate_state(state: Dict[str, Any]) -> Dict[str, Any]:
    """Migrate state from old version to current.

    Args:
        state: State dictionary (any version)

    Returns:
        Migrated state (current version)
    """
    version = state.get("version", "1.0.0")

    if version == "1.0.0":
        # Migrate 1.0.0 → 1.1.0
        state = _migrate_1_0_to_1_1(state)
        version = "1.1.0"

    if version == "1.1.0":
        # Migrate 1.1.0 → 2.0.0
        state = _migrate_1_1_to_2_0(state)
        version = "2.0.0"

    state["version"] = STATE_VERSION
    return state

参考：

docs/state-versioning.md

templates/state-manager-template.py

Real-World Examples

实际应用示例

BatchStateManager Pattern

BatchStateManager模式

From

plugins/autonomous-dev/lib/batch_state_manager.py

Features:

JSON persistence with atomic writes
Crash recovery via --resume flag
Progress tracking (completed/failed features)
Automatic context management via Claude Code (200K token budget)
State versioning for schema upgrades

Note (Issue #218): Deprecated context clearing functions (

should_clear_context()

pause_batch_for_clear()

get_clear_notification_message()

) have been removed as Claude Code v2.0+ handles context automatically with 200K token budget.

Usage:

python

undefined

来自

plugins/autonomous-dev/lib/batch_state_manager.py

特性:

基于原子写入的JSON持久化
通过--resume标志实现崩溃恢复
进度跟踪（已完成/失败的功能）
通过Claude Code自动进行上下文管理（20万token额度）
用于schema升级的状态版本化

注意（Issue #218）：已弃用的上下文清理函数（

should_clear_context()

、

pause_batch_for_clear()

、

get_clear_notification_message()

）已被移除，因为Claude Code v2.0+可通过20万token额度自动管理上下文。

用法:

python

undefined

Create batch state

state = create_batch_state(features=["feat1", "feat2", "feat3"]) state.batch_id # "batch-20251116-123456"

Process features

for feature in state.features: try: # Process feature result = process_feature(feature) # Feature implementation updates context automatically

except Exception as e:
    # Track failures for audit trail
    mark_failed(state, feature, str(e))

save_batch_state(state_file, state)  # Atomic write

for feature in state.features: try: # Process feature result = process_feature(feature) # Feature implementation updates context automatically

except Exception as e:
    # Track failures for audit trail
    mark_failed(state, feature, str(e))

save_batch_state(state_file, state)  # Atomic write

Resume after crash

state = load_batch_state(state_file) next_feature = get_next_pending_feature(state) # Skips completed


**Context Management**: Claude Code automatically manages the 200K token budget. No manual context clearing required.

state = load_batch_state(state_file) next_feature = get_next_pending_feature(state) # Skips completed


**上下文管理**：Claude Code自动管理20万token额度，无需手动清理上下文。

Checkpoint Integration (Issue #79)

检查点集成（Issue #79）

Agents save checkpoints using the portable pattern:

Agents使用可移植模式保存检查点：

Portable Pattern (Works Anywhere)

可移植模式（随处可用）

python

from pathlib import Path
import sys

python

from pathlib import Path
import sys

Portable path detection

current = Path.cwd() while current != current.parent: if (current / ".git").exists(): project_root = current break current = current.parent

Add lib to path

lib_path = project_root / "plugins/autonomous-dev/lib" if lib_path.exists(): sys.path.insert(0, str(lib_path))

try:
    from agent_tracker import AgentTracker
    success = AgentTracker.save_agent_checkpoint(
        agent_name='my-agent',
        message='Task completed - found 5 patterns',
        tools_used=['Read', 'Grep', 'WebSearch']
    )
    print(f"Checkpoint: {'saved' if success else 'skipped'}")
except ImportError:
    print("ℹ️ Checkpoint skipped (user project)")

undefined

lib_path = project_root / "plugins/autonomous-dev/lib" if lib_path.exists(): sys.path.insert(0, str(lib_path))

try:
    from agent_tracker import AgentTracker
    success = AgentTracker.save_agent_checkpoint(
        agent_name='my-agent',
        message='Task completed - found 5 patterns',
        tools_used=['Read', 'Grep', 'WebSearch']
    )
    print(f"Checkpoint: {'saved' if success else 'skipped'}")
except ImportError:
    print("ℹ️ Checkpoint skipped (user project)")

undefined

Features

特性

Portable: Works from any directory (user projects, subdirectories, fresh installs)
No hardcoded paths: Uses dynamic project root detection
Graceful degradation: Returns False, doesn't block workflow
Security validated: Path validation (CWE-22), no subprocess (CWE-78)

可移植性：可在任意目录（用户项目、子目录、全新安装环境）运行
无硬编码路径：使用动态项目根目录检测
优雅降级：返回False，不会阻塞工作流
安全验证：路径验证（CWE-22），无子进程调用（CWE-78）

Design Patterns

设计模式

Progressive Enhancement: Works with or without tracking infrastructure
Non-blocking: Never raises exceptions
Two-tier: Library imports instead of subprocess calls

See: LIBRARIES.md Section 24 (agent_tracker.py), DEVELOPMENT.md Scenario 2.5, docs/LIBRARIES.md for API

渐进增强：无论是否有跟踪基础设施均可运行
非阻塞：从不抛出异常
两层架构：通过库导入而非子进程调用实现

参考：LIBRARIES.md第24节（agent_tracker.py）、DEVELOPMENT.md场景2.5、docs/LIBRARIES.md查看API

Usage Guidelines

使用指南

For Library Authors

针对库开发者

When implementing stateful features:

Use JSON persistence with atomic writes
Add file locking for concurrent access protection
Design for crash recovery with resumable state
Version your state for schema evolution
Validate on load to detect corruption

实现有状态功能时：

使用JSON持久化并结合原子写入
添加文件锁定以保护并发访问
为崩溃恢复设计状态，支持可恢复的状态机制
为状态添加版本以适配schema演进
在加载时验证状态以检测损坏

For Claude

针对Claude

When creating or analyzing stateful libraries:

Load this skill when keywords match ("state", "persistence", etc.)
Follow persistence patterns for reliability
Implement crash recovery for long-running operations
Use atomic operations to prevent corruption
Reference templates in
```
templates/
```
directory

创建或分析有状态库时：

当关键词匹配（"state"、"persistence"等）时加载本Skill
遵循持久化模式以确保可靠性
实现崩溃恢复以支持长时间运行的操作
使用原子操作防止数据损坏
参考
```
templates/
```
目录中的模板

Token Savings

Token节省

By centralizing state management patterns in this skill:

Before: ~50 tokens per library for inline state management docs
After: ~10 tokens for skill reference comment
Savings: ~40 tokens per library
Total: ~400 tokens across 10 libraries (4-5% reduction)

通过将状态管理模式集中在本Skill中：

之前：每个库的内联状态管理文档约占50个token
之后：仅需约10个token的Skill引用注释
节省额度：每个库约节省40个token
总计：10个库共节省约400个token（减少4-5%）

Progressive Disclosure

渐进式披露

This skill uses Claude Code 2.0+ progressive disclosure architecture:

Metadata (frontmatter): Always loaded (~180 tokens)
Full content: Loaded only when keywords match
Result: Efficient context usage, scales to 100+ skills

When you use terms like "state management", "persistence", "crash recovery", or "atomic writes", Claude Code automatically loads the full skill content.

本Skill采用Claude Code 2.0+的渐进式披露架构：

元数据（前置内容）：始终加载（约180个token）
完整内容：仅在关键词匹配时加载
效果：高效的上下文使用，可扩展至100+个Skill

当你使用“state management”、“persistence”、“crash recovery”或“atomic writes”等术语时，Claude Code会自动加载本Skill的完整内容。

Templates and Examples

模板与示例

Templates (reusable code structures)

模板（可复用的代码结构）

```
templates/state-manager-template.py
```
: Complete state manager class
```
templates/atomic-write-template.py
```
: Atomic write implementation
```
templates/file-lock-template.py
```
: File locking utilities

```
templates/state-manager-template.py
```
：完整的状态管理器类
```
templates/atomic-write-template.py
```
：原子写入实现
```
templates/file-lock-template.py
```
：文件锁定工具

Examples (real implementations)

示例（真实实现）

```
examples/batch-state-example.py
```
: BatchStateManager pattern
```
examples/user-state-example.py
```
: UserStateManager pattern
```
examples/crash-recovery-example.py
```
: Crash recovery demonstration

```
examples/batch-state-example.py
```
：BatchStateManager模式
```
examples/user-state-example.py
```
：UserStateManager模式
```
examples/crash-recovery-example.py
```
：崩溃恢复演示

Documentation (detailed guides)

文档（详细指南）

```
docs/json-persistence.md
```
: JSON storage patterns
```
docs/atomic-writes.md
```
: Atomic write implementation
```
docs/file-locking.md
```
: Concurrent access protection
```
docs/crash-recovery.md
```
: Recovery strategies

```
docs/json-persistence.md
```
：JSON存储模式
```
docs/atomic-writes.md
```
：原子写入实现
```
docs/file-locking.md
```
：并发访问保护
```
docs/crash-recovery.md
```
：恢复策略

Cross-References

交叉引用

This skill integrates with other autonomous-dev skills:

library-design-patterns: Two-tier design, progressive enhancement
error-handling-patterns: Exception handling and recovery
security-patterns: File permissions and path validation

See:

skills/library-design-patterns/

skills/error-handling-patterns/

本Skill与其他autonomous-dev技能集成：

library-design-patterns：两层架构、渐进增强
error-handling-patterns：异常处理与恢复
security-patterns：文件权限与路径验证

参考：

skills/library-design-patterns/

skills/error-handling-patterns/

Maintenance

维护

This skill should be updated when:

New state management patterns emerge
State schema versioning needs change
Concurrency patterns evolve
Performance optimizations discovered

Last Updated: 2025-11-16 (Phase 8.8 - Initial creation) Version: 1.0.0

在以下场景下应更新本Skill：

出现新的状态管理模式
状态schema版本化需求变更
并发模式演进
发现性能优化方案

最后更新：2025-11-16（Phase 8.8 - 初始创建）版本：1.0.0

Hard Rules

硬性规则

FORBIDDEN:

Storing state without a defined schema or version field
Direct file writes without atomic operations (write-then-rename pattern)
State files without backup/recovery mechanism
Unbounded state growth (MUST have cleanup/rotation strategy)

REQUIRED:

All state files MUST include a schema version for migration support
State mutations MUST be atomic (no partial writes on failure)
State MUST be recoverable from corruption (fallback to defaults)
All state access MUST go through a single module (no scattered file reads)

禁止操作:

存储状态时未定义schema或版本字段
不使用原子操作直接写入文件（需采用先写入再重命名的模式）
状态文件无备份/恢复机制
状态无限制增长（必须具备清理/轮换策略）

强制要求:

所有状态文件必须包含schema版本以支持迁移
状态变更必须是原子操作（失败时不会产生部分写入）
状态必须可从损坏中恢复（回退至默认值）
所有状态访问必须通过单一模块进行（禁止分散的文件读取）