moai-formats-data

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Data Format Specialist

数据格式专家

Quick Reference

快速参考

Advanced Data Format Management - Comprehensive data handling covering TOON encoding, JSON/YAML optimization, serialization patterns, and data validation for performance-critical applications.
Core Capabilities:
  • TOON Encoding: 40-60% token reduction vs JSON for LLM communication
  • JSON/YAML Optimization: Efficient serialization and parsing patterns
  • Data Validation: Schema validation, type checking, error handling
  • Format Conversion: Seamless transformation between data formats
  • Performance: Optimized data structures and caching strategies
  • Schema Management: Dynamic schema generation and evolution
When to Use:
  • Optimizing data transmission to LLMs within token budgets
  • High-performance serialization/deserialization
  • Schema validation and data integrity
  • Format conversion and data transformation
  • Large dataset processing and optimization
Quick Start:
Create a TOONEncoder instance and call encode with a dictionary containing user and age fields to compress the data. The encoded result achieves 40-60% token reduction. Call decode to restore the original data structure.
Create a JSONOptimizer instance and call serialize_fast with a large dataset to achieve ultra-fast JSON processing.
Create a DataValidator instance and call create_schema with a dictionary defining name as a required string type. Call validate with the data and schema to check validity.

高级数据格式管理 - 涵盖TOON编码、JSON/YAML优化、序列化模式及性能关键型应用数据验证的全面数据处理方案。
核心能力:
  • TOON编码:与JSON相比,在LLM通信中可减少40-60%的token数量
  • JSON/YAML优化:高效的序列化与解析模式
  • 数据验证:架构验证、类型检查、错误处理
  • 格式转换:数据格式间的无缝转换
  • 性能优化:优化的数据结构与缓存策略
  • 架构管理:动态架构生成与演进
适用场景:
  • 在token预算内优化向LLM传输的数据
  • 高性能序列化/反序列化
  • 架构验证与数据完整性保障
  • 格式转换与数据变换
  • 大型数据集处理与优化
快速开始:
创建TOONEncoder实例,调用encode方法对包含user和age字段的字典进行编码以压缩数据。编码后的结果可减少40-60%的token数量。调用decode方法可恢复原始数据结构。
创建JSONOptimizer实例,调用serialize_fast方法处理大型数据集以实现超快速JSON处理。
创建DataValidator实例,调用create_schema方法定义一个要求name为必填字符串类型的字典。调用validate方法传入数据和架构以检查有效性。

Implementation Guide

实现指南

Core Concepts

核心概念

TOON (Token-Optimized Object Notation):
  • Custom binary-compatible format optimized for LLM token usage
  • Type markers: # for numbers, ! for booleans, @ for timestamps, ~ for null
  • 40-60% size reduction vs JSON for typical data structures
  • Lossless round-trip encoding/decoding
Performance Optimization:
  • Ultra-fast JSON processing with orjson achieving 2-5x faster than standard json
  • Streaming processing for large datasets using ijson
  • Intelligent caching with LRU eviction and memory management
  • Schema compression and validation optimization
Data Validation:
  • Type-safe validation with custom rules and patterns
  • Schema evolution and migration support
  • Cross-field validation and dependency checking
  • Performance-optimized batch validation
TOON(Token-Optimized Object Notation,token优化对象表示法):
  • 为LLM token使用场景优化的自定义二进制兼容格式
  • 类型标记:# 表示数字,! 表示布尔值,@ 表示时间戳,~ 表示空值
  • 对于典型数据结构,与JSON相比可减少40-60%的体积
  • 无损往返编码/解码
性能优化:
  • 使用orjson实现超快速JSON处理,速度比标准json快2-5倍
  • 使用ijson对大型数据集进行流式处理
  • 带LRU淘汰机制的智能缓存与内存管理
  • 架构压缩与验证优化
数据验证:
  • 带自定义规则与模式的类型安全验证
  • 支持架构演进与迁移
  • 跨字段验证与依赖检查
  • 性能优化的批量验证

Basic Implementation

基础实现

TOON Encoding for LLM Optimization:
Create a TOONEncoder instance. Define data with user object containing id, name, active boolean, and created datetime, plus permissions array. Call encode to compress and decode to restore. Compare sizes to verify reduction.
Fast JSON Processing:
Create a JSONOptimizer instance. Call serialize_fast to get bytes and deserialize_fast to parse. Use compress_schema with a type object and properties definition to optimize repeated validation.
Data Validation:
Create a DataValidator instance. Define user_schema with username requiring string type, minimum length 3, email requiring email type, and age as optional integer with minimum value 13. Call validate with user_data and schema, then check result for valid status, sanitized_data, or errors list.
面向LLM优化的TOON编码:
创建TOONEncoder实例。定义包含user对象(含id、name、active布尔值、created时间戳)及permissions数组的数据。调用encode方法进行压缩,调用decode方法恢复。对比体积以验证压缩效果。
快速JSON处理:
创建JSONOptimizer实例。调用serialize_fast方法获取字节数据,调用deserialize_fast方法解析。使用compress_schema方法传入类型对象与属性定义以优化重复验证。
数据验证:
创建DataValidator实例。定义user_schema,要求username为必填字符串类型(最小长度3)、email为邮箱类型、age为可选整数(最小值13)。调用validate方法传入user_data与架构,然后检查结果的valid状态、sanitized_data或errors列表。

Common Use Cases

常见用例

API Response Optimization:
Create a function to optimize API responses for LLM consumption by encoding data with TOONEncoder. Create a corresponding function to parse optimized responses by decoding TOON data back to dictionary.
Configuration Management:
Create a YAMLOptimizer instance and call load_fast with a config file path. Call merge_configs with base_config, env_config, and user_config for multi-file merging.
Large Dataset Processing:
Create a StreamProcessor with chunk_size of 8192. Define a process_item function that handles each item. Call process_json_stream with the file path and callback to process large JSON files without loading into memory.

API响应优化:
创建一个函数,通过TOONEncoder编码数据来优化供LLM使用的API响应。创建对应的函数,将TOON数据解码回字典以解析优化后的响应。
配置管理:
创建YAMLOptimizer实例,调用load_fast方法传入配置文件路径。调用merge_configs方法传入base_config、env_config和user_config以实现多文件合并。
大型数据集处理:
创建chunk_size为8192的StreamProcessor。定义处理每个项目的process_item函数。调用process_json_stream方法传入文件路径与回调函数,无需将整个大型JSON文件加载到内存即可进行处理。

Advanced Features Overview

高级功能概述

Advanced TOON Features

高级TOON功能

See modules/toon-encoding.md for custom type handlers (UUID, Decimal), streaming TOON processing, batch TOON encoding, and performance characteristics with benchmarks.
详见modules/toon-encoding.md,内容包括自定义类型处理器(UUID、Decimal)、流式TOON处理、批量TOON编码及带基准测试的性能特性。

Advanced Validation Patterns

高级验证模式

See modules/data-validation.md for cross-field validation, schema evolution and migration, custom validation rules, and batch validation optimization.
详见modules/data-validation.md,内容包括跨字段验证、架构演进与迁移、自定义验证规则及批量验证优化。

Performance Optimization

性能优化

See modules/caching-performance.md for intelligent caching strategies, cache warming and invalidation, memory management, and performance monitoring.
详见modules/caching-performance.md,内容包括智能缓存策略、缓存预热与失效、内存管理及性能监控。

JSON/YAML Advanced Features

JSON/YAML高级功能

See modules/json-optimization.md for streaming JSON processing, memory-efficient parsing, schema compression, and format conversion utilities.

详见modules/json-optimization.md,内容包括流式JSON处理、内存高效解析、架构压缩及格式转换工具。

Works Well With

适配组件

  • moai-domain-backend - Backend data serialization and API responses
  • moai-domain-database - Database data format optimization
  • moai-foundation-core - MCP data serialization and transmission patterns
  • moai-workflow-docs - Documentation data formatting
  • moai-foundation-context - Context optimization for token budgets

  • moai-domain-backend - 后端数据序列化与API响应
  • moai-domain-database - 数据库数据格式优化
  • moai-foundation-core - MCP数据序列化与传输模式
  • moai-workflow-docs - 文档数据格式化
  • moai-foundation-context - 面向token预算的上下文优化

Module References

模块参考

Core Implementation Modules:
  • modules/toon-encoding.md - TOON encoding implementation
  • modules/json-optimization.md - High-performance JSON/YAML
  • modules/data-validation.md - Advanced validation and schemas
  • modules/caching-performance.md - Caching strategies
Supporting Files:
  • modules/INDEX.md - Module overview and integration patterns
  • reference.md - Extended reference documentation
  • examples.md - Complete working examples

核心实现模块:
  • modules/toon-encoding.md - TOON编码实现
  • modules/json-optimization.md - 高性能JSON/YAML处理
  • modules/data-validation.md - 高级验证与架构
  • modules/caching-performance.md - 缓存策略
支持文件:
  • modules/INDEX.md - 模块概述与集成模式
  • reference.md - 扩展参考文档
  • examples.md - 完整可用示例

Technology Stack

技术栈

Core Libraries:
  • orjson: Ultra-fast JSON parsing and serialization
  • PyYAML: YAML processing with C-based loaders
  • ijson: Streaming JSON parser for large files
  • python-dateutil: Advanced datetime parsing
  • regex: Advanced regular expression support
Performance Tools:
  • lru_cache: Built-in memoization
  • pickle: Object serialization
  • hashlib: Hash generation for caching
  • functools: Function decorators and utilities
Validation Libraries:
  • jsonschema: JSON Schema validation
  • cerberus: Lightweight data validation
  • marshmallow: Object serialization/deserialization
  • pydantic: Data validation using Python type hints

核心库:
  • orjson:超快速JSON解析与序列化
  • PyYAML:基于C加载器的YAML处理
  • ijson:大型文件流式JSON解析器
  • python-dateutil:高级时间戳解析
  • regex:高级正则表达式支持
性能工具:
  • lru_cache:内置记忆化工具
  • pickle:对象序列化
  • hashlib:缓存用哈希生成
  • functools:函数装饰器与工具
验证库:
  • jsonschema:JSON架构验证
  • cerberus:轻量级数据验证
  • marshmallow:对象序列化/反序列化
  • pydantic:基于Python类型提示的数据验证

Resources

资源

For working code examples, see examples.md.
Status: Production Ready Last Updated: 2026-01-11 Maintained by: MoAI-ADK Data Team
如需可用代码示例,请查看examples.md
状态:已就绪可用于生产环境 最后更新:2026-01-11 维护团队:MoAI-ADK数据团队