data-schema-knowledge-modeling

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Data Schema & Knowledge Modeling

数据Schema与知识建模

Table of Contents

目录

Purpose

目标

Create rigorous, validated models of entities, relationships, and constraints that enable correct system implementation, knowledge representation, and semantic reasoning.
创建严谨、经过验证的实体、关系和约束模型,以支持正确的系统实现、知识表示和语义推理。

When to Use

适用场景

Invoke this skill when you need to:
  • Design database schema (SQL, NoSQL, graph) for new application
  • Model complex domain with many entities and relationships
  • Build knowledge graph or ontology for semantic search/reasoning
  • Define API data models and contracts
  • Create taxonomies or classification hierarchies
  • Establish data governance and canonical models
  • Migrate legacy schemas to modern architectures
  • Resolve ambiguity in domain concepts and relationships
  • Enable data integration across systems
  • Document system invariants and business rules
Common trigger phrases:
  • "Design a schema for..."
  • "Model the entities and relationships"
  • "Create a knowledge graph"
  • "What's the data model?"
  • "Define the ontology"
  • "How should we structure this data?"
  • "Map relationships between..."
  • "Design the API data model"
在以下场景中调用此技能:
  • 为新应用设计数据库schema(SQL、NoSQL、图数据库)
  • 建模包含大量实体与关系的复杂领域
  • 构建用于语义搜索/推理的知识图谱或本体
  • 定义API数据模型与契约
  • 创建分类法或分类层级
  • 建立数据治理与标准模型
  • 将遗留schema迁移至现代架构
  • 解决领域概念与关系中的歧义
  • 实现跨系统的数据集成
  • 记录系统不变量与业务规则
常见触发语:
  • "为...设计一个schema"
  • "建模实体与关系"
  • "创建知识图谱"
  • "数据模型是什么?"
  • "定义本体"
  • "我们应该如何结构化这些数据?"
  • "映射...之间的关系"
  • "设计API数据模型"

What Is It

概念解析

Data schema & knowledge modeling is the process of formally defining:
  1. Entities - Things that exist (User, Product, Order, Organization)
  2. Attributes - Properties of entities (name, price, status, createdAt)
  3. Relationships - Connections between entities (User owns Order, Product belongsTo Category)
  4. Constraints - Rules and invariants (unique email, price > 0, one primary address)
  5. Cardinality - How many of each (one-to-many, many-to-many)
Quick example: E-commerce schema:
  • Entities: User, Product, Order, Cart, Payment
  • Relationships: User has many Orders, Order contains many Products (via OrderItems), User has one Cart
  • Constraints: Email must be unique, Order total matches sum of OrderItems, Payment amount equals Order total
  • Result: Unambiguous model that prevents data inconsistencies
数据Schema与知识建模是指正式定义以下内容的过程:
  1. Entities(实体) - 存在的事物(用户、产品、订单、组织)
  2. Attributes(属性) - 实体的特性(名称、价格、状态、创建时间)
  3. Relationships(关系) - 实体之间的关联(用户拥有订单、产品属于分类)
  4. Constraints(约束) - 规则与不变量(唯一邮箱、价格>0、仅一个主地址)
  5. Cardinality(基数) - 关联数量(一对多、多对多)
简单示例:电商schema
  • 实体:用户、产品、订单、购物车、支付
  • 关系:用户拥有多个订单,订单包含多个产品(通过订单项),用户拥有一个购物车
  • 约束:邮箱必须唯一,订单总额等于订单项金额之和,支付金额等于订单总额
  • 结果:无歧义的模型,可防止数据不一致

Workflow

工作流程

Copy this checklist and track your progress:
Data Schema & Knowledge Modeling Progress:
- [ ] Step 1: Gather domain requirements and scope
- [ ] Step 2: Identify entities and attributes
- [ ] Step 3: Define relationships and cardinality
- [ ] Step 4: Specify constraints and invariants
- [ ] Step 5: Validate and document the model
Step 1: Gather domain requirements and scope
Ask user for domain description, core use cases (what queries/operations will this support), existing data (if migration/integration), performance/scale requirements, and technology constraints (SQL vs NoSQL vs graph database). Understanding use cases shapes the model - OLTP vs OLAP vs graph traversal require different designs. See Schema Types for guidance.
Step 2: Identify entities and attributes
Extract nouns from requirements (those are candidate entities). For each entity, list attributes with types and nullability. Use resources/template.md for systematic entity identification. Verify each entity represents a distinct concept with independent lifecycle. Document entity purpose and examples.
Step 3: Define relationships and cardinality
Map connections between entities (one-to-one, one-to-many, many-to-many). For many-to-many, identify junction tables/entities. Specify relationship directionality and optionality (can X exist without Y?). Use resources/methodology.md for complex relationship patterns like hierarchies, polymorphic associations, and temporal relationships.
Step 4: Specify constraints and invariants
Define uniqueness constraints, foreign key relationships, check constraints, and business rules. Document domain invariants (rules that must ALWAYS be true). Identify derived/computed attributes vs stored. Use resources/methodology.md for advanced constraint patterns and validation strategies.
Step 5: Validate and document the model
Create
data-schema-knowledge-modeling.md
file with complete schema definition. Validate against use cases - can the schema support required queries/operations? Check for normalization (eliminate redundancy) or denormalization (optimize for specific queries). Self-assess using resources/evaluators/rubric_data_schema_knowledge_modeling.json. Minimum standard: Average score ≥ 3.5.
复制以下清单并跟踪进度:
数据Schema与知识建模进度:
- [ ] 步骤1:收集领域需求与范围
- [ ] 步骤2:识别实体与属性
- [ ] 步骤3:定义关系与基数
- [ ] 步骤4:指定约束与不变量
- [ ] 步骤5:验证并记录模型
步骤1:收集领域需求与范围
向用户询问领域描述、核心用例(该模型需支持哪些查询/操作)、现有数据(如果涉及迁移/集成)、性能/规模要求,以及技术约束(SQL vs NoSQL vs 图数据库)。用例决定模型设计——OLTP、OLAP与图遍历需要不同的设计。可参考Schema类型获取指导。
步骤2:识别实体与属性
从需求中提取名词(这些是候选实体)。为每个实体列出带有类型与可空性的属性。使用resources/template.md进行系统化的实体识别。验证每个实体是否代表具有独立生命周期的独特概念。记录实体的用途与示例。
步骤3:定义关系与基数
映射实体之间的关联(一对一、一对多、多对多)。对于多对多关系,识别关联表/实体。指定关系的方向性与可选性(X可以不依赖Y存在吗?)。使用resources/methodology.md了解复杂关系模式,如层级、多态关联与时间关系。
步骤4:指定约束与不变量
定义唯一性约束、外键关系、检查约束与业务规则。记录领域不变量(必须始终成立的规则)。区分派生/计算属性与存储属性。使用resources/methodology.md了解高级约束模式与验证策略。
步骤5:验证并记录模型
创建
data-schema-knowledge-modeling.md
文件,包含完整的schema定义。根据用例进行验证——该schema是否支持所需的查询/操作?检查规范化(消除冗余)或反规范化(针对特定查询优化)情况。使用resources/evaluators/rubric_data_schema_knowledge_modeling.json进行自我评估。最低标准:平均分≥3.5。

Schema Types

Schema类型

Choose based on use case and technology:
Relational (SQL) Schema
  • Best for: Transactional systems (OLTP), strong consistency, complex queries with joins
  • Pattern: Normalized tables, foreign keys, ACID transactions
  • Example use cases: E-commerce orders, banking transactions, HR systems
  • Key decision: Normalization level (3NF for consistency vs denormalized for read performance)
Document/NoSQL Schema
  • Best for: Flexible/evolving structure, high write throughput, denormalized reads
  • Pattern: Nested documents, embedded relationships, no joins
  • Example use cases: Content management, user profiles, event logs
  • Key decision: Embed vs reference (embed for 1-to-few, reference for 1-to-many)
Graph Schema (Ontology)
  • Best for: Complex relationships, traversal queries, semantic reasoning, knowledge graphs
  • Pattern: Nodes (entities), edges (relationships), properties on both
  • Example use cases: Social networks, fraud detection, recommendation engines, scientific research
  • Key decision: Property graph vs RDF triples
Event/Time-Series Schema
  • Best for: Audit logs, metrics, IoT data, append-only data
  • Pattern: Immutable events, time-based partitioning, aggregation tables
  • Example use cases: User activity tracking, monitoring, financial transactions
  • Key decision: Raw events vs pre-aggregated summaries
Dimensional (Data Warehouse) Schema
  • Best for: Analytics (OLAP), aggregations, historical reporting
  • Pattern: Fact tables + dimension tables (star/snowflake schema)
  • Example use cases: Business intelligence, sales analytics, customer 360
  • Key decision: Star schema (denormalized) vs snowflake (normalized dimensions)
根据用例与技术选择合适的类型:
关系型(SQL)Schema
  • 最佳适用场景:事务系统(OLTP)、强一致性、需多表关联的复杂查询
  • 模式:规范化表、外键、ACID事务
  • 示例用例:电商订单、银行交易、HR系统
  • 关键决策:规范化级别(3NF保证一致性 vs 反规范化提升读取性能)
文档型/NoSQL Schema
  • 最佳适用场景:灵活/演进的结构、高写入吞吐量、反规范化读取
  • 模式:嵌套文档、嵌入式关系、无关联查询
  • 示例用例:内容管理、用户档案、事件日志
  • 关键决策:嵌入 vs 引用(1对少量关系用嵌入,1对多关系用引用)
图Schema(本体)
  • 最佳适用场景:复杂关系、遍历查询、语义推理、知识图谱
  • 模式:节点(实体)、边(关系)、两者均带属性
  • 示例用例:社交网络、欺诈检测、推荐引擎、科学研究
  • 关键决策:属性图 vs RDF三元组
事件/时间序列Schema
  • 最佳适用场景:审计日志、指标数据、IoT数据、仅追加数据
  • 模式:不可变事件、基于时间的分区、聚合表
  • 示例用例:用户行为跟踪、监控、金融交易
  • 关键决策:原始事件 vs 预聚合摘要
维度型(数据仓库)Schema
  • 最佳适用场景:分析(OLAP)、聚合、历史报告
  • 模式:事实表 + 维度表(星型/雪花型schema)
  • 示例用例:商业智能、销售分析、客户360视图
  • 关键决策:星型schema(反规范化) vs 雪花型schema(规范化维度)

Common Patterns

常见模式

Pattern: Entity Lifecycle Modeling Track entity state changes explicitly. Example: Order (draft → pending → confirmed → shipped → delivered → completed/cancelled). Include status field, timestamps for each state, and transitions table if history needed.
Pattern: Soft Deletes Never physically delete records - add
deletedAt
timestamp. Allows data recovery, audit compliance, and referential integrity. Filter
WHERE deletedAt IS NULL
in queries.
Pattern: Polymorphic Associations Entity relates to multiple types. Example: Comment can be on Post or Photo. Options: (1) separate foreign keys (commentableType + commentableId), (2) junction tables per type, (3) single table inheritance.
Pattern: Temporal/Historical Data Track changes over time. Options: (1) Effective/expiry dates per record, (2) separate history table, (3) event sourcing (store all changes as events). Choose based on query patterns.
Pattern: Multi-tenancy Isolate data per customer. Options: (1) Separate databases (strong isolation), (2) Shared schema with tenantId column (efficient), (3) Separate schemas in same DB (balance). Add tenantId to all queries if shared.
Pattern: Hierarchies Model trees/nested structures. Options: (1) Adjacency list (parentId), (2) Nested sets (left/right values), (3) Path enumeration (materialized path), (4) Closure table (all ancestor-descendant pairs). Trade-offs between read/write performance.
模式:实体生命周期建模 显式跟踪实体状态变化。示例:订单(草稿→待处理→已确认→已发货→已送达→已完成/已取消)。包含状态字段、各状态的时间戳,如需记录历史可添加状态变迁表。
模式:软删除 绝不物理删除记录——添加
deletedAt
时间戳。支持数据恢复、审计合规与引用完整性。查询时通过
WHERE deletedAt IS NULL
过滤。
模式:多态关联 实体与多种类型关联。示例:评论可关联帖子或图片。可选方案:(1) 单独的外键(commentableType + commentableId),(2) 每种类型对应一个关联表,(3) 单表继承。
模式:时间/历史数据 跟踪数据随时间的变化。可选方案:(1) 每条记录包含生效/过期日期,(2) 单独的历史表,(3) 事件溯源(将所有变更存储为事件)。根据查询模式选择合适方案。
模式:多租户 按客户隔离数据。可选方案:(1) 独立数据库(强隔离),(2) 共享schema并添加tenantId列(高效),(3) 同一数据库中的独立schema(平衡方案)。若使用共享模式,需在所有查询中添加tenantId过滤。
模式:层级结构 建模树状/嵌套结构。可选方案:(1) 邻接表(parentId),(2) 嵌套集(左/右值),(3) 路径枚举(物化路径),(4) 闭包表(所有祖先-后代对)。需权衡读写性能。

Guardrails

注意事项

✓ Do:
  • Start with use cases - schema serves queries/operations
  • Normalize first, then denormalize for specific performance needs
  • Document all constraints and invariants explicitly
  • Use meaningful, consistent naming conventions
  • Consider future evolution - design for extensibility
  • Validate model against ALL required use cases
  • Model the real world accurately (don't force fit to technology)
✗ Don't:
  • Design schema in isolation from use cases
  • Premature optimization (denormalize before measuring)
  • Skip constraint definitions (leads to data corruption)
  • Use generic names (data, value, thing) - be specific
  • Ignore cardinality and nullability
  • Model implementation details in domain entities
  • Forget about data migration path from existing systems
  • Create circular dependencies between entities
✓ 建议:
  • 从用例入手——schema为查询/操作服务
  • 先规范化,再针对特定性能需求进行反规范化
  • 明确记录所有约束与不变量
  • 使用有意义、一致的命名规范
  • 考虑未来演进——设计时兼顾可扩展性
  • 根据所有必要用例验证模型
  • 准确建模现实世界(不要强行适配技术)
✗ 避免:
  • 脱离用例孤立设计schema
  • 过早优化(在衡量性能前不要反规范化)
  • 跳过约束定义(会导致数据损坏)
  • 使用通用名称(如data、value、thing)——需具体
  • 忽略基数与可空性
  • 在领域实体中建模实现细节
  • 忘记从现有系统迁移数据的路径
  • 在实体之间创建循环依赖

Quick Reference

快速参考

Resources:
  • resources/template.md
    - Structured process for entity identification, relationship mapping, and constraint definition
  • resources/methodology.md
    - Advanced patterns: temporal modeling, graph ontologies, schema evolution, normalization strategies
  • resources/examples/
    - Worked examples showing complete schema designs with validation
  • resources/evaluators/rubric_data_schema_knowledge_modeling.json
    - Quality assessment before delivery
When to choose which resource:
  • Simple domain (< 10 entities) → Start with template
  • Complex domain or graph/ontology → Study methodology for advanced patterns
  • Need to see examples → Review examples folder
  • Before delivering to user → Always validate with rubric
Expected deliverable:
data-schema-knowledge-modeling.md
file containing: domain description, complete entity definitions with attributes and types, relationship mappings with cardinality, constraint specifications, diagram (ERD/graph visualization), validation against use cases, and implementation notes.
Common schema notations:
  • ERD (Entity-Relationship Diagram): Visual representation of entities and relationships
  • UML Class Diagram: Object-oriented view with inheritance and associations
  • Graph Diagram: Nodes and edges for graph databases
  • JSON Schema: API/document structure with validation rules
  • SQL DDL: Executable CREATE TABLE statements
  • Ontology (OWL/RDF): Semantic web knowledge representation
资源:
  • resources/template.md
    - 实体识别、关系映射与约束定义的结构化流程
  • resources/methodology.md
    - 高级模式:时间建模、图本体、schema演进、规范化策略
  • resources/examples/
    - 完整schema设计示例及验证过程
  • resources/evaluators/rubric_data_schema_knowledge_modeling.json
    - 交付前的质量评估标准
资源选择指南:
  • 简单领域(<10个实体)→ 从模板开始
  • 复杂领域或图/本体→ 学习方法论中的高级模式
  • 需要参考示例→ 查看examples文件夹
  • 交付给用户前→ 始终使用评估标准进行验证
预期交付物:
data-schema-knowledge-modeling.md
文件,包含:领域描述、带属性与类型的完整实体定义、带基数的关系映射、约束规范、图表(ERD/图可视化)、用例验证结果与实现说明。
常见schema表示法:
  • ERD(实体关系图):实体与关系的可视化表示
  • UML类图:面向对象视图,包含继承与关联
  • 图图表:图数据库的节点与边
  • JSON Schema:带验证规则的API/文档结构
  • SQL DDL:可执行的CREATE TABLE语句
  • 本体(OWL/RDF):语义网知识表示法