oceanbase-schema-design

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

OceanBase Schema Design Skill

OceanBase Schema 设计技能

Expert in database schema design for OceanBase distributed database (MySQL & Oracle modes).

Provides best practices for designing efficient table structures, partitioning strategies, table groups, indexes, and schema optimization in OceanBase. This skill covers both MySQL mode and Oracle mode.

精通OceanBase分布式数据库（MySQL & Oracle模式）的数据库Schema设计。

提供OceanBase中高效表结构、分区策略、表组、索引设计以及Schema优化的最佳实践，本技能同时覆盖MySQL模式和Oracle模式。

Mode-specific syntax guide

不同模式的语法指南

While OceanBase uses the same underlying storage and partitioning mechanisms for both MySQL and Oracle modes, there are important syntax differences:

Aspect	MySQL Mode	Oracle Mode
Data Types	`BIGINT` , `VARCHAR` , `DATETIME`	`NUMBER` , `VARCHAR2` , `DATE` , `TIMESTAMP`
Schema Access	`database.table`	`schema.table`
Default Index	LOCAL (for partitioned tables)	GLOBAL (for partitioned tables)
Time Functions	`NOW()` , `CURRENT_TIMESTAMP`	`SYSDATE` , `SYSTIMESTAMP`
String Functions	`CONCAT()`	`\|\|` operator

Design Principles:

✅ Same: Partitioning types, table group concepts, index types
✅ Same: Design best practices, normalization principles
❌ Different: Data type names, default index behavior, function syntax

Examples in this skill are marked with mode indicators: MySQL Mode: or Oracle Mode:

虽然OceanBase对MySQL和Oracle模式使用相同的底层存储和分区机制，但二者存在重要的语法差异：

对比项	MySQL 模式	Oracle 模式
数据类型	`BIGINT` , `VARCHAR` , `DATETIME`	`NUMBER` , `VARCHAR2` , `DATE` , `TIMESTAMP`
Schema访问方式	`database.table`	`schema.table`
默认索引类型	LOCAL（分区表默认）	GLOBAL（分区表默认）
时间函数	`NOW()` , `CURRENT_TIMESTAMP`	`SYSDATE` , `SYSTIMESTAMP`
字符串函数	`CONCAT()`	`\|\|` 运算符

设计原则：

✅ 相同点：分区类型、表组概念、索引类型
✅ 相同点：设计最佳实践、范式化原则
❌ 不同点：数据类型名称、默认索引行为、函数语法

本技能中的示例会标注模式标识：MySQL模式： 或 Oracle模式：

Table design fundamentals

表设计基础

Database normalization

数据库范式

Three Normal Forms:

1NF: All field values must be atomic (indivisible)
2NF: Every column must be fully dependent on the primary key
3NF: No transitive dependencies (columns must be directly dependent on primary key)

Important: Schema design should prioritize business performance over strict normalization. Appropriate data redundancy is acceptable to reduce table joins and improve performance. Redundancy fields should not be frequently modified and not be very long VARCHAR fields.

三大范式：

1NF：所有字段值必须是原子性的（不可拆分）
2NF：每一列都必须完全依赖于主键
3NF：不存在传递依赖（列必须直接依赖于主键）

重要提示： Schema设计应优先考虑业务性能而非严格遵循范式。为了减少表关联、提升性能，可接受适当的数据冗余。冗余字段不应频繁修改，也不应该是超长的VARCHAR字段。

Table structure design standards

表结构设计规范

1. Primary key design:

sql

-- ✅ GOOD: Use business fields as primary key
obclient [SALES_DB]> CREATE TABLE order_table (
                     order_id BIGINT PRIMARY KEY,
                     customer_id BIGINT NOT NULL,
                     gmt_create TIMESTAMP DEFAULT CURRENT_TIMESTAMP,  -- MySQL: CURRENT_TIMESTAMP, Oracle: SYSDATE
                     gmt_modified TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
                     ) COMMENT='Order table';

-- ✅ GOOD: Composite primary key
obclient [SALES_DB]> CREATE TABLE order_item_table (
                     order_id BIGINT,
                     item_id BIGINT,
                     PRIMARY KEY (order_id, item_id)
                     );

Key points:

OceanBase uses Index-Organized Table (IOT) model
If no primary key is specified, system automatically generates a hidden primary key
Recommend using business fields as primary key (avoid auto-increment)
Include required fields:
```
gmt_create
```
,
```
gmt_modified
```
All fields should have
```
COMMENT
```
attributes and be
```
NOT NULL
```
with appropriate
```
DEFAULT
```
values
Use
```
TINYINT UNSIGNED
```
for boolean fields (1=yes, 0=no)
Join fields must have consistent data types to avoid implicit conversion

1. 主键设计：

sql

-- ✅ 推荐：使用业务字段作为主键
obclient [SALES_DB]> CREATE TABLE order_table (
                     order_id BIGINT PRIMARY KEY,
                     customer_id BIGINT NOT NULL,
                     gmt_create TIMESTAMP DEFAULT CURRENT_TIMESTAMP,  -- MySQL: CURRENT_TIMESTAMP, Oracle: SYSDATE
                     gmt_modified TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
                     ) COMMENT='Order table';

-- ✅ 推荐：联合主键
obclient [SALES_DB]> CREATE TABLE order_item_table (
                     order_id BIGINT,
                     item_id BIGINT,
                     PRIMARY KEY (order_id, item_id)
                     );

核心要点：

OceanBase采用索引组织表（Index-Organized Table, IOT）模型
如果未指定主键，系统会自动生成隐藏主键
推荐使用业务字段作为主键（避免自增主键）
必须包含的字段：
```
gmt_create
```
、
```
gmt_modified
```
所有字段都应该添加
```
COMMENT
```
属性，设置为
```
NOT NULL
```
并配置合适的
```
DEFAULT
```
值
布尔类型字段使用
```
TINYINT UNSIGNED
```
（1=是，0=否）
关联字段的数据类型必须一致，避免隐式转换

Partition design

分区设计

When to use partitioning

何时使用分区

Use partitioning when:

Data volume is large and access is concentrated
Table is a history table or transaction log table
Table has obvious access hotspots
Need to improve query performance through partition pruning

符合以下场景时推荐使用分区：

数据量较大且访问集中
表为历史表或事务日志表
表存在明显的访问热点
需要通过分区裁剪提升查询性能

Partition key selection

分区键选择

Critical Rule:

For partitioned tables, every primary key and unique key must include at least one field that is part of the partition key

Partition key selection principles:

Hash partition: Choose high cardinality field that appears most frequently in WHERE clauses (e.g., User ID, Order ID)
Range/List partition: Choose fields based on business rules (e.g., time-based for log tables), partition count should not be too small

sql

-- ✅ GOOD: Hash partition on high-cardinality field
obclient [SALES_DB]> CREATE TABLE order_table (
                     order_id BIGINT PRIMARY KEY,
                     customer_id BIGINT,
                     order_date DATE
                     ) PARTITION BY HASH(order_id) PARTITIONS 8;

-- ✅ GOOD: Range partition for time-based data
obclient [SALES_DB]> CREATE TABLE order_log_table (
                     log_id BIGINT,
                     log_date DATE NOT NULL,
                     PRIMARY KEY (log_id, log_date)
                     ) PARTITION BY RANGE COLUMNS(log_date) (
                     PARTITION p202401 VALUES LESS THAN('2024-02-01'),
                     PARTITION p202402 VALUES LESS THAN('2024-03-01'),
                     PARTITION pMAX VALUES LESS THAN MAXVALUE
                     );

Limitations: Hash partition is not suitable for range queries on partition key. Range partition is better for time-based queries.

核心规则：

对于分区表，每个主键和唯一键都必须至少包含一个分区键中的字段

分区键选择原则：

Hash分区：选择WHERE子句中出现频率最高的高基数字段（例如用户ID、订单ID）
Range/List分区：根据业务规则选择字段（例如日志表按时间分区），分区数量不宜过少

sql

-- ✅ 推荐：基于高基数字段做Hash分区
obclient [SALES_DB]> CREATE TABLE order_table (
                     order_id BIGINT PRIMARY KEY,
                     customer_id BIGINT,
                     order_date DATE
                     ) PARTITION BY HASH(order_id) PARTITIONS 8;

-- ✅ 推荐：时间类数据使用Range分区
obclient [SALES_DB]> CREATE TABLE order_log_table (
                     log_id BIGINT,
                     log_date DATE NOT NULL,
                     PRIMARY KEY (log_id, log_date)
                     ) PARTITION BY RANGE COLUMNS(log_date) (
                     PARTITION p202401 VALUES LESS THAN('2024-02-01'),
                     PARTITION p202402 VALUES LESS THAN('2024-03-01'),
                     PARTITION pMAX VALUES LESS THAN MAXVALUE
                     );

限制： Hash分区不适合针对分区键的范围查询，Range分区更适合时间类查询。

Partition types

分区类型

Supported partition types:

Hash: Even data distribution, point queries (high cardinality field)
Range: Time-based data, historical data (date or numeric with clear ranges)
List: Discrete value sets (field with limited distinct values)
Key: Similar to Hash, uses MySQL's internal hashing function
Composite: Combines two partition types (e.g., Hash + Range)

sql

-- Hash partition
obclient [SALES_DB]> CREATE TABLE order_table (
                     order_id BIGINT PRIMARY KEY,
                     customer_id BIGINT
                     ) PARTITION BY HASH(order_id) PARTITIONS 8;

-- Range partition
obclient [SALES_DB]> CREATE TABLE order_table (
                     order_id BIGINT,
                     order_date DATE NOT NULL,
                     PRIMARY KEY (order_id, order_date)
                     ) PARTITION BY RANGE COLUMNS(order_date) (
                     PARTITION p2024Q1 VALUES LESS THAN('2024-04-01'),
                     PARTITION p2024Q2 VALUES LESS THAN('2024-07-01'),
                     PARTITION pMAX VALUES LESS THAN MAXVALUE
                     );

-- Composite partition (Hash + Range)
obclient [SALES_DB]> CREATE TABLE order_table (
                     order_id BIGINT,
                     customer_id BIGINT,
                     order_date DATE NOT NULL,
                     PRIMARY KEY (order_id, customer_id, order_date)
                     ) PARTITION BY HASH(customer_id) PARTITIONS 8
                     SUBPARTITION BY RANGE COLUMNS(order_date) (
                     PARTITION p0 (
                         SUBPARTITION sp0_q1 VALUES LESS THAN('2024-04-01'),
                         SUBPARTITION sp0_q2 VALUES LESS THAN('2024-07-01')
                     )
                     -- ... more partitions
                     );

支持的分区类型：

Hash：数据分布均匀，适合点查询（高基数字段）
Range：适合时间类数据、历史数据（日期或有明确范围的数值）
List：适合离散值集合（不同取值有限的字段）
Key：类似Hash，使用MySQL内部哈希函数
复合分区：组合两种分区类型（例如Hash + Range）

sql

-- Hash分区
obclient [SALES_DB]> CREATE TABLE order_table (
                     order_id BIGINT PRIMARY KEY,
                     customer_id BIGINT
                     ) PARTITION BY HASH(order_id) PARTITIONS 8;

-- Range分区
obclient [SALES_DB]> CREATE TABLE order_table (
                     order_id BIGINT,
                     order_date DATE NOT NULL,
                     PRIMARY KEY (order_id, order_date)
                     ) PARTITION BY RANGE COLUMNS(order_date) (
                     PARTITION p2024Q1 VALUES LESS THAN('2024-04-01'),
                     PARTITION p2024Q2 VALUES LESS THAN('2024-07-01'),
                     PARTITION pMAX VALUES LESS THAN MAXVALUE
                     );

-- 复合分区（Hash + Range）
obclient [SALES_DB]> CREATE TABLE order_table (
                     order_id BIGINT,
                     customer_id BIGINT,
                     order_date DATE NOT NULL,
                     PRIMARY KEY (order_id, customer_id, order_date)
                     ) PARTITION BY HASH(customer_id) PARTITIONS 8
                     SUBPARTITION BY RANGE COLUMNS(order_date) (
                     PARTITION p0 (
                         SUBPARTITION sp0_q1 VALUES LESS THAN('2024-04-01'),
                         SUBPARTITION sp0_q2 VALUES LESS THAN('2024-07-01')
                     )
                     -- ... 更多分区
                     );

Partition pruning

分区裁剪

Partition pruning avoids accessing irrelevant partitions, significantly improving SQL execution efficiency.

Rules:

Hash/List: Partition key must be in WHERE clause with equality or IN conditions
Range: Use range conditions that match partition boundaries, avoid functions on partition keys

sql

-- ✅ GOOD: Partition pruning works
obclient [SALES_DB]> SELECT * FROM order_table WHERE order_id = 12345;
-- Execution plan shows: partitions(p1) - only one partition accessed

-- ❌ BAD: No partition pruning (missing partition key or function on partition key)
obclient [SALES_DB]> SELECT * FROM order_table WHERE customer_id = 1001;
-- Execution plan shows: all partitions scanned

分区裁剪可以避免访问无关分区，显著提升SQL执行效率。

规则：

Hash/List分区：分区键必须出现在WHERE子句中，且使用等值或IN条件
Range分区：使用匹配分区边界的范围条件，避免对分区键使用函数

sql

-- ✅ 推荐：分区裁剪生效
obclient [SALES_DB]> SELECT * FROM order_table WHERE order_id = 12345;
-- 执行计划显示：partitions(p1) - 仅访问1个分区

-- ❌ 不推荐：分区裁剪不生效（缺少分区键或分区键上使用了函数）
obclient [SALES_DB]> SELECT * FROM order_table WHERE customer_id = 1001;
-- 执行计划显示：扫描全部分区

Table group design

表组设计

Table group overview

表组概述

Table Group is a logical concept representing a collection of tables. It controls the physical storage proximity of related tables to enable Partition Wise Join.

表组（Table Group） 是代表表集合的逻辑概念，用于控制相关表的物理存储邻近性，以实现Partition Wise Join。

SHARDING attributes

SHARDING属性

1. SHARDING = 'NONE': All partitions co-located on same machine, no partition restrictions

2. SHARDING = 'PARTITION': Tables sharded by first-level partitions, all tables must have identical first-level partition definitions

3. SHARDING = 'ADAPTIVE' (Default): Tables sharded adaptively, enables Partition Wise Join for both first-level and second-level partitions

Use table groups when: Multiple tables are frequently joined together, tables have related partition keys, need to avoid cross-partition joins

sql

-- Create table group and add tables
obclient [SALES_DB]> CREATE TABLEGROUP order_tg SHARDING = 'PARTITION'
                     PARTITION BY HASH(customer_id) PARTITIONS 8;

obclient [SALES_DB]> CREATE TABLE order_table (
                     order_id BIGINT,
                     customer_id BIGINT,
                     PRIMARY KEY (order_id, customer_id)
                     ) PARTITION BY HASH(customer_id) PARTITIONS 8;

obclient [SALES_DB]> ALTER TABLE order_table SET TABLEGROUP order_tg;
-- Now JOIN queries can use Partition Wise Join

1. SHARDING = 'NONE': 所有分区都存放在同一机器上，无分区限制

2. SHARDING = 'PARTITION': 表按一级分区分片，所有表的一级分区定义必须完全一致

3. SHARDING = 'ADAPTIVE'（默认值）: 表自适应分片，支持一级和二级分区的Partition Wise Join

符合以下场景时推荐使用表组： 多表频繁关联、表的分区键相关、需要避免跨分区关联

sql

-- 创建表组并添加表
obclient [SALES_DB]> CREATE TABLEGROUP order_tg SHARDING = 'PARTITION'
                     PARTITION BY HASH(customer_id) PARTITIONS 8;

obclient [SALES_DB]> CREATE TABLE order_table (
                     order_id BIGINT,
                     customer_id BIGINT,
                     PRIMARY KEY (order_id, customer_id)
                     ) PARTITION BY HASH(customer_id) PARTITIONS 8;

obclient [SALES_DB]> ALTER TABLE order_table SET TABLEGROUP order_tg;
-- 现在JOIN查询可以使用Partition Wise Join

Index design

索引设计

Index types

索引类型

1. Local Index (LOCAL):

Index data stored with table partition data
Default for MySQL mode partitioned tables
Better for DML performance, preferred choice

2. Global Index (GLOBAL):

Index separately partitioned
Default for Oracle mode partitioned tables
May cause distributed transactions, use only when necessary

sql

-- MySQL Mode: Default is LOCAL
obclient [SALES_DB]> CREATE INDEX idx_customer_date ON order_table(customer_id, order_date);
-- Creates LOCAL index by default

-- Create GLOBAL index explicitly
obclient [SALES_DB]> CREATE INDEX idx_customer_date ON order_table(customer_id, order_date) GLOBAL;

1. 本地索引（LOCAL）:

索引数据和表分区数据存储在一起
MySQL模式分区表的默认索引类型
DML性能更优，是首选方案

2. 全局索引（GLOBAL）:

索引单独分区
Oracle模式分区表的默认索引类型
可能会产生分布式事务，仅在必要时使用

sql

-- MySQL模式：默认创建LOCAL索引
obclient [SALES_DB]> CREATE INDEX idx_customer_date ON order_table(customer_id, order_date);
-- 默认创建LOCAL索引

-- 显式创建GLOBAL索引
obclient [SALES_DB]> CREATE INDEX idx_customer_date ON order_table(customer_id, order_date) GLOBAL;

Index design principles

索引设计原则

1. Leftmost prefix principle: Index must match query pattern from leftmost column

sql

-- ✅ GOOD: Index matches query
obclient [SALES_DB]> CREATE INDEX idx_customer_date ON order_table(customer_id, order_date);
obclient [SALES_DB]> SELECT * FROM order_table 
                     WHERE customer_id = 1001 AND order_date >= '2024-01-01';

-- ❌ BAD: Missing leftmost column prevents index usage
obclient [SALES_DB]> SELECT * FROM order_table WHERE order_date >= '2024-01-01';
-- Execution plan shows: TABLE SCAN (full table scan)

2. Composite index column order: Place high cardinality columns first

sql

-- ✅ GOOD: High cardinality column first
obclient [SALES_DB]> CREATE INDEX idx_customer_date ON order_table(customer_id, order_date);

3. Covering index: Include all query columns to avoid table lookup

sql

obclient [SALES_DB]> CREATE INDEX idx_customer_date_amount ON order_table(customer_id, order_date) 
                     STORING (total_amount, status);

4. ORDER BY optimization: Index can support ORDER BY if WHERE conditions match leftmost columns

5. Join field indexes: Always index join columns for efficient joins

1. 最左前缀原则： 索引必须从最左列开始匹配查询模式

sql

-- ✅ 推荐：索引匹配查询条件
obclient [SALES_DB]> CREATE INDEX idx_customer_date ON order_table(customer_id, order_date);
obclient [SALES_DB]> SELECT * FROM order_table 
                     WHERE customer_id = 1001 AND order_date >= '2024-01-01';

-- ❌ 不推荐：缺少最左列导致索引无法生效
obclient [SALES_DB]> SELECT * FROM order_table WHERE order_date >= '2024-01-01';
-- 执行计划显示：TABLE SCAN（全表扫描）

2. 联合索引列顺序： 高基数列放在前面

sql

-- ✅ 推荐：高基数列放在最前
obclient [SALES_DB]> CREATE INDEX idx_customer_date ON order_table(customer_id, order_date);

3. 覆盖索引： 包含所有查询列，避免回表

sql

obclient [SALES_DB]> CREATE INDEX idx_customer_date_amount ON order_table(customer_id, order_date) 
                     STORING (total_amount, status);

4. ORDER BY优化： 如果WHERE条件匹配最左列，索引可以支持ORDER BY

5. 关联字段索引： 关联列必须创建索引，以提升关联效率

Index design best practices

索引设计最佳实践

General principles:

Indexed fields should be
```
NOT NULL
```
with appropriate
```
DEFAULT
```
values
Business-unique fields should be primary keys
Join fields must have consistent data types
Place high cardinality columns first in composite indexes
Use single composite index instead of multiple single-column indexes
Add ORDER BY/GROUP BY columns to index for covering index

Partitioned table index:

Prefer LOCAL index → Global partitioned index → Global index
Reduce unnecessary global indexes (high maintenance cost)
Primary key must include partition key

Index maintenance:

Confirm new index is effective before deploying SQL
Index modification: Create new index → Verify → Delete old index
Delete unused indexes to avoid index bloat
Never mix
```
ALTER TABLE ADD INDEX
```
and
```
DROP INDEX
```
in one DDL

Common mistakes to avoid:

❌ Creating one index per query
❌ Believing indexes severely slow down updates
❌ Using "check then insert" instead of unique indexes
❌ Creating redundant indexes (e.g., idx_abc already covers idx_a and idx_ab)

通用原则：

索引字段应该设置为
```
NOT NULL
```
并配置合适的
```
DEFAULT
```
值
业务唯一的字段应该设为主键
关联字段的数据类型必须一致
联合索引中高基数列放在前面
优先使用单个联合索引，而非多个单列索引
将ORDER BY/GROUP BY列加入索引，形成覆盖索引

分区表索引：

优先级：本地索引 > 全局分区索引 > 全局索引
减少不必要的全局索引（维护成本高）
主键必须包含分区键

索引维护：

部署SQL前确认新索引生效
索引修改流程：创建新索引 → 验证生效 → 删除旧索引
删除未使用的索引，避免索引膨胀
不要在同一个DDL中同时执行
```
ALTER TABLE ADD INDEX
```
和
```
DROP INDEX
```
操作

需要避免的常见错误：

❌ 为每个查询单独创建一个索引
❌ 认为索引会严重降低更新性能
❌ 使用“先查后插”代替唯一索引
❌ 创建冗余索引（例如idx_abc已经覆盖了idx_a和idx_ab的功能）

Schema design checklist

Schema设计检查清单

Quick reference

快速参考

Partition types: Hash (even distribution, point queries) | Range (time-based data) | List (discrete value sets) | Key (similar to Hash) | Composite (two partition types)

Table group SHARDING: NONE (no restriction) | PARTITION (first-level partitions must match) | ADAPTIVE (all first-level or all second-level must match)

Index types: LOCAL (default MySQL, better DML performance) | GLOBAL (default Oracle, cross-partition queries)

分区类型： Hash（分布均匀，适合点查询） | Range（适合时间类数据） | List（适合离散值集合） | Key（类似Hash） | 复合分区（组合两种分区类型）

表组SHARDING属性： NONE（无限制） | PARTITION（一级分区必须匹配） | ADAPTIVE（所有一级或所有二级分区必须匹配）

索引类型： LOCAL（MySQL默认，DML性能更优） | GLOBAL（Oracle默认，适合跨分区查询）