oceanbase-schema-design
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseOceanBase Schema Design Skill
OceanBase Schema 设计技能
Expert in database schema design for OceanBase distributed database (MySQL & Oracle modes).
Provides best practices for designing efficient table structures, partitioning strategies, table groups, indexes, and schema optimization in OceanBase. This skill covers both MySQL mode and Oracle mode.
精通OceanBase分布式数据库(MySQL & Oracle模式)的数据库Schema设计。
提供OceanBase中高效表结构、分区策略、表组、索引设计以及Schema优化的最佳实践,本技能同时覆盖MySQL模式和Oracle模式。
Mode-specific syntax guide
不同模式的语法指南
While OceanBase uses the same underlying storage and partitioning mechanisms for both MySQL and Oracle modes, there are important syntax differences:
| Aspect | MySQL Mode | Oracle Mode |
|---|---|---|
| Data Types | | |
| Schema Access | | |
| Default Index | LOCAL (for partitioned tables) | GLOBAL (for partitioned tables) |
| Time Functions | | |
| String Functions | | |
Design Principles:
- ✅ Same: Partitioning types, table group concepts, index types
- ✅ Same: Design best practices, normalization principles
- ❌ Different: Data type names, default index behavior, function syntax
Examples in this skill are marked with mode indicators: MySQL Mode: or Oracle Mode:
虽然OceanBase对MySQL和Oracle模式使用相同的底层存储和分区机制,但二者存在重要的语法差异:
| 对比项 | MySQL 模式 | Oracle 模式 |
|---|---|---|
| 数据类型 | | |
| Schema访问方式 | | |
| 默认索引类型 | LOCAL(分区表默认) | GLOBAL(分区表默认) |
| 时间函数 | | |
| 字符串函数 | | |
设计原则:
- ✅ 相同点:分区类型、表组概念、索引类型
- ✅ 相同点:设计最佳实践、范式化原则
- ❌ 不同点:数据类型名称、默认索引行为、函数语法
本技能中的示例会标注模式标识:MySQL模式: 或 Oracle模式:
Table design fundamentals
表设计基础
Database normalization
数据库范式
Three Normal Forms:
- 1NF: All field values must be atomic (indivisible)
- 2NF: Every column must be fully dependent on the primary key
- 3NF: No transitive dependencies (columns must be directly dependent on primary key)
Important: Schema design should prioritize business performance over strict normalization. Appropriate data redundancy is acceptable to reduce table joins and improve performance. Redundancy fields should not be frequently modified and not be very long VARCHAR fields.
三大范式:
- 1NF:所有字段值必须是原子性的(不可拆分)
- 2NF:每一列都必须完全依赖于主键
- 3NF:不存在传递依赖(列必须直接依赖于主键)
重要提示: Schema设计应优先考虑业务性能而非严格遵循范式。为了减少表关联、提升性能,可接受适当的数据冗余。冗余字段不应频繁修改,也不应该是超长的VARCHAR字段。
Table structure design standards
表结构设计规范
1. Primary key design:
sql
-- ✅ GOOD: Use business fields as primary key
obclient [SALES_DB]> CREATE TABLE order_table (
order_id BIGINT PRIMARY KEY,
customer_id BIGINT NOT NULL,
gmt_create TIMESTAMP DEFAULT CURRENT_TIMESTAMP, -- MySQL: CURRENT_TIMESTAMP, Oracle: SYSDATE
gmt_modified TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
) COMMENT='Order table';
-- ✅ GOOD: Composite primary key
obclient [SALES_DB]> CREATE TABLE order_item_table (
order_id BIGINT,
item_id BIGINT,
PRIMARY KEY (order_id, item_id)
);Key points:
- OceanBase uses Index-Organized Table (IOT) model
- If no primary key is specified, system automatically generates a hidden primary key
- Recommend using business fields as primary key (avoid auto-increment)
- Include required fields: ,
gmt_creategmt_modified - All fields should have attributes and be
COMMENTwith appropriateNOT NULLvaluesDEFAULT - Use for boolean fields (1=yes, 0=no)
TINYINT UNSIGNED - Join fields must have consistent data types to avoid implicit conversion
1. 主键设计:
sql
-- ✅ 推荐:使用业务字段作为主键
obclient [SALES_DB]> CREATE TABLE order_table (
order_id BIGINT PRIMARY KEY,
customer_id BIGINT NOT NULL,
gmt_create TIMESTAMP DEFAULT CURRENT_TIMESTAMP, -- MySQL: CURRENT_TIMESTAMP, Oracle: SYSDATE
gmt_modified TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
) COMMENT='Order table';
-- ✅ 推荐:联合主键
obclient [SALES_DB]> CREATE TABLE order_item_table (
order_id BIGINT,
item_id BIGINT,
PRIMARY KEY (order_id, item_id)
);核心要点:
- OceanBase采用索引组织表(Index-Organized Table, IOT)模型
- 如果未指定主键,系统会自动生成隐藏主键
- 推荐使用业务字段作为主键(避免自增主键)
- 必须包含的字段:、
gmt_creategmt_modified - 所有字段都应该添加属性,设置为
COMMENT并配置合适的NOT NULL值DEFAULT - 布尔类型字段使用(1=是,0=否)
TINYINT UNSIGNED - 关联字段的数据类型必须一致,避免隐式转换
Partition design
分区设计
When to use partitioning
何时使用分区
Use partitioning when:
- Data volume is large and access is concentrated
- Table is a history table or transaction log table
- Table has obvious access hotspots
- Need to improve query performance through partition pruning
符合以下场景时推荐使用分区:
- 数据量较大且访问集中
- 表为历史表或事务日志表
- 表存在明显的访问热点
- 需要通过分区裁剪提升查询性能
Partition key selection
分区键选择
Critical Rule:
- For partitioned tables, every primary key and unique key must include at least one field that is part of the partition key
Partition key selection principles:
- Hash partition: Choose high cardinality field that appears most frequently in WHERE clauses (e.g., User ID, Order ID)
- Range/List partition: Choose fields based on business rules (e.g., time-based for log tables), partition count should not be too small
sql
-- ✅ GOOD: Hash partition on high-cardinality field
obclient [SALES_DB]> CREATE TABLE order_table (
order_id BIGINT PRIMARY KEY,
customer_id BIGINT,
order_date DATE
) PARTITION BY HASH(order_id) PARTITIONS 8;
-- ✅ GOOD: Range partition for time-based data
obclient [SALES_DB]> CREATE TABLE order_log_table (
log_id BIGINT,
log_date DATE NOT NULL,
PRIMARY KEY (log_id, log_date)
) PARTITION BY RANGE COLUMNS(log_date) (
PARTITION p202401 VALUES LESS THAN('2024-02-01'),
PARTITION p202402 VALUES LESS THAN('2024-03-01'),
PARTITION pMAX VALUES LESS THAN MAXVALUE
);Limitations: Hash partition is not suitable for range queries on partition key. Range partition is better for time-based queries.
核心规则:
- 对于分区表,每个主键和唯一键都必须至少包含一个分区键中的字段
分区键选择原则:
- Hash分区:选择WHERE子句中出现频率最高的高基数字段(例如用户ID、订单ID)
- Range/List分区:根据业务规则选择字段(例如日志表按时间分区),分区数量不宜过少
sql
-- ✅ 推荐:基于高基数字段做Hash分区
obclient [SALES_DB]> CREATE TABLE order_table (
order_id BIGINT PRIMARY KEY,
customer_id BIGINT,
order_date DATE
) PARTITION BY HASH(order_id) PARTITIONS 8;
-- ✅ 推荐:时间类数据使用Range分区
obclient [SALES_DB]> CREATE TABLE order_log_table (
log_id BIGINT,
log_date DATE NOT NULL,
PRIMARY KEY (log_id, log_date)
) PARTITION BY RANGE COLUMNS(log_date) (
PARTITION p202401 VALUES LESS THAN('2024-02-01'),
PARTITION p202402 VALUES LESS THAN('2024-03-01'),
PARTITION pMAX VALUES LESS THAN MAXVALUE
);限制: Hash分区不适合针对分区键的范围查询,Range分区更适合时间类查询。
Partition types
分区类型
Supported partition types:
- Hash: Even data distribution, point queries (high cardinality field)
- Range: Time-based data, historical data (date or numeric with clear ranges)
- List: Discrete value sets (field with limited distinct values)
- Key: Similar to Hash, uses MySQL's internal hashing function
- Composite: Combines two partition types (e.g., Hash + Range)
sql
-- Hash partition
obclient [SALES_DB]> CREATE TABLE order_table (
order_id BIGINT PRIMARY KEY,
customer_id BIGINT
) PARTITION BY HASH(order_id) PARTITIONS 8;
-- Range partition
obclient [SALES_DB]> CREATE TABLE order_table (
order_id BIGINT,
order_date DATE NOT NULL,
PRIMARY KEY (order_id, order_date)
) PARTITION BY RANGE COLUMNS(order_date) (
PARTITION p2024Q1 VALUES LESS THAN('2024-04-01'),
PARTITION p2024Q2 VALUES LESS THAN('2024-07-01'),
PARTITION pMAX VALUES LESS THAN MAXVALUE
);
-- Composite partition (Hash + Range)
obclient [SALES_DB]> CREATE TABLE order_table (
order_id BIGINT,
customer_id BIGINT,
order_date DATE NOT NULL,
PRIMARY KEY (order_id, customer_id, order_date)
) PARTITION BY HASH(customer_id) PARTITIONS 8
SUBPARTITION BY RANGE COLUMNS(order_date) (
PARTITION p0 (
SUBPARTITION sp0_q1 VALUES LESS THAN('2024-04-01'),
SUBPARTITION sp0_q2 VALUES LESS THAN('2024-07-01')
)
-- ... more partitions
);支持的分区类型:
- Hash:数据分布均匀,适合点查询(高基数字段)
- Range:适合时间类数据、历史数据(日期或有明确范围的数值)
- List:适合离散值集合(不同取值有限的字段)
- Key:类似Hash,使用MySQL内部哈希函数
- 复合分区:组合两种分区类型(例如Hash + Range)
sql
-- Hash分区
obclient [SALES_DB]> CREATE TABLE order_table (
order_id BIGINT PRIMARY KEY,
customer_id BIGINT
) PARTITION BY HASH(order_id) PARTITIONS 8;
-- Range分区
obclient [SALES_DB]> CREATE TABLE order_table (
order_id BIGINT,
order_date DATE NOT NULL,
PRIMARY KEY (order_id, order_date)
) PARTITION BY RANGE COLUMNS(order_date) (
PARTITION p2024Q1 VALUES LESS THAN('2024-04-01'),
PARTITION p2024Q2 VALUES LESS THAN('2024-07-01'),
PARTITION pMAX VALUES LESS THAN MAXVALUE
);
-- 复合分区(Hash + Range)
obclient [SALES_DB]> CREATE TABLE order_table (
order_id BIGINT,
customer_id BIGINT,
order_date DATE NOT NULL,
PRIMARY KEY (order_id, customer_id, order_date)
) PARTITION BY HASH(customer_id) PARTITIONS 8
SUBPARTITION BY RANGE COLUMNS(order_date) (
PARTITION p0 (
SUBPARTITION sp0_q1 VALUES LESS THAN('2024-04-01'),
SUBPARTITION sp0_q2 VALUES LESS THAN('2024-07-01')
)
-- ... 更多分区
);Partition pruning
分区裁剪
Partition pruning avoids accessing irrelevant partitions, significantly improving SQL execution efficiency.
Rules:
- Hash/List: Partition key must be in WHERE clause with equality or IN conditions
- Range: Use range conditions that match partition boundaries, avoid functions on partition keys
sql
-- ✅ GOOD: Partition pruning works
obclient [SALES_DB]> SELECT * FROM order_table WHERE order_id = 12345;
-- Execution plan shows: partitions(p1) - only one partition accessed
-- ❌ BAD: No partition pruning (missing partition key or function on partition key)
obclient [SALES_DB]> SELECT * FROM order_table WHERE customer_id = 1001;
-- Execution plan shows: all partitions scanned分区裁剪可以避免访问无关分区,显著提升SQL执行效率。
规则:
- Hash/List分区:分区键必须出现在WHERE子句中,且使用等值或IN条件
- Range分区:使用匹配分区边界的范围条件,避免对分区键使用函数
sql
-- ✅ 推荐:分区裁剪生效
obclient [SALES_DB]> SELECT * FROM order_table WHERE order_id = 12345;
-- 执行计划显示:partitions(p1) - 仅访问1个分区
-- ❌ 不推荐:分区裁剪不生效(缺少分区键或分区键上使用了函数)
obclient [SALES_DB]> SELECT * FROM order_table WHERE customer_id = 1001;
-- 执行计划显示:扫描全部分区Table group design
表组设计
Table group overview
表组概述
Table Group is a logical concept representing a collection of tables. It controls the physical storage proximity of related tables to enable Partition Wise Join.
表组(Table Group) 是代表表集合的逻辑概念,用于控制相关表的物理存储邻近性,以实现Partition Wise Join。
SHARDING attributes
SHARDING属性
1. SHARDING = 'NONE': All partitions co-located on same machine, no partition restrictions
2. SHARDING = 'PARTITION': Tables sharded by first-level partitions, all tables must have identical first-level partition definitions
3. SHARDING = 'ADAPTIVE' (Default): Tables sharded adaptively, enables Partition Wise Join for both first-level and second-level partitions
Use table groups when: Multiple tables are frequently joined together, tables have related partition keys, need to avoid cross-partition joins
sql
-- Create table group and add tables
obclient [SALES_DB]> CREATE TABLEGROUP order_tg SHARDING = 'PARTITION'
PARTITION BY HASH(customer_id) PARTITIONS 8;
obclient [SALES_DB]> CREATE TABLE order_table (
order_id BIGINT,
customer_id BIGINT,
PRIMARY KEY (order_id, customer_id)
) PARTITION BY HASH(customer_id) PARTITIONS 8;
obclient [SALES_DB]> ALTER TABLE order_table SET TABLEGROUP order_tg;
-- Now JOIN queries can use Partition Wise Join1. SHARDING = 'NONE': 所有分区都存放在同一机器上,无分区限制
2. SHARDING = 'PARTITION': 表按一级分区分片,所有表的一级分区定义必须完全一致
3. SHARDING = 'ADAPTIVE'(默认值): 表自适应分片,支持一级和二级分区的Partition Wise Join
符合以下场景时推荐使用表组: 多表频繁关联、表的分区键相关、需要避免跨分区关联
sql
-- 创建表组并添加表
obclient [SALES_DB]> CREATE TABLEGROUP order_tg SHARDING = 'PARTITION'
PARTITION BY HASH(customer_id) PARTITIONS 8;
obclient [SALES_DB]> CREATE TABLE order_table (
order_id BIGINT,
customer_id BIGINT,
PRIMARY KEY (order_id, customer_id)
) PARTITION BY HASH(customer_id) PARTITIONS 8;
obclient [SALES_DB]> ALTER TABLE order_table SET TABLEGROUP order_tg;
-- 现在JOIN查询可以使用Partition Wise JoinIndex design
索引设计
Index types
索引类型
1. Local Index (LOCAL):
- Index data stored with table partition data
- Default for MySQL mode partitioned tables
- Better for DML performance, preferred choice
2. Global Index (GLOBAL):
- Index separately partitioned
- Default for Oracle mode partitioned tables
- May cause distributed transactions, use only when necessary
sql
-- MySQL Mode: Default is LOCAL
obclient [SALES_DB]> CREATE INDEX idx_customer_date ON order_table(customer_id, order_date);
-- Creates LOCAL index by default
-- Create GLOBAL index explicitly
obclient [SALES_DB]> CREATE INDEX idx_customer_date ON order_table(customer_id, order_date) GLOBAL;1. 本地索引(LOCAL):
- 索引数据和表分区数据存储在一起
- MySQL模式分区表的默认索引类型
- DML性能更优,是首选方案
2. 全局索引(GLOBAL):
- 索引单独分区
- Oracle模式分区表的默认索引类型
- 可能会产生分布式事务,仅在必要时使用
sql
-- MySQL模式:默认创建LOCAL索引
obclient [SALES_DB]> CREATE INDEX idx_customer_date ON order_table(customer_id, order_date);
-- 默认创建LOCAL索引
-- 显式创建GLOBAL索引
obclient [SALES_DB]> CREATE INDEX idx_customer_date ON order_table(customer_id, order_date) GLOBAL;Index design principles
索引设计原则
1. Leftmost prefix principle: Index must match query pattern from leftmost column
sql
-- ✅ GOOD: Index matches query
obclient [SALES_DB]> CREATE INDEX idx_customer_date ON order_table(customer_id, order_date);
obclient [SALES_DB]> SELECT * FROM order_table
WHERE customer_id = 1001 AND order_date >= '2024-01-01';
-- ❌ BAD: Missing leftmost column prevents index usage
obclient [SALES_DB]> SELECT * FROM order_table WHERE order_date >= '2024-01-01';
-- Execution plan shows: TABLE SCAN (full table scan)2. Composite index column order: Place high cardinality columns first
sql
-- ✅ GOOD: High cardinality column first
obclient [SALES_DB]> CREATE INDEX idx_customer_date ON order_table(customer_id, order_date);3. Covering index: Include all query columns to avoid table lookup
sql
obclient [SALES_DB]> CREATE INDEX idx_customer_date_amount ON order_table(customer_id, order_date)
STORING (total_amount, status);4. ORDER BY optimization: Index can support ORDER BY if WHERE conditions match leftmost columns
5. Join field indexes: Always index join columns for efficient joins
1. 最左前缀原则: 索引必须从最左列开始匹配查询模式
sql
-- ✅ 推荐:索引匹配查询条件
obclient [SALES_DB]> CREATE INDEX idx_customer_date ON order_table(customer_id, order_date);
obclient [SALES_DB]> SELECT * FROM order_table
WHERE customer_id = 1001 AND order_date >= '2024-01-01';
-- ❌ 不推荐:缺少最左列导致索引无法生效
obclient [SALES_DB]> SELECT * FROM order_table WHERE order_date >= '2024-01-01';
-- 执行计划显示:TABLE SCAN(全表扫描)2. 联合索引列顺序: 高基数列放在前面
sql
-- ✅ 推荐:高基数列放在最前
obclient [SALES_DB]> CREATE INDEX idx_customer_date ON order_table(customer_id, order_date);3. 覆盖索引: 包含所有查询列,避免回表
sql
obclient [SALES_DB]> CREATE INDEX idx_customer_date_amount ON order_table(customer_id, order_date)
STORING (total_amount, status);4. ORDER BY优化: 如果WHERE条件匹配最左列,索引可以支持ORDER BY
5. 关联字段索引: 关联列必须创建索引,以提升关联效率
Index design best practices
索引设计最佳实践
General principles:
- Indexed fields should be with appropriate
NOT NULLvaluesDEFAULT - Business-unique fields should be primary keys
- Join fields must have consistent data types
- Place high cardinality columns first in composite indexes
- Use single composite index instead of multiple single-column indexes
- Add ORDER BY/GROUP BY columns to index for covering index
Partitioned table index:
- Prefer LOCAL index → Global partitioned index → Global index
- Reduce unnecessary global indexes (high maintenance cost)
- Primary key must include partition key
Index maintenance:
- Confirm new index is effective before deploying SQL
- Index modification: Create new index → Verify → Delete old index
- Delete unused indexes to avoid index bloat
- Never mix and
ALTER TABLE ADD INDEXin one DDLDROP INDEX
Common mistakes to avoid:
- ❌ Creating one index per query
- ❌ Believing indexes severely slow down updates
- ❌ Using "check then insert" instead of unique indexes
- ❌ Creating redundant indexes (e.g., idx_abc already covers idx_a and idx_ab)
通用原则:
- 索引字段应该设置为并配置合适的
NOT NULL值DEFAULT - 业务唯一的字段应该设为主键
- 关联字段的数据类型必须一致
- 联合索引中高基数列放在前面
- 优先使用单个联合索引,而非多个单列索引
- 将ORDER BY/GROUP BY列加入索引,形成覆盖索引
分区表索引:
- 优先级:本地索引 > 全局分区索引 > 全局索引
- 减少不必要的全局索引(维护成本高)
- 主键必须包含分区键
索引维护:
- 部署SQL前确认新索引生效
- 索引修改流程:创建新索引 → 验证生效 → 删除旧索引
- 删除未使用的索引,避免索引膨胀
- 不要在同一个DDL中同时执行和
ALTER TABLE ADD INDEX操作DROP INDEX
需要避免的常见错误:
- ❌ 为每个查询单独创建一个索引
- ❌ 认为索引会严重降低更新性能
- ❌ 使用“先查后插”代替唯一索引
- ❌ 创建冗余索引(例如idx_abc已经覆盖了idx_a和idx_ab的功能)
Schema design checklist
Schema设计检查清单
Before deploying a table schema to production:
- Primary key defined (preferably business fields)
- Required fields included (gmt_create, gmt_modified)
- All fields have COMMENT attributes and are NOT NULL with DEFAULT values
- Join fields have consistent data types
- Partition key selected appropriately and included in primary key/unique key
- Table group created if tables are frequently joined
- Indexes designed following leftmost prefix principle, on join columns
- LOCAL index preferred for partitioned tables, no redundant indexes
- Schema tested with production-like data volume
将表Schema部署到生产环境前,请确认:
- 已定义主键(优先使用业务字段)
- 已包含必填字段(gmt_create、gmt_modified)
- 所有字段都添加了COMMENT属性,设置为NOT NULL并配置了DEFAULT值
- 关联字段的数据类型一致
- 分区键选择合理,且已包含在主键/唯一键中
- 如果表频繁关联,已创建对应表组
- 索引设计遵循最左前缀原则,关联列已创建索引
- 分区表优先使用LOCAL索引,无冗余索引
- Schema已通过接近生产数据量的测试
Quick reference
快速参考
Partition types: Hash (even distribution, point queries) | Range (time-based data) | List (discrete value sets) | Key (similar to Hash) | Composite (two partition types)
Table group SHARDING: NONE (no restriction) | PARTITION (first-level partitions must match) | ADAPTIVE (all first-level or all second-level must match)
Index types: LOCAL (default MySQL, better DML performance) | GLOBAL (default Oracle, cross-partition queries)
分区类型: Hash(分布均匀,适合点查询) | Range(适合时间类数据) | List(适合离散值集合) | Key(类似Hash) | 复合分区(组合两种分区类型)
表组SHARDING属性: NONE(无限制) | PARTITION(一级分区必须匹配) | ADAPTIVE(所有一级或所有二级分区必须匹配)
索引类型: LOCAL(MySQL默认,DML性能更优) | GLOBAL(Oracle默认,适合跨分区查询)