oceanbase-sql-optimization
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseOceanBase SQL Optimization Skill
OceanBase SQL优化指南
Expert in SQL query optimization for OceanBase distributed database (MySQL & Oracle modes).
Provides best practices for writing efficient SQL queries, designing effective indexes, analyzing execution plans, and tuning slow queries in OceanBase. This skill covers both MySQL mode and Oracle mode.
OceanBase分布式数据库(MySQL与Oracle模式)的SQL查询优化专家。
提供在OceanBase中编写高效SQL查询、设计有效索引、分析执行计划以及调优慢查询的最佳实践。本指南同时覆盖MySQL模式与Oracle模式。
Mode-Specific Syntax Guide
模式专属语法指南
While OceanBase uses the same underlying optimizer for both MySQL and Oracle modes, there are important syntax differences:
| Aspect | MySQL Mode | Oracle Mode |
|---|---|---|
| EXPLAIN Syntax | | |
| System Views | | |
| Data Types | | |
| Time Functions | | |
| Schema Access | | |
| Limit Syntax | | |
| String Concatenation | | |
Optimization Principles:
- ✅ Same: Execution plan operators, join algorithms, partition pruning logic
- ✅ Same: Index design principles, query rewriting strategies
- ❌ Different: Syntax, system view access, data type names, function names
Examples in this skill are marked with mode indicators: MySQL Mode: or Oracle Mode:
虽然OceanBase在MySQL与Oracle模式下使用相同的底层优化器,但二者存在重要的语法差异:
| 维度 | MySQL模式 | Oracle模式 |
|---|---|---|
| EXPLAIN语法 | | |
| 系统视图 | | |
| 数据类型 | | |
| 时间函数 | | |
| Schema访问 | | |
| 限制语法 | | |
| 字符串拼接 | | ` |
优化原则:
- ✅ 相同点:执行计划算子、连接算法、分区裁剪逻辑
- ✅ 相同点:索引设计原则、查询重写策略
- ❌ 不同点:语法、系统视图访问方式、数据类型名称、函数名称
本指南中的示例会标注模式类型:MySQL Mode: 或 Oracle Mode:
Core Optimization Principles
核心优化原则
1. Understand Execution Plans
1. 理解执行计划
Always analyze execution plans before optimization:
MySQL Mode:
sql
obclient [SALES_DB]> EXPLAIN SELECT * FROM order_table WHERE order_date >= '2024-01-01';Oracle Mode:
sql
obclient [SALES_DB]> EXPLAIN SELECT * FROM order_table WHERE order_date >= DATE '2024-01-01';查询结果如下:
+------------------------------------------------------------------------------------------------+
| Query Plan |
+------------------------------------------------------------------------------------------------+
| =========================================== |
| |ID|OPERATOR |NAME |EST. ROWS|COST | |
| ------------------------------------------- |
| |0 |TABLE SCAN |order_table|1000 |1000 | |
| |1 | TABLE GET |order_table|1000 |1000 | |
+=========================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([order_table.order_id], [order_table.customer_id], [order_table.order_date]), |
| filter([order_table.order_date >= '2024-01-01']), |
| access([order_table.order_id], [order_table.customer_id], [order_table.order_date]), |
| partitions(p0) |
+------------------------------------------------------------------------------------------------+Key indicators to check:
- EST. ROWS: Estimated rows processed (lower is better)
- EST.TIME(us): Estimated execution time in microseconds (lower is better)
- OPERATOR: Look for (full table scan, bad) vs
TABLE SCAN(index lookup, good)TABLE GET - Partitions: Check if partition pruning is working (e.g., means only one partition accessed)
partitions(p1) - is_index_back: means no index back,
falsemeans index back requiredtrue
优化前务必分析执行计划:
MySQL模式:
sql
obclient [SALES_DB]> EXPLAIN SELECT * FROM order_table WHERE order_date >= '2024-01-01';Oracle模式:
sql
obclient [SALES_DB]> EXPLAIN SELECT * FROM order_table WHERE order_date >= DATE '2024-01-01';查询结果如下:
+------------------------------------------------------------------------------------------------+
| Query Plan |
+------------------------------------------------------------------------------------------------+
| =========================================== |
| |ID|OPERATOR |NAME |EST. ROWS|COST | |
| ------------------------------------------- |
| |0 |TABLE SCAN |order_table|1000 |1000 | |
| |1 | TABLE GET |order_table|1000 |1000 | |
+=========================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([order_table.order_id], [order_table.customer_id], [order_table.order_date]), |
| filter([order_table.order_date >= '2024-01-01']), |
| access([order_table.order_id], [order_table.customer_id], [order_table.order_date]), |
| partitions(p0) |
+------------------------------------------------------------------------------------------------+需要检查的关键指标:
- EST. ROWS:预估处理行数(越少越好)
- EST.TIME(us):预估执行时间(微秒,越少越好)
- OPERATOR:注意(全表扫描,性能差)与
TABLE SCAN(索引查找,性能好)的区别TABLE GET - Partitions:检查分区裁剪是否生效(例如表示仅访问一个分区)
partitions(p1) - is_index_back:表示无需回表,
false表示需要回表true
2. Index Design Best Practices
2. 索引设计最佳实践
Create indexes on frequently queried columns (syntax is the same for both modes):
sql
-- ✅ GOOD: Index on WHERE clause column
obclient [SALES_DB]> CREATE INDEX idx_order_date ON order_table(order_date);
-- ✅ GOOD: Composite index for multi-column queries (column order matters)
obclient [SALES_DB]> CREATE INDEX idx_customer_date ON order_table(customer_id, order_date);
-- ✅ GOOD: Unique index for unique constraints
obclient [SALES_DB]> CREATE UNIQUE INDEX idx_order_number ON order_table(order_number);
-- ✅ GOOD: Index with STORING clause for covering index
obclient [SALES_DB]> CREATE INDEX idx_customer_date ON order_table(customer_id, order_date)
STORING (total_amount, status);
-- ❌ BAD: Index on rarely queried columns
obclient [SALES_DB]> CREATE INDEX idx_notes ON order_table(notes);OceanBase Index Types:
- UNIQUE Index: Ensures uniqueness, fast lookups
- LOCAL Index: Partition-local index (default for partitioned tables)
- GLOBAL Index: Cross-partition index
- Function Index: Index on expressions (e.g., )
CREATE INDEX idx_expr ON t1((c1 + c2))
Index selection rules:
- Index columns used in WHERE clauses
- Index columns used in JOIN conditions
- Consider composite indexes for multi-column filters (column order matters - most selective first)
- Use STORING clause to create covering indexes (avoids index back)
- Avoid over-indexing (each index adds write overhead)
- For partitioned tables, prefer LOCAL indexes unless cross-partition queries are common
为频繁查询的列创建索引(两种模式语法相同):
sql
-- ✅ 推荐:为WHERE子句中的列创建索引
obclient [SALES_DB]> CREATE INDEX idx_order_date ON order_table(order_date);
-- ✅ 推荐:为多列查询创建复合索引(列顺序很重要)
obclient [SALES_DB]> CREATE INDEX idx_customer_date ON order_table(customer_id, order_date);
-- ✅ 推荐:为唯一约束创建唯一索引
obclient [SALES_DB]> CREATE UNIQUE INDEX idx_order_number ON order_table(order_number);
-- ✅ 推荐:使用STORING子句创建覆盖索引
obclient [SALES_DB]> CREATE INDEX idx_customer_date ON order_table(customer_id, order_date)
STORING (total_amount, status);
-- ❌ 不推荐:为极少查询的列创建索引
obclient [SALES_DB]> CREATE INDEX idx_notes ON order_table(notes);OceanBase索引类型:
- UNIQUE Index:保证唯一性,查询速度快
- LOCAL Index:分区本地索引(分区表默认类型)
- GLOBAL Index:跨分区索引
- Function Index:基于表达式的索引(例如)
CREATE INDEX idx_expr ON t1((c1 + c2))
索引选择规则:
- 为WHERE子句中的列创建索引
- 为JOIN条件中的列创建索引
- 多列过滤查询考虑复合索引(列顺序优先选择选择性高的列)
- 使用STORING子句创建覆盖索引(避免回表)
- 避免过度索引(每个索引都会增加写入开销)
- 对于分区表,优先使用LOCAL索引,除非存在大量跨分区查询
3. Query Writing Best Practices
3. 查询编写最佳实践
Use specific column lists instead of SELECT *:
sql
-- ✅ GOOD: Select only needed columns
obclient [SALES_DB]> SELECT order_id, customer_id, total_amount
FROM order_table
WHERE order_date >= '2024-01-01';
-- ❌ BAD: Select all columns
obclient [SALES_DB]> SELECT * FROM order_table WHERE order_date >= '2024-01-01';Use appropriate WHERE conditions:
sql
-- ✅ GOOD: Use indexed column in WHERE
obclient [SALES_DB]> SELECT * FROM order_table WHERE order_id = 12345;
-- ✅ GOOD: Use range queries on indexed columns
obclient [SALES_DB]> SELECT * FROM order_table
WHERE order_date BETWEEN '2024-01-01' AND '2024-12-31';
-- ❌ BAD: Function on indexed column prevents index usage
obclient [SALES_DB]> SELECT * FROM order_table WHERE YEAR(order_date) = 2024;
-- ✅ GOOD: Rewrite to use index
obclient [SALES_DB]> SELECT * FROM order_table
WHERE order_date >= '2024-01-01' AND order_date < '2025-01-01';使用指定列列表而非SELECT *:
sql
-- ✅ 推荐:仅选择需要的列
obclient [SALES_DB]> SELECT order_id, customer_id, total_amount
FROM order_table
WHERE order_date >= '2024-01-01';
-- ❌ 不推荐:选择所有列
obclient [SALES_DB]> SELECT * FROM order_table WHERE order_date >= '2024-01-01';使用合适的WHERE条件:
sql
-- ✅ 推荐:WHERE子句使用索引列
obclient [SALES_DB]> SELECT * FROM order_table WHERE order_id = 12345;
-- ✅ 推荐:对索引列使用范围查询
obclient [SALES_DB]> SELECT * FROM order_table
WHERE order_date BETWEEN '2024-01-01' AND '2024-12-31';
-- ❌ 不推荐:对索引列使用函数会导致索引失效
obclient [SALES_DB]> SELECT * FROM order_table WHERE YEAR(order_date) = 2024;
-- ✅ 推荐:重写查询以使用索引
obclient [SALES_DB]> SELECT * FROM order_table
WHERE order_date >= '2024-01-01' AND order_date < '2025-01-01';Partition Pruning Optimization
分区裁剪优化
Partition pruning avoids accessing irrelevant partitions, significantly improving SQL execution efficiency. Always check the field in the execution plan to verify partition pruning is working.
partitions分区裁剪避免访问无关分区,可显著提升SQL执行效率。务必检查执行计划中的字段,确认分区裁剪是否生效。
partitionsHash Partitioning
哈希分区
Ensure partition key is in WHERE clause:
sql
-- ✅ GOOD: Partition key in WHERE clause enables partition pruning
obclient [SALES_DB]> CREATE TABLE order_table (
order_id INT,
customer_id INT,
order_date DATE
) PARTITION BY HASH(order_id) PARTITIONS 5;
obclient [SALES_DB]> SELECT * FROM order_table WHERE order_id = 12345;查询结果如下:
+------------------------------------------------------------------------------------------------+
| Query Plan |
+------------------------------------------------------------------------------------------------+
| ================================================= |
| |ID|OPERATOR |NAME |EST. ROWS|EST.TIME(us)| |
| ------------------------------------------------- |
| |0 |TABLE SCAN|order_table|990 |383 | |
| ================================================= |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([order_table.order_id], [order_table.customer_id], [order_table.order_date]), |
| filter(nil), |
| access([order_table.order_id], [order_table.customer_id], [order_table.order_date]), |
| partitions(p1) -- Only one partition accessed! |
+------------------------------------------------------------------------------------------------+sql
-- ❌ BAD: Missing partition key prevents pruning (scans all partitions)
obclient [SALES_DB]> SELECT * FROM order_table WHERE customer_id = 1001;
-- Check execution plan: partitions(p0, p1, p2, p3, p4) - all partitions scanned确保WHERE子句中包含分区键:
sql
-- ✅ 推荐:WHERE子句包含分区键,启用分区裁剪
obclient [SALES_DB]> CREATE TABLE order_table (
order_id INT,
customer_id INT,
order_date DATE
) PARTITION BY HASH(order_id) PARTITIONS 5;
obclient [SALES_DB]> SELECT * FROM order_table WHERE order_id = 12345;查询结果如下:
+------------------------------------------------------------------------------------------------+
| Query Plan |
+------------------------------------------------------------------------------------------------+
| ================================================= |
| |ID|OPERATOR |NAME |EST. ROWS|EST.TIME(us)| |
| ------------------------------------------------- |
| |0 |TABLE SCAN|order_table|990 |383 | |
| ================================================= |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([order_table.order_id], [order_table.customer_id], [order_table.order_date]), |
| filter(nil), |
| access([order_table.order_id], [order_table.customer_id], [order_table.order_date]), |
| partitions(p1) -- 仅访问一个分区! |
+------------------------------------------------------------------------------------------------+sql
-- ❌ 不推荐:缺少分区键导致无法裁剪(扫描所有分区)
obclient [SALES_DB]> SELECT * FROM order_table WHERE customer_id = 1001;
-- 查看执行计划:partitions(p0, p1, p2, p3, p4) - 所有分区被扫描Range Partitioning
范围分区
Use range conditions that match partition boundaries:
sql
-- ✅ GOOD: Range query matches partition boundaries
obclient [SALES_DB]> CREATE TABLE order_table (
order_id INT,
customer_id INT,
order_date DATE
) PARTITION BY RANGE(order_date) (
PARTITION p0 VALUES LESS THAN('2024-01-01'),
PARTITION p1 VALUES LESS THAN('2024-02-01'),
PARTITION p2 VALUES LESS THAN('2024-03-01')
);
obclient [SALES_DB]> SELECT * FROM order_table
WHERE order_date >= '2024-01-01' AND order_date < '2024-02-01';查询结果如下:
+------------------------------------------------------------------------------------------------+
| Query Plan |
+------------------------------------------------------------------------------------------------+
| ================================================= |
| |ID|OPERATOR |NAME |EST. ROWS|EST.TIME(us)| |
| ------------------------------------------------- |
| |0 |TABLE SCAN|order_table|1 |46 | |
| ================================================= |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([order_table.order_id], [order_table.customer_id], [order_table.order_date]), |
| filter([order_table.order_date >= '2024-01-01'], [order_table.order_date < '2024-02-01']),
| access([order_table.order_id], [order_table.customer_id], [order_table.order_date]), |
| partitions(p1) -- Only partition p1 accessed! |
+------------------------------------------------------------------------------------------------+sql
-- ❌ BAD: Function prevents partition pruning
obclient [SALES_DB]> SELECT * FROM order_table WHERE MONTH(order_date) = 1;
-- Check execution plan: partitions(p0, p1, p2) - all partitions scanned
-- ✅ GOOD: Rewrite to use partition pruning
obclient [SALES_DB]> SELECT * FROM order_table
WHERE order_date >= '2024-01-01' AND order_date < '2024-02-01';Partition Pruning Rules:
- For Hash/List partitions: Partition key must be in WHERE clause with equality or IN conditions
- For Range partitions: Use range conditions that match partition boundaries
- Avoid functions on partition keys in WHERE clauses
- For expressions as partition keys, the expression must appear as a whole in WHERE clause
使用与分区边界匹配的范围条件:
sql
-- ✅ 推荐:范围查询匹配分区边界
obclient [SALES_DB]> CREATE TABLE order_table (
order_id INT,
customer_id INT,
order_date DATE
) PARTITION BY RANGE(order_date) (
PARTITION p0 VALUES LESS THAN('2024-01-01'),
PARTITION p1 VALUES LESS THAN('2024-02-01'),
PARTITION p2 VALUES LESS THAN('2024-03-01')
);
obclient [SALES_DB]> SELECT * FROM order_table
WHERE order_date >= '2024-01-01' AND order_date < '2024-02-01';查询结果如下:
+------------------------------------------------------------------------------------------------+
| Query Plan |
+------------------------------------------------------------------------------------------------+
| ================================================= |
| |ID|OPERATOR |NAME |EST. ROWS|EST.TIME(us)| |
| ------------------------------------------------- |
| |0 |TABLE SCAN|order_table|1 |46 | |
| ================================================= |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([order_table.order_id], [order_table.customer_id], [order_table.order_date]), |
| filter([order_table.order_date >= '2024-01-01'], [order_table.order_date < '2024-02-01']),
| access([order_table.order_id], [order_table.customer_id], [order_table.order_date]), |
| partitions(p1) -- 仅访问分区p1! |
+------------------------------------------------------------------------------------------------+sql
-- ❌ 不推荐:使用函数导致分区裁剪失效
obclient [SALES_DB]> SELECT * FROM order_table WHERE MONTH(order_date) = 1;
-- 查看执行计划:partitions(p0, p1, p2) - 所有分区被扫描
-- ✅ 推荐:重写查询以启用分区裁剪
obclient [SALES_DB]> SELECT * FROM order_table
WHERE order_date >= '2024-01-01' AND order_date < '2024-02-01';分区裁剪规则:
- 哈希/列表分区:WHERE子句中必须包含分区键,且使用等值或IN条件
- 范围分区:使用与分区边界匹配的范围条件
- 避免在WHERE子句中对分区键使用函数
- 若分区键为表达式,WHERE子句中必须完整使用该表达式
Join Optimization
连接优化
OceanBase supports three join algorithms: Nested Loop Join (NLJ), Hash Join (HJ), and Merge Join (MJ). The optimizer automatically selects the best algorithm based on statistics and cost estimation.
OceanBase支持三种连接算法:Nested Loop Join (NLJ)、Hash Join (HJ) 和 Merge Join (MJ)。优化器会根据统计信息和成本估算自动选择最佳算法。
Join Algorithms
连接算法
1. Nested Loop Join (NLJ)
- Best for: Small outer table, indexed inner table
- Use when: Join condition has index on inner table
- Hint:
/*+ USE_NL(table1, table2) */
sql
-- ✅ GOOD: NLJ with indexed join column
obclient [SALES_DB]> CREATE INDEX idx_customer_id ON order_table(customer_id);
obclient [SALES_DB]> EXPLAIN SELECT /*+USE_NL(c, o)*/ o.order_id, c.customer_name
FROM customer_table c
INNER JOIN order_table o ON c.customer_id = o.customer_id
WHERE c.customer_type = 'VIP';查询结果如下:
+------------------------------------------------------------------------------------------------+
| Query Plan |
+------------------------------------------------------------------------------------------------+
| =========================================== |
| |ID|OPERATOR |NAME |EST. ROWS|EST.TIME(us)| |
| ------------------------------------------- |
| |0 |NESTED-LOOP JOIN| |990 |37346 | |
| |1 | TABLE SCAN |c |999 |669 | |
| |2 | TABLE GET |o |1 |36 | -- Index lookup! |
| =========================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([o.order_id], [c.customer_name]), filter(nil), |
| conds(nil), nl_params_([c.customer_id]) |
| 1 - output([c.customer_id], [c.customer_name]), filter([c.customer_type = 'VIP']), |
| access([c.customer_id], [c.customer_name]), partitions(p0) |
| 2 - output([o.order_id]), filter(nil), |
| access([o.order_id]), partitions(p0), |
| range_key([o.customer_id]), range([? = o.customer_id]) -- Index used! |
+------------------------------------------------------------------------------------------------+2. Hash Join (HJ)
- Best for: Large datasets, equal join conditions
- Use when: Both tables are large and join condition is equality
- Hint:
/*+ USE_HASH(table1, table2) */
sql
-- ✅ GOOD: Hash Join for large table joins
obclient [SALES_DB]> EXPLAIN SELECT /*+USE_HASH(o, c)*/ o.order_id, c.customer_name
FROM order_table o
INNER JOIN customer_table c ON o.customer_id = c.customer_id;查询结果如下:
+------------------------------------------------------------------------------------------------+
| Query Plan |
+------------------------------------------------------------------------------------------------+
| ======================================= |
| |ID|OPERATOR |NAME|EST. ROWS|EST.TIME(us)| |
| --------------------------------------- |
| |0 |HASH JOIN | |98010000 |66774608 | |
| |1 | TABLE SCAN|o |100000 |68478 | |
| |2 | TABLE SCAN|c |100000 |68478 | |
| ======================================= |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([o.order_id], [c.customer_name]), filter(nil), |
| equal_conds([o.customer_id = c.customer_id]), other_conds(nil) |
+------------------------------------------------------------------------------------------------+3. Merge Join (MJ)
- Best for: Pre-sorted data, ordered results needed
- Use when: Both inputs are already sorted or can be sorted efficiently
- Hint:
/*+ USE_MERGE(table1, table2) */
sql
-- ✅ GOOD: Merge Join when data is sorted
obclient [SALES_DB]> EXPLAIN SELECT /*+USE_MERGE(o, c)*/ o.order_id, c.customer_name
FROM order_table o
INNER JOIN customer_table c ON o.customer_id = c.customer_id
ORDER BY o.customer_id;1. Nested Loop Join (NLJ)
- 适用场景:外表数据量小,内表有索引
- 使用时机:连接条件中的内表列有索引
- 提示:
/*+ USE_NL(table1, table2) */
sql
-- ✅ 推荐:NLJ搭配索引连接列
obclient [SALES_DB]> CREATE INDEX idx_customer_id ON order_table(customer_id);
obclient [SALES_DB]> EXPLAIN SELECT /*+USE_NL(c, o)*/ o.order_id, c.customer_name
FROM customer_table c
INNER JOIN order_table o ON c.customer_id = o.customer_id
WHERE c.customer_type = 'VIP';查询结果如下:
+------------------------------------------------------------------------------------------------+
| Query Plan |
+------------------------------------------------------------------------------------------------+
| =========================================== |
| |ID|OPERATOR |NAME |EST. ROWS|EST.TIME(us)| |
| ------------------------------------------- |
| |0 |NESTED-LOOP JOIN| |990 |37346 | |
| |1 | TABLE SCAN |c |999 |669 | |
| |2 | TABLE GET |o |1 |36 | -- 索引查找! |
| =========================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([o.order_id], [c.customer_name]), filter(nil), |
| conds(nil), nl_params_([c.customer_id]) |
| 1 - output([c.customer_id], [c.customer_name]), filter([c.customer_type = 'VIP']), |
| access([c.customer_id], [c.customer_name]), partitions(p0) |
| 2 - output([o.order_id]), filter(nil), |
| access([o.order_id]), partitions(p0), |
| range_key([o.customer_id]), range([? = o.customer_id]) -- 使用索引! |
+------------------------------------------------------------------------------------------------+2. Hash Join (HJ)
- 适用场景:大数据集,等值连接条件
- 使用时机:两张表数据量都较大,且连接条件为等值
- 提示:
/*+ USE_HASH(table1, table2) */
sql
-- ✅ 推荐:Hash Join用于大表连接
obclient [SALES_DB]> EXPLAIN SELECT /*+USE_HASH(o, c)*/ o.order_id, c.customer_name
FROM order_table o
INNER JOIN customer_table c ON o.customer_id = c.customer_id;查询结果如下:
+------------------------------------------------------------------------------------------------+
| Query Plan |
+------------------------------------------------------------------------------------------------+
| ======================================= |
| |ID|OPERATOR |NAME|EST. ROWS|EST.TIME(us)| |
| --------------------------------------- |
| |0 |HASH JOIN | |98010000 |66774608 | |
| |1 | TABLE SCAN|o |100000 |68478 | |
| |2 | TABLE SCAN|c |100000 |68478 | |
| ======================================= |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([o.order_id], [c.customer_name]), filter(nil), |
| equal_conds([o.customer_id = c.customer_id]), other_conds(nil) |
+------------------------------------------------------------------------------------------------+3. Merge Join (MJ)
- 适用场景:数据已排序,需要有序结果
- 使用时机:两个输入数据集已排序或可高效排序
- 提示:
/*+ USE_MERGE(table1, table2) */
sql
-- ✅ 推荐:数据已排序时使用Merge Join
obclient [SALES_DB]> EXPLAIN SELECT /*+USE_MERGE(o, c)*/ o.order_id, c.customer_name
FROM order_table o
INNER JOIN customer_table c ON o.customer_id = c.customer_id
ORDER BY o.customer_id;Join Order
连接顺序
Use LEADING hint to control join order:
sql
-- ✅ GOOD: Control join order explicitly
obclient [SALES_DB]> SELECT /*+LEADING(c, o)*/ o.order_id, c.customer_name
FROM customer_table c
INNER JOIN order_table o ON c.customer_id = o.customer_id
WHERE c.customer_type = 'VIP'
AND o.order_date >= '2024-01-01';
-- ❌ BAD: Let optimizer choose suboptimal order
obclient [SALES_DB]> SELECT o.order_id, c.customer_name
FROM order_table o
INNER JOIN customer_table c ON o.customer_id = c.customer_id
WHERE c.customer_type = 'VIP';使用LEADING提示控制连接顺序:
sql
-- ✅ 推荐:显式控制连接顺序
obclient [SALES_DB]> SELECT /*+LEADING(c, o)*/ o.order_id, c.customer_name
FROM customer_table c
INNER JOIN order_table o ON c.customer_id = o.customer_id
WHERE c.customer_type = 'VIP'
AND o.order_date >= '2024-01-01';
-- ❌ 不推荐:让优化器选择非最优顺序
obclient [SALES_DB]> SELECT o.order_id, c.customer_name
FROM order_table o
INNER JOIN customer_table c ON o.customer_id = c.customer_id
WHERE c.customer_type = 'VIP';Index on Join Columns
连接列索引
Always index foreign key columns:
sql
-- ✅ GOOD: Index on join column enables efficient joins
obclient [SALES_DB]> CREATE INDEX idx_customer_id ON order_table(customer_id);
-- Then join queries can use index (NLJ with TABLE GET)
obclient [SALES_DB]> SELECT o.*, c.customer_name
FROM order_table o
INNER JOIN customer_table c ON o.customer_id = c.customer_id;务必为外键列创建索引:
sql
-- ✅ 推荐:为连接列创建索引,实现高效连接
obclient [SALES_DB]> CREATE INDEX idx_customer_id ON order_table(customer_id);
-- 之后的连接查询可使用索引(NLJ搭配TABLE GET)
obclient [SALES_DB]> SELECT o.*, c.customer_name
FROM order_table o
INNER JOIN customer_table c ON o.customer_id = c.customer_id;Aggregation Optimization
聚合优化
Use Appropriate Aggregation Functions
使用合适的聚合函数
sql
-- ✅ GOOD: COUNT(*) is optimized
obclient [SALES_DB]> SELECT COUNT(*) FROM order_table WHERE order_date >= '2024-01-01';
-- ✅ GOOD: COUNT(column) when you need non-NULL count
obclient [SALES_DB]> SELECT COUNT(customer_id) FROM order_table;
-- ❌ BAD: COUNT(DISTINCT) on large datasets can be slow
obclient [SALES_DB]> SELECT COUNT(DISTINCT customer_id) FROM order_table;
-- ✅ GOOD: Use GROUP BY with LIMIT (MySQL) or FETCH FIRST (Oracle) for top N
-- MySQL Mode:
obclient [SALES_DB]> SELECT customer_id, SUM(total_amount) AS total
FROM order_table
GROUP BY customer_id
ORDER BY total DESC
LIMIT 10;
-- Oracle Mode:
obclient [SALES_DB]> SELECT customer_id, SUM(total_amount) AS total
FROM order_table
GROUP BY customer_id
ORDER BY total DESC
FETCH FIRST 10 ROWS ONLY;sql
-- ✅ 推荐:COUNT(*)已优化
obclient [SALES_DB]> SELECT COUNT(*) FROM order_table WHERE order_date >= '2024-01-01';
-- ✅ 推荐:需要统计非NULL值时使用COUNT(column)
obclient [SALES_DB]> SELECT COUNT(customer_id) FROM order_table;
-- ❌ 不推荐:大数据集上使用COUNT(DISTINCT)会较慢
obclient [SALES_DB]> SELECT COUNT(DISTINCT customer_id) FROM order_table;
-- ✅ 推荐:结合GROUP BY与LIMIT(MySQL)或FETCH FIRST(Oracle)获取Top N
-- MySQL模式:
obclient [SALES_DB]> SELECT customer_id, SUM(total_amount) AS total
FROM order_table
GROUP BY customer_id
ORDER BY total DESC
LIMIT 10;
-- Oracle模式:
obclient [SALES_DB]> SELECT customer_id, SUM(total_amount) AS total
FROM order_table
GROUP BY customer_id
ORDER BY total DESC
FETCH FIRST 10 ROWS ONLY;Subquery Optimization
子查询优化
Convert Correlated Subqueries to JOINs
将关联子查询转换为JOIN
sql
-- ❌ BAD: Correlated subquery (executes for each row)
obclient [SALES_DB]> SELECT o.order_id, o.total_amount
FROM order_table o
WHERE o.total_amount > (
SELECT AVG(total_amount)
FROM order_table
WHERE customer_id = o.customer_id
);
-- ✅ GOOD: Convert to JOIN
obclient [SALES_DB]> SELECT o.order_id, o.total_amount
FROM order_table o
INNER JOIN (
SELECT customer_id, AVG(total_amount) as avg_amount
FROM order_table
GROUP BY customer_id
) avg_orders ON o.customer_id = avg_orders.customer_id
WHERE o.total_amount > avg_orders.avg_amount;sql
-- ❌ 不推荐:关联子查询(逐行执行)
obclient [SALES_DB]> SELECT o.order_id, o.total_amount
FROM order_table o
WHERE o.total_amount > (
SELECT AVG(total_amount)
FROM order_table
WHERE customer_id = o.customer_id
);
-- ✅ 推荐:转换为JOIN
obclient [SALES_DB]> SELECT o.order_id, o.total_amount
FROM order_table o
INNER JOIN (
SELECT customer_id, AVG(total_amount) as avg_amount
FROM order_table
GROUP BY customer_id
) avg_orders ON o.customer_id = avg_orders.customer_id
WHERE o.total_amount > avg_orders.avg_amount;Use EXISTS instead of IN for large datasets
大数据集下使用EXISTS替代IN
Syntax is the same for both modes:
sql
-- ✅ GOOD: EXISTS stops at first match
obclient [SALES_DB]> SELECT * FROM order_table o
WHERE EXISTS (
SELECT 1 FROM customer_table c
WHERE c.customer_id = o.customer_id
AND c.customer_type = 'VIP'
);
-- ⚠️ OK: IN works but may be slower for large lists
obclient [SALES_DB]> SELECT * FROM order_table
WHERE customer_id IN (SELECT customer_id FROM customer_table WHERE customer_type = 'VIP');两种模式语法相同:
sql
-- ✅ 推荐:EXISTS找到第一个匹配项即停止
obclient [SALES_DB]> SELECT * FROM order_table o
WHERE EXISTS (
SELECT 1 FROM customer_table c
WHERE c.customer_id = o.customer_id
AND c.customer_type = 'VIP'
);
-- ⚠️ 可用:IN可正常工作,但大数据集下可能较慢
obclient [SALES_DB]> SELECT * FROM order_table
WHERE customer_id IN (SELECT customer_id FROM customer_table WHERE customer_type = 'VIP');Slow Query Tuning
慢查询调优
Enable SQL Audit
启用SQL审计
Configure SQL Audit settings:
sql
-- Enable SQL Audit
obclient> ALTER SYSTEM SET enable_sql_audit = true;
-- Set SQL Audit memory percentage (default: 3%, range: [0,80])
obclient> SET GLOBAL ob_sql_audit_percentage = 3;配置SQL审计设置:
sql
-- 启用SQL审计
obclient> ALTER SYSTEM SET enable_sql_audit = true;
-- 设置SQL审计内存占比(默认:3%,范围:[0,80])
obclient> SET GLOBAL ob_sql_audit_percentage = 3;Identify Slow Queries
识别慢查询
Query TOP SQL by CPU usage:
MySQL Mode: Use with
Oracle Mode: Use with
oceanbase.GV$OB_SQL_AUDITTIME_TO_USEC(NOW())Oracle Mode: Use
GV$OB_SQL_AUDITSYSDATE - INTERVAL '30' MINUTEsql
-- Find TOP SQL by CPU usage in last 30 minutes
obclient> SELECT sql_id, COUNT(*) AS executions, SUM(execute_time) AS tot_cpu_time,
AVG(execute_time) AS avg_cpu_time,
SUM(execute_time)/(30*60*1000*1000) AS cpu_cnt, query_sql
FROM oceanbase.GV$OB_SQL_AUDIT -- MySQL: add 'oceanbase.' prefix
WHERE tenant_id = 'mysql001' -- MySQL: string, Oracle: number
AND request_time BETWEEN (TIME_TO_USEC(NOW())-30*60*1000*1000) AND TIME_TO_USEC(NOW()) -- MySQL
-- Oracle: BETWEEN (SYSDATE - INTERVAL '30' MINUTE) AND SYSDATE
AND is_executor_rpc = 0
GROUP BY sql_id
HAVING COUNT(*) > 1
ORDER BY cpu_cnt DESC
LIMIT 10; -- MySQL: LIMIT, Oracle: FETCH FIRST 10 ROWS ONLYQuery slow queries by execution time:
MySQL Mode: Use function
Oracle Mode: Use function
usec_to_time()Oracle Mode: Use
TO_CHAR()sql
-- Find queries with execution time > 100ms
obclient> SELECT request_id,
usec_to_time(request_time) AS request_time, -- MySQL
-- TO_CHAR(request_time, 'YYYY-MM-DD HH24:MI:SS.FF') AS request_time, -- Oracle
elapsed_time, queue_time, execute_time, flt_trace_id, query_sql
FROM oceanbase.v$ob_sql_audit -- MySQL: add 'oceanbase.' prefix
WHERE elapsed_time > 100000 -- > 100ms in microseconds
ORDER BY elapsed_time DESC
LIMIT 10; -- MySQL: LIMIT, Oracle: FETCH FIRST 10 ROWS ONLY按CPU使用率查询TOP SQL:
MySQL模式: 使用搭配
Oracle模式: 使用搭配
oceanbase.GV$OB_SQL_AUDITTIME_TO_USEC(NOW())Oracle模式: 使用
GV$OB_SQL_AUDITSYSDATE - INTERVAL '30' MINUTEsql
-- 查询过去30分钟内CPU使用率最高的TOP SQL
obclient> SELECT sql_id, COUNT(*) AS executions, SUM(execute_time) AS tot_cpu_time,
AVG(execute_time) AS avg_cpu_time,
SUM(execute_time)/(30*60*1000*1000) AS cpu_cnt, query_sql
FROM oceanbase.GV$OB_SQL_AUDIT -- MySQL: 添加'oceanbase.'前缀
WHERE tenant_id = 'mysql001' -- MySQL: 字符串类型,Oracle: 数字类型
AND request_time BETWEEN (TIME_TO_USEC(NOW())-30*60*1000*1000) AND TIME_TO_USEC(NOW()) -- MySQL
-- Oracle: BETWEEN (SYSDATE - INTERVAL '30' MINUTE) AND SYSDATE
AND is_executor_rpc = 0
GROUP BY sql_id
HAVING COUNT(*) > 1
ORDER BY cpu_cnt DESC
LIMIT 10; -- MySQL: 使用LIMIT,Oracle: 使用FETCH FIRST 10 ROWS ONLY按执行时间查询慢查询:
MySQL模式: 使用函数
Oracle模式: 使用函数
usec_to_time()Oracle模式: 使用
TO_CHAR()sql
-- 查询执行时间>100ms的查询
obclient> SELECT request_id,
usec_to_time(request_time) AS request_time, -- MySQL
-- TO_CHAR(request_time, 'YYYY-MM-DD HH24:MI:SS.FF') AS request_time, -- Oracle
elapsed_time, queue_time, execute_time, flt_trace_id, query_sql
FROM oceanbase.v$ob_sql_audit -- MySQL: 添加'oceanbase.'前缀
WHERE elapsed_time > 100000 -- > 100ms(微秒)
ORDER BY elapsed_time DESC
LIMIT 10; -- MySQL: 使用LIMIT,Oracle: 使用FETCH FIRST 10 ROWS ONLYAnalyze Query Performance
分析查询性能
Get detailed execution statistics:
MySQL Mode: Use
Oracle Mode: Use (no schema prefix)
oceanbase.gv$ob_sql_auditOracle Mode: Use
GV$OB_SQL_AUDITsql
-- Get detailed statistics for a specific SQL
obclient> SELECT sql_id, query_sql, executions, elapsed_time, execute_time,
queue_time, return_rows, affected_rows, partition_cnt,
table_scan, is_hit_plan, plan_id
FROM oceanbase.gv$ob_sql_audit -- MySQL: add 'oceanbase.' prefix
WHERE sql_id = 'your_sql_id'
ORDER BY elapsed_time DESC;Get execution plan for slow query:
MySQL Mode: Use
Oracle Mode: Use (no schema prefix)
oceanbase.GV$OB_PLAN_CACHE_PLAN_STATOracle Mode: Use
GV$OB_PLAN_CACHE_PLAN_STATsql
-- Get execution plan from plan cache
obclient> SELECT tenant_id, svr_ip, svr_port, sql_id, plan_id,
last_active_time, first_load_time, outline_data
FROM oceanbase.GV$OB_PLAN_CACHE_PLAN_STAT -- MySQL: add 'oceanbase.' prefix
WHERE tenant_id = 1002 AND sql_id = 'your_sql_id'
AND svr_ip = 'xxx.xxx.xxx.xxx' AND svr_port = 35046;
-- Get plan details
obclient> SELECT operator, name, rows, cost
FROM oceanbase.GV$OB_PLAN_CACHE_PLAN_EXPLAIN -- MySQL: add 'oceanbase.' prefix
WHERE tenant_id = 1002 AND plan_id = 741
AND svr_ip = 'xxx.xxx.xxx.xxx' AND svr_port = 35046;获取详细执行统计信息:
MySQL模式: 使用
Oracle模式: 使用(无需schema前缀)
oceanbase.gv$ob_sql_auditOracle模式: 使用
GV$OB_SQL_AUDITsql
-- 获取特定SQL的详细统计信息
obclient> SELECT sql_id, query_sql, executions, elapsed_time, execute_time,
queue_time, return_rows, affected_rows, partition_cnt,
table_scan, is_hit_plan, plan_id
FROM oceanbase.gv$ob_sql_audit -- MySQL: 添加'oceanbase.'前缀
WHERE sql_id = 'your_sql_id'
ORDER BY elapsed_time DESC;获取慢查询的执行计划:
MySQL模式: 使用
Oracle模式: 使用(无需schema前缀)
oceanbase.GV$OB_PLAN_CACHE_PLAN_STATOracle模式: 使用
GV$OB_PLAN_CACHE_PLAN_STATsql
-- 从计划缓存中获取执行计划
obclient> SELECT tenant_id, svr_ip, svr_port, sql_id, plan_id,
last_active_time, first_load_time, outline_data
FROM oceanbase.GV$OB_PLAN_CACHE_PLAN_STAT -- MySQL: 添加'oceanbase.'前缀
WHERE tenant_id = 1002 AND sql_id = 'your_sql_id'
AND svr_ip = 'xxx.xxx.xxx.xxx' AND svr_port = 35046;
-- 获取计划详情
obclient> SELECT operator, name, rows, cost
FROM oceanbase.GV$OB_PLAN_CACHE_PLAN_EXPLAIN -- MySQL: 添加'oceanbase.'前缀
WHERE tenant_id = 1002 AND plan_id = 741
AND svr_ip = 'xxx.xxx.xxx.xxx' AND svr_port = 35046;Find Currently Running Slow Queries
查找当前运行的慢查询
MySQL Mode:
sql
obclient> SELECT user, tenant, sql_id, concat(time, 's') AS time, info,
svr_ip, svr_port, trace_id
FROM oceanbase.GV$OB_PROCESSLIST
WHERE state = 'ACTIVE' ORDER BY time DESC LIMIT 10;Oracle Mode:
sql
obclient> SELECT user_name, tenant_name, sql_id, time || 's' AS time, info,
svr_ip, svr_port, trace_id
FROM GV$OB_PROCESSLIST
WHERE state = 'ACTIVE' ORDER BY time DESC FETCH FIRST 10 ROWS ONLY;MySQL模式:
sql
obclient> SELECT user, tenant, sql_id, concat(time, 's') AS time, info,
svr_ip, svr_port, trace_id
FROM oceanbase.GV$OB_PROCESSLIST
WHERE state = 'ACTIVE' ORDER BY time DESC LIMIT 10;Oracle模式:
sql
obclient> SELECT user_name, tenant_name, sql_id, time || 's' AS time, info,
svr_ip, svr_port, trace_id
FROM GV$OB_PROCESSLIST
WHERE state = 'ACTIVE' ORDER BY time DESC FETCH FIRST 10 ROWS ONLY;Real-World Optimization Case
实际优化案例
Case: Type Conversion Preventing Index Usage
案例:类型转换导致索引失效
Problem: SQL execution time was 2 seconds, CPU usage exceeded 70% of server resources.
Root Cause Analysis:
-
Identify TOP SQL:sql
obclient> SELECT sql_id, COUNT(*) AS executions, AVG(execute_time) AS avg_cpu_time FROM oceanbase.GV$OB_SQL_AUDIT WHERE tenant_id = 'mysql001' AND request_time BETWEEN (TIME_TO_USEC(NOW())-30*60*1000*1000) AND TIME_TO_USEC(NOW()) GROUP BY sql_id ORDER BY SUM(execute_time) DESC LIMIT 1; -
Analyze Execution Plan:sql
obclient> EXPLAIN SELECT ... FROM v_tt01 WHERE COL001 IN (20017476);Execution plan showed:- High cost on index scan
TBL3(IDX_TBL3_COL170) - - index not used for matching
range(MIN,MIN ; MAX,MAX)always true - - type conversion happening
filter([20017476 = cast(cast(TBL3.COL170, VARCHAR2(20 BYTE)), NUMBER)])
- High cost on
-
Root Cause:
- View had
V_TT01conversion onTO_NUMBERCOL001 - Table was
TBL3.COL170but compared withVARCHAR2(20)NUMBER - Type conversion prevented index usage
- View
Optimization Steps:
-
Remove unnecessary TO_NUMBER conversion:sql
-- Before: View had TO_NUMBER conversion CREATE VIEW V_TT01_OLD AS SELECT To_number("tt01"."col001") AS "COL001", ... -- After: Remove TO_NUMBER (COL001 is already NUMBER) CREATE VIEW V_TT01 AS SELECT "tt01"."col001" AS "COL001", ...Result: SQL RT reduced from 2s to 150ms -
Unify data types:sql
-- Before: TBL3.COL170 was VARCHAR2(20) CREATE TABLE TBL3 ( COL170 VARCHAR2(20) NOT NULL, ... ); -- After: Change to NUMBER(20) to match join condition ALTER TABLE TBL3 MODIFY COL170 NUMBER(20) NOT NULL;Result: SQL RT further reduced to 20ms -
Partition and table group optimization:sql
-- Add hash partitioning on join keys ALTER TABLE TBL1 PARTITION BY HASH(COL001) PARTITIONS 8; ALTER TABLE TBL2 PARTITION BY HASH(COL004) PARTITIONS 8; ALTER TABLE TBL3 PARTITION BY HASH(COL170) PARTITIONS 8; ALTER TABLE TT01 PARTITION BY HASH(COL001) PARTITIONS 8; -- Create table group to avoid cross-partition joins CREATE TABLEGROUP order_tg PARTITION BY HASH(COL001) PARTITIONS 8; ALTER TABLE TBL1 SET TABLEGROUP order_tg; ALTER TABLE TBL2 SET TABLEGROUP order_tg; ALTER TABLE TBL3 SET TABLEGROUP order_tg; ALTER TABLE TT01 SET TABLEGROUP order_tg;Result: SQL RT further reduced to 4ms
Final Execution Plan: Shows (partition pruning working) and with index (index used efficiently)
PX PARTITION ITERATORTABLE SCANINDEX_TBL3_COL170Key Learnings:
- Always check execution plan for type conversions in filters
- Ensure join column data types match
- Use partition pruning and table groups for distributed queries
- Remove unnecessary type conversions in views
问题: SQL执行时间为2秒,CPU使用率超过服务器资源的70%。
根因分析:
-
识别TOP SQL:sql
obclient> SELECT sql_id, COUNT(*) AS executions, AVG(execute_time) AS avg_cpu_time FROM oceanbase.GV$OB_SQL_AUDIT WHERE tenant_id = 'mysql001' AND request_time BETWEEN (TIME_TO_USEC(NOW())-30*60*1000*1000) AND TIME_TO_USEC(NOW()) GROUP BY sql_id ORDER BY SUM(execute_time) DESC LIMIT 1; -
分析执行计划:sql
obclient> EXPLAIN SELECT ... FROM v_tt01 WHERE COL001 IN (20017476);执行计划显示:- 索引扫描成本高
TBL3(IDX_TBL3_COL170) - - 索引未用于匹配
range(MIN,MIN ; MAX,MAX)always true - - 发生类型转换
filter([20017476 = cast(cast(TBL3.COL170, VARCHAR2(20 BYTE)), NUMBER)])
-
根本原因:
- 视图对
V_TT01使用了COL001转换TO_NUMBER - 表为
TBL3.COL170类型,但与VARCHAR2(20)类型进行比较NUMBER - 类型转换导致索引失效
- 视图
优化步骤:
-
移除不必要的TO_NUMBER转换:sql
-- 优化前:视图包含TO_NUMBER转换 CREATE VIEW V_TT01_OLD AS SELECT To_number("tt01"."col001") AS "COL001", ... -- 优化后:移除TO_NUMBER(COL001本身已是NUMBER类型) CREATE VIEW V_TT01 AS SELECT "tt01"."col001" AS "COL001", ...结果:SQL响应时间从2秒降至150毫秒 -
统一数据类型:sql
-- 优化前:TBL3.COL170为VARCHAR2(20) CREATE TABLE TBL3 ( COL170 VARCHAR2(20) NOT NULL, ... ); -- 优化后:修改为NUMBER(20),匹配连接条件 ALTER TABLE TBL3 MODIFY COL170 NUMBER(20) NOT NULL;结果:SQL响应时间进一步降至20毫秒 -
分区与表组优化:sql
-- 为连接键添加哈希分区 ALTER TABLE TBL1 PARTITION BY HASH(COL001) PARTITIONS 8; ALTER TABLE TBL2 PARTITION BY HASH(COL004) PARTITIONS 8; ALTER TABLE TBL3 PARTITION BY HASH(COL170) PARTITIONS 8; ALTER TABLE TT01 PARTITION BY HASH(COL001) PARTITIONS 8; -- 创建表组,避免跨分区连接 CREATE TABLEGROUP order_tg PARTITION BY HASH(COL001) PARTITIONS 8; ALTER TABLE TBL1 SET TABLEGROUP order_tg; ALTER TABLE TBL2 SET TABLEGROUP order_tg; ALTER TABLE TBL3 SET TABLEGROUP order_tg; ALTER TABLE TT01 SET TABLEGROUP order_tg;结果:SQL响应时间进一步降至4毫秒
最终执行计划: 显示(分区裁剪生效)和使用索引的(索引高效利用)
PX PARTITION ITERATORINDEX_TBL3_COL170TABLE SCAN关键经验:
- 务必检查执行计划中的过滤条件是否存在类型转换
- 确保连接列的数据类型匹配
- 对分布式查询使用分区裁剪和表组
- 移除视图中不必要的类型转换
Common Anti-patterns
常见反模式
❌ Avoid These Patterns
❌ 避免以下模式
1. N+1 Query Problem:
sql
-- ❌ BAD: Multiple queries in loop
-- Instead, use JOIN or batch queries2. SELECT * in Production:
sql
-- ❌ BAD: SELECT * returns unnecessary data
SELECT * FROM order_table;
-- ✅ GOOD: Select only needed columns
SELECT order_id, customer_id, total_amount FROM order_table;3. Functions on Indexed Columns:
sql
-- ❌ BAD: Prevents index usage
WHERE UPPER(customer_name) = 'JOHN'
-- ✅ GOOD: Store normalized data or use function-based index
WHERE customer_name = 'JOHN'4. Implicit Type Conversions:
sql
-- ❌ BAD: String to number conversion
WHERE order_id = '12345'
-- ✅ GOOD: Match data types
WHERE order_id = 123451. N+1查询问题:
sql
-- ❌ 不推荐:循环中执行多个查询
-- 替代方案:使用JOIN或批量查询*2. 生产环境中使用SELECT :
sql
-- ❌ 不推荐:SELECT *返回不必要的数据
SELECT * FROM order_table;
-- ✅ 推荐:仅选择需要的列
SELECT order_id, customer_id, total_amount FROM order_table;3. 对索引列使用函数:
sql
-- ❌ 不推荐:导致索引失效
WHERE UPPER(customer_name) = 'JOHN'
-- ✅ 推荐:存储标准化数据或使用函数索引
WHERE customer_name = 'JOHN'4. 隐式类型转换:
sql
-- ❌ 不推荐:字符串转数字
WHERE order_id = '12345'
-- ✅ 推荐:匹配数据类型
WHERE order_id = 12345Performance Monitoring
性能监控
Key Metrics to Monitor
需要监控的关键指标
sql
-- Check table statistics
obclient [SALES_DB]> ANALYZE TABLE order_table;
-- View table size and row count
obclient [SALES_DB]> SELECT
table_name,
table_rows,
data_length,
index_length
FROM information_schema.tables
WHERE table_schema = 'SALES_DB';
-- Check index usage
obclient [SALES_DB]> SHOW INDEX FROM order_table;sql
-- 检查表统计信息
obclient [SALES_DB]> ANALYZE TABLE order_table;
-- 查看表大小和行数
obclient [SALES_DB]> SELECT
table_name,
table_rows,
data_length,
index_length
FROM information_schema.tables
WHERE table_schema = 'SALES_DB';
-- 检查索引使用情况
obclient [SALES_DB]> SHOW INDEX FROM order_table;Optimization Checklist
优化检查清单
Before deploying SQL queries to production:
- Execution plan reviewed (EXPLAIN)
- Appropriate indexes created
- Partition pruning enabled (if partitioned)
- SELECT columns limited to needed fields
- WHERE conditions use indexed columns
- JOIN conditions indexed
- Subqueries optimized (JOINs preferred)
- Aggregations use efficient functions
- No functions on indexed columns in WHERE
- Data types match in comparisons
- Query tested with production-like data volume
将SQL查询部署到生产环境前,请确认:
- 已查看执行计划(EXPLAIN)
- 已创建合适的索引
- 已启用分区裁剪(若使用分区表)
- SELECT语句仅包含需要的列
- WHERE条件使用索引列
- 连接条件已创建索引
- 子查询已优化(优先使用JOIN)
- 聚合操作使用高效函数
- WHERE子句中未对索引列使用函数
- 比较条件中数据类型匹配
- 已使用生产级数据量测试查询
Quick Reference
快速参考
Execution Plan Operators
执行计划算子
| Operator | Meaning | Optimization |
|---|---|---|
| TABLE SCAN / TABLE FULL SCAN | Full table scan | Add index or use partition key |
| TABLE RANGE SCAN | Range scan on table | ✅ Good, but consider index |
| TABLE GET | Direct row access via primary key | ✅ Best |
| HASH JOIN | Hash join algorithm | ✅ Good for large equal joins |
| NESTED-LOOP JOIN | Nested loop join | ✅ Good for small outer table with indexed inner |
| MERGE JOIN | Merge join algorithm | ✅ Good for pre-sorted data |
| SORT | Sorting operation | Add ORDER BY index or use sorted index |
| AGGREGATE | Aggregation | Consider materialized views |
| PX PARTITION ITERATOR | Parallel partition iterator | ✅ Good, partition pruning working |
| EXCHANGE OUT DISTR | Distributed exchange | Indicates distributed execution |
| 算子 | 含义 | 优化方向 |
|---|---|---|
| TABLE SCAN / TABLE FULL SCAN | 全表扫描 | 添加索引或使用分区键 |
| TABLE RANGE SCAN | 表范围扫描 | ✅ 性能较好,但可考虑添加索引 |
| TABLE GET | 通过主键直接访问行 | ✅ 最佳性能 |
| HASH JOIN | 哈希连接算法 | ✅ 适用于大数据集等值连接 |
| NESTED-LOOP JOIN | 嵌套循环连接 | ✅ 适用于小表搭配索引连接 |
| MERGE JOIN | 合并连接 | ✅ 适用于已排序数据 |
| SORT | 排序操作 | 添加ORDER BY索引或使用已排序索引 |
| AGGREGATE | 聚合操作 | 考虑使用物化视图 |
| PX PARTITION ITERATOR | 并行分区迭代器 | ✅ 性能好,分区裁剪生效 |
| EXCHANGE OUT DISTR | 分布式交换 | 表示分布式执行 |
Index Types
索引类型
- Primary Key: Automatically indexed, unique
- Unique Index: Ensures uniqueness, fast lookups
- Composite Index: Multiple columns, order matters
- Covering Index: Contains all query columns
- Primary Key:自动创建索引,唯一
- Unique Index:保证唯一性,查询速度快
- Composite Index:多列索引,列顺序重要
- Covering Index:包含查询所需的所有列