oceanbase-sql-optimization

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

OceanBase SQL Optimization Skill

OceanBase SQL优化指南

Expert in SQL query optimization for OceanBase distributed database (MySQL & Oracle modes).

Provides best practices for writing efficient SQL queries, designing effective indexes, analyzing execution plans, and tuning slow queries in OceanBase. This skill covers both MySQL mode and Oracle mode.

OceanBase分布式数据库（MySQL与Oracle模式）的SQL查询优化专家。

提供在OceanBase中编写高效SQL查询、设计有效索引、分析执行计划以及调优慢查询的最佳实践。本指南同时覆盖MySQL模式与Oracle模式。

Mode-Specific Syntax Guide

模式专属语法指南

While OceanBase uses the same underlying optimizer for both MySQL and Oracle modes, there are important syntax differences:

Aspect	MySQL Mode	Oracle Mode
EXPLAIN Syntax	`EXPLAIN [INTO table_name] [SET statement_id = string]`	`EXPLAIN [INTO table_name] [SET STATEMENT_ID = 'string']`
System Views	`oceanbase.GV$OB_SQL_AUDIT`	`GV$OB_SQL_AUDIT` (same name, different schema access)
Data Types	`BIGINT` , `VARCHAR`	`NUMBER` , `VARCHAR2`
Time Functions	`TIME_TO_USEC()` , `usec_to_time()`	`TO_DATE()` , `TO_CHAR()`
Schema Access	`database.table`	`schema.table`
Limit Syntax	`LIMIT n`	`FETCH FIRST n ROWS ONLY`
String Concatenation	`CONCAT()`	`\|\|` operator

Optimization Principles:

✅ Same: Execution plan operators, join algorithms, partition pruning logic
✅ Same: Index design principles, query rewriting strategies
❌ Different: Syntax, system view access, data type names, function names

Examples in this skill are marked with mode indicators: MySQL Mode: or Oracle Mode:

虽然OceanBase在MySQL与Oracle模式下使用相同的底层优化器，但二者存在重要的语法差异：

维度	MySQL模式	Oracle模式
EXPLAIN语法	`EXPLAIN [INTO table_name] [SET statement_id = string]`	`EXPLAIN [INTO table_name] [SET STATEMENT_ID = 'string']`
系统视图	`oceanbase.GV$OB_SQL_AUDIT`	`GV$OB_SQL_AUDIT` （名称相同， schema访问方式不同）
数据类型	`BIGINT` , `VARCHAR`	`NUMBER` , `VARCHAR2`
时间函数	`TIME_TO_USEC()` , `usec_to_time()`	`TO_DATE()` , `TO_CHAR()`
Schema访问	`database.table`	`schema.table`
限制语法	`LIMIT n`	`FETCH FIRST n ROWS ONLY`
字符串拼接	`CONCAT()`	`

优化原则：

✅ 相同点：执行计划算子、连接算法、分区裁剪逻辑
✅ 相同点：索引设计原则、查询重写策略
❌ 不同点：语法、系统视图访问方式、数据类型名称、函数名称

本指南中的示例会标注模式类型：MySQL Mode: 或 Oracle Mode:

Core Optimization Principles

核心优化原则

1. Understand Execution Plans

1. 理解执行计划

Always analyze execution plans before optimization:

MySQL Mode:

sql

obclient [SALES_DB]> EXPLAIN SELECT * FROM order_table WHERE order_date >= '2024-01-01';

Oracle Mode:

sql

obclient [SALES_DB]> EXPLAIN SELECT * FROM order_table WHERE order_date >= DATE '2024-01-01';

查询结果如下：

+------------------------------------------------------------------------------------------------+
| Query Plan                                                                                     |
+------------------------------------------------------------------------------------------------+
| ===========================================                                                    |
| |ID|OPERATOR              |NAME      |EST. ROWS|COST |                                        |
| -------------------------------------------                                                    |
| |0 |TABLE SCAN            |order_table|1000    |1000 |                                        |
| |1 | TABLE GET            |order_table|1000    |1000 |                                        |
+===========================================                                                      |
| Outputs & filters:                                                                             |
| -------------------------------------                                                          |
|   0 - output([order_table.order_id], [order_table.customer_id], [order_table.order_date]),   |
|       filter([order_table.order_date >= '2024-01-01']),                                        |
|       access([order_table.order_id], [order_table.customer_id], [order_table.order_date]),    |
|       partitions(p0)                                                                            |
+------------------------------------------------------------------------------------------------+

Key indicators to check:

EST. ROWS: Estimated rows processed (lower is better)
EST.TIME(us): Estimated execution time in microseconds (lower is better)
OPERATOR: Look for
```
TABLE SCAN
```
(full table scan, bad) vs
```
TABLE GET
```
(index lookup, good)
Partitions: Check if partition pruning is working (e.g.,
```
partitions(p1)
```
means only one partition accessed)
is_index_back:
```
false
```
means no index back,
```
true
```
means index back required

优化前务必分析执行计划：

MySQL模式：

sql

obclient [SALES_DB]> EXPLAIN SELECT * FROM order_table WHERE order_date >= '2024-01-01';

Oracle模式：

sql

obclient [SALES_DB]> EXPLAIN SELECT * FROM order_table WHERE order_date >= DATE '2024-01-01';

查询结果如下：

+------------------------------------------------------------------------------------------------+
| Query Plan                                                                                     |
+------------------------------------------------------------------------------------------------+
| ===========================================                                                    |
| |ID|OPERATOR              |NAME      |EST. ROWS|COST |                                        |
| -------------------------------------------                                                    |
| |0 |TABLE SCAN            |order_table|1000    |1000 |                                        |
| |1 | TABLE GET            |order_table|1000    |1000 |                                        |
+===========================================                                                      |
| Outputs & filters:                                                                             |
| -------------------------------------                                                          |
|   0 - output([order_table.order_id], [order_table.customer_id], [order_table.order_date]),   |
|       filter([order_table.order_date >= '2024-01-01']),                                        |
|       access([order_table.order_id], [order_table.customer_id], [order_table.order_date]),    |
|       partitions(p0)                                                                            |
+------------------------------------------------------------------------------------------------+

需要检查的关键指标：

EST. ROWS：预估处理行数（越少越好）
EST.TIME(us)：预估执行时间（微秒，越少越好）
OPERATOR：注意
```
TABLE SCAN
```
（全表扫描，性能差）与
```
TABLE GET
```
（索引查找，性能好）的区别
Partitions：检查分区裁剪是否生效（例如
```
partitions(p1)
```
表示仅访问一个分区）
is_index_back：
```
false
```
表示无需回表，
```
true
```
表示需要回表

2. Index Design Best Practices

2. 索引设计最佳实践

Create indexes on frequently queried columns (syntax is the same for both modes):

sql

-- ✅ GOOD: Index on WHERE clause column
obclient [SALES_DB]> CREATE INDEX idx_order_date ON order_table(order_date);

-- ✅ GOOD: Composite index for multi-column queries (column order matters)
obclient [SALES_DB]> CREATE INDEX idx_customer_date ON order_table(customer_id, order_date);

-- ✅ GOOD: Unique index for unique constraints
obclient [SALES_DB]> CREATE UNIQUE INDEX idx_order_number ON order_table(order_number);

-- ✅ GOOD: Index with STORING clause for covering index
obclient [SALES_DB]> CREATE INDEX idx_customer_date ON order_table(customer_id, order_date) 
                     STORING (total_amount, status);

-- ❌ BAD: Index on rarely queried columns
obclient [SALES_DB]> CREATE INDEX idx_notes ON order_table(notes);

OceanBase Index Types:

UNIQUE Index: Ensures uniqueness, fast lookups
LOCAL Index: Partition-local index (default for partitioned tables)
GLOBAL Index: Cross-partition index
Function Index: Index on expressions (e.g.,
```
CREATE INDEX idx_expr ON t1((c1 + c2))
```
)

Index selection rules:

Index columns used in WHERE clauses
Index columns used in JOIN conditions
Consider composite indexes for multi-column filters (column order matters - most selective first)
Use STORING clause to create covering indexes (avoids index back)
Avoid over-indexing (each index adds write overhead)
For partitioned tables, prefer LOCAL indexes unless cross-partition queries are common

为频繁查询的列创建索引（两种模式语法相同）：

sql

-- ✅ 推荐：为WHERE子句中的列创建索引
obclient [SALES_DB]> CREATE INDEX idx_order_date ON order_table(order_date);

-- ✅ 推荐：为多列查询创建复合索引（列顺序很重要）
obclient [SALES_DB]> CREATE INDEX idx_customer_date ON order_table(customer_id, order_date);

-- ✅ 推荐：为唯一约束创建唯一索引
obclient [SALES_DB]> CREATE UNIQUE INDEX idx_order_number ON order_table(order_number);

-- ✅ 推荐：使用STORING子句创建覆盖索引
obclient [SALES_DB]> CREATE INDEX idx_customer_date ON order_table(customer_id, order_date) 
                     STORING (total_amount, status);

-- ❌ 不推荐：为极少查询的列创建索引
obclient [SALES_DB]> CREATE INDEX idx_notes ON order_table(notes);

OceanBase索引类型：

UNIQUE Index：保证唯一性，查询速度快
LOCAL Index：分区本地索引（分区表默认类型）
GLOBAL Index：跨分区索引
Function Index：基于表达式的索引（例如
```
CREATE INDEX idx_expr ON t1((c1 + c2))
```
）

索引选择规则：

为WHERE子句中的列创建索引
为JOIN条件中的列创建索引
多列过滤查询考虑复合索引（列顺序优先选择选择性高的列）
使用STORING子句创建覆盖索引（避免回表）
避免过度索引（每个索引都会增加写入开销）
对于分区表，优先使用LOCAL索引，除非存在大量跨分区查询

3. Query Writing Best Practices

3. 查询编写最佳实践

Use specific column lists instead of SELECT *:

sql

-- ✅ GOOD: Select only needed columns
obclient [SALES_DB]> SELECT order_id, customer_id, total_amount 
                     FROM order_table 
                     WHERE order_date >= '2024-01-01';

-- ❌ BAD: Select all columns
obclient [SALES_DB]> SELECT * FROM order_table WHERE order_date >= '2024-01-01';

Use appropriate WHERE conditions:

sql

-- ✅ GOOD: Use indexed column in WHERE
obclient [SALES_DB]> SELECT * FROM order_table WHERE order_id = 12345;

-- ✅ GOOD: Use range queries on indexed columns
obclient [SALES_DB]> SELECT * FROM order_table 
                     WHERE order_date BETWEEN '2024-01-01' AND '2024-12-31';

-- ❌ BAD: Function on indexed column prevents index usage
obclient [SALES_DB]> SELECT * FROM order_table WHERE YEAR(order_date) = 2024;

-- ✅ GOOD: Rewrite to use index
obclient [SALES_DB]> SELECT * FROM order_table 
                     WHERE order_date >= '2024-01-01' AND order_date < '2025-01-01';

使用指定列列表而非SELECT *：

sql

-- ✅ 推荐：仅选择需要的列
obclient [SALES_DB]> SELECT order_id, customer_id, total_amount 
                     FROM order_table 
                     WHERE order_date >= '2024-01-01';

-- ❌ 不推荐：选择所有列
obclient [SALES_DB]> SELECT * FROM order_table WHERE order_date >= '2024-01-01';

使用合适的WHERE条件：

sql

-- ✅ 推荐：WHERE子句使用索引列
obclient [SALES_DB]> SELECT * FROM order_table WHERE order_id = 12345;

-- ✅ 推荐：对索引列使用范围查询
obclient [SALES_DB]> SELECT * FROM order_table 
                     WHERE order_date BETWEEN '2024-01-01' AND '2024-12-31';

-- ❌ 不推荐：对索引列使用函数会导致索引失效
obclient [SALES_DB]> SELECT * FROM order_table WHERE YEAR(order_date) = 2024;

-- ✅ 推荐：重写查询以使用索引
obclient [SALES_DB]> SELECT * FROM order_table 
                     WHERE order_date >= '2024-01-01' AND order_date < '2025-01-01';

Partition Pruning Optimization

分区裁剪优化

Partition pruning avoids accessing irrelevant partitions, significantly improving SQL execution efficiency. Always check the

partitions

field in the execution plan to verify partition pruning is working.

分区裁剪避免访问无关分区，可显著提升SQL执行效率。务必检查执行计划中的

partitions

字段，确认分区裁剪是否生效。

Hash Partitioning

哈希分区

Ensure partition key is in WHERE clause:

sql

-- ✅ GOOD: Partition key in WHERE clause enables partition pruning
obclient [SALES_DB]> CREATE TABLE order_table (
                     order_id INT,
                     customer_id INT,
                     order_date DATE
                     ) PARTITION BY HASH(order_id) PARTITIONS 5;

obclient [SALES_DB]> SELECT * FROM order_table WHERE order_id = 12345;

查询结果如下：

+------------------------------------------------------------------------------------------------+
| Query Plan                                                                                     |
+------------------------------------------------------------------------------------------------+
| =================================================                                              |
| |ID|OPERATOR  |NAME       |EST. ROWS|EST.TIME(us)|                                            |
| -------------------------------------------------                                              |
| |0 |TABLE SCAN|order_table|990      |383         |                                            |
| =================================================                                              |
| Outputs & filters:                                                                             |
| -------------------------------------                                                          |
|   0 - output([order_table.order_id], [order_table.customer_id], [order_table.order_date]),  |
|       filter(nil),                                                                             |
|       access([order_table.order_id], [order_table.customer_id], [order_table.order_date]),   |
|       partitions(p1)  -- Only one partition accessed!                                         |
+------------------------------------------------------------------------------------------------+

sql

-- ❌ BAD: Missing partition key prevents pruning (scans all partitions)
obclient [SALES_DB]> SELECT * FROM order_table WHERE customer_id = 1001;
-- Check execution plan: partitions(p0, p1, p2, p3, p4) - all partitions scanned

确保WHERE子句中包含分区键：

sql

-- ✅ 推荐：WHERE子句包含分区键，启用分区裁剪
obclient [SALES_DB]> CREATE TABLE order_table (
                     order_id INT,
                     customer_id INT,
                     order_date DATE
                     ) PARTITION BY HASH(order_id) PARTITIONS 5;

obclient [SALES_DB]> SELECT * FROM order_table WHERE order_id = 12345;

查询结果如下：

+------------------------------------------------------------------------------------------------+
| Query Plan                                                                                     |
+------------------------------------------------------------------------------------------------+
| =================================================                                              |
| |ID|OPERATOR  |NAME       |EST. ROWS|EST.TIME(us)|                                            |
| -------------------------------------------------                                              |
| |0 |TABLE SCAN|order_table|990      |383         |                                            |
| =================================================                                              |
| Outputs & filters:                                                                             |
| -------------------------------------                                                          |
|   0 - output([order_table.order_id], [order_table.customer_id], [order_table.order_date]),  |
|       filter(nil),                                                                             |
|       access([order_table.order_id], [order_table.customer_id], [order_table.order_date]),   |
|       partitions(p1)  -- 仅访问一个分区!                                         |
+------------------------------------------------------------------------------------------------+

sql

-- ❌ 不推荐：缺少分区键导致无法裁剪（扫描所有分区）
obclient [SALES_DB]> SELECT * FROM order_table WHERE customer_id = 1001;
-- 查看执行计划：partitions(p0, p1, p2, p3, p4) - 所有分区被扫描

Range Partitioning

范围分区

Use range conditions that match partition boundaries:

sql

-- ✅ GOOD: Range query matches partition boundaries
obclient [SALES_DB]> CREATE TABLE order_table (
                     order_id INT,
                     customer_id INT,
                     order_date DATE
                     ) PARTITION BY RANGE(order_date) (
                     PARTITION p0 VALUES LESS THAN('2024-01-01'),
                     PARTITION p1 VALUES LESS THAN('2024-02-01'),
                     PARTITION p2 VALUES LESS THAN('2024-03-01')
                     );

obclient [SALES_DB]> SELECT * FROM order_table 
                     WHERE order_date >= '2024-01-01' AND order_date < '2024-02-01';

查询结果如下：

+------------------------------------------------------------------------------------------------+
| Query Plan                                                                                     |
+------------------------------------------------------------------------------------------------+
| =================================================                                              |
| |ID|OPERATOR  |NAME       |EST. ROWS|EST.TIME(us)|                                            |
| -------------------------------------------------                                              |
| |0 |TABLE SCAN|order_table|1        |46          |                                            |
| =================================================                                              |
| Outputs & filters:                                                                             |
| -------------------------------------                                                          |
|   0 - output([order_table.order_id], [order_table.customer_id], [order_table.order_date]),  |
|       filter([order_table.order_date >= '2024-01-01'], [order_table.order_date < '2024-02-01']),
|       access([order_table.order_id], [order_table.customer_id], [order_table.order_date]),   |
|       partitions(p1)  -- Only partition p1 accessed!                                        |
+------------------------------------------------------------------------------------------------+

sql

-- ❌ BAD: Function prevents partition pruning
obclient [SALES_DB]> SELECT * FROM order_table WHERE MONTH(order_date) = 1;
-- Check execution plan: partitions(p0, p1, p2) - all partitions scanned

-- ✅ GOOD: Rewrite to use partition pruning
obclient [SALES_DB]> SELECT * FROM order_table 
                     WHERE order_date >= '2024-01-01' AND order_date < '2024-02-01';

Partition Pruning Rules:

For Hash/List partitions: Partition key must be in WHERE clause with equality or IN conditions
For Range partitions: Use range conditions that match partition boundaries
Avoid functions on partition keys in WHERE clauses
For expressions as partition keys, the expression must appear as a whole in WHERE clause

使用与分区边界匹配的范围条件：

sql

-- ✅ 推荐：范围查询匹配分区边界
obclient [SALES_DB]> CREATE TABLE order_table (
                     order_id INT,
                     customer_id INT,
                     order_date DATE
                     ) PARTITION BY RANGE(order_date) (
                     PARTITION p0 VALUES LESS THAN('2024-01-01'),
                     PARTITION p1 VALUES LESS THAN('2024-02-01'),
                     PARTITION p2 VALUES LESS THAN('2024-03-01')
                     );

obclient [SALES_DB]> SELECT * FROM order_table 
                     WHERE order_date >= '2024-01-01' AND order_date < '2024-02-01';

查询结果如下：

+------------------------------------------------------------------------------------------------+
| Query Plan                                                                                     |
+------------------------------------------------------------------------------------------------+
| =================================================                                              |
| |ID|OPERATOR  |NAME       |EST. ROWS|EST.TIME(us)|                                            |
| -------------------------------------------------                                              |
| |0 |TABLE SCAN|order_table|1        |46          |                                            |
| =================================================                                              |
| Outputs & filters:                                                                             |
| -------------------------------------                                                          |
|   0 - output([order_table.order_id], [order_table.customer_id], [order_table.order_date]),  |
|       filter([order_table.order_date >= '2024-01-01'], [order_table.order_date < '2024-02-01']),
|       access([order_table.order_id], [order_table.customer_id], [order_table.order_date]),   |
|       partitions(p1)  -- 仅访问分区p1!                                        |
+------------------------------------------------------------------------------------------------+

sql

-- ❌ 不推荐：使用函数导致分区裁剪失效
obclient [SALES_DB]> SELECT * FROM order_table WHERE MONTH(order_date) = 1;
-- 查看执行计划：partitions(p0, p1, p2) - 所有分区被扫描

-- ✅ 推荐：重写查询以启用分区裁剪
obclient [SALES_DB]> SELECT * FROM order_table 
                     WHERE order_date >= '2024-01-01' AND order_date < '2024-02-01';

分区裁剪规则：

哈希/列表分区：WHERE子句中必须包含分区键，且使用等值或IN条件
范围分区：使用与分区边界匹配的范围条件
避免在WHERE子句中对分区键使用函数
若分区键为表达式，WHERE子句中必须完整使用该表达式

Join Optimization

连接优化

OceanBase supports three join algorithms: Nested Loop Join (NLJ), Hash Join (HJ), and Merge Join (MJ). The optimizer automatically selects the best algorithm based on statistics and cost estimation.

OceanBase支持三种连接算法：Nested Loop Join (NLJ)、Hash Join (HJ) 和 Merge Join (MJ)。优化器会根据统计信息和成本估算自动选择最佳算法。

Join Algorithms

连接算法

1. Nested Loop Join (NLJ)

Best for: Small outer table, indexed inner table
Use when: Join condition has index on inner table
Hint:
```
/*+ USE_NL(table1, table2) */
```

sql

-- ✅ GOOD: NLJ with indexed join column
obclient [SALES_DB]> CREATE INDEX idx_customer_id ON order_table(customer_id);

obclient [SALES_DB]> EXPLAIN SELECT /*+USE_NL(c, o)*/ o.order_id, c.customer_name 
                     FROM customer_table c
                     INNER JOIN order_table o ON c.customer_id = o.customer_id
                     WHERE c.customer_type = 'VIP';

查询结果如下：

+------------------------------------------------------------------------------------------------+
| Query Plan                                                                                     |
+------------------------------------------------------------------------------------------------+
| ===========================================                                                    |
| |ID|OPERATOR        |NAME  |EST. ROWS|EST.TIME(us)|                                           |
| -------------------------------------------                                                    |
| |0 |NESTED-LOOP JOIN|      |990      |37346       |                                           |
| |1 | TABLE SCAN     |c     |999      |669         |                                           |
| |2 | TABLE GET      |o     |1        |36          |  -- Index lookup!                         |
| ===========================================                                                    |
| Outputs & filters:                                                                             |
| -------------------------------------                                                          |
|   0 - output([o.order_id], [c.customer_name]), filter(nil),                                   |
|       conds(nil), nl_params_([c.customer_id])                                                 |
|   1 - output([c.customer_id], [c.customer_name]), filter([c.customer_type = 'VIP']),         |
|       access([c.customer_id], [c.customer_name]), partitions(p0)                             |
|   2 - output([o.order_id]), filter(nil),                                                      |
|       access([o.order_id]), partitions(p0),                                                    |
|       range_key([o.customer_id]), range([? = o.customer_id])  -- Index used!                  |
+------------------------------------------------------------------------------------------------+

2. Hash Join (HJ)

Best for: Large datasets, equal join conditions
Use when: Both tables are large and join condition is equality
Hint:
```
/*+ USE_HASH(table1, table2) */
```

sql

-- ✅ GOOD: Hash Join for large table joins
obclient [SALES_DB]> EXPLAIN SELECT /*+USE_HASH(o, c)*/ o.order_id, c.customer_name 
                     FROM order_table o
                     INNER JOIN customer_table c ON o.customer_id = c.customer_id;

查询结果如下：

+------------------------------------------------------------------------------------------------+
| Query Plan                                                                                     |
+------------------------------------------------------------------------------------------------+
| =======================================                                                        |
| |ID|OPERATOR   |NAME|EST. ROWS|EST.TIME(us)|                                                   |
| ---------------------------------------                                                        |
| |0 |HASH JOIN  |    |98010000 |66774608    |                                                   |
| |1 | TABLE SCAN|o   |100000   |68478       |                                                   |
| |2 | TABLE SCAN|c   |100000   |68478       |                                                   |
| =======================================                                                        |
| Outputs & filters:                                                                             |
| -------------------------------------                                                          |
|   0 - output([o.order_id], [c.customer_name]), filter(nil),                                    |
|       equal_conds([o.customer_id = c.customer_id]), other_conds(nil)                           |
+------------------------------------------------------------------------------------------------+

3. Merge Join (MJ)

Best for: Pre-sorted data, ordered results needed
Use when: Both inputs are already sorted or can be sorted efficiently
Hint:
```
/*+ USE_MERGE(table1, table2) */
```

sql

-- ✅ GOOD: Merge Join when data is sorted
obclient [SALES_DB]> EXPLAIN SELECT /*+USE_MERGE(o, c)*/ o.order_id, c.customer_name 
                     FROM order_table o
                     INNER JOIN customer_table c ON o.customer_id = c.customer_id
                     ORDER BY o.customer_id;

1. Nested Loop Join (NLJ)

适用场景：外表数据量小，内表有索引
使用时机：连接条件中的内表列有索引
提示：
```
/*+ USE_NL(table1, table2) */
```

sql

-- ✅ 推荐：NLJ搭配索引连接列
obclient [SALES_DB]> CREATE INDEX idx_customer_id ON order_table(customer_id);

obclient [SALES_DB]> EXPLAIN SELECT /*+USE_NL(c, o)*/ o.order_id, c.customer_name 
                     FROM customer_table c
                     INNER JOIN order_table o ON c.customer_id = o.customer_id
                     WHERE c.customer_type = 'VIP';

查询结果如下：

+------------------------------------------------------------------------------------------------+
| Query Plan                                                                                     |
+------------------------------------------------------------------------------------------------+
| ===========================================                                                    |
| |ID|OPERATOR        |NAME  |EST. ROWS|EST.TIME(us)|                                           |
| -------------------------------------------                                                    |
| |0 |NESTED-LOOP JOIN|      |990      |37346       |                                           |
| |1 | TABLE SCAN     |c     |999      |669         |                                           |
| |2 | TABLE GET      |o     |1        |36          |  -- 索引查找!                         |
| ===========================================                                                    |
| Outputs & filters:                                                                             |
| -------------------------------------                                                          |
|   0 - output([o.order_id], [c.customer_name]), filter(nil),                                   |
|       conds(nil), nl_params_([c.customer_id])                                                 |
|   1 - output([c.customer_id], [c.customer_name]), filter([c.customer_type = 'VIP']),         |
|       access([c.customer_id], [c.customer_name]), partitions(p0)                             |
|   2 - output([o.order_id]), filter(nil),                                                      |
|       access([o.order_id]), partitions(p0),                                                    |
|       range_key([o.customer_id]), range([? = o.customer_id])  -- 使用索引!                  |
+------------------------------------------------------------------------------------------------+

2. Hash Join (HJ)

适用场景：大数据集，等值连接条件
使用时机：两张表数据量都较大，且连接条件为等值
提示：
```
/*+ USE_HASH(table1, table2) */
```

sql

-- ✅ 推荐：Hash Join用于大表连接
obclient [SALES_DB]> EXPLAIN SELECT /*+USE_HASH(o, c)*/ o.order_id, c.customer_name 
                     FROM order_table o
                     INNER JOIN customer_table c ON o.customer_id = c.customer_id;

查询结果如下：

+------------------------------------------------------------------------------------------------+
| Query Plan                                                                                     |
+------------------------------------------------------------------------------------------------+
| =======================================                                                        |
| |ID|OPERATOR   |NAME|EST. ROWS|EST.TIME(us)|                                                   |
| ---------------------------------------                                                        |
| |0 |HASH JOIN  |    |98010000 |66774608    |                                                   |
| |1 | TABLE SCAN|o   |100000   |68478       |                                                   |
| |2 | TABLE SCAN|c   |100000   |68478       |                                                   |
| =======================================                                                        |
| Outputs & filters:                                                                             |
| -------------------------------------                                                          |
|   0 - output([o.order_id], [c.customer_name]), filter(nil),                                    |
|       equal_conds([o.customer_id = c.customer_id]), other_conds(nil)                           |
+------------------------------------------------------------------------------------------------+

3. Merge Join (MJ)

适用场景：数据已排序，需要有序结果
使用时机：两个输入数据集已排序或可高效排序
提示：
```
/*+ USE_MERGE(table1, table2) */
```

sql

-- ✅ 推荐：数据已排序时使用Merge Join
obclient [SALES_DB]> EXPLAIN SELECT /*+USE_MERGE(o, c)*/ o.order_id, c.customer_name 
                     FROM order_table o
                     INNER JOIN customer_table c ON o.customer_id = c.customer_id
                     ORDER BY o.customer_id;

Join Order

连接顺序

Use LEADING hint to control join order:

sql

-- ✅ GOOD: Control join order explicitly
obclient [SALES_DB]> SELECT /*+LEADING(c, o)*/ o.order_id, c.customer_name 
                     FROM customer_table c
                     INNER JOIN order_table o ON c.customer_id = o.customer_id
                     WHERE c.customer_type = 'VIP'
                     AND o.order_date >= '2024-01-01';

-- ❌ BAD: Let optimizer choose suboptimal order
obclient [SALES_DB]> SELECT o.order_id, c.customer_name 
                     FROM order_table o
                     INNER JOIN customer_table c ON o.customer_id = c.customer_id
                     WHERE c.customer_type = 'VIP';

使用LEADING提示控制连接顺序：

sql

-- ✅ 推荐：显式控制连接顺序
obclient [SALES_DB]> SELECT /*+LEADING(c, o)*/ o.order_id, c.customer_name 
                     FROM customer_table c
                     INNER JOIN order_table o ON c.customer_id = o.customer_id
                     WHERE c.customer_type = 'VIP'
                     AND o.order_date >= '2024-01-01';

-- ❌ 不推荐：让优化器选择非最优顺序
obclient [SALES_DB]> SELECT o.order_id, c.customer_name 
                     FROM order_table o
                     INNER JOIN customer_table c ON o.customer_id = c.customer_id
                     WHERE c.customer_type = 'VIP';

Index on Join Columns

连接列索引

Always index foreign key columns:

sql

-- ✅ GOOD: Index on join column enables efficient joins
obclient [SALES_DB]> CREATE INDEX idx_customer_id ON order_table(customer_id);

-- Then join queries can use index (NLJ with TABLE GET)
obclient [SALES_DB]> SELECT o.*, c.customer_name 
                     FROM order_table o
                     INNER JOIN customer_table c ON o.customer_id = c.customer_id;

务必为外键列创建索引：

sql

-- ✅ 推荐：为连接列创建索引，实现高效连接
obclient [SALES_DB]> CREATE INDEX idx_customer_id ON order_table(customer_id);

-- 之后的连接查询可使用索引（NLJ搭配TABLE GET）
obclient [SALES_DB]> SELECT o.*, c.customer_name 
                     FROM order_table o
                     INNER JOIN customer_table c ON o.customer_id = c.customer_id;

Aggregation Optimization

聚合优化

Use Appropriate Aggregation Functions

使用合适的聚合函数

sql

-- ✅ GOOD: COUNT(*) is optimized
obclient [SALES_DB]> SELECT COUNT(*) FROM order_table WHERE order_date >= '2024-01-01';

-- ✅ GOOD: COUNT(column) when you need non-NULL count
obclient [SALES_DB]> SELECT COUNT(customer_id) FROM order_table;

-- ❌ BAD: COUNT(DISTINCT) on large datasets can be slow
obclient [SALES_DB]> SELECT COUNT(DISTINCT customer_id) FROM order_table;

-- ✅ GOOD: Use GROUP BY with LIMIT (MySQL) or FETCH FIRST (Oracle) for top N
-- MySQL Mode:
obclient [SALES_DB]> SELECT customer_id, SUM(total_amount) AS total
                     FROM order_table
                     GROUP BY customer_id
                     ORDER BY total DESC
                     LIMIT 10;

-- Oracle Mode:
obclient [SALES_DB]> SELECT customer_id, SUM(total_amount) AS total
                     FROM order_table
                     GROUP BY customer_id
                     ORDER BY total DESC
                     FETCH FIRST 10 ROWS ONLY;

sql

-- ✅ 推荐：COUNT(*)已优化
obclient [SALES_DB]> SELECT COUNT(*) FROM order_table WHERE order_date >= '2024-01-01';

-- ✅ 推荐：需要统计非NULL值时使用COUNT(column)
obclient [SALES_DB]> SELECT COUNT(customer_id) FROM order_table;

-- ❌ 不推荐：大数据集上使用COUNT(DISTINCT)会较慢
obclient [SALES_DB]> SELECT COUNT(DISTINCT customer_id) FROM order_table;

-- ✅ 推荐：结合GROUP BY与LIMIT（MySQL）或FETCH FIRST（Oracle）获取Top N
-- MySQL模式:
obclient [SALES_DB]> SELECT customer_id, SUM(total_amount) AS total
                     FROM order_table
                     GROUP BY customer_id
                     ORDER BY total DESC
                     LIMIT 10;

-- Oracle模式:
obclient [SALES_DB]> SELECT customer_id, SUM(total_amount) AS total
                     FROM order_table
                     GROUP BY customer_id
                     ORDER BY total DESC
                     FETCH FIRST 10 ROWS ONLY;

Subquery Optimization

子查询优化

Convert Correlated Subqueries to JOINs

将关联子查询转换为JOIN

sql

-- ❌ BAD: Correlated subquery (executes for each row)
obclient [SALES_DB]> SELECT o.order_id, o.total_amount
                     FROM order_table o
                     WHERE o.total_amount > (
                         SELECT AVG(total_amount) 
                         FROM order_table 
                         WHERE customer_id = o.customer_id
                     );

-- ✅ GOOD: Convert to JOIN
obclient [SALES_DB]> SELECT o.order_id, o.total_amount
                     FROM order_table o
                     INNER JOIN (
                         SELECT customer_id, AVG(total_amount) as avg_amount
                         FROM order_table
                         GROUP BY customer_id
                     ) avg_orders ON o.customer_id = avg_orders.customer_id
                     WHERE o.total_amount > avg_orders.avg_amount;

sql

-- ❌ 不推荐：关联子查询（逐行执行）
obclient [SALES_DB]> SELECT o.order_id, o.total_amount
                     FROM order_table o
                     WHERE o.total_amount > (
                         SELECT AVG(total_amount) 
                         FROM order_table 
                         WHERE customer_id = o.customer_id
                     );

-- ✅ 推荐：转换为JOIN
obclient [SALES_DB]> SELECT o.order_id, o.total_amount
                     FROM order_table o
                     INNER JOIN (
                         SELECT customer_id, AVG(total_amount) as avg_amount
                         FROM order_table
                         GROUP BY customer_id
                     ) avg_orders ON o.customer_id = avg_orders.customer_id
                     WHERE o.total_amount > avg_orders.avg_amount;

Use EXISTS instead of IN for large datasets

大数据集下使用EXISTS替代IN

Syntax is the same for both modes:

sql

-- ✅ GOOD: EXISTS stops at first match
obclient [SALES_DB]> SELECT * FROM order_table o
                     WHERE EXISTS (
                         SELECT 1 FROM customer_table c
                         WHERE c.customer_id = o.customer_id
                         AND c.customer_type = 'VIP'
                     );

-- ⚠️ OK: IN works but may be slower for large lists
obclient [SALES_DB]> SELECT * FROM order_table
                     WHERE customer_id IN (SELECT customer_id FROM customer_table WHERE customer_type = 'VIP');

两种模式语法相同：

sql

-- ✅ 推荐：EXISTS找到第一个匹配项即停止
obclient [SALES_DB]> SELECT * FROM order_table o
                     WHERE EXISTS (
                         SELECT 1 FROM customer_table c
                         WHERE c.customer_id = o.customer_id
                         AND c.customer_type = 'VIP'
                     );

-- ⚠️ 可用：IN可正常工作，但大数据集下可能较慢
obclient [SALES_DB]> SELECT * FROM order_table
                     WHERE customer_id IN (SELECT customer_id FROM customer_table WHERE customer_type = 'VIP');

Slow Query Tuning

慢查询调优

Enable SQL Audit

启用SQL审计

Configure SQL Audit settings:

sql

-- Enable SQL Audit
obclient> ALTER SYSTEM SET enable_sql_audit = true;

-- Set SQL Audit memory percentage (default: 3%, range: [0,80])
obclient> SET GLOBAL ob_sql_audit_percentage = 3;

配置SQL审计设置：

sql

-- 启用SQL审计
obclient> ALTER SYSTEM SET enable_sql_audit = true;

-- 设置SQL审计内存占比（默认：3%，范围：[0,80]）
obclient> SET GLOBAL ob_sql_audit_percentage = 3;

Identify Slow Queries

识别慢查询

Query TOP SQL by CPU usage:

MySQL Mode: Use

oceanbase.GV$OB_SQL_AUDIT

with

TIME_TO_USEC(NOW())

Oracle Mode: Use

GV$OB_SQL_AUDIT

with

SYSDATE - INTERVAL '30' MINUTE

sql

-- Find TOP SQL by CPU usage in last 30 minutes
obclient> SELECT sql_id, COUNT(*) AS executions, SUM(execute_time) AS tot_cpu_time,
                AVG(execute_time) AS avg_cpu_time,
                SUM(execute_time)/(30*60*1000*1000) AS cpu_cnt, query_sql
         FROM oceanbase.GV$OB_SQL_AUDIT  -- MySQL: add 'oceanbase.' prefix
         WHERE tenant_id = 'mysql001'  -- MySQL: string, Oracle: number
           AND request_time BETWEEN (TIME_TO_USEC(NOW())-30*60*1000*1000) AND TIME_TO_USEC(NOW())  -- MySQL
           -- Oracle: BETWEEN (SYSDATE - INTERVAL '30' MINUTE) AND SYSDATE
           AND is_executor_rpc = 0
         GROUP BY sql_id
         HAVING COUNT(*) > 1
         ORDER BY cpu_cnt DESC
         LIMIT 10;  -- MySQL: LIMIT, Oracle: FETCH FIRST 10 ROWS ONLY

Query slow queries by execution time:

MySQL Mode: Use

usec_to_time()

function
Oracle Mode: Use

TO_CHAR()

function

sql

-- Find queries with execution time > 100ms
obclient> SELECT request_id,
                usec_to_time(request_time) AS request_time,  -- MySQL
                -- TO_CHAR(request_time, 'YYYY-MM-DD HH24:MI:SS.FF') AS request_time,  -- Oracle
                elapsed_time, queue_time, execute_time, flt_trace_id, query_sql
         FROM oceanbase.v$ob_sql_audit  -- MySQL: add 'oceanbase.' prefix
         WHERE elapsed_time > 100000  -- > 100ms in microseconds
         ORDER BY elapsed_time DESC
         LIMIT 10;  -- MySQL: LIMIT, Oracle: FETCH FIRST 10 ROWS ONLY

按CPU使用率查询TOP SQL：

MySQL模式： 使用

oceanbase.GV$OB_SQL_AUDIT

搭配

TIME_TO_USEC(NOW())

Oracle模式： 使用

GV$OB_SQL_AUDIT

搭配

SYSDATE - INTERVAL '30' MINUTE

sql

-- 查询过去30分钟内CPU使用率最高的TOP SQL
obclient> SELECT sql_id, COUNT(*) AS executions, SUM(execute_time) AS tot_cpu_time,
                AVG(execute_time) AS avg_cpu_time,
                SUM(execute_time)/(30*60*1000*1000) AS cpu_cnt, query_sql
         FROM oceanbase.GV$OB_SQL_AUDIT  -- MySQL: 添加'oceanbase.'前缀
         WHERE tenant_id = 'mysql001'  -- MySQL: 字符串类型，Oracle: 数字类型
           AND request_time BETWEEN (TIME_TO_USEC(NOW())-30*60*1000*1000) AND TIME_TO_USEC(NOW())  -- MySQL
           -- Oracle: BETWEEN (SYSDATE - INTERVAL '30' MINUTE) AND SYSDATE
           AND is_executor_rpc = 0
         GROUP BY sql_id
         HAVING COUNT(*) > 1
         ORDER BY cpu_cnt DESC
         LIMIT 10;  -- MySQL: 使用LIMIT，Oracle: 使用FETCH FIRST 10 ROWS ONLY

按执行时间查询慢查询：

MySQL模式： 使用

usec_to_time()

函数
Oracle模式： 使用

TO_CHAR()

函数

sql

-- 查询执行时间>100ms的查询
obclient> SELECT request_id,
                usec_to_time(request_time) AS request_time,  -- MySQL
                -- TO_CHAR(request_time, 'YYYY-MM-DD HH24:MI:SS.FF') AS request_time,  -- Oracle
                elapsed_time, queue_time, execute_time, flt_trace_id, query_sql
         FROM oceanbase.v$ob_sql_audit  -- MySQL: 添加'oceanbase.'前缀
         WHERE elapsed_time > 100000  -- > 100ms（微秒）
         ORDER BY elapsed_time DESC
         LIMIT 10;  -- MySQL: 使用LIMIT，Oracle: 使用FETCH FIRST 10 ROWS ONLY

Analyze Query Performance

分析查询性能

Get detailed execution statistics:

MySQL Mode: Use

oceanbase.gv$ob_sql_audit

Oracle Mode: Use

GV$OB_SQL_AUDIT

(no schema prefix)

sql

-- Get detailed statistics for a specific SQL
obclient> SELECT sql_id, query_sql, executions, elapsed_time, execute_time,
                queue_time, return_rows, affected_rows, partition_cnt,
                table_scan, is_hit_plan, plan_id
         FROM oceanbase.gv$ob_sql_audit  -- MySQL: add 'oceanbase.' prefix
         WHERE sql_id = 'your_sql_id'
         ORDER BY elapsed_time DESC;

Get execution plan for slow query:

MySQL Mode: Use

oceanbase.GV$OB_PLAN_CACHE_PLAN_STAT

Oracle Mode: Use

GV$OB_PLAN_CACHE_PLAN_STAT

(no schema prefix)

sql

-- Get execution plan from plan cache
obclient> SELECT tenant_id, svr_ip, svr_port, sql_id, plan_id,
                last_active_time, first_load_time, outline_data
         FROM oceanbase.GV$OB_PLAN_CACHE_PLAN_STAT  -- MySQL: add 'oceanbase.' prefix
         WHERE tenant_id = 1002 AND sql_id = 'your_sql_id'
           AND svr_ip = 'xxx.xxx.xxx.xxx' AND svr_port = 35046;

-- Get plan details
obclient> SELECT operator, name, rows, cost
         FROM oceanbase.GV$OB_PLAN_CACHE_PLAN_EXPLAIN  -- MySQL: add 'oceanbase.' prefix
         WHERE tenant_id = 1002 AND plan_id = 741
           AND svr_ip = 'xxx.xxx.xxx.xxx' AND svr_port = 35046;

获取详细执行统计信息：

MySQL模式： 使用

oceanbase.gv$ob_sql_audit

Oracle模式： 使用

GV$OB_SQL_AUDIT

（无需schema前缀）

sql

-- 获取特定SQL的详细统计信息
obclient> SELECT sql_id, query_sql, executions, elapsed_time, execute_time,
                queue_time, return_rows, affected_rows, partition_cnt,
                table_scan, is_hit_plan, plan_id
         FROM oceanbase.gv$ob_sql_audit  -- MySQL: 添加'oceanbase.'前缀
         WHERE sql_id = 'your_sql_id'
         ORDER BY elapsed_time DESC;

获取慢查询的执行计划：

MySQL模式： 使用

oceanbase.GV$OB_PLAN_CACHE_PLAN_STAT

Oracle模式： 使用

GV$OB_PLAN_CACHE_PLAN_STAT

（无需schema前缀）

sql

-- 从计划缓存中获取执行计划
obclient> SELECT tenant_id, svr_ip, svr_port, sql_id, plan_id,
                last_active_time, first_load_time, outline_data
         FROM oceanbase.GV$OB_PLAN_CACHE_PLAN_STAT  -- MySQL: 添加'oceanbase.'前缀
         WHERE tenant_id = 1002 AND sql_id = 'your_sql_id'
           AND svr_ip = 'xxx.xxx.xxx.xxx' AND svr_port = 35046;

-- 获取计划详情
obclient> SELECT operator, name, rows, cost
         FROM oceanbase.GV$OB_PLAN_CACHE_PLAN_EXPLAIN  -- MySQL: 添加'oceanbase.'前缀
         WHERE tenant_id = 1002 AND plan_id = 741
           AND svr_ip = 'xxx.xxx.xxx.xxx' AND svr_port = 35046;

Find Currently Running Slow Queries

查找当前运行的慢查询

MySQL Mode:

sql

obclient> SELECT user, tenant, sql_id, concat(time, 's') AS time, info,
                svr_ip, svr_port, trace_id
         FROM oceanbase.GV$OB_PROCESSLIST
         WHERE state = 'ACTIVE' ORDER BY time DESC LIMIT 10;

Oracle Mode:

sql

obclient> SELECT user_name, tenant_name, sql_id, time || 's' AS time, info,
                svr_ip, svr_port, trace_id
         FROM GV$OB_PROCESSLIST
         WHERE state = 'ACTIVE' ORDER BY time DESC FETCH FIRST 10 ROWS ONLY;

MySQL模式：

sql

obclient> SELECT user, tenant, sql_id, concat(time, 's') AS time, info,
                svr_ip, svr_port, trace_id
         FROM oceanbase.GV$OB_PROCESSLIST
         WHERE state = 'ACTIVE' ORDER BY time DESC LIMIT 10;

Oracle模式：

sql

obclient> SELECT user_name, tenant_name, sql_id, time || 's' AS time, info,
                svr_ip, svr_port, trace_id
         FROM GV$OB_PROCESSLIST
         WHERE state = 'ACTIVE' ORDER BY time DESC FETCH FIRST 10 ROWS ONLY;

Real-World Optimization Case

实际优化案例

Case: Type Conversion Preventing Index Usage

案例：类型转换导致索引失效

Problem: SQL execution time was 2 seconds, CPU usage exceeded 70% of server resources.

Root Cause Analysis:

Identify TOP SQL:

sql

obclient> SELECT sql_id, COUNT(*) AS executions, AVG(execute_time) AS avg_cpu_time
         FROM oceanbase.GV$OB_SQL_AUDIT
         WHERE tenant_id = 'mysql001'
           AND request_time BETWEEN (TIME_TO_USEC(NOW())-30*60*1000*1000) AND TIME_TO_USEC(NOW())
         GROUP BY sql_id
         ORDER BY SUM(execute_time) DESC
         LIMIT 1;

Analyze Execution Plan:

sql

obclient> EXPLAIN SELECT ... FROM v_tt01 WHERE COL001 IN (20017476);

Execution plan showed:

High cost on
```
TBL3(IDX_TBL3_COL170)
```
index scan
```
range(MIN,MIN ; MAX,MAX)always true
```
- index not used for matching

filter([20017476 = cast(cast(TBL3.COL170, VARCHAR2(20 BYTE)), NUMBER)])

- type conversion happening

Root Cause:
- View
```
V_TT01
```
  had
```
TO_NUMBER
```
  conversion on
```
COL001
```
- Table
```
TBL3.COL170
```
  was
```
VARCHAR2(20)
```
  but compared with
```
NUMBER
```
- Type conversion prevented index usage

Optimization Steps:

Remove unnecessary TO_NUMBER conversion:

sql

-- Before: View had TO_NUMBER conversion
CREATE VIEW V_TT01_OLD AS
SELECT To_number("tt01"."col001") AS "COL001", ...

-- After: Remove TO_NUMBER (COL001 is already NUMBER)
CREATE VIEW V_TT01 AS
SELECT "tt01"."col001" AS "COL001", ...

Result: SQL RT reduced from 2s to 150ms

Unify data types:

sql

-- Before: TBL3.COL170 was VARCHAR2(20)
CREATE TABLE TBL3 (
  COL170 VARCHAR2(20) NOT NULL,
  ...
);

-- After: Change to NUMBER(20) to match join condition
ALTER TABLE TBL3 MODIFY COL170 NUMBER(20) NOT NULL;

Result: SQL RT further reduced to 20ms

Partition and table group optimization:

sql

-- Add hash partitioning on join keys
ALTER TABLE TBL1 PARTITION BY HASH(COL001) PARTITIONS 8;
ALTER TABLE TBL2 PARTITION BY HASH(COL004) PARTITIONS 8;
ALTER TABLE TBL3 PARTITION BY HASH(COL170) PARTITIONS 8;
ALTER TABLE TT01 PARTITION BY HASH(COL001) PARTITIONS 8;

-- Create table group to avoid cross-partition joins
CREATE TABLEGROUP order_tg PARTITION BY HASH(COL001) PARTITIONS 8;
ALTER TABLE TBL1 SET TABLEGROUP order_tg;
ALTER TABLE TBL2 SET TABLEGROUP order_tg;
ALTER TABLE TBL3 SET TABLEGROUP order_tg;
ALTER TABLE TT01 SET TABLEGROUP order_tg;

Result: SQL RT further reduced to 4ms

Final Execution Plan: Shows

PX PARTITION ITERATOR

(partition pruning working) and

TABLE SCAN

with index

INDEX_TBL3_COL170

(index used efficiently)

Key Learnings:

Always check execution plan for type conversions in filters
Ensure join column data types match
Use partition pruning and table groups for distributed queries
Remove unnecessary type conversions in views

问题： SQL执行时间为2秒，CPU使用率超过服务器资源的70%。

根因分析：

识别TOP SQL：

sql

obclient> SELECT sql_id, COUNT(*) AS executions, AVG(execute_time) AS avg_cpu_time
         FROM oceanbase.GV$OB_SQL_AUDIT
         WHERE tenant_id = 'mysql001'
           AND request_time BETWEEN (TIME_TO_USEC(NOW())-30*60*1000*1000) AND TIME_TO_USEC(NOW())
         GROUP BY sql_id
         ORDER BY SUM(execute_time) DESC
         LIMIT 1;

分析执行计划：

sql

obclient> EXPLAIN SELECT ... FROM v_tt01 WHERE COL001 IN (20017476);

执行计划显示：

```
TBL3(IDX_TBL3_COL170)
```
索引扫描成本高
```
range(MIN,MIN ; MAX,MAX)always true
```
- 索引未用于匹配

filter([20017476 = cast(cast(TBL3.COL170, VARCHAR2(20 BYTE)), NUMBER)])

- 发生类型转换

根本原因：
- 视图
```
V_TT01
```
  对
```
COL001
```
  使用了
```
TO_NUMBER
```
  转换
- 表
```
TBL3.COL170
```
  为
```
VARCHAR2(20)
```
  类型，但与
```
NUMBER
```
  类型进行比较
- 类型转换导致索引失效

优化步骤：

移除不必要的TO_NUMBER转换：

sql

-- 优化前：视图包含TO_NUMBER转换
CREATE VIEW V_TT01_OLD AS
SELECT To_number("tt01"."col001") AS "COL001", ...

-- 优化后：移除TO_NUMBER（COL001本身已是NUMBER类型）
CREATE VIEW V_TT01 AS
SELECT "tt01"."col001" AS "COL001", ...

结果：SQL响应时间从2秒降至150毫秒

统一数据类型：

sql

-- 优化前：TBL3.COL170为VARCHAR2(20)
CREATE TABLE TBL3 (
  COL170 VARCHAR2(20) NOT NULL,
  ...
);

-- 优化后：修改为NUMBER(20)，匹配连接条件
ALTER TABLE TBL3 MODIFY COL170 NUMBER(20) NOT NULL;

结果：SQL响应时间进一步降至20毫秒

分区与表组优化：

sql

-- 为连接键添加哈希分区
ALTER TABLE TBL1 PARTITION BY HASH(COL001) PARTITIONS 8;
ALTER TABLE TBL2 PARTITION BY HASH(COL004) PARTITIONS 8;
ALTER TABLE TBL3 PARTITION BY HASH(COL170) PARTITIONS 8;
ALTER TABLE TT01 PARTITION BY HASH(COL001) PARTITIONS 8;

-- 创建表组，避免跨分区连接
CREATE TABLEGROUP order_tg PARTITION BY HASH(COL001) PARTITIONS 8;
ALTER TABLE TBL1 SET TABLEGROUP order_tg;
ALTER TABLE TBL2 SET TABLEGROUP order_tg;
ALTER TABLE TBL3 SET TABLEGROUP order_tg;
ALTER TABLE TT01 SET TABLEGROUP order_tg;

结果：SQL响应时间进一步降至4毫秒

最终执行计划： 显示

PX PARTITION ITERATOR

（分区裁剪生效）和使用索引

INDEX_TBL3_COL170

的

TABLE SCAN

（索引高效利用）

关键经验：

务必检查执行计划中的过滤条件是否存在类型转换
确保连接列的数据类型匹配
对分布式查询使用分区裁剪和表组
移除视图中不必要的类型转换

Common Anti-patterns

常见反模式

❌ Avoid These Patterns

❌ 避免以下模式

1. N+1 Query Problem:

sql

-- ❌ BAD: Multiple queries in loop
-- Instead, use JOIN or batch queries

2. SELECT * in Production:

sql

-- ❌ BAD: SELECT * returns unnecessary data
SELECT * FROM order_table;

-- ✅ GOOD: Select only needed columns
SELECT order_id, customer_id, total_amount FROM order_table;

3. Functions on Indexed Columns:

sql

-- ❌ BAD: Prevents index usage
WHERE UPPER(customer_name) = 'JOHN'

-- ✅ GOOD: Store normalized data or use function-based index
WHERE customer_name = 'JOHN'

4. Implicit Type Conversions:

sql

-- ❌ BAD: String to number conversion
WHERE order_id = '12345'

-- ✅ GOOD: Match data types
WHERE order_id = 12345

1. N+1查询问题：

sql

-- ❌ 不推荐：循环中执行多个查询
-- 替代方案：使用JOIN或批量查询

*2. 生产环境中使用SELECT ：

sql

-- ❌ 不推荐：SELECT *返回不必要的数据
SELECT * FROM order_table;

-- ✅ 推荐：仅选择需要的列
SELECT order_id, customer_id, total_amount FROM order_table;

3. 对索引列使用函数：

sql

-- ❌ 不推荐：导致索引失效
WHERE UPPER(customer_name) = 'JOHN'

-- ✅ 推荐：存储标准化数据或使用函数索引
WHERE customer_name = 'JOHN'

4. 隐式类型转换：

sql

-- ❌ 不推荐：字符串转数字
WHERE order_id = '12345'

-- ✅ 推荐：匹配数据类型
WHERE order_id = 12345

Performance Monitoring

性能监控

Key Metrics to Monitor

需要监控的关键指标

sql

-- Check table statistics
obclient [SALES_DB]> ANALYZE TABLE order_table;

-- View table size and row count
obclient [SALES_DB]> SELECT 
                         table_name,
                         table_rows,
                         data_length,
                         index_length
                     FROM information_schema.tables
                     WHERE table_schema = 'SALES_DB';

-- Check index usage
obclient [SALES_DB]> SHOW INDEX FROM order_table;

sql

-- 检查表统计信息
obclient [SALES_DB]> ANALYZE TABLE order_table;

-- 查看表大小和行数
obclient [SALES_DB]> SELECT 
                         table_name,
                         table_rows,
                         data_length,
                         index_length
                     FROM information_schema.tables
                     WHERE table_schema = 'SALES_DB';

-- 检查索引使用情况
obclient [SALES_DB]> SHOW INDEX FROM order_table;

Optimization Checklist

优化检查清单

Quick Reference

快速参考

Execution Plan Operators

执行计划算子

Operator	Meaning	Optimization
TABLE SCAN / TABLE FULL SCAN	Full table scan	Add index or use partition key
TABLE RANGE SCAN	Range scan on table	✅ Good, but consider index
TABLE GET	Direct row access via primary key	✅ Best
HASH JOIN	Hash join algorithm	✅ Good for large equal joins
NESTED-LOOP JOIN	Nested loop join	✅ Good for small outer table with indexed inner
MERGE JOIN	Merge join algorithm	✅ Good for pre-sorted data
SORT	Sorting operation	Add ORDER BY index or use sorted index
AGGREGATE	Aggregation	Consider materialized views
PX PARTITION ITERATOR	Parallel partition iterator	✅ Good, partition pruning working
EXCHANGE OUT DISTR	Distributed exchange	Indicates distributed execution

算子	含义	优化方向
TABLE SCAN / TABLE FULL SCAN	全表扫描	添加索引或使用分区键
TABLE RANGE SCAN	表范围扫描	✅ 性能较好，但可考虑添加索引
TABLE GET	通过主键直接访问行	✅ 最佳性能
HASH JOIN	哈希连接算法	✅ 适用于大数据集等值连接
NESTED-LOOP JOIN	嵌套循环连接	✅ 适用于小表搭配索引连接
MERGE JOIN	合并连接	✅ 适用于已排序数据
SORT	排序操作	添加ORDER BY索引或使用已排序索引
AGGREGATE	聚合操作	考虑使用物化视图
PX PARTITION ITERATOR	并行分区迭代器	✅ 性能好，分区裁剪生效
EXCHANGE OUT DISTR	分布式交换	表示分布式执行

Index Types

索引类型

Primary Key: Automatically indexed, unique
Unique Index: Ensures uniqueness, fast lookups
Composite Index: Multiple columns, order matters
Covering Index: Contains all query columns

Primary Key：自动创建索引，唯一
Unique Index：保证唯一性，查询速度快
Composite Index：多列索引，列顺序重要
Covering Index：包含查询所需的所有列