alibabacloud-polardbx-sql
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePolarDB-X SQL (MySQL Compatibility Focus)
PolarDB-X SQL(专注MySQL兼容性)
Write, review, and adapt SQL for PolarDB-X 2.0 Enterprise Edition (Distributed Edition) AUTO mode databases, avoiding the "runs on MySQL but fails on PolarDB-X" problem.
Architecture: PolarDB-X 2.0 Enterprise Edition (CN compute nodes + DN storage nodes + GMS metadata service + CDC log nodes) + AUTO mode database
Scope:
- PolarDB-X 2.0 Enterprise Edition (also known as Distributed Edition) + AUTO mode database
Not applicable to:
- PolarDB-X 1.0 (DRDS 1.0)
- PolarDB-X 2.0 Standard Edition
- PolarDB-X 2.0 Enterprise Edition DRDS mode databases
Key difference between AUTO mode and DRDS mode: AUTO mode uses MySQL-compatible syntax to define partitions, while DRDS mode uses the legacy syntax. Verify the database mode with:
PARTITION BYdbpartition/tbpartitionsql
SHOW CREATE DATABASE db_name;
-- Output containing MODE = 'auto' indicates AUTO mode为PolarDB-X 2.0企业版(分布式版)AUTO模式数据库编写、审核和适配SQL,避免出现「在MySQL上可运行但在PolarDB-X上运行失败」的问题。
架构:PolarDB-X 2.0企业版(CN计算节点 + DN存储节点 + GMS元数据服务 + CDC日志节点) + AUTO模式数据库
适用范围:
- PolarDB-X 2.0企业版(也称为分布式版) + AUTO模式数据库
不适用场景:
- PolarDB-X 1.0(DRDS 1.0)
- PolarDB-X 2.0标准版
- PolarDB-X 2.0企业版DRDS模式数据库
AUTO模式和DRDS模式的核心区别:AUTO模式使用兼容MySQL的语法定义分区,而DRDS模式使用旧版语法。可通过以下命令确认数据库模式:
PARTITION BYdbpartition/tbpartitionsql
SHOW CREATE DATABASE db_name;
-- 输出包含MODE = 'auto'则表示为AUTO模式Installation
安装
Connect to a PolarDB-X instance via a MySQL-compatible client:
bash
mysql -h <host> -P <port> -u <user> -p<password> -D <database>Supported clients: MySQL CLI, MySQL Workbench, DBeaver, Navicat, or any MySQL-compatible client.
通过兼容MySQL的客户端连接PolarDB-X实例:
bash
mysql -h <host> -P <port> -u <user> -p<password> -D <database>支持的客户端:MySQL CLI、MySQL Workbench、DBeaver、Navicat,或任意兼容MySQL的客户端。
Parameter Confirmation
参数确认
IMPORTANT: Parameter Confirmation — Before executing any command or API call, ALL user-customizable parameters (e.g., RegionId, instance names, CIDR blocks, passwords, domain names, resource specifications, etc.) MUST be confirmed with the user. Do NOT assume or use default values without explicit user approval.
Configurable parameters for this skill:
| Parameter Name | Required/Optional | Description | Default Value |
|---|---|---|---|
| host | Required | PolarDB-X instance connection address | None |
| port | Required | PolarDB-X instance port | 3306 |
| user | Required | Database username | None |
| password | Required | Database password | None |
| database | Required | Target database name | None |
重要提示:参数确认 — 执行任何命令或API调用前,所有用户可自定义参数(例如RegionId、实例名称、CIDR块、密码、域名、资源规格等)必须与用户确认。未获得用户明确批准时,请勿假设参数值或使用默认值。
本技能的可配置参数:
| 参数名 | 必填/可选 | 描述 | 默认值 |
|---|---|---|---|
| host | 必填 | PolarDB-X实例连接地址 | 无 |
| port | 必填 | PolarDB-X实例端口 | 3306 |
| user | 必填 | 数据库用户名 | 无 |
| password | 必填 | 数据库密码 | 无 |
| database | 必填 | 目标数据库名称 | 无 |
Core Workflow (Follow each time)
核心工作流(每次都需遵循)
- Confirm the target engine and version:
- Run to determine the instance type:
SELECT VERSION();- Result contains with version > 5.4.12 (e.g.,
TDDL) -> 2.0 Enterprise Edition (Distributed Edition), this skill applies. Parse the Enterprise Edition version number (e.g., 5.4.19).5.7.25-TDDL-5.4.19-20251031 - Result contains with version <= 5.4.12 (e.g.,
TDDL) -> DRDS 1.0, this skill does not apply.5.6.29-TDDL-5.4.12-16327949 - Result contains (e.g.,
X-Cluster) -> 2.0 Standard Edition, this skill does not apply, use the8.0.32-X-Cluster-8.4.20-20251017skill instead.polardbx-standard
- Result contains
- After confirming 2.0 Enterprise Edition, run to verify AUTO mode (MODE = 'auto').
SHOW CREATE DATABASE db_name; - The version number affects feature availability (e.g., NEW SEQUENCE requires 5.4.14+, CCI requires a newer version).
- Run
- Determine the table type:
- Small or dictionary tables -> Broadcast table (fully replicated to every DN).
BROADCAST - Tables that don't need distribution -> Single table (stored on one DN only).
SINGLE - Otherwise -> Partitioned table (default), choose appropriate partition key and strategy.
- Small or dictionary tables -> Broadcast table
- Partition scheme design (for partitioned tables):
- Collect SQL access pattern data: prefer SQL Insight (most accurate); when unavailable, use slow query logs + application code analysis, or have the business team provide SQL patterns as alternatives. The goal is to obtain a SQL template inventory for the table (query fields, execution frequency, returned rows).
- Partition key selection: Prefer fields with high equality query ratio and high cardinality; primary keys/unique keys have a natural advantage (highest cardinality, no hotspots); exclude fields with hotspots (fields with few distinct values or extremely uneven distribution are unsuitable as partition keys).
- GSI selection: Decide strategy based on write volume — tables with low write volume can freely create GSIs; create GSIs for high-frequency non-partition-key query fields; fields with low cardinality and time fields are unsuitable for GSI; fields that always appear combined with other fields and never appear alone don't need standalone GSIs. GSI types: regular GSI for few returned rows, Clustered GSI for one-to-many, UGSI for unique constraints. GSI syntax must include — see gsi.md for full syntax.
PARTITION BY KEY(...) PARTITIONS N - Partition algorithm: ~90% of workloads use single-level HASH/KEY; order-type multi-dimensional queries use CO_HASH; time-based data cleanup uses HASH+RANGE; multi-tenant uses LIST+HASH. For single column, HASH and KEY are equivalent.
- Partition count: 256 suits the vast majority of workloads; should be several times the number of DN nodes; keep single partition under 100 million rows.
- Migration workflow (three-step method for single table to partitioned table): (1) First convert to a partitioned table with 1 partition (preserving uniqueness) -> (2) Create required GSI/UGSI -> (3) Change to the target partition count. See partition-design-best-practice.md for details.
- Use PolarDB-X safe defaults when generating SQL:
- Avoid unsupported MySQL features (stored procedures/triggers/EVENTs/SPATIAL, etc.).
- Use or
KEYpartitioning instead of MySQL's AUTO_INCREMENT primary key write hotspot.HASH - When non-partition-key queries are needed, consider creating Global Secondary Indexes (GSI).
- If the user provides MySQL SQL, perform compatibility checks:
- Replace unsupported features and provide PolarDB-X alternatives.
- Clearly mark behavioral differences and version requirements.
- When SQL is slow or errors occur, use PolarDB-X diagnostic tools:
- to view the logical execution plan.
EXPLAIN - to view the physical execution plan pushed down to DN.
EXPLAIN EXECUTE - to view shard scan details and check for full-shard scans.
EXPLAIN SHARDING - to actually execute and collect runtime statistics.
EXPLAIN ANALYZE
- 确认目标引擎和版本:
- 运行确定实例类型:
SELECT VERSION();- 结果包含且版本 > 5.4.12(例如
TDDL)-> 2.0企业版(分布式版),适用本技能。解析企业版版本号(例如5.4.19)。5.7.25-TDDL-5.4.19-20251031 - 结果包含且版本 <= 5.4.12(例如
TDDL)-> DRDS 1.0,不适用本技能。5.6.29-TDDL-5.4.12-16327949 - 结果包含(例如
X-Cluster)-> 2.0标准版,不适用本技能,请改用8.0.32-X-Cluster-8.4.20-20251017技能。polardbx-standard
- 结果包含
- 确认是2.0企业版后,运行验证是否为AUTO模式(MODE = 'auto')。
SHOW CREATE DATABASE db_name; - 版本号会影响功能可用性(例如NEW SEQUENCE需要5.4.14+版本,CCI需要更高版本)。
- 运行
- 确定表类型:
- 小表或字典表 -> 广播表(全量复制到每个DN节点)。
BROADCAST - 不需要分布式存储的表 -> 单表(仅存储在一个DN节点)。
SINGLE - 其他场景 -> 分区表(默认),选择合适的分区键和分区策略。
- 小表或字典表 -> 广播表
- 分区方案设计(针对分区表):
- 收集SQL访问模式数据:优先使用SQL Insight(最准确);不可用时,可使用慢查询日志 + 应用代码分析,或让业务团队提供SQL模式作为替代。目标是获取表的SQL模板清单(查询字段、执行频率、返回行数)。
- 分区键选择:优先选择等值查询占比高、基数高的字段;主键/唯一键具有天然优势(基数最高,无热点);排除存在热点的字段(distinct值少或分布极不均匀的字段不适合作为分区键)。
- GSI选择:根据写入量决定策略 — 写入量低的表可自由创建GSI;为高频非分区键查询字段创建GSI;低基数字段和时间字段不适合创建GSI;总是与其他字段组合出现、从不单独出现的字段不需要单独创建GSI。GSI类型:返回行数少的场景用普通GSI,一对多场景用聚簇GSI,唯一约束场景用UGSI。GSI语法必须包含— 完整语法参见gsi.md。
PARTITION BY KEY(...) PARTITIONS N - 分区算法:约90%的工作负载使用单级HASH/KEY;订单类多维查询使用CO_HASH;基于时间的数据清理使用HASH+RANGE;多租户场景使用LIST+HASH。针对单列的场景,HASH和KEY等效。
- 分区数量:256个分区适配绝大多数工作负载;应为DN节点数量的数倍;单个分区的行数保持在1亿以下。
- 迁移工作流(单表转分区表三步法):(1)首先转换为只有1个分区的分区表(保留唯一性)->(2)创建所需的GSI/UGSI ->(3)调整为目标分区数量。详情参见partition-design-best-practice.md。
- 生成SQL时使用PolarDB-X安全默认配置:
- 避免使用不支持的MySQL特性(存储过程/触发器/EVENT/空间类型等)。
- 使用或
KEY分区代替MySQL的AUTO_INCREMENT主键写入热点。HASH - 需要非分区键查询时,考虑创建全局二级索引(GSI)。
- 如果用户提供了MySQL SQL,执行兼容性检查:
- 替换不支持的特性,提供PolarDB-X替代方案。
- 明确标注行为差异和版本要求。
- 当SQL运行缓慢或出现错误时,使用PolarDB-X诊断工具:
- 查看逻辑执行计划。
EXPLAIN - 查看下推到DN的物理执行计划。
EXPLAIN EXECUTE - 查看分片扫描详情,检查是否存在全分片扫描。
EXPLAIN SHARDING - 实际执行并收集运行时统计信息。
EXPLAIN ANALYZE
Key Differences Quick Reference
核心差异速查
- Three table types: Single table (), Broadcast table (
SINGLE), Partitioned table (default); choose based on data volume and access patterns.BROADCAST - Partitioned tables: Support KEY/HASH/RANGE/LIST/RANGE COLUMNS/LIST COLUMNS/CO_HASH + secondary partitions (49 combinations).
- Primary keys and unique keys: Classified as Global (globally unique) or Local (unique within partition); single/broadcast/auto-partitioned tables are always Global; manual partitioned tables require primary/unique keys to include all partition columns for Global, otherwise Local (risk of data duplication and DDL failure).
- Global Secondary Index GSI: Solves full-shard scan issues for non-partition-key queries, supports GSI / UGSI / Clustered GSI types. CRITICAL: GSI must specify its own PARTITION BY clause — it is an independently partitioned table, not a regular MySQL index. Correct syntax:
sql
-- ✅ Correct: GSI with PARTITION BY clause GLOBAL INDEX g_i_seller(seller_id) PARTITION BY KEY(seller_id) PARTITIONS 16 CLUSTERED INDEX cg_i_buyer(buyer_id) PARTITION BY KEY(buyer_id) PARTITIONS 16 -- ❌ Wrong: Missing PARTITION BY (this is NOT MySQL INDEX syntax) GLOBAL INDEX gsi_seller(seller_id) - Clustered Columnar Index CCI: Row-column hybrid storage, accelerates OLAP analytical queries via .
CLUSTERED COLUMNAR INDEX - Sequence: Globally unique sequence, default type is (5.4.14+), distributed alternative to AUTO_INCREMENT.
NEW SEQUENCE - Distributed transactions: Based on TSO global clock + MVCC + 2PC, strong consistency by default; single-shard transactions automatically optimized to local transactions.
- Table groups: Tables with the same partition rules bound to the same table group, ensuring JOIN computation pushdown to avoid cross-shard data shuffling.
- TTL tables: Automatic expiration and cleanup of cold data based on time columns, can work with CCI for hot/cold data separation.
- Unsupported MySQL features: Stored procedures/triggers/EVENTs/SPATIAL/GEOMETRY/LOAD XML/HANDLER, etc.
- STRAIGHT_JOIN / NATURAL JOIN not supported: Use standard JOIN syntax instead.
- := assignment operator not supported: Move logic to the application layer.
- Subqueries not supported in HAVING/JOIN ON clauses: Rewrite subqueries as JOINs or CTEs.
- 三种表类型:单表()、广播表(
SINGLE)、分区表(默认);根据数据量和访问模式选择。BROADCAST - 分区表:支持KEY/HASH/RANGE/LIST/RANGE COLUMNS/LIST COLUMNS/CO_HASH + 二级分区(共49种组合)。
- 主键和唯一键:分为全局(全局唯一)或本地(分区内唯一);单表/广播表/自动分区表始终是全局的;手动分区表要求主键/唯一键包含所有分区列才能实现全局唯一,否则为本地唯一(存在数据重复和DDL失败的风险)。
- 全局二级索引GSI:解决非分区键查询的全分片扫描问题,支持GSI / UGSI / 聚簇GSI类型。重要提示:GSI必须指定自身的PARTITION BY子句 — 它是独立分区的表,不是普通的MySQL索引。正确语法:
sql
-- ✅ 正确:GSI带PARTITION BY子句 GLOBAL INDEX g_i_seller(seller_id) PARTITION BY KEY(seller_id) PARTITIONS 16 CLUSTERED INDEX cg_i_buyer(buyer_id) PARTITION BY KEY(buyer_id) PARTITIONS 16 -- ❌ 错误:缺少PARTITION BY(这不是MySQL INDEX语法) GLOBAL INDEX gsi_seller(seller_id) - 聚簇列索引CCI:行列混合存储,通过加速OLAP分析查询。
CLUSTERED COLUMNAR INDEX - Sequence:全局唯一序列,默认类型为(5.4.14+版本支持),是AUTO_INCREMENT的分布式替代方案。
NEW SEQUENCE - 分布式事务:基于TSO全局时钟 + MVCC + 2PC,默认强一致性;单分片事务自动优化为本地事务。
- 表组:分区规则相同的表绑定到同一个表组,确保JOIN计算下推,避免跨分片数据shuffle。
- TTL表:基于时间列自动过期清理冷数据,可与CCI配合实现冷热数据分离。
- 不支持的MySQL特性:存储过程/触发器/EVENT/空间类型/几何类型/LOAD XML/HANDLER等。
- 不支持STRAIGHT_JOIN / NATURAL JOIN:请改用标准JOIN语法。
- 不支持:=赋值运算符:将逻辑移到应用层实现。
- HAVING/JOIN ON子句不支持子查询:将子查询重写为JOIN或CTE。
Best Practices
最佳实践
- Choose the right table type: Use broadcast tables for small/dictionary tables, single tables for non-distributed needs, partitioned tables for everything else.
- Select partition keys based on real SQL patterns: Prefer SQL Insight data; when unavailable, use slow query logs or code analysis as alternatives; prioritize fields with high equality query ratio, high cardinality, and no hotspots; primary keys/unique keys are naturally strong partition key candidates.
- Include partition columns in primary keys: Primary/unique keys of manual partitioned tables should include all partition columns to ensure global uniqueness.
- Create GSIs wisely: Decide GSI strategy based on write volume; use regular GSI for few returned rows, Clustered GSI for one-to-many, UGSI for unique constraints; don't create GSIs for low-ratio SQL; use to periodically clean up redundant GSIs. Every GSI must have its own
INSPECT INDEXclause; never write barePARTITION BY KEY(...) PARTITIONS Nwithout PARTITION BY.GLOBAL INDEX idx(col) - Use 256 partitions: 256 partitions suit the vast majority of workloads, should be several times the number of DN nodes.
- Use the three-step method for single table to partitioned table: First convert to 1 partition (preserving uniqueness) -> Create GSI/UGSI -> Change to target partition count, avoiding uniqueness constraint gaps.
- Don't force partition key hits for low-ratio SQL: Partition design is pragmatic work; low-QPS cross-shard queries have limited total cost, don't create GSIs for every query field.
- Use table groups to optimize JOINs: Bind frequently joined tables to the same table group using the same partition rules.
- Avoid unsupported MySQL syntax: Don't use stored procedures, triggers, EVENTs, SPATIAL, NATURAL JOIN, , etc.
:= - Avoid subqueries in HAVING/JOIN ON: Rewrite as JOINs or CTEs.
- Use EXPLAIN commands for diagnosis: For SQL performance issues, prefer and
EXPLAIN SHARDING.EXPLAIN ANALYZE - Check long transactions before Online DDL: Check for long transactions before executing DDL to avoid MDL lock waits.
- Use TTL tables to manage cold data: For large tables with time attributes, use TTL tables to automatically clean up expired data.
- Use Keyset pagination for efficient paging: Avoid deep pagination (cost O(M+N), even larger in distributed systems); record the sort value of the last row in each batch as the WHERE condition for the next batch; when sort columns may have duplicates, use
LIMIT M, Ntuple comparison; ensure appropriate composite indexes on sort columns.(sort_column, id) - Use auto-add partitions for Range partitioned tables: Leverage the TTL mechanism to automatically pre-create future partitions for time-type Range partitioned tables, preventing write failures due to insufficient partitions; set for add-only mode; immediately run
TTL_CLEANUP = 'OFF'after configuration to trigger the first pre-creation; requires version 5.4.20+.CLEANUP EXPIRED DATA WITH TTL_CLEANUP = 'OFF'
- 选择合适的表类型:小表/字典表使用广播表,非分布式需求的表使用单表,其他场景使用分区表。
- 基于真实SQL模式选择分区键:优先使用SQL Insight数据;不可用时,使用慢查询日志或代码分析作为替代;优先选择等值查询占比高、基数高、无热点的字段;主键/唯一键天然是优秀的分区键候选。
- 主键中包含分区列:手动分区表的主键/唯一键应包含所有分区列,以确保全局唯一性。
- 合理创建GSI:根据写入量决定GSI策略;返回行数少的场景用普通GSI,一对多场景用聚簇GSI,唯一约束场景用UGSI;不要为占比低的SQL创建GSI;使用定期清理冗余GSI。每个GSI必须有自身的
INSPECT INDEX子句;切勿编写不带PARTITION BY的裸PARTITION BY KEY(...) PARTITIONS N语句。GLOBAL INDEX idx(col) - 使用256个分区:256个分区适配绝大多数工作负载,应为DN节点数量的数倍。
- 单表转分区表使用三步法:首先转换为1个分区(保留唯一性)-> 创建GSI/UGSI -> 调整为目标分区数量,避免唯一性约束漏洞。
- 不要为占比低的SQL强制命中分区键:分区设计是务实的工作;低QPS的跨分片查询总开销有限,不要为每个查询字段都创建GSI。
- 使用表组优化JOIN:将经常关联查询的表使用相同的分区规则绑定到同一个表组。
- 避免使用不支持的MySQL语法:不要使用存储过程、触发器、EVENT、空间类型、NATURAL JOIN、等。
:= - 避免在HAVING/JOIN ON中使用子查询:重写为JOIN或CTE。
- 使用EXPLAIN命令进行诊断:遇到SQL性能问题时,优先使用和
EXPLAIN SHARDING。EXPLAIN ANALYZE - 在线DDL前检查长事务:执行DDL前检查是否存在长事务,避免MDL锁等待。
- 使用TTL表管理冷数据:对于带有时间属性的大表,使用TTL表自动清理过期数据。
- 使用Keyset分页实现高效分页:避免深度分页(开销为O(M+N),在分布式系统中开销更大);记录每批次最后一行的排序值作为下一批次的WHERE条件;当排序列可能重复时,使用
LIMIT M, N元组比较;确保排序列上有合适的复合索引。(sort_column, id) - 范围分区表使用自动新增分区:利用TTL机制为时间类型的范围分区表自动预创建未来分区,防止分区不足导致的写入失败;设置实现仅新增分区模式;配置完成后立即运行
TTL_CLEANUP = 'OFF'触发首次预创建;需要5.4.20+版本支持。CLEANUP EXPIRED DATA WITH TTL_CLEANUP = 'OFF'
Reference Links
参考链接
| Reference | Description |
|---|---|
| references/create-table.md | CREATE TABLE syntax, table types (single/broadcast/partitioned), partition strategies, secondary partitions, partition management |
| references/partition-design-best-practice.md | Partition design best practices: partition key/GSI/algorithm/count selection, three-step migration, complete examples |
| references/primary-key-unique-key.md | Primary key and unique key Global/Local classification, rules, risks, and recommendations |
| references/gsi.md | Global Secondary Index GSI/UGSI/Clustered GSI creation, querying, and limitations |
| references/cci.md | Clustered Columnar Index CCI creation, usage, and applicable scenarios |
| references/sequence.md | Sequence types (NEW/GROUP/SIMPLE/TIME), creation and usage |
| references/transactions.md | Distributed transaction model, isolation levels, and considerations |
| references/mysql-compatibility-notes.md | MySQL vs PolarDB-X compatibility differences and development limitations |
| references/explain.md | EXPLAIN command variants and execution plan diagnostics |
| references/ttl-table.md | TTL table definition, cold data archiving, and cleanup scheduling |
| references/online-ddl.md | Online DDL assessment, lock-free execution strategy, long transaction checks, DMS lock-free changes |
| references/pagination-best-practice.md | Efficient pagination: Keyset pagination, per-shard traversal, index requirements, Java examples |
| references/auto-add-range-parts.md | Range partition auto-add: TTL-based partition pre-creation, first/second level configuration, management commands |
| references/cli-installation-guide.md | Alibaba Cloud CLI installation guide |
| 参考资料 | 描述 |
|---|---|
| references/create-table.md | CREATE TABLE语法、表类型(单表/广播表/分区表)、分区策略、二级分区、分区管理 |
| references/partition-design-best-practice.md | 分区设计最佳实践:分区键/GSI/算法/数量选择、三步迁移法、完整示例 |
| references/primary-key-unique-key.md | 主键和唯一键的全局/本地分类、规则、风险与建议 |
| references/gsi.md | 全局二级索引GSI/UGSI/聚簇GSI的创建、查询与限制 |
| references/cci.md | 聚簇列索引CCI的创建、使用与适用场景 |
| references/sequence.md | Sequence类型(NEW/GROUP/SIMPLE/TIME)、创建与使用 |
| references/transactions.md | 分布式事务模型、隔离级别与注意事项 |
| references/mysql-compatibility-notes.md | MySQL与PolarDB-X的兼容性差异与开发限制 |
| references/explain.md | EXPLAIN命令变体与执行计划诊断 |
| references/ttl-table.md | TTL表定义、冷数据归档与清理调度 |
| references/online-ddl.md | 在线DDL评估、无锁执行策略、长事务检查、DMS无锁变更 |
| references/pagination-best-practice.md | 高效分页:Keyset分页、分片遍历、索引要求、Java示例 |
| references/auto-add-range-parts.md | 范围分区自动新增:基于TTL的分区预创建、一二层级配置、管理命令 |
| references/cli-installation-guide.md | 阿里云CLI安装指南 |