analyzing-schema-change-storage-risk
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAnalyzing Schema Change Storage Risk
分析Schema变更存储风险
Estimates the storage headroom needed to safely run online schema changes.
Mirrors the official guidance:
some operations may temporarily require up to 3× the size of the affected
table or index while the schema change is in flight.
For ongoing range-distribution monitoring, see
analyzing-range-distribution.
估算安全执行在线Schema变更所需的存储余量。与官方指南一致:部分操作在Schema变更执行期间,可能临时需要最多3倍于受影响表或索引的大小的存储空间。
如需持续监控范围分布,请查看analyzing-range-distribution。
When to Use This Skill
何时使用该技能
Run a quick estimate before issuing any of these operations on a table whose
indexes are large (multiple GB per index, or many ranges per index):
CREATE INDEX- with
ADD COLUMNorINDEXUNIQUE ALTER PRIMARY KEYCREATE MATERIALIZED VIEWCREATE TABLE ASREFRESH MATERIALIZED VIEW- (when the locality change rewrites data)
ALTER TABLE ... SET LOCALITY
Tables whose indexes are small (kilobytes to a few megabytes) carry trivial
storage risk; estimation is unnecessary.
当对索引较大的表(单索引占用数GB空间,或每个索引包含多个范围)执行以下任一操作前,先快速估算存储需求:
CREATE INDEX- 带或
INDEX的UNIQUEADD COLUMN ALTER PRIMARY KEYCREATE MATERIALIZED VIEWCREATE TABLE ASREFRESH MATERIALIZED VIEW- (当 locality 变更需要重写数据时)
ALTER TABLE ... SET LOCALITY
索引较小的表(从千字节到几兆字节)存储风险极低,无需估算。
Background
背景知识
How much temporary space does a backfill actually need?
回填实际需要多少临时空间?
The honest answer depends on the operation:
- /
CREATE INDEX: needs roughly 1× the size of the new index — the indexed columns plus the primary key columns, written into a fresh index span. This is typically a small fraction of the table. Worst-case headroom is bounded by the size of that one index.ADD COLUMN ... UNIQUE - : rewrites the primary index and any secondary indexes whose definitions depend on the old PK. Old data sticks around until GC, so peak on-disk usage during the change can approach the size of the table again.
ALTER PRIMARY KEY - All bulk-ingest backfills: extra MVCC versions and pre-compaction SSTs add overhead until Pebble compacts and GC runs.
The official docs round these up into a single conservative recommendation:
plan for up to 3× the size of the affected table or index as free space.
That figure is a safety bound, not a precise prediction. For most
operations the real cost is much smaller; for
on a large table it is the right ballpark.
CREATE INDEXALTER PRIMARY KEY准确答案取决于具体操作:
- /
CREATE INDEX:大约需要1倍于新索引的大小——即索引列加上主键列,写入新的索引区间。这通常仅占表的一小部分。最坏情况下所需的余量以该单个索引的大小为上限。ADD COLUMN ... UNIQUE - :会重写主索引以及所有依赖旧主键定义的二级索引。旧数据会保留直到垃圾回收(GC)完成,因此变更期间磁盘峰值使用量可能接近表的原始大小。
ALTER PRIMARY KEY - 所有批量导入回填操作:额外的MVCC版本和预压缩SST文件会增加开销,直到Pebble完成压缩并执行GC。
官方文档将这些情况汇总为一个保守的建议:预留最多3倍于受影响表或索引大小的可用空间。该数值是安全边界,而非精确预测。对于大多数操作,实际成本要小得多;而对于大型表的操作,该数值则是合理的估算范围。
CREATE INDEXALTER PRIMARY KEYWhat happens if the cluster runs out of disk mid-backfill?
如果集群在回填过程中磁盘耗尽会发生什么?
Backfills bulk-ingest data via , which checks the per-store
remaining capacity before each ingestion. If the remaining fraction falls
below (default ,
i.e. 5%), the ingest is rejected with . Both the
legacy and declarative schema changers translate that error into a job pause
request, so the schema change halts rather than wedging the cluster. To
resume, free space (e.g. drop unused indexes, expand storage) and resume the
paused job.
AddSSTablekv.bulk_io_write.min_capacity_remaining_fraction0.05InsufficientSpaceErrorThis is a reactive safety net, not a planning tool — by the time it fires,
foreground writes on the affected store may already be unhealthy.
回填通过批量导入数据,每次导入前会检查每个存储节点的剩余容量。如果剩余容量比例低于(默认值为,即5%),导入会被拒绝并返回。旧版和声明式schema变更器都会将该错误转换为作业暂停请求,因此schema变更会停止而非导致集群故障。如需恢复,需释放空间(例如删除未使用的索引、扩容存储)并恢复暂停的作业。
AddSSTablekv.bulk_io_write.min_capacity_remaining_fraction0.05InsufficientSpaceError这是一个被动的安全保障,而非规划工具——当触发该机制时,受影响存储节点上的前台写入可能已经出现异常。
Estimating Capacity
容量估算步骤
Step 1 — Check free space per store
步骤1 — 检查每个存储节点的可用空间
The minimum free space across stores is what bounds the schema change, not the
total cluster free space (replicas are spread across nodes).
sql
SELECT
node_id,
store_id,
ROUND((capacity - used) / 1073741824.0, 2) AS free_gb,
ROUND((used::FLOAT / capacity) * 100, 2) AS used_pct
FROM crdb_internal.kv_store_status
ORDER BY free_gb ASC;集群中所有存储节点的最小可用空间才是schema变更的限制因素,而非集群总可用空间(副本分布在不同节点上)。
sql
SELECT
node_id,
store_id,
ROUND((capacity - used) / 1073741824.0, 2) AS free_gb,
ROUND((used::FLOAT / capacity) * 100, 2) AS used_pct
FROM crdb_internal.kv_store_status
ORDER BY free_gb ASC;Step 2 — Estimate the affected table/index size
步骤2 — 估算受影响的表/索引大小
Use the docs-recommended form of :
SHOW RANGESsql
SHOW RANGES FROM TABLE <table> WITH DETAILS, KEYS, INDEXES;The output includes one row per range, with and .
Aggregate by index for the per-index totals that matter for capacity planning:
range_size_mbindex_namesql
WITH r AS (SHOW RANGES FROM TABLE <table> WITH DETAILS, KEYS, INDEXES)
SELECT
index_name,
COUNT(*) AS range_count,
ROUND(SUM(range_size_mb), 2) AS index_size_mb,
ROUND(SUM(range_size_mb) / 1024, 2) AS index_size_gb
FROM r
GROUP BY index_name
ORDER BY index_size_mb DESC;使用文档推荐的格式:
SHOW RANGESsql
SHOW RANGES FROM TABLE <table> WITH DETAILS, KEYS, INDEXES;输出包含每个范围的一行数据,其中包含和。按索引聚合得到每个索引的总大小,这对容量规划至关重要:
range_size_mbindex_namesql
WITH r AS (SHOW RANGES FROM TABLE <table> WITH DETAILS, KEYS, INDEXES)
SELECT
index_name,
COUNT(*) AS range_count,
ROUND(SUM(range_size_mb), 2) AS index_size_mb,
ROUND(SUM(range_size_mb) / 1024, 2) AS index_size_gb
FROM r
GROUP BY index_name
ORDER BY index_size_mb DESC;Step 3 — Compare against the operation
步骤3 — 根据操作类型对比
| Operation | Conservative free-space target (per store) |
|---|---|
| Up to 3× the size of the new index (its indexed + PK columns). |
| Up to 3× the size of the table (sum of the relevant indexes from step 2). |
| Up to 3× the expected size of the materialized result. |
The new index does not exist yet, so estimate it from a comparable existing
index (e.g. one on similarly typed columns) or from the source columns'
contribution to the primary index.
If the smallest free-space figure from step 1 is well above the target, the
operation is safe to run. If it is close, free space first (drop unused
indexes, expand storage) before issuing the DDL.
| 操作类型 | 保守可用空间目标(每个存储节点) |
|---|---|
| 最多为新索引大小的3倍(其索引列+主键列的大小)。 |
| 最多为表大小的3倍(步骤2中相关索引的总和)。 |
| 最多为物化结果预期大小的3倍。 |
由于新索引尚未存在,可从类似的现有索引(例如具有相似类型列的索引)或源列在主索引中的占比来估算其大小。
如果步骤1中得到的最小可用空间远高于目标值,则操作可以安全执行。如果接近目标值,请先释放空间(删除未使用的索引、扩容存储),再执行DDL语句。
Operational Notes
操作注意事项
- is expensive. It computes span statistics on demand. Always target a specific table, never run it cluster-wide, and prefer maintenance windows on tables with thousands of ranges.
SHOW RANGES ... WITH DETAILS - Watch the job, not just disk. If a backfill pauses with
, free disk on the affected store and resume the paused schema change job. Check with:
InsufficientSpaceErrorsqlSELECT job_id, status, error FROM crdb_internal.jobs WHERE job_type = 'SCHEMA CHANGE' AND status = 'paused'; - Drop unused indexes first. Often the cheapest way to free headroom
before a large backfill is to drop indexes that
shows are unused.
crdb_internal.index_usage_statistics - Statistics lag. is approximate and can lag actual disk usage; treat estimates as conservative ballparks, not exact figures.
range_size_mb
- 开销较大。它会按需计算区间统计信息。请始终针对特定表执行,切勿在集群范围内运行,对于包含数千个范围的表,建议在维护窗口执行。
SHOW RANGES ... WITH DETAILS - 不仅要监控磁盘,还要监控作业。如果回填因暂停,请在受影响的存储节点上释放磁盘空间,然后恢复暂停的schema变更作业。可通过以下查询检查:
InsufficientSpaceErrorsqlSELECT job_id, status, error FROM crdb_internal.jobs WHERE job_type = 'SCHEMA CHANGE' AND status = 'paused'; - 先删除未使用的索引。在大型回填前释放余量最经济的方法通常是删除显示未使用的索引。
crdb_internal.index_usage_statistics - 统计数据存在延迟。是近似值,可能滞后于实际磁盘使用情况;请将估算值视为保守的大致范围,而非精确数值。
range_size_mb
References
参考资料
Related Skills
相关技能
- analyzing-range-distribution — range count, leaseholder placement, fragmentation
- analyzing-range-distribution — 范围计数、租约持有者放置、碎片分析