analyzing-schema-change-storage-risk

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Analyzing Schema Change Storage Risk

分析Schema变更存储风险

Estimates the storage headroom needed to safely run online schema changes. Mirrors the official guidance: some operations may temporarily require up to 3× the size of the affected table or index while the schema change is in flight.

For ongoing range-distribution monitoring, see analyzing-range-distribution.

估算安全执行在线Schema变更所需的存储余量。与官方指南一致：部分操作在Schema变更执行期间，可能临时需要最多3倍于受影响表或索引的大小的存储空间。

如需持续监控范围分布，请查看analyzing-range-distribution。

When to Use This Skill

何时使用该技能

Run a quick estimate before issuing any of these operations on a table whose indexes are large (multiple GB per index, or many ranges per index):

```
CREATE INDEX
```
```
ADD COLUMN
```
with
```
INDEX
```
or
```
UNIQUE
```
```
ALTER PRIMARY KEY
```
```
CREATE MATERIALIZED VIEW
```
```
CREATE TABLE AS
```
```
REFRESH MATERIALIZED VIEW
```
```
ALTER TABLE ... SET LOCALITY
```
(when the locality change rewrites data)

Tables whose indexes are small (kilobytes to a few megabytes) carry trivial storage risk; estimation is unnecessary.

当对索引较大的表（单索引占用数GB空间，或每个索引包含多个范围）执行以下任一操作前，先快速估算存储需求：

```
CREATE INDEX
```
带
```
INDEX
```
或
```
UNIQUE
```
的
```
ADD COLUMN
```
```
ALTER PRIMARY KEY
```
```
CREATE MATERIALIZED VIEW
```
```
CREATE TABLE AS
```
```
REFRESH MATERIALIZED VIEW
```
```
ALTER TABLE ... SET LOCALITY
```
（当 locality 变更需要重写数据时）

索引较小的表（从千字节到几兆字节）存储风险极低，无需估算。

Background

背景知识

How much temporary space does a backfill actually need?

回填实际需要多少临时空间？

The honest answer depends on the operation:

CREATE INDEX
/
ADD COLUMN ... UNIQUE
: needs roughly 1× the size of the new index — the indexed columns plus the primary key columns, written into a fresh index span. This is typically a small fraction of the table. Worst-case headroom is bounded by the size of that one index.
ALTER PRIMARY KEY
: rewrites the primary index and any secondary indexes whose definitions depend on the old PK. Old data sticks around until GC, so peak on-disk usage during the change can approach the size of the table again.
All bulk-ingest backfills: extra MVCC versions and pre-compaction SSTs add overhead until Pebble compacts and GC runs.

The official docs round these up into a single conservative recommendation: plan for up to 3× the size of the affected table or index as free space. That figure is a safety bound, not a precise prediction. For most

CREATE INDEX

operations the real cost is much smaller; for

ALTER PRIMARY KEY

on a large table it is the right ballpark.

准确答案取决于具体操作：

CREATE INDEX
/
ADD COLUMN ... UNIQUE
：大约需要1倍于新索引的大小——即索引列加上主键列，写入新的索引区间。这通常仅占表的一小部分。最坏情况下所需的余量以该单个索引的大小为上限。
ALTER PRIMARY KEY
：会重写主索引以及所有依赖旧主键定义的二级索引。旧数据会保留直到垃圾回收（GC）完成，因此变更期间磁盘峰值使用量可能接近表的原始大小。
所有批量导入回填操作：额外的MVCC版本和预压缩SST文件会增加开销，直到Pebble完成压缩并执行GC。

官方文档将这些情况汇总为一个保守的建议：预留最多3倍于受影响表或索引大小的可用空间。该数值是安全边界，而非精确预测。对于大多数

CREATE INDEX

操作，实际成本要小得多；而对于大型表的

ALTER PRIMARY KEY

操作，该数值则是合理的估算范围。

What happens if the cluster runs out of disk mid-backfill?

如果集群在回填过程中磁盘耗尽会发生什么？

Backfills bulk-ingest data via

AddSSTable

, which checks the per-store remaining capacity before each ingestion. If the remaining fraction falls below

kv.bulk_io_write.min_capacity_remaining_fraction

(default

0.05

, i.e. 5%), the ingest is rejected with

InsufficientSpaceError

. Both the legacy and declarative schema changers translate that error into a job pause request, so the schema change halts rather than wedging the cluster. To resume, free space (e.g. drop unused indexes, expand storage) and resume the paused job.

This is a reactive safety net, not a planning tool — by the time it fires, foreground writes on the affected store may already be unhealthy.

回填通过

AddSSTable

批量导入数据，每次导入前会检查每个存储节点的剩余容量。如果剩余容量比例低于

kv.bulk_io_write.min_capacity_remaining_fraction

（默认值为

0.05

，即5%），导入会被拒绝并返回

InsufficientSpaceError

。旧版和声明式schema变更器都会将该错误转换为作业暂停请求，因此schema变更会停止而非导致集群故障。如需恢复，需释放空间（例如删除未使用的索引、扩容存储）并恢复暂停的作业。

这是一个被动的安全保障，而非规划工具——当触发该机制时，受影响存储节点上的前台写入可能已经出现异常。

Estimating Capacity

容量估算步骤

Step 1 — Check free space per store

步骤1 — 检查每个存储节点的可用空间

The minimum free space across stores is what bounds the schema change, not the total cluster free space (replicas are spread across nodes).

sql

SELECT
  node_id,
  store_id,
  ROUND((capacity - used) / 1073741824.0, 2) AS free_gb,
  ROUND((used::FLOAT / capacity) * 100, 2)   AS used_pct
FROM crdb_internal.kv_store_status
ORDER BY free_gb ASC;

集群中所有存储节点的最小可用空间才是schema变更的限制因素，而非集群总可用空间（副本分布在不同节点上）。

sql

SELECT
  node_id,
  store_id,
  ROUND((capacity - used) / 1073741824.0, 2) AS free_gb,
  ROUND((used::FLOAT / capacity) * 100, 2)   AS used_pct
FROM crdb_internal.kv_store_status
ORDER BY free_gb ASC;

Step 2 — Estimate the affected table/index size

步骤2 — 估算受影响的表/索引大小

Use the docs-recommended form of

SHOW RANGES

sql

SHOW RANGES FROM TABLE <table> WITH DETAILS, KEYS, INDEXES;

The output includes one row per range, with

range_size_mb

and

index_name

. Aggregate by index for the per-index totals that matter for capacity planning:

sql

WITH r AS (SHOW RANGES FROM TABLE <table> WITH DETAILS, KEYS, INDEXES)
SELECT
  index_name,
  COUNT(*)                              AS range_count,
  ROUND(SUM(range_size_mb), 2)          AS index_size_mb,
  ROUND(SUM(range_size_mb) / 1024, 2)   AS index_size_gb
FROM r
GROUP BY index_name
ORDER BY index_size_mb DESC;

使用文档推荐的

SHOW RANGES

格式：

sql

SHOW RANGES FROM TABLE <table> WITH DETAILS, KEYS, INDEXES;

输出包含每个范围的一行数据，其中包含

range_size_mb

和

index_name

。按索引聚合得到每个索引的总大小，这对容量规划至关重要：

sql

WITH r AS (SHOW RANGES FROM TABLE <table> WITH DETAILS, KEYS, INDEXES)
SELECT
  index_name,
  COUNT(*)                              AS range_count,
  ROUND(SUM(range_size_mb), 2)          AS index_size_mb,
  ROUND(SUM(range_size_mb) / 1024, 2)   AS index_size_gb
FROM r
GROUP BY index_name
ORDER BY index_size_mb DESC;

Step 3 — Compare against the operation

步骤3 — 根据操作类型对比

Operation	Conservative free-space target (per store)
`CREATE INDEX` / `ADD COLUMN ... UNIQUE`	Up to 3× the size of the new index (its indexed + PK columns).
`ALTER PRIMARY KEY`	Up to 3× the size of the table (sum of the relevant indexes from step 2).
`CREATE MATERIALIZED VIEW` / `CREATE TABLE AS`	Up to 3× the expected size of the materialized result.

The new index does not exist yet, so estimate it from a comparable existing index (e.g. one on similarly typed columns) or from the source columns' contribution to the primary index.

If the smallest free-space figure from step 1 is well above the target, the operation is safe to run. If it is close, free space first (drop unused indexes, expand storage) before issuing the DDL.

操作类型	保守可用空间目标（每个存储节点）
`CREATE INDEX` / `ADD COLUMN ... UNIQUE`	最多为新索引大小的3倍（其索引列+主键列的大小）。
`ALTER PRIMARY KEY`	最多为表大小的3倍（步骤2中相关索引的总和）。
`CREATE MATERIALIZED VIEW` / `CREATE TABLE AS`	最多为物化结果预期大小的3倍。

由于新索引尚未存在，可从类似的现有索引（例如具有相似类型列的索引）或源列在主索引中的占比来估算其大小。

如果步骤1中得到的最小可用空间远高于目标值，则操作可以安全执行。如果接近目标值，请先释放空间（删除未使用的索引、扩容存储），再执行DDL语句。

Operational Notes

操作注意事项

SHOW RANGES ... WITH DETAILS
is expensive. It computes span statistics on demand. Always target a specific table, never run it cluster-wide, and prefer maintenance windows on tables with thousands of ranges.

Watch the job, not just disk. If a backfill pauses with

InsufficientSpaceError

, free disk on the affected store and resume the paused schema change job. Check with:

sql

SELECT job_id, status, error
FROM crdb_internal.jobs
WHERE job_type = 'SCHEMA CHANGE' AND status = 'paused';

Drop unused indexes first. Often the cheapest way to free headroom before a large backfill is to drop indexes that
```
crdb_internal.index_usage_statistics
```
shows are unused.
Statistics lag.
```
range_size_mb
```
is approximate and can lag actual disk usage; treat estimates as conservative ballparks, not exact figures.

SHOW RANGES ... WITH DETAILS
开销较大。它会按需计算区间统计信息。请始终针对特定表执行，切勿在集群范围内运行，对于包含数千个范围的表，建议在维护窗口执行。
不仅要监控磁盘，还要监控作业。如果回填因
```
InsufficientSpaceError
```
暂停，请在受影响的存储节点上释放磁盘空间，然后恢复暂停的schema变更作业。可通过以下查询检查：
sql
```
SELECT job_id, status, error
FROM crdb_internal.jobs
WHERE job_type = 'SCHEMA CHANGE' AND status = 'paused';
```
先删除未使用的索引。在大型回填前释放余量最经济的方法通常是删除
```
crdb_internal.index_usage_statistics
```
显示未使用的索引。
统计数据存在延迟。
```
range_size_mb
```
是近似值，可能滞后于实际磁盘使用情况；请将估算值视为保守的大致范围，而非精确数值。