setup-timescaledb-hypertables

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

TimescaleDB Complete Setup

TimescaleDB 完整设置

Instructions for insert-heavy data patterns where data is inserted but rarely changed:
  • Time-series data (sensors, metrics, system monitoring)
  • Event logs (user events, audit trails, application logs)
  • Transaction records (orders, payments, financial transactions)
  • Sequential data (records with auto-incrementing IDs and timestamps)
  • Append-only datasets (immutable records, historical data)
以下是针对写入密集型数据模式的说明,这类数据模式下数据仅被写入但很少被修改:
  • 时序数据(传感器、指标、系统监控)
  • 事件日志(用户事件、审计追踪、应用日志)
  • 交易记录(订单、支付、金融交易)
  • 序列数据(带自增ID和时间戳的记录)
  • 仅追加数据集(不可变记录、历史数据)

Step 1: Create Hypertable

步骤1:创建Hypertable

sql
CREATE TABLE your_table_name (
    timestamp TIMESTAMPTZ NOT NULL,
    entity_id TEXT NOT NULL,          -- device_id, user_id, symbol, etc.
    category TEXT,                    -- sensor_type, event_type, asset_class, etc.
    value_1 DOUBLE PRECISION,         -- price, temperature, latency, etc.
    value_2 DOUBLE PRECISION,         -- volume, humidity, throughput, etc.
    value_3 INTEGER,                  -- count, status, level, etc.
    metadata JSONB                    -- flexible additional data
) WITH (
    tsdb.hypertable,
    tsdb.partition_column='timestamp',
    tsdb.enable_columnstore=true,     -- Disable if table has vector columns
    tsdb.segmentby='entity_id',       -- See selection guide below
    tsdb.orderby='timestamp DESC',     -- See selection guide below
    tsdb.sparse_index='minmax(value_1),minmax(value_2),minmax(value_3)' -- see selection guide below
);
sql
CREATE TABLE your_table_name (
    timestamp TIMESTAMPTZ NOT NULL,
    entity_id TEXT NOT NULL,          -- device_id, user_id, symbol, etc.
    category TEXT,                    -- sensor_type, event_type, asset_class, etc.
    value_1 DOUBLE PRECISION,         -- price, temperature, latency, etc.
    value_2 DOUBLE PRECISION,         -- volume, humidity, throughput, etc.
    value_3 INTEGER,                  -- count, status, level, etc.
    metadata JSONB                    -- flexible additional data
) WITH (
    tsdb.hypertable,
    tsdb.partition_column='timestamp',
    tsdb.enable_columnstore=true,     -- Disable if table has vector columns
    tsdb.segmentby='entity_id',       -- See selection guide below
    tsdb.orderby='timestamp DESC',     -- See selection guide below
    tsdb.sparse_index='minmax(value_1),minmax(value_2),minmax(value_3)' -- see selection guide below
);

Compression Decision

压缩决策

  • Enable by default for insert-heavy patterns
  • Disable if table has vector type columns (pgvector) - indexes on vector columns incompatible with columnstore
  • 默认启用 写入密集型模式
  • 禁用场景 表中包含向量类型列(pgvector)——向量列的索引与columnstore不兼容

Partition Column Selection

分区列选择

Must be time-based (TIMESTAMP/TIMESTAMPTZ/DATE) or integer (INT/BIGINT) with good temporal/sequential distribution.
Common patterns:
  • TIME-SERIES:
    timestamp
    ,
    event_time
    ,
    measured_at
  • EVENT LOGS:
    event_time
    ,
    created_at
    ,
    logged_at
  • TRANSACTIONS:
    created_at
    ,
    transaction_time
    ,
    processed_at
  • SEQUENTIAL:
    id
    (auto-increment when no timestamp),
    sequence_number
  • APPEND-ONLY:
    created_at
    ,
    inserted_at
    ,
    id
Less ideal:
ingested_at
(when data entered system - use only if it's your primary query dimension) Avoid:
updated_at
(breaks time ordering unless it's primary query dimension)
必须是基于时间的类型(TIMESTAMP/TIMESTAMPTZ/DATE)或整数类型(INT/BIGINT),且具有良好的时间/序列分布。
常见模式:
  • 时序数据:
    timestamp
    ,
    event_time
    ,
    measured_at
  • 事件日志:
    event_time
    ,
    created_at
    ,
    logged_at
  • 交易数据:
    created_at
    ,
    transaction_time
    ,
    processed_at
  • 序列数据:
    id
    (无时间戳时使用自增ID),
    sequence_number
  • 仅追加数据:
    created_at
    ,
    inserted_at
    ,
    id
不太理想的选择:
ingested_at
(数据进入系统的时间——仅当它是主要查询维度时使用) 避免选择:
updated_at
(会破坏时间顺序,除非它是主要查询维度)

Segment_By Column Selection

Segment_By列选择

PREFER SINGLE COLUMN - multi-column rarely optimal. Multi-column can only work for highly correlated columns (e.g., metric_name + metric_type) with sufficient row density.
Requirements:
  • Frequently used in WHERE clauses (most common filter)
  • Good row density (>100 rows per value per chunk)
  • Primary logical partition/grouping
Examples:
  • IoT:
    device_id
  • Finance:
    symbol
  • Metrics:
    service_name
    ,
    service_name, metric_type
    (if sufficient row density),
    metric_name, metric_type
    (if sufficient row density)
  • Analytics:
    user_id
    if sufficient row density, otherwise
    session_id
  • E-commerce:
    product_id
    if sufficient row density, otherwise
    category_id
Row density guidelines:
  • Target: >100 rows per segment_by value within each chunk.
  • Poor: <10 rows per segment_by value per chunk → choose less granular column
  • What to do with low-density columns: prepend to order_by column list.
Query pattern drives choice:
sql
SELECT * FROM table WHERE entity_id = 'X' AND timestamp > ...
-- ↳ segment_by: entity_id (if >100 rows per chunk)
Avoid: timestamps, unique IDs, low-density columns (<100 rows/value/chunk), columns rarely used in filtering
优先选择单列——多列很少是最优的。多列仅适用于高度相关的列(例如metric_name + metric_type)且具有足够的行密度。
要求:
  • 频繁出现在WHERE子句中(最常用的过滤条件)
  • 良好的行密度(每个块中每个值对应>100行)
  • 主要的逻辑分区/分组依据
示例:
  • IoT:
    device_id
  • 金融:
    symbol
  • 指标:
    service_name
    ,
    service_name, metric_type
    (如果行密度足够),
    metric_name, metric_type
    (如果行密度足够)
  • 分析:
    user_id
    (如果行密度足够),否则使用
    session_id
  • 电商:
    product_id
    (如果行密度足够),否则使用
    category_id
行密度指南:
  • 目标:每个块中每个segment_by值对应>100行
  • 较差:每个块中每个segment_by值对应<10行 → 选择粒度更粗的列
  • 低密度列的处理方式:将其添加到order_by列列表的开头
查询模式驱动选择:
sql
SELECT * FROM table WHERE entity_id = 'X' AND timestamp > ...
-- ↳ segment_by: entity_id (if >100 rows per chunk)
避免选择: 时间戳、唯一ID、低密度列(<100行/值/块)、很少用于过滤的列

Order_By Column Selection

Order_By列选择

Creates natural time-series progression when combined with segment_by for optimal compression.
Most common:
timestamp DESC
Examples:
  • IoT/Finance/E-commerce:
    timestamp DESC
  • Metrics:
    metric_name, timestamp DESC
    (if metric_name has too low density for segment_by)
  • Analytics:
    user_id, timestamp DESC
    (user_id has too low density for segment_by)
Alternative patterns:
  • sequence_id DESC
    for event streams with sequence numbers
  • timestamp DESC, event_order DESC
    for sub-ordering within same timestamp
Low-density column handling: If a column has <100 rows per chunk (too low for segment_by), prepend it to order_by:
  • Example:
    metric_name
    has 20 rows/chunk → use
    segment_by='service_name'
    ,
    order_by='metric_name, timestamp DESC'
  • Groups similar values together (all temperature readings, then pressure readings) for better compression
Good test: ordering created by
(segment_by_column, order_by_column)
should form a natural time-series progression. Values close to each other in the progression should be similar.
Avoid in order_by: random columns, columns with high variance between adjacent rows, columns unrelated to segment_by
与segment_by结合形成自然的时序序列,以实现最佳压缩效果。
最常见的选择:
timestamp DESC
示例:
  • IoT/金融/电商:
    timestamp DESC
  • 指标:
    metric_name, timestamp DESC
    (如果metric_name的密度过低,不适合作为segment_by)
  • 分析:
    user_id, timestamp DESC
    (user_id的密度过低,不适合作为segment_by)
替代模式:
  • 带序列号的事件流使用
    sequence_id DESC
  • 同一时间戳内需要子排序的使用
    timestamp DESC, event_order DESC
低密度列的处理: 如果某列在每个块中的行数<100(密度过低,不适合作为segment_by),则将其添加到order_by的开头:
  • 示例:
    metric_name
    每个块中有20行 → 使用
    segment_by='service_name'
    ,
    order_by='metric_name, timestamp DESC'
  • 将相似值分组在一起(所有温度读数,然后是压力读数)以获得更好的压缩效果
验证方法:
(segment_by_column, order_by_column)
形成的排序应构成自然的时序序列。序列中相邻的值应相似。
避免在order_by中使用: 随机列、相邻行之间差异大的列、与segment_by无关的列

Compression Sparse Index Selection

压缩稀疏索引选择

Sparse indexes enable query filtering on compressed data without decompression. Store metadata per batch (~1000 rows) to eliminate batches that don't match query predicates.
Types:
  • minmax: Min/max values per batch - for range queries (>, <, BETWEEN) on numeric/temporal columns
Use minmax for: price, temperature, measurement, timestamp (range filtering)
Use for:
  • minmax for outlier detection (temperature > 90).
  • minmax for fields that are highly correlated with segmentby and orderby columns (e.g. if orderby includes
    created_at
    , minmax on
    updated_at
    is useful).
Avoid: rarely filtered columns.
IMPORTANT: NEVER index columns in segmentby or orderby. Orderby columns will always have minmax indexes without any configuration.
Configuration: The format is a comma-separated list of type_of_index(column_name).
sql
ALTER TABLE table_name SET (
    timescaledb.sparse_index = 'minmax(value_1),minmax(value_2)'
);
Explicit configuration available since v2.22.0 (was auto-created since v2.16.0).
稀疏索引 允许在不解压缩的情况下对压缩数据进行查询过滤。每个批次(约1000行)存储元数据,以排除不匹配查询谓词的批次。
类型:
  • minmax: 每个批次的最小/最大值——用于数值/时间列的范围查询(>, <, BETWEEN)
minmax的适用场景: 价格、温度、测量值、时间戳(范围过滤)
使用场景:
  • minmax用于异常值检测(例如temperature > 90)
  • minmax用于与segmentby和orderby列高度相关的字段(例如,如果orderby包含
    created_at
    ,则对
    updated_at
    创建minmax索引很有用)
避免场景: 很少用于过滤的列
重要提示:切勿为segmentby或orderby中的列创建索引。Orderby列会自动创建minmax索引,无需额外配置。
配置方式: 格式为逗号分隔的index_type(column_name)列表。
sql
ALTER TABLE table_name SET (
    timescaledb.sparse_index = 'minmax(value_1),minmax(value_2)'
);
显式配置从v2.22.0开始可用(v2.16.0及以后版本会自动创建)。

Chunk Time Interval (Optional)

块时间间隔(可选)

Default: 7 days (use if volume unknown, or ask user). Adjust based on volume:
  • High frequency: 1 hour - 1 day
  • Medium: 1 day - 1 week
  • Low: 1 week - 1 month
sql
SELECT set_chunk_time_interval('your_table_name', INTERVAL '1 day');
Good test: recent chunk indexes should fit in less than 25% of RAM.
默认值:7天(如果数据量未知则使用默认值,或询问用户)。根据数据量调整:
  • 高频数据:1小时 - 1天
  • 中频数据:1天 - 1周
  • 低频数据:1周 - 1个月
sql
SELECT set_chunk_time_interval('your_table_name', INTERVAL '1 day');
验证方法: 最新块的索引应占用不到25%的内存

Indexes & Primary Keys

索引与主键

Common index patterns - composite indexes on an id and timestamp:
sql
CREATE INDEX idx_entity_timestamp ON your_table_name (entity_id, timestamp DESC);
Important: Only create indexes you'll actually use - each has maintenance overhead.
Primary key and unique constraints rules: Must include partition column.
Option 1: Composite PK with partition column
sql
ALTER TABLE your_table_name ADD PRIMARY KEY (entity_id, timestamp);
Option 2: Single-column PK (only if it's the partition column)
sql
CREATE TABLE ... (id BIGINT PRIMARY KEY, ...) WITH (tsdb.partition_column='id');
Option 3: No PK: strict uniqueness is often not required for insert-heavy patterns.
常见的索引模式——id和时间戳的复合索引:
sql
CREATE INDEX idx_entity_timestamp ON your_table_name (entity_id, timestamp DESC);
重要提示: 仅创建实际需要的索引——每个索引都会带来维护开销
主键和唯一约束规则: 必须包含分区列
选项1:包含分区列的复合主键
sql
ALTER TABLE your_table_name ADD PRIMARY KEY (entity_id, timestamp);
选项2:单列主键(仅当该列是分区列时)
sql
CREATE TABLE ... (id BIGINT PRIMARY KEY, ...) WITH (tsdb.partition_column='id');
选项3:无主键:写入密集型模式通常不需要严格的唯一性

Step 2: Compression Policy (Optional)

步骤2:压缩策略(可选)

IMPORTANT: If you used
tsdb.enable_columnstore=true
in Step 1, starting with TimescaleDB version 2.23 a columnstore policy is automatically created with
after => INTERVAL '7 days'
. You only need to call
add_columnstore_policy()
if you want to customize the
after
interval to something other than 7 days.
Set
after
interval for when: data becomes mostly immutable (some updates/backfill OK) AND B-tree indexes aren't needed for queries (less common criterion).
sql
-- In TimescaleDB 2.23 and later only needed if you want to override the default 7-day policy created by tsdb.enable_columnstore=true
-- Remove the existing auto-created policy first:
-- CALL remove_columnstore_policy('your_table_name');
-- Then add custom policy:
-- CALL add_columnstore_policy('your_table_name', after => INTERVAL '1 day');
重要提示:如果在步骤1中使用了
tsdb.enable_columnstore=true
,从TimescaleDB v2.23版本开始,会自动创建一个
after => INTERVAL '7 days'
的columnstore策略。只有当你想将
after
间隔自定义为7天以外的值时,才需要调用
add_columnstore_policy()
设置
after
间隔的时机:数据基本不可变(允许少量更新/回填)且查询不再需要B树索引(较不常见的条件)。
sql
-- 仅在TimescaleDB 2.23及以后版本中,当你想覆盖由tsdb.enable_columnstore=true自动创建的7天默认策略时才需要执行以下操作
-- 先移除已自动创建的策略:
-- CALL remove_columnstore_policy('your_table_name');
-- 然后添加自定义策略:
-- CALL add_columnstore_policy('your_table_name', after => INTERVAL '1 day');

Step 3: Retention Policy

步骤3:保留策略

IMPORTANT: Don't guess - ask user or comment out if unknown.
sql
-- Example - replace with requirements or comment out
SELECT add_retention_policy('your_table_name', INTERVAL '365 days');
重要提示:不要猜测——询问用户,或者如果未知则注释掉。
sql
-- 示例 - 根据需求替换或注释掉
SELECT add_retention_policy('your_table_name', INTERVAL '365 days');

Step 4: Create Continuous Aggregates

步骤4:创建连续聚合

Use different aggregation intervals for different uses.
针对不同的使用场景使用不同的聚合间隔。

Short-term (Minutes/Hours)

短期(分钟/小时)

For up-to-the-minute dashboards on high-frequency data.
sql
CREATE MATERIALIZED VIEW your_table_hourly
WITH (timescaledb.continuous) AS
SELECT
    time_bucket(INTERVAL '1 hour', timestamp) AS bucket,
    entity_id,
    category,
    COUNT(*) as record_count,
    AVG(value_1) as avg_value_1,
    MIN(value_1) as min_value_1,
    MAX(value_1) as max_value_1,
    SUM(value_2) as sum_value_2
FROM your_table_name
GROUP BY bucket, entity_id, category;
适用于高频数据的实时仪表盘。
sql
CREATE MATERIALIZED VIEW your_table_hourly
WITH (timescaledb.continuous) AS
SELECT
    time_bucket(INTERVAL '1 hour', timestamp) AS bucket,
    entity_id,
    category,
    COUNT(*) as record_count,
    AVG(value_1) as avg_value_1,
    MIN(value_1) as min_value_1,
    MAX(value_1) as max_value_1,
    SUM(value_2) as sum_value_2
FROM your_table_name
GROUP BY bucket, entity_id, category;

Long-term (Days/Weeks/Months)

长期(天/周/月)

For long-term reporting and analytics.
sql
CREATE MATERIALIZED VIEW your_table_daily
WITH (timescaledb.continuous) AS
SELECT
    time_bucket(INTERVAL '1 day', timestamp) AS bucket,
    entity_id,
    category,
    COUNT(*) as record_count,
    AVG(value_1) as avg_value_1,
    MIN(value_1) as min_value_1,
    MAX(value_1) as max_value_1,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY value_1) as median_value_1,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY value_1) as p95_value_1,
    SUM(value_2) as sum_value_2
FROM your_table_name
GROUP BY bucket, entity_id, category;
适用于长期报表和分析。
sql
CREATE MATERIALIZED VIEW your_table_daily
WITH (timescaledb.continuous) AS
SELECT
    time_bucket(INTERVAL '1 day', timestamp) AS bucket,
    entity_id,
    category,
    COUNT(*) as record_count,
    AVG(value_1) as avg_value_1,
    MIN(value_1) as min_value_1,
    MAX(value_1) as max_value_1,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY value_1) as median_value_1,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY value_1) as p95_value_1,
    SUM(value_2) as sum_value_2
FROM your_table_name
GROUP BY bucket, entity_id, category;

Step 5: Aggregate Refresh Policies

步骤5:聚合刷新策略

Set up refresh policies based on your data freshness requirements.
start_offset: Usually omit (refreshes all). Exception: If you don't care about refreshing data older than X (see below). With retention policy on raw data: match the retention policy.
end_offset: Set beyond active update window (e.g., 15 min if data usually arrives within 10 min). Data newer than end_offset won't appear in queries without real-time aggregation. If you don't know your update window, use the size of the time_bucket in the query, but not less than 5 minutes.
schedule_interval: Set to the same value as the end_offset but not more than 1 hour.
Hourly - frequent refresh for dashboards:
sql
SELECT add_continuous_aggregate_policy('your_table_hourly',
    start_offset => NULL,
    end_offset => INTERVAL '15 minutes',
    schedule_interval => INTERVAL '15 minutes');
Daily - less frequent for reports:
sql
SELECT add_continuous_aggregate_policy('your_table_daily',
    start_offset => NULL,
    end_offset => INTERVAL '1 hour',
    schedule_interval => INTERVAL '1 hour');
Use start_offset only if you don't care about refreshing old data Use for high-volume systems where query accuracy on older data doesn't matter:
sql
-- the following aggregate can be stale for data older than 7 days
-- SELECT add_continuous_aggregate_policy('aggregate_for_last_7_days',
--     start_offset => INTERVAL '7 days',    -- only refresh last 7 days (NULL = refresh all)
--     end_offset => INTERVAL '15 minutes',
--     schedule_interval => INTERVAL '15 minutes');
IMPORTANT: you MUST set a start_offset to be less than the retention policy on raw data. By default, set the start_offset equal to the retention policy. If the retention policy is commented out, comment out the start_offset as well. like this:
sql
SELECT add_continuous_aggregate_policy('your_table_daily',
    start_offset => NULL,    -- Use NULL to refresh all data, or set to retention period if enabled on raw data
--  start_offset => INTERVAL '<retention period here>',    -- uncomment if retention policy is enabled on the raw data table
    end_offset => INTERVAL '1 hour',
    schedule_interval => INTERVAL '1 hour');
根据数据新鲜度需求设置刷新策略。
start_offset: 通常省略(刷新所有数据)。例外情况:如果不关心刷新早于X的数据(见下文)。如果原始数据有保留策略:与保留策略匹配。
end_offset: 设置为超出活跃更新窗口的值(例如,如果数据通常在10分钟内到达,则设置为15分钟)。早于end_offset的数据不会出现在查询结果中,除非启用了实时聚合。如果不知道更新窗口,使用查询中time_bucket的大小,但不小于5分钟。
schedule_interval: 设置为与end_offset相同的值,但不超过1小时。
小时级 - 仪表盘的频繁刷新:
sql
SELECT add_continuous_aggregate_policy('your_table_hourly',
    start_offset => NULL,
    end_offset => INTERVAL '15 minutes',
    schedule_interval => INTERVAL '15 minutes');
天级 - 报表的低频率刷新:
sql
SELECT add_continuous_aggregate_policy('your_table_daily',
    start_offset => NULL,
    end_offset => INTERVAL '1 hour',
    schedule_interval => INTERVAL '1 hour');
仅当不关心刷新旧数据时使用start_offset 适用于高容量系统,这些系统中旧数据的查询准确性不重要:
sql
-- 以下聚合对于早于7天的数据可能过时
-- SELECT add_continuous_aggregate_policy('aggregate_for_last_7_days',
--     start_offset => INTERVAL '7 days',    -- only refresh last 7 days (NULL = refresh all)
--     end_offset => INTERVAL '15 minutes',
--     schedule_interval => INTERVAL '15 minutes');
重要提示:必须将start_offset设置为小于原始数据的保留策略。默认情况下,将start_offset设置为与保留策略相同的值。如果保留策略被注释掉,也请注释掉start_offset。例如:
sql
SELECT add_continuous_aggregate_policy('your_table_daily',
    start_offset => NULL,    -- 使用NULL刷新所有数据,如果原始数据启用了保留策略则设置为保留周期
--  start_offset => INTERVAL '<retention period here>',    -- 如果原始数据表启用了保留策略则取消注释
    end_offset => INTERVAL '1 hour',
    schedule_interval => INTERVAL '1 hour');

Step 6: Real-Time Aggregation (Optional)

步骤6:实时聚合(可选)

Real-time combines materialized + recent raw data at query time. Provides up-to-date results at the cost of higher query latency.
More useful for fine-grained aggregates (e.g., minutely) than coarse ones (e.g., daily/monthly) since large buckets will be mostly incomplete with recent data anyway.
Disabled by default in v2.13+, before that it was enabled by default.
Use when: Need data newer than end_offset, up-to-minute dashboards, can tolerate higher query latency Disable when: Performance critical, refresh policies sufficient, high query volume, missing and stale data for recent data is acceptable
Enable for current results (higher query cost):
sql
ALTER MATERIALIZED VIEW your_table_hourly SET (timescaledb.materialized_only = false);
Disable for performance (but with stale results):
sql
ALTER MATERIALIZED VIEW your_table_hourly SET (timescaledb.materialized_only = true);
实时聚合在查询时结合物化视图数据和最新的原始数据。提供最新的结果,但会增加查询延迟。
对于细粒度聚合(例如分钟级)比粗粒度聚合(例如天级/月级)更有用,因为大时间桶的最新数据通常大部分是不完整的。
v2.13+版本默认禁用,之前的版本默认启用。
启用场景: 需要早于end_offset的数据、实时仪表盘、可以容忍更高的查询延迟 禁用场景: 性能要求高、刷新策略足够、查询量高、可以接受最新数据的缺失或过时
启用以获取当前结果(查询成本更高):
sql
ALTER MATERIALIZED VIEW your_table_hourly SET (timescaledb.materialized_only = false);
禁用以提升性能(但结果可能过时):
sql
ALTER MATERIALIZED VIEW your_table_hourly SET (timescaledb.materialized_only = true);

Step 7: Compress Aggregates

步骤7:压缩聚合数据

Rule: segment_by = ALL GROUP BY columns except time_bucket, order_by = time_bucket DESC
sql
-- Hourly
ALTER MATERIALIZED VIEW your_table_hourly SET (
    timescaledb.enable_columnstore,
    timescaledb.segmentby = 'entity_id, category',
    timescaledb.orderby = 'bucket DESC'
);
CALL add_columnstore_policy('your_table_hourly', after => INTERVAL '3 days');

-- Daily
ALTER MATERIALIZED VIEW your_table_daily SET (
    timescaledb.enable_columnstore,
    timescaledb.segmentby = 'entity_id, category',
    timescaledb.orderby = 'bucket DESC'
);
CALL add_columnstore_policy('your_table_daily', after => INTERVAL '7 days');
规则:segment_by = 除time_bucket外的所有GROUP BY列,order_by = time_bucket DESC
sql
-- 小时级
ALTER MATERIALIZED VIEW your_table_hourly SET (
    timescaledb.enable_columnstore,
    timescaledb.segmentby = 'entity_id, category',
    timescaledb.orderby = 'bucket DESC'
);
CALL add_columnstore_policy('your_table_hourly', after => INTERVAL '3 days');

-- 天级
ALTER MATERIALIZED VIEW your_table_daily SET (
    timescaledb.enable_columnstore,
    timescaledb.segmentby = 'entity_id, category',
    timescaledb.orderby = 'bucket DESC'
);
CALL add_columnstore_policy('your_table_daily', after => INTERVAL '7 days');

Step 8: Aggregate Retention

步骤8:聚合数据保留

Aggregates are typically kept longer than raw data. IMPORTANT: Don't guess - ask user or you MUST comment out if unknown.
sql
-- Example - replace or comment out
SELECT add_retention_policy('your_table_hourly', INTERVAL '2 years');
SELECT add_retention_policy('your_table_daily', INTERVAL '5 years');
规则:聚合数据的保留时间通常比原始数据长。 重要提示:不要猜测——询问用户,或者如果未知则必须注释掉
sql
-- 示例 - 根据需求替换或注释掉
SELECT add_retention_policy('your_table_hourly', INTERVAL '2 years');
SELECT add_retention_policy('your_table_daily', INTERVAL '5 years');

Step 9: Performance Indexes on Continuous Aggregates

步骤9:连续聚合的性能索引

Index strategy: Analyze WHERE clauses in common queries → Create indexes matching filter columns + time ordering
Pattern:
(filter_column, bucket DESC)
supports
WHERE filter_column = X AND bucket >= Y ORDER BY bucket DESC
Examples:
sql
CREATE INDEX idx_hourly_entity_bucket ON your_table_hourly (entity_id, bucket DESC);
CREATE INDEX idx_hourly_category_bucket ON your_table_hourly (category, bucket DESC);
Multi-column filters: Create composite indexes for
WHERE entity_id = X AND category = Y
:
sql
CREATE INDEX idx_hourly_entity_category_bucket ON your_table_hourly (entity_id, category, bucket DESC);
Important: Only create indexes you'll actually use - each has maintenance overhead.
索引策略: 分析常见查询中的WHERE子句 → 创建与过滤列+时间排序匹配的索引
模式:
(filter_column, bucket DESC)
支持
WHERE filter_column = X AND bucket >= Y ORDER BY bucket DESC
查询
示例:
sql
CREATE INDEX idx_hourly_entity_bucket ON your_table_hourly (entity_id, bucket DESC);
CREATE INDEX idx_hourly_category_bucket ON your_table_hourly (category, bucket DESC);
多列过滤:
WHERE entity_id = X AND category = Y
创建复合索引:
sql
CREATE INDEX idx_hourly_entity_category_bucket ON your_table_hourly (entity_id, category, bucket DESC);
重要提示: 仅创建实际需要的索引——每个索引都会带来维护开销

Step 10: Optional Enhancements

步骤10:可选增强

Space Partitioning (NOT RECOMMENDED)

空间分区(不推荐)

Only for query patterns where you ALWAYS filter by the space-partition column with expert knowledge and extensive benchmarking. STRONGLY prefer time-only partitioning.
仅适用于始终按空间分区列过滤的查询模式,且需要专业知识和大量基准测试。强烈建议仅使用时间分区

Step 11: Verify Configuration

步骤11:验证配置

sql
-- Check hypertable
SELECT * FROM timescaledb_information.hypertables
WHERE hypertable_name = 'your_table_name';

-- Check compression settings
SELECT * FROM hypertable_compression_stats('your_table_name');

-- Check aggregates
SELECT * FROM timescaledb_information.continuous_aggregates;

-- Check policies
SELECT * FROM timescaledb_information.jobs ORDER BY job_id;

-- Monitor chunk information
SELECT
    chunk_name,
    range_start,
    range_end,
    is_compressed
FROM timescaledb_information.chunks
WHERE hypertable_name = 'your_table_name'
ORDER BY range_start DESC;
sql
-- 检查hypertable
SELECT * FROM timescaledb_information.hypertables
WHERE hypertable_name = 'your_table_name';

-- 检查压缩设置
SELECT * FROM hypertable_compression_stats('your_table_name');

-- 检查聚合
SELECT * FROM timescaledb_information.continuous_aggregates;

-- 检查策略
SELECT * FROM timescaledb_information.jobs ORDER BY job_id;

-- 监控块信息
SELECT
    chunk_name,
    range_start,
    range_end,
    is_compressed
FROM timescaledb_information.chunks
WHERE hypertable_name = 'your_table_name'
ORDER BY range_start DESC;

Performance Guidelines

性能指南

  • Chunk size: Recent chunk indexes should fit in less than 25% of RAM
  • Compression: Expect 90%+ reduction (10x) with proper columnstore config
  • Query optimization: Use continuous aggregates for historical queries and dashboards
  • Memory: Run
    timescaledb-tune
    for self-hosting (auto-configured on cloud)
  • 块大小: 最新块的索引应占用不到25%的内存
  • 压缩: 正确配置columnstore后,预期可减少90%以上的存储空间(10倍压缩比)
  • 查询优化: 对历史查询和仪表盘使用连续聚合
  • 内存: 自托管环境运行
    timescaledb-tune
    (云环境会自动配置)

Schema Best Practices

模式最佳实践

Do's and Don'ts

注意事项

  • ✅ Use
    TIMESTAMPTZ
    NOT
    timestamp
  • ✅ Use
    >=
    and
    <
    NOT
    BETWEEN
    for timestamps
  • ✅ Use
    TEXT
    with constraints NOT
    char(n)
    /
    varchar(n)
  • ✅ Use
    snake_case
    NOT
    CamelCase
  • ✅ Use
    BIGINT GENERATED ALWAYS AS IDENTITY
    NOT
    SERIAL
  • ✅ Use
    BIGINT
    for IDs by default over
    INTEGER
    or
    SMALLINT
  • ✅ Use
    DOUBLE PRECISION
    by default over
    REAL
    /
    FLOAT
  • ✅ Use
    NUMERIC
    NOT
    MONEY
  • ✅ Use
    NOT EXISTS
    NOT
    NOT IN
  • ✅ Use
    time_bucket()
    or
    date_trunc()
    NOT
    timestamp(0)
    for truncation
  • ✅ 使用
    TIMESTAMPTZ
    而非
    timestamp
  • ✅ 使用
    >=
    <
    而非
    BETWEEN
    处理时间戳
  • ✅ 使用带约束的
    TEXT
    而非
    char(n)
    /
    varchar(n)
  • ✅ 使用
    snake_case
    而非
    CamelCase
  • ✅ 使用
    BIGINT GENERATED ALWAYS AS IDENTITY
    而非
    SERIAL
  • ✅ 默认使用
    BIGINT
    作为ID,而非
    INTEGER
    SMALLINT
  • ✅ 默认使用
    DOUBLE PRECISION
    而非
    REAL
    /
    FLOAT
  • ✅ 使用
    NUMERIC
    而非
    MONEY
  • ✅ 使用
    NOT EXISTS
    而非
    NOT IN
  • ✅ 使用
    time_bucket()
    date_trunc()
    而非
    timestamp(0)
    进行截断

API Reference (Current vs Deprecated)

API参考(当前版本 vs 已弃用版本)

Deprecated Parameters → New Parameters:
  • timescaledb.compress
    timescaledb.enable_columnstore
  • timescaledb.compress_segmentby
    timescaledb.segmentby
  • timescaledb.compress_orderby
    timescaledb.orderby
Deprecated Functions → New Functions:
  • add_compression_policy()
    add_columnstore_policy()
  • remove_compression_policy()
    remove_columnstore_policy()
  • compress_chunk()
    convert_to_columnstore()
    (use with
    CALL
    , not
    SELECT
    )
  • decompress_chunk()
    convert_to_rowstore()
    (use with
    CALL
    , not
    SELECT
    )
Compression Stats (use functions, not views):
  • Use function:
    hypertable_compression_stats('table_name')
  • Use function:
    chunk_compression_stats('_timescaledb_internal._hyper_X_Y_chunk')
  • Note: Views like
    columnstore_settings
    may not be available in all versions; use functions instead
Manual Compression Example:
sql
-- Compress a specific chunk
CALL convert_to_columnstore('_timescaledb_internal._hyper_7_1_chunk');

-- Check compression statistics
SELECT
    number_compressed_chunks,
    pg_size_pretty(before_compression_total_bytes) as before_compression,
    pg_size_pretty(after_compression_total_bytes) as after_compression,
    ROUND(100.0 * (1 - after_compression_total_bytes::numeric / NULLIF(before_compression_total_bytes, 0)), 1) as compression_pct
FROM hypertable_compression_stats('your_table_name');
已弃用参数 → 新参数:
  • timescaledb.compress
    timescaledb.enable_columnstore
  • timescaledb.compress_segmentby
    timescaledb.segmentby
  • timescaledb.compress_orderby
    timescaledb.orderby
已弃用函数 → 新函数:
  • add_compression_policy()
    add_columnstore_policy()
  • remove_compression_policy()
    remove_columnstore_policy()
  • compress_chunk()
    convert_to_columnstore()
    (使用
    CALL
    而非
    SELECT
  • decompress_chunk()
    convert_to_rowstore()
    (使用
    CALL
    而非
    SELECT
压缩统计信息(使用函数而非视图):
  • 使用函数:
    hypertable_compression_stats('table_name')
  • 使用函数:
    chunk_compression_stats('_timescaledb_internal._hyper_X_Y_chunk')
  • 注意:
    columnstore_settings
    等视图可能在某些版本中不可用;请使用函数替代
手动压缩示例:
sql
-- 压缩特定块
CALL convert_to_columnstore('_timescaledb_internal._hyper_7_1_chunk');

-- 检查压缩统计信息
SELECT
    number_compressed_chunks,
    pg_size_pretty(before_compression_total_bytes) as before_compression,
    pg_size_pretty(after_compression_total_bytes) as after_compression,
    ROUND(100.0 * (1 - after_compression_total_bytes::numeric / NULLIF(before_compression_total_bytes, 0)), 1) as compression_pct
FROM hypertable_compression_stats('your_table_name');

Questions to Ask User

需向用户询问的问题

  1. What kind of data will you be storing?
  2. How do you expect to use the data?
  3. What queries will you run?
  4. How long to keep the data?
  5. Column types if unclear
  1. 你将存储哪种类型的数据?
  2. 你打算如何使用这些数据?
  3. 你将运行哪些查询?
  4. 数据需要保留多久?
  5. 如果列类型不明确,请确认列类型