find-hypertable-candidates

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

PostgreSQL Hypertable Candidate Analysis

PostgreSQL Hypertable候选表分析

Identify tables that would benefit from TimescaleDB hypertable conversion. After identification, use the companion "migrate-postgres-tables-to-hypertables" skill for configuration and migration.
识别能从TimescaleDB hypertable转换中获益的表。识别完成后,使用配套的"migrate-postgres-tables-to-hypertables"技能进行配置和迁移。

TimescaleDB Benefits

TimescaleDB的优势

Performance gains: 90%+ compression, fast time-based queries, improved insert performance, efficient aggregations, continuous aggregates for materialization (dashboards, reports, analytics), automatic data management (retention, compression).
Best for insert-heavy patterns:
  • Time-series data (sensors, metrics, monitoring)
  • Event logs (user events, audit trails, application logs)
  • Transaction records (orders, payments, financial)
  • Sequential data (auto-incrementing IDs with timestamps)
  • Append-only datasets (immutable records, historical)
Requirements: Large volumes (1M+ rows), time-based queries, infrequent updates
性能提升: 90%以上的压缩率、快速的基于时间的查询、改进的写入性能、高效的聚合、用于实例化的连续聚合(仪表盘、报表、分析)、自动数据管理(保留、压缩)。
最适合写入密集型场景:
  • 时间序列数据(传感器、指标、监控)
  • 事件日志(用户事件、审计追踪、应用日志)
  • 交易记录(订单、支付、金融)
  • 序列数据(带时间戳的自增ID)
  • 仅追加数据集(不可变记录、历史数据)
要求: 数据量大(100万行以上)、基于时间的查询、极少更新

Step 1: Database Schema Analysis

步骤1:数据库架构分析

Option A: From Database Connection

选项A:从数据库连接分析

Table statistics and size

表统计信息和大小

sql
-- Get all tables with row counts and insert/update patterns
WITH table_stats AS (
    SELECT
        schemaname, tablename,
        n_tup_ins as total_inserts,
        n_tup_upd as total_updates,
        n_tup_del as total_deletes,
        n_live_tup as live_rows,
        n_dead_tup as dead_rows
    FROM pg_stat_user_tables
),
table_sizes AS (
    SELECT
        schemaname, tablename,
        pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as total_size,
        pg_total_relation_size(schemaname||'.'||tablename) as total_size_bytes
    FROM pg_tables
    WHERE schemaname NOT IN ('information_schema', 'pg_catalog')
)
SELECT
    ts.schemaname, ts.tablename, ts.live_rows,
    tsize.total_size, tsize.total_size_bytes,
    ts.total_inserts, ts.total_updates, ts.total_deletes,
    ROUND(CASE WHEN ts.live_rows > 0
          THEN (ts.total_inserts::float / ts.live_rows) * 100
          ELSE 0 END, 2) as insert_ratio_pct
FROM table_stats ts
JOIN table_sizes tsize ON ts.schemaname = tsize.schemaname AND ts.tablename = tsize.tablename
ORDER BY tsize.total_size_bytes DESC;
Look for:
  • mostly insert-heavy patterns (less updates/deletes)
  • big tables (1M+ rows or 100MB+)
sql
-- Get all tables with row counts and insert/update patterns
WITH table_stats AS (
    SELECT
        schemaname, tablename,
        n_tup_ins as total_inserts,
        n_tup_upd as total_updates,
        n_tup_del as total_deletes,
        n_live_tup as live_rows,
        n_dead_tup as dead_rows
    FROM pg_stat_user_tables
),
table_sizes AS (
    SELECT
        schemaname, tablename,
        pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as total_size,
        pg_total_relation_size(schemaname||'.'||tablename) as total_size_bytes
    FROM pg_tables
    WHERE schemaname NOT IN ('information_schema', 'pg_catalog')
)
SELECT
    ts.schemaname, ts.tablename, ts.live_rows,
    tsize.total_size, tsize.total_size_bytes,
    ts.total_inserts, ts.total_updates, ts.total_deletes,
    ROUND(CASE WHEN ts.live_rows > 0
          THEN (ts.total_inserts::float / ts.live_rows) * 100
          ELSE 0 END, 2) as insert_ratio_pct
FROM table_stats ts
JOIN table_sizes tsize ON ts.schemaname = tsize.schemaname AND ts.tablename = tsize.tablename
ORDER BY tsize.total_size_bytes DESC;
重点关注:
  • 以写入操作为主(更新/删除操作少)
  • 大型表(100万行以上或100MB以上)

Index patterns

索引模式

sql
-- Identify common query dimensions
SELECT schemaname, tablename, indexname, indexdef
FROM pg_indexes
WHERE schemaname NOT IN ('information_schema', 'pg_catalog')
ORDER BY tablename, indexname;
Look for:
  • Multiple indexes with timestamp/created_at columns → time-based queries
  • Composite (entity_id, timestamp) indexes → good candidates
  • Time-only indexes → time range filtering common
sql
-- Identify common query dimensions
SELECT schemaname, tablename, indexname, indexdef
FROM pg_indexes
WHERE schemaname NOT IN ('information_schema', 'pg_catalog')
ORDER BY tablename, indexname;
重点关注:
  • 包含timestamp/created_at列的多个索引 → 存在基于时间的查询
  • 复合(entity_id, timestamp)索引 → 优质候选表特征
  • 仅时间列的索引 → 时间范围过滤是常见操作

Query patterns (if pg_stat_statements available)

查询模式(如果pg_stat_statements可用)

sql
-- Check availability
SELECT EXISTS (SELECT 1 FROM pg_extension WHERE extname = 'pg_stat_statements');

-- Analyze expensive queries for candidate tables
SELECT query, calls, mean_exec_time, total_exec_time
FROM pg_stat_statements
WHERE query ILIKE '%your_table_name%'
ORDER BY total_exec_time DESC LIMIT 20;
✅ Good patterns: Time-based WHERE, entity filtering combined with time-based qualifiers, GROUP BY time_bucket, range queries over time ❌ Poor patterns: Non-time lookups with no time-based qualifiers in same query (WHERE email = ...)
sql
-- Check availability
SELECT EXISTS (SELECT 1 FROM pg_extension WHERE extname = 'pg_stat_statements');

-- Analyze expensive queries for candidate tables
SELECT query, calls, mean_exec_time, total_exec_time
FROM pg_stat_statements
WHERE query ILIKE '%your_table_name%'
ORDER BY total_exec_time DESC LIMIT 20;
✅ 优质模式: 基于时间的WHERE条件、结合实体过滤与时间限定符、按time_bucket分组、时间范围查询 ❌ 劣质模式: 同一查询中无时间限定符的非时间查找(如WHERE email = ...)

Constraints

约束条件

sql
-- Check migration compatibility
SELECT conname, contype, pg_get_constraintdef(oid) as definition
FROM pg_constraint
WHERE conrelid = 'your_table_name'::regclass;
Compatibility:
  • Primary keys (p): Must include partition column or ask user if can be modified
  • Foreign keys (f): Plain→Hypertable and Hypertable→Plain OK, Hypertable→Hypertable NOT supported
  • Unique constraints (u): Must include partition column or ask user if can be modified
  • Check constraints (c): Usually OK
sql
-- Check migration compatibility
SELECT conname, contype, pg_get_constraintdef(oid) as definition
FROM pg_constraint
WHERE conrelid = 'your_table_name'::regclass;
兼容性:
  • 主键(p):必须包含分区列,或询问用户是否可修改
  • 外键(f):普通表→Hypertable和Hypertable→普通表均支持,Hypertable→Hypertable不支持
  • 唯一约束(u):必须包含分区列,或询问用户是否可修改
  • 检查约束(c):通常支持

Option B: From Code Analysis

选项B:从代码分析

✅ GOOD Patterns

✅ 优质模式

python
undefined
python
undefined

Append-only logging

Append-only logging

INSERT INTO events (user_id, event_time, data) VALUES (...);
INSERT INTO events (user_id, event_time, data) VALUES (...);

Time-series collection

Time-series collection

INSERT INTO metrics (device_id, timestamp, value) VALUES (...);
INSERT INTO metrics (device_id, timestamp, value) VALUES (...);

Time-based queries

Time-based queries

SELECT * FROM metrics WHERE timestamp >= NOW() - INTERVAL '24 hours';
SELECT * FROM metrics WHERE timestamp >= NOW() - INTERVAL '24 hours';

Time aggregations

Time aggregations

SELECT DATE_TRUNC('day', timestamp), COUNT(*) GROUP BY 1;
undefined
SELECT DATE_TRUNC('day', timestamp), COUNT(*) GROUP BY 1;
undefined

❌ POOR Patterns

❌ 劣质模式

python
undefined
python
undefined

Frequent updates to historical records

Frequent updates to historical records

UPDATE users SET email = ..., updated_at = NOW() WHERE id = ...;
UPDATE users SET email = ..., updated_at = NOW() WHERE id = ...;

Non-time lookups

Non-time lookups

SELECT * FROM users WHERE email = ...;
SELECT * FROM users WHERE email = ...;

Small reference tables

Small reference tables

SELECT * FROM countries ORDER BY name;
undefined
SELECT * FROM countries ORDER BY name;
undefined

Schema Indicators

架构指标

✅ GOOD:
  • Has timestamp/timestamptz column
  • Multiple indexes with timestamp-based columns
  • Composite (entity_id, timestamp) indexes
❌ POOR:
  • Mostly indexes with non-time-based columns (on columns like email, name, status, etc.)
  • Columns that you expect to be updated over time (updated_at, updated_by, status, etc.)
  • Unique constraints on non-time fields
  • Frequent updated_at modifications
  • Small static tables
✅ 优质:
  • 包含timestamp/timestamptz列
  • 多个包含时间戳列的索引
  • 复合(entity_id, timestamp)索引
❌ 劣质:
  • 索引多基于非时间列(如email、name、status等)
  • 存在预期会随时间更新的列(updated_at、updated_by、status等)
  • 非时间字段上的唯一约束
  • 频繁修改updated_at
  • 小型静态表

Special Case: ID-Based Tables

特殊情况:基于ID的表

Sequential ID tables can be candidates if:
  • Insert-mostly pattern / updates are either infrequent or only on recent records.
  • If updates do happen, they occur on recent records (such as an order status being updated orderered->processing->delivered. Note once an order is delivered, it is unlikely to be updated again.)
  • IDs correlate with time (as is the case for serial/auto-incrementing IDs/GENERATED ALWAYS AS IDENTITY)
  • ID is the primary query dimension
  • Recent data accessed more often (frequently the case in ecommerce, finance, etc.)
  • Time-based reporting common (e.g. monthly, daily summaries/analytics)
sql
CREATE TABLE orders (
    id BIGSERIAL PRIMARY KEY,           -- Can partition by ID
    user_id BIGINT,
    created_at TIMESTAMPTZ DEFAULT NOW() -- For sparse indexes
);
Note: For ID-based tables where there is also a time column (created_at, ordered_at, etc.), you can partition by ID and use sparse indexes on the time column. See the
migrate-postgres-tables-to-hypertables
skill for details.
如果满足以下条件,序列ID表可作为候选表:
  • 以写入操作为主 / 更新操作极少或仅针对近期记录
  • 若存在更新,仅针对近期记录(例如订单状态从已下单→处理中→已送达,一旦订单送达,几乎不会再更新)
  • ID与时间相关(如serial/自增ID/GENERATED ALWAYS AS IDENTITY)
  • ID是主要查询维度
  • 近期数据访问更频繁(电商、金融等场景常见)
  • 基于时间的报告是常见需求(如月度、日度汇总/分析)
sql
CREATE TABLE orders (
    id BIGSERIAL PRIMARY KEY,           -- Can partition by ID
    user_id BIGINT,
    created_at TIMESTAMPTZ DEFAULT NOW() -- For sparse indexes
);
注意:对于同时包含时间列(created_at、ordered_at等)的基于ID的表,可按ID分区并在时间列上使用稀疏索引。详情请查看
migrate-postgres-tables-to-hypertables
技能。

Step 2: Candidacy Scoring (8+ points = good candidate)

步骤2:候选资格评分(8分及以上=优质候选表)

Time-Series Characteristics (5+ points needed)

时间序列特征(需5分及以上)

  • Has timestamp/timestamptz column: 3 points
  • Data inserted chronologically: 2 points
  • Queries filter by time: 2 points
  • Time aggregations common: 2 points
  • 包含timestamp/timestamptz列:3分
  • 数据按时间顺序写入:2分
  • 查询按时间过滤:2分
  • 时间聚合操作常见:2分

Scale & Performance (3+ points recommended)

规模与性能(推荐3分及以上)

  • Large table (1M+ rows or 100MB+): 2 points
  • High insert volume: 1 point
  • Infrequent updates to historical: 1 point
  • Range queries common: 1 point
  • Aggregation queries: 2 points
  • 大型表(100万行以上或100MB以上):2分
  • 高写入量:1分
  • 历史数据极少更新:1分
  • 范围查询常见:1分
  • 聚合查询:2分

Data Patterns (bonus)

数据模式(加分项)

  • Contains entity ID for segmentation (device_id, user_id, product_id, symbol, etc.): 1 point
  • Numeric measurements: 1 point
  • Log/event structure: 1 point
  • 包含用于分段的实体ID(device_id、user_id、product_id、symbol等):1分
  • 数值型测量数据:1分
  • 日志/事件结构:1分

Common Patterns

常见模式

✅ GOOD Candidates

✅ 优质候选表

✅ Event/Log Tables (user_events, audit_logs)
sql
CREATE TABLE user_events (
    id BIGSERIAL PRIMARY KEY,
    user_id BIGINT,
    event_type TEXT,
    event_time TIMESTAMPTZ DEFAULT NOW(),
    metadata JSONB
);
-- Partition by id, segment by user_id, enable minmax sparse_index on event_time
✅ Sensor/IoT Data (sensor_readings, telemetry)
sql
CREATE TABLE sensor_readings (
    device_id TEXT,
    timestamp TIMESTAMPTZ,
    temperature DOUBLE PRECISION,
    humidity DOUBLE PRECISION
);
-- Partition by timestamp, segment by device_id, minmax sparse indexes on temperature and humidity
✅ Financial/Trading (stock_prices, transactions)
sql
CREATE TABLE stock_prices (
    symbol VARCHAR(10),
    price_time TIMESTAMPTZ,
    open_price DECIMAL,
    close_price DECIMAL,
    volume BIGINT
);
-- Partition by price_time, segment by symbol, minmax sparse indexes on open_price and close_price and volume
✅ System Metrics (monitoring_data)
sql
CREATE TABLE system_metrics (
    hostname TEXT,
    metric_time TIMESTAMPTZ,
    cpu_usage DOUBLE PRECISION,
    memory_usage BIGINT
);
-- Partition by metric_time, segment by hostname, minmax sparse indexes on cpu_usage and memory_usage
✅ 事件/日志表(user_events、audit_logs)
sql
CREATE TABLE user_events (
    id BIGSERIAL PRIMARY KEY,
    user_id BIGINT,
    event_type TEXT,
    event_time TIMESTAMPTZ DEFAULT NOW(),
    metadata JSONB
);
-- Partition by id, segment by user_id, enable minmax sparse_index on event_time
✅ 传感器/物联网数据(sensor_readings、telemetry)
sql
CREATE TABLE sensor_readings (
    device_id TEXT,
    timestamp TIMESTAMPTZ,
    temperature DOUBLE PRECISION,
    humidity DOUBLE PRECISION
);
-- Partition by timestamp, segment by device_id, minmax sparse indexes on temperature and humidity
✅ 金融/交易数据(stock_prices、transactions)
sql
CREATE TABLE stock_prices (
    symbol VARCHAR(10),
    price_time TIMESTAMPTZ,
    open_price DECIMAL,
    close_price DECIMAL,
    volume BIGINT
);
-- Partition by price_time, segment by symbol, minmax sparse indexes on open_price and close_price and volume
✅ 系统指标(monitoring_data)
sql
CREATE TABLE system_metrics (
    hostname TEXT,
    metric_time TIMESTAMPTZ,
    cpu_usage DOUBLE PRECISION,
    memory_usage BIGINT
);
-- Partition by metric_time, segment by hostname, minmax sparse indexes on cpu_usage and memory_usage

❌ POOR Candidates

❌ 劣质候选表

❌ Reference Tables (countries, categories)
sql
CREATE TABLE countries (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    code CHAR(2)
);
-- Static data, no time component
❌ User Profiles (users, accounts)
sql
CREATE TABLE users (
    id BIGSERIAL PRIMARY KEY,
    email VARCHAR(255),
    created_at TIMESTAMPTZ,
    updated_at TIMESTAMPTZ
);
-- Accessed by ID, frequently updated, has timestamp but it's not the primary query dimension (the primary query dimension is id or email)
❌ Settings/Config (user_settings)
sql
CREATE TABLE user_settings (
    user_id BIGINT PRIMARY KEY,
    theme VARCHAR(20),       -- Changes: light -> dark -> auto
    language VARCHAR(10),    -- Changes: en -> es -> fr
    notifications JSONB,     -- Frequent preference updates
    updated_at TIMESTAMPTZ
);
-- Accessed by user_id, frequently updated, has timestamp but it's not the primary query dimension (the primary query dimension is user_id)
❌ 参考表(countries、categories)
sql
CREATE TABLE countries (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    code CHAR(2)
);
-- Static data, no time component
❌ 用户资料表(users、accounts)
sql
CREATE TABLE users (
    id BIGSERIAL PRIMARY KEY,
    email VARCHAR(255),
    created_at TIMESTAMPTZ,
    updated_at TIMESTAMPTZ
);
-- Accessed by ID, frequently updated, has timestamp but it's not the primary query dimension (the primary query dimension is id or email)
❌ 设置/配置表(user_settings)
sql
CREATE TABLE user_settings (
    user_id BIGINT PRIMARY KEY,
    theme VARCHAR(20),       -- Changes: light -> dark -> auto
    language VARCHAR(10),    -- Changes: en -> es -> fr
    notifications JSONB,     -- Frequent preference updates
    updated_at TIMESTAMPTZ
);
-- Accessed by user_id, frequently updated, has timestamp but it's not the primary query dimension (the primary query dimension is user_id)

Analysis Output Requirements

分析输出要求

For each candidate table provide:
  • Score: Based on criteria (8+ = strong candidate)
  • Pattern: Insert vs update ratio
  • Access: Time-based vs entity lookups
  • Size: Current size and growth rate
  • Queries: Time-range, aggregations, point lookups
Focus on insert-heavy patterns with time-based or sequential access. Tables scoring 8+ points are strong candidates for conversion.
针对每个候选表提供:
  • 评分: 基于评分标准(8分及以上=强候选表)
  • 模式: 写入与更新比例
  • 访问方式: 基于时间 vs 实体查找
  • 规模: 当前大小与增长率
  • 查询类型: 时间范围、聚合、点查询
重点关注写入密集型模式与基于时间或序列的访问方式。评分8分及以上的表是转换的强候选表。