using-timeseries-databases

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Time-Series Databases

时间序列数据库

Implement efficient storage and querying for time-stamped data (metrics, IoT sensors, financial ticks, logs).
为带时间戳的数据(指标、IoT传感器数据、金融交易tick数据、日志)实现高效的存储与查询。

Database Selection

数据库选型

Choose based on primary use case:
TimescaleDB - PostgreSQL extension
  • Use when: Already on PostgreSQL, need SQL + JOINs, hybrid workloads
  • Query: Standard SQL
  • Scale: 100K-1M inserts/sec
InfluxDB - Purpose-built TSDB
  • Use when: DevOps metrics, Prometheus integration, Telegraf ecosystem
  • Query: InfluxQL or Flux
  • Scale: 500K-1M points/sec
ClickHouse - Columnar analytics
  • Use when: Fastest aggregations needed, analytics dashboards, log analysis
  • Query: SQL
  • Scale: 1M-10M inserts/sec, 100M-1B rows/sec queries
QuestDB - High-throughput IoT
  • Use when: Highest write performance needed, financial tick data
  • Query: SQL + Line Protocol
  • Scale: 4M+ inserts/sec
根据核心使用场景选择合适的数据库:
TimescaleDB - PostgreSQL扩展
  • 适用场景:已基于PostgreSQL构建系统,需要SQL查询与JOIN操作,混合工作负载
  • 查询语言:标准SQL
  • 扩展能力:10万-100万条/秒写入
InfluxDB - 专用时间序列数据库
  • 适用场景:DevOps指标、Prometheus集成、Telegraf生态系统
  • 查询语言:InfluxQL或Flux
  • 扩展能力:50万-100万条/秒写入
ClickHouse - 列式分析数据库
  • 适用场景:需要最快的聚合性能、分析仪表板、日志分析
  • 查询语言:SQL
  • 扩展能力:100万-1000万条/秒写入,1亿-10亿条/秒查询
QuestDB - 高吞吐量IoT数据库
  • 适用场景:需要最高写入性能、金融tick数据存储
  • 查询语言:SQL + Line Protocol
  • 扩展能力:400万条/秒以上写入

Core Patterns

核心模式

1. Hypertables (TimescaleDB)

1. 超级表(TimescaleDB)

Automatic time-based partitioning:
sql
CREATE TABLE sensor_data (
  time        TIMESTAMPTZ NOT NULL,
  sensor_id   INTEGER NOT NULL,
  temperature DOUBLE PRECISION,
  humidity    DOUBLE PRECISION
);

SELECT create_hypertable('sensor_data', 'time');
Benefits:
  • Efficient data expiration (drop old chunks)
  • Parallel query execution
  • Compression on older chunks (10-20x savings)
基于时间的自动分区:
sql
CREATE TABLE sensor_data (
  time        TIMESTAMPTZ NOT NULL,
  sensor_id   INTEGER NOT NULL,
  temperature DOUBLE PRECISION,
  humidity    DOUBLE PRECISION
);

SELECT create_hypertable('sensor_data', 'time');
优势:
  • 高效的数据过期(自动删除旧分区)
  • 并行查询执行
  • 旧分区压缩(节省10-20倍存储空间)

2. Continuous Aggregates

2. 连续聚合

Pre-computed rollups for fast dashboard queries:
sql
-- TimescaleDB: hourly rollup
CREATE MATERIALIZED VIEW sensor_data_hourly
WITH (timescaledb.continuous) AS
SELECT time_bucket('1 hour', time) AS hour,
       sensor_id,
       AVG(temperature) AS avg_temp,
       MAX(temperature) AS max_temp,
       MIN(temperature) AS min_temp
FROM sensor_data
GROUP BY hour, sensor_id;

-- Auto-refresh policy
SELECT add_continuous_aggregate_policy('sensor_data_hourly',
  start_offset => INTERVAL '3 hours',
  end_offset => INTERVAL '1 hour',
  schedule_interval => INTERVAL '1 hour');
Query strategy:
  • Short range (last hour): Raw data
  • Medium range (last day): 1-minute rollups
  • Long range (last month): 1-hour rollups
  • Very long (last year): Daily rollups
预计算聚合结果,加速仪表板查询:
sql
-- TimescaleDB:小时级聚合
CREATE MATERIALIZED VIEW sensor_data_hourly
WITH (timescaledb.continuous) AS
SELECT time_bucket('1 hour', time) AS hour,
       sensor_id,
       AVG(temperature) AS avg_temp,
       MAX(temperature) AS max_temp,
       MIN(temperature) AS min_temp
FROM sensor_data
GROUP BY hour, sensor_id;

-- 自动刷新策略
SELECT add_continuous_aggregate_policy('sensor_data_hourly',
  start_offset => INTERVAL '3 hours',
  end_offset => INTERVAL '1 hour',
  schedule_interval => INTERVAL '1 hour');
查询策略:
  • 短时间范围(最近1小时):查询原始数据
  • 中等时间范围(最近1天):1分钟级聚合数据
  • 长时间范围(最近1个月):1小时级聚合数据
  • 超长时间范围(最近1年):天级聚合数据

3. Retention Policies

3. 数据保留策略

Automatic data expiration:
sql
-- TimescaleDB: delete data older than 90 days
SELECT add_retention_policy('sensor_data', INTERVAL '90 days');
Common patterns:
  • Raw data: 7-90 days
  • Hourly rollups: 1-2 years
  • Daily rollups: Infinite retention
自动数据过期:
sql
-- TimescaleDB:删除90天以上的旧数据
SELECT add_retention_policy('sensor_data', INTERVAL '90 days');
常见模式:
  • 原始数据:7-90天
  • 小时级聚合数据:1-2年
  • 天级聚合数据:永久保留

4. Downsampling for Visualization

4. 可视化下采样

Use LTTB (Largest-Triangle-Three-Buckets) algorithm to reduce points for charts.
Problem: Browsers can't smoothly render 1M points Solution: Downsample to 500-1000 points preserving visual fidelity
sql
-- TimescaleDB toolkit LTTB
SELECT time, value
FROM lttb(
  'SELECT time, temperature FROM sensor_data WHERE sensor_id = 1',
  1000  -- target number of points
);
Thresholds:
  • < 1,000 points: No downsampling
  • 1,000-10,000 points: LTTB to 1,000 points
  • 10,000+ points: LTTB to 500 points or use pre-aggregated data
使用LTTB(Largest-Triangle-Three-Buckets,最大三角三分桶)算法减少图表展示的数据点数量。
问题:浏览器无法流畅渲染100万条数据点 解决方案:下采样至500-1000个数据点,同时保留视觉保真度
sql
-- TimescaleDB工具包中的LTTB函数
SELECT time, value
FROM lttb(
  'SELECT time, temperature FROM sensor_data WHERE sensor_id = 1',
  1000  -- 目标数据点数量
);
阈值规则:
  • 少于1000个点:无需下采样
  • 1000-10000个点:LTTB下采样至1000个点
  • 10000个点以上:LTTB下采样至500个点或使用预聚合数据

Dashboard Integration

仪表板集成

Time-series databases are the primary data source for real-time dashboards.
Query patterns by component:
ComponentQuery PatternExample
KPI CardLatest value
SELECT temperature FROM sensors ORDER BY time DESC LIMIT 1
Trend ChartTime-bucketed avg
SELECT time_bucket('5m', time), AVG(cpu) GROUP BY 1
HeatmapMulti-metric window
SELECT hour, AVG(cpu), AVG(memory) GROUP BY hour
AlertThreshold check
SELECT COUNT(*) WHERE cpu > 80 AND time > NOW() - '5m'
Data flow:
  1. Ingest metrics (Prometheus, MQTT, application events)
  2. Store in time-series DB with continuous aggregates
  3. Apply retention policies (raw: 30d, rollups: 1y)
  4. Query layer downsamples to optimal points (LTTB)
  5. Frontend renders with Recharts/visx
Auto-refresh intervals:
  • Critical alerts: 1-5 seconds (WebSocket)
  • Operations dashboard: 10-30 seconds (polling)
  • Analytics dashboard: 1-5 minutes (cached)
  • Historical reports: On-demand only
时间序列数据库是实时仪表板的核心数据源。
按组件划分的查询模式:
组件查询模式示例
KPI卡片最新值查询
SELECT temperature FROM sensors ORDER BY time DESC LIMIT 1
趋势图表时间桶聚合平均值
SELECT time_bucket('5m', time), AVG(cpu) GROUP BY 1
热力图多指标窗口聚合
SELECT hour, AVG(cpu), AVG(memory) GROUP BY hour
告警阈值检查
SELECT COUNT(*) WHERE cpu > 80 AND time > NOW() - '5m'
数据流程:
  1. 采集指标数据(Prometheus、MQTT、应用事件)
  2. 存储至时间序列数据库并配置连续聚合
  3. 应用数据保留策略(原始数据保留30天,聚合数据保留1年)
  4. 查询层通过LTTB算法下采样至最优数据点数量
  5. 前端使用Recharts/visx渲染
自动刷新间隔:
  • 关键告警:1-5秒(WebSocket)
  • 运维仪表板:10-30秒(轮询)
  • 分析仪表板:1-5分钟(缓存)
  • 历史报表:仅按需查询

Database-Specific Details

数据库专属细节

For implementation guides, see:
  • references/timescaledb.md
    - Setup, tuning, compression
  • references/influxdb.md
    - InfluxQL/Flux, retention policies
  • references/clickhouse.md
    - MergeTree engines, clustering
  • references/questdb.md
    - Line Protocol, SIMD optimization
For downsampling implementation:
  • references/downsampling-strategies.md
    - LTTB algorithm, aggregation methods
For examples:
  • examples/metrics-dashboard-backend/
    - TimescaleDB + FastAPI
  • examples/iot-data-pipeline/
    - InfluxDB + Go for IoT
For scripts:
  • scripts/setup_hypertable.py
    - Create TimescaleDB hypertables
  • scripts/generate_retention_policy.py
    - Generate retention policies
如需实现指南,请参考:
  • references/timescaledb.md
    - 安装配置、性能调优、数据压缩
  • references/influxdb.md
    - InfluxQL/Flux语法、数据保留策略
  • references/clickhouse.md
    - MergeTree引擎、集群配置
  • references/questdb.md
    - Line Protocol、SIMD优化
下采样实现参考:
  • references/downsampling-strategies.md
    - LTTB算法、聚合方法
示例项目:
  • examples/metrics-dashboard-backend/
    - TimescaleDB + FastAPI
  • examples/iot-data-pipeline/
    - InfluxDB + Go构建IoT数据管道
脚本工具:
  • scripts/setup_hypertable.py
    - 创建TimescaleDB超级表
  • scripts/generate_retention_policy.py
    - 生成数据保留策略

Performance Optimization

性能优化

Write Optimization

写入优化

Batch inserts:
DatabaseBatch SizeExpected Throughput
TimescaleDB1,000-10,000100K-1M rows/sec
InfluxDB5,000+500K-1M points/sec
ClickHouse10,000-100,0001M-10M rows/sec
QuestDB10,000+4M+ rows/sec
批量插入建议:
数据库批量大小预期吞吐量
TimescaleDB1000-1000010万-100万条/秒
InfluxDB5000+50万-100万条/秒
ClickHouse10000-100000100万-1000万条/秒
QuestDB10000+400万条/秒以上

Query Optimization

查询优化

Rule 1: Always filter by time first (indexed)
sql
-- BAD: Full table scan
SELECT * FROM metrics WHERE metric_name = 'cpu';

-- GOOD: Time index used
SELECT * FROM metrics
WHERE time > NOW() - INTERVAL '1 hour'
  AND metric_name = 'cpu';
Rule 2: Use continuous aggregates for dashboard queries
sql
-- BAD: Aggregate 1B rows every dashboard load
SELECT time_bucket('1 hour', time), AVG(cpu)
FROM metrics
WHERE time > NOW() - INTERVAL '30 days'
GROUP BY 1;

-- GOOD: Query pre-computed rollup
SELECT hour, avg_cpu
FROM metrics_hourly
WHERE hour > NOW() - INTERVAL '30 days';
Rule 3: Downsample for visualization
typescript
// Request optimal point count
const points = Math.min(1000, chartWidth);
const query = `/api/metrics?start=${start}&end=${end}&points=${points}`;
规则1:始终先按时间过滤(时间字段已索引)
sql
-- 不良实践:全表扫描
SELECT * FROM metrics WHERE metric_name = 'cpu';

-- 最佳实践:使用时间索引
SELECT * FROM metrics
WHERE time > NOW() - INTERVAL '1 hour'
  AND metric_name = 'cpu';
规则2:仪表板查询使用预计算的连续聚合数据
sql
-- 不良实践:每次加载仪表板时聚合10亿条数据
SELECT time_bucket('1 hour', time), AVG(cpu)
FROM metrics
WHERE time > NOW() - INTERVAL '30 days'
GROUP BY 1;

-- 最佳实践:查询预计算的聚合结果
SELECT hour, avg_cpu
FROM metrics_hourly
WHERE hour > NOW() - INTERVAL '30 days';
规则3:为可视化需求进行下采样
typescript
// 请求最优数量的数据点
const points = Math.min(1000, chartWidth);
const query = `/api/metrics?start=${start}&end=${end}&points=${points}`;

Use Cases

适用场景

DevOps Monitoring → InfluxDB or TimescaleDB
  • Prometheus metrics, application traces, infrastructure
IoT Sensor Data → QuestDB or TimescaleDB
  • Millions of devices, high write throughput
Financial Tick Data → QuestDB or ClickHouse
  • Sub-millisecond queries, OHLC aggregates
User Analytics → ClickHouse
  • Event tracking, daily active users, funnel analysis
Real-time Dashboards → Any TSDB + Continuous Aggregates
  • Pre-computed rollups, WebSocket streaming, LTTB downsampling
DevOps监控 → InfluxDB或TimescaleDB
  • Prometheus指标、应用链路追踪、基础设施监控
IoT传感器数据 → QuestDB或TimescaleDB
  • 数百万设备接入,高写入吞吐量
金融Tick数据 → QuestDB或ClickHouse
  • 亚毫秒级查询、OHLC聚合
用户行为分析 → ClickHouse
  • 事件追踪、日活用户、漏斗分析
实时仪表板 → 任意时间序列数据库 + 连续聚合
  • 预计算聚合结果、WebSocket流传输、LTTB下采样