using-timeseries-databases

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Time-Series Databases

时间序列数据库

Implement efficient storage and querying for time-stamped data (metrics, IoT sensors, financial ticks, logs).

为带时间戳的数据（指标、IoT传感器数据、金融交易tick数据、日志）实现高效的存储与查询。

Database Selection

数据库选型

Choose based on primary use case:

TimescaleDB - PostgreSQL extension

Use when: Already on PostgreSQL, need SQL + JOINs, hybrid workloads
Query: Standard SQL
Scale: 100K-1M inserts/sec

InfluxDB - Purpose-built TSDB

Use when: DevOps metrics, Prometheus integration, Telegraf ecosystem
Query: InfluxQL or Flux
Scale: 500K-1M points/sec

ClickHouse - Columnar analytics

Use when: Fastest aggregations needed, analytics dashboards, log analysis
Query: SQL
Scale: 1M-10M inserts/sec, 100M-1B rows/sec queries

QuestDB - High-throughput IoT

Use when: Highest write performance needed, financial tick data
Query: SQL + Line Protocol
Scale: 4M+ inserts/sec

根据核心使用场景选择合适的数据库：

TimescaleDB - PostgreSQL扩展

适用场景：已基于PostgreSQL构建系统，需要SQL查询与JOIN操作，混合工作负载
查询语言：标准SQL
扩展能力：10万-100万条/秒写入

InfluxDB - 专用时间序列数据库

适用场景：DevOps指标、Prometheus集成、Telegraf生态系统
查询语言：InfluxQL或Flux
扩展能力：50万-100万条/秒写入

ClickHouse - 列式分析数据库

适用场景：需要最快的聚合性能、分析仪表板、日志分析
查询语言：SQL
扩展能力：100万-1000万条/秒写入，1亿-10亿条/秒查询

QuestDB - 高吞吐量IoT数据库

适用场景：需要最高写入性能、金融tick数据存储
查询语言：SQL + Line Protocol
扩展能力：400万条/秒以上写入

Core Patterns

核心模式

1. Hypertables (TimescaleDB)

1. 超级表（TimescaleDB）

Automatic time-based partitioning:

sql

CREATE TABLE sensor_data (
  time        TIMESTAMPTZ NOT NULL,
  sensor_id   INTEGER NOT NULL,
  temperature DOUBLE PRECISION,
  humidity    DOUBLE PRECISION
);

SELECT create_hypertable('sensor_data', 'time');

Benefits:

Efficient data expiration (drop old chunks)
Parallel query execution
Compression on older chunks (10-20x savings)

基于时间的自动分区：

sql

CREATE TABLE sensor_data (
  time        TIMESTAMPTZ NOT NULL,
  sensor_id   INTEGER NOT NULL,
  temperature DOUBLE PRECISION,
  humidity    DOUBLE PRECISION
);

SELECT create_hypertable('sensor_data', 'time');

优势：

高效的数据过期（自动删除旧分区）
并行查询执行
旧分区压缩（节省10-20倍存储空间）

2. Continuous Aggregates

2. 连续聚合

Pre-computed rollups for fast dashboard queries:

sql

-- TimescaleDB: hourly rollup
CREATE MATERIALIZED VIEW sensor_data_hourly
WITH (timescaledb.continuous) AS
SELECT time_bucket('1 hour', time) AS hour,
       sensor_id,
       AVG(temperature) AS avg_temp,
       MAX(temperature) AS max_temp,
       MIN(temperature) AS min_temp
FROM sensor_data
GROUP BY hour, sensor_id;

-- Auto-refresh policy
SELECT add_continuous_aggregate_policy('sensor_data_hourly',
  start_offset => INTERVAL '3 hours',
  end_offset => INTERVAL '1 hour',
  schedule_interval => INTERVAL '1 hour');

Query strategy:

Short range (last hour): Raw data
Medium range (last day): 1-minute rollups
Long range (last month): 1-hour rollups
Very long (last year): Daily rollups

预计算聚合结果，加速仪表板查询：

sql

-- TimescaleDB：小时级聚合
CREATE MATERIALIZED VIEW sensor_data_hourly
WITH (timescaledb.continuous) AS
SELECT time_bucket('1 hour', time) AS hour,
       sensor_id,
       AVG(temperature) AS avg_temp,
       MAX(temperature) AS max_temp,
       MIN(temperature) AS min_temp
FROM sensor_data
GROUP BY hour, sensor_id;

-- 自动刷新策略
SELECT add_continuous_aggregate_policy('sensor_data_hourly',
  start_offset => INTERVAL '3 hours',
  end_offset => INTERVAL '1 hour',
  schedule_interval => INTERVAL '1 hour');

查询策略：

短时间范围（最近1小时）：查询原始数据
中等时间范围（最近1天）：1分钟级聚合数据
长时间范围（最近1个月）：1小时级聚合数据
超长时间范围（最近1年）：天级聚合数据

3. Retention Policies

3. 数据保留策略

Automatic data expiration:

sql

-- TimescaleDB: delete data older than 90 days
SELECT add_retention_policy('sensor_data', INTERVAL '90 days');

Common patterns:

Raw data: 7-90 days
Hourly rollups: 1-2 years
Daily rollups: Infinite retention

自动数据过期：

sql

-- TimescaleDB：删除90天以上的旧数据
SELECT add_retention_policy('sensor_data', INTERVAL '90 days');

常见模式：

原始数据：7-90天
小时级聚合数据：1-2年
天级聚合数据：永久保留

4. Downsampling for Visualization

4. 可视化下采样

Use LTTB (Largest-Triangle-Three-Buckets) algorithm to reduce points for charts.

Problem: Browsers can't smoothly render 1M points Solution: Downsample to 500-1000 points preserving visual fidelity

sql

-- TimescaleDB toolkit LTTB
SELECT time, value
FROM lttb(
  'SELECT time, temperature FROM sensor_data WHERE sensor_id = 1',
  1000  -- target number of points
);

Thresholds:

< 1,000 points: No downsampling
1,000-10,000 points: LTTB to 1,000 points
10,000+ points: LTTB to 500 points or use pre-aggregated data

使用LTTB（Largest-Triangle-Three-Buckets，最大三角三分桶）算法减少图表展示的数据点数量。

问题：浏览器无法流畅渲染100万条数据点解决方案：下采样至500-1000个数据点，同时保留视觉保真度

sql

-- TimescaleDB工具包中的LTTB函数
SELECT time, value
FROM lttb(
  'SELECT time, temperature FROM sensor_data WHERE sensor_id = 1',
  1000  -- 目标数据点数量
);

阈值规则：

少于1000个点：无需下采样
1000-10000个点：LTTB下采样至1000个点
10000个点以上：LTTB下采样至500个点或使用预聚合数据

Dashboard Integration

仪表板集成

Time-series databases are the primary data source for real-time dashboards.

Query patterns by component:

Component	Query Pattern	Example
KPI Card	Latest value	`SELECT temperature FROM sensors ORDER BY time DESC LIMIT 1`
Trend Chart	Time-bucketed avg	`SELECT time_bucket('5m', time), AVG(cpu) GROUP BY 1`
Heatmap	Multi-metric window	`SELECT hour, AVG(cpu), AVG(memory) GROUP BY hour`
Alert	Threshold check	`SELECT COUNT(*) WHERE cpu > 80 AND time > NOW() - '5m'`

Data flow:

Ingest metrics (Prometheus, MQTT, application events)
Store in time-series DB with continuous aggregates
Apply retention policies (raw: 30d, rollups: 1y)
Query layer downsamples to optimal points (LTTB)
Frontend renders with Recharts/visx

Auto-refresh intervals:

Critical alerts: 1-5 seconds (WebSocket)
Operations dashboard: 10-30 seconds (polling)
Analytics dashboard: 1-5 minutes (cached)
Historical reports: On-demand only

时间序列数据库是实时仪表板的核心数据源。

按组件划分的查询模式：

组件	查询模式	示例
KPI卡片	最新值查询	`SELECT temperature FROM sensors ORDER BY time DESC LIMIT 1`
趋势图表	时间桶聚合平均值	`SELECT time_bucket('5m', time), AVG(cpu) GROUP BY 1`
热力图	多指标窗口聚合	`SELECT hour, AVG(cpu), AVG(memory) GROUP BY hour`
告警	阈值检查	`SELECT COUNT(*) WHERE cpu > 80 AND time > NOW() - '5m'`

数据流程：

采集指标数据（Prometheus、MQTT、应用事件）
存储至时间序列数据库并配置连续聚合
应用数据保留策略（原始数据保留30天，聚合数据保留1年）
查询层通过LTTB算法下采样至最优数据点数量
前端使用Recharts/visx渲染

自动刷新间隔：

关键告警：1-5秒（WebSocket）
运维仪表板：10-30秒（轮询）
分析仪表板：1-5分钟（缓存）
历史报表：仅按需查询

Database-Specific Details

数据库专属细节

For implementation guides, see:

```
references/timescaledb.md
```
- Setup, tuning, compression
```
references/influxdb.md
```
- InfluxQL/Flux, retention policies
```
references/clickhouse.md
```
- MergeTree engines, clustering
```
references/questdb.md
```
- Line Protocol, SIMD optimization

For downsampling implementation:

```
references/downsampling-strategies.md
```
- LTTB algorithm, aggregation methods

For examples:

```
examples/metrics-dashboard-backend/
```
- TimescaleDB + FastAPI
```
examples/iot-data-pipeline/
```
- InfluxDB + Go for IoT

For scripts:

```
scripts/setup_hypertable.py
```
- Create TimescaleDB hypertables
```
scripts/generate_retention_policy.py
```
- Generate retention policies

如需实现指南，请参考：

```
references/timescaledb.md
```
- 安装配置、性能调优、数据压缩
```
references/influxdb.md
```
- InfluxQL/Flux语法、数据保留策略
```
references/clickhouse.md
```
- MergeTree引擎、集群配置
```
references/questdb.md
```
- Line Protocol、SIMD优化

下采样实现参考：

```
references/downsampling-strategies.md
```
- LTTB算法、聚合方法

示例项目：

```
examples/metrics-dashboard-backend/
```
- TimescaleDB + FastAPI
```
examples/iot-data-pipeline/
```
- InfluxDB + Go构建IoT数据管道

脚本工具：

```
scripts/setup_hypertable.py
```
- 创建TimescaleDB超级表
```
scripts/generate_retention_policy.py
```
- 生成数据保留策略

Performance Optimization

性能优化

Write Optimization

写入优化

Batch inserts:

Database	Batch Size	Expected Throughput
TimescaleDB	1,000-10,000	100K-1M rows/sec
InfluxDB	5,000+	500K-1M points/sec
ClickHouse	10,000-100,000	1M-10M rows/sec
QuestDB	10,000+	4M+ rows/sec

批量插入建议：

数据库	批量大小	预期吞吐量
TimescaleDB	1000-10000	10万-100万条/秒
InfluxDB	5000+	50万-100万条/秒
ClickHouse	10000-100000	100万-1000万条/秒
QuestDB	10000+	400万条/秒以上

Query Optimization

查询优化

Rule 1: Always filter by time first (indexed)

sql

-- BAD: Full table scan
SELECT * FROM metrics WHERE metric_name = 'cpu';

-- GOOD: Time index used
SELECT * FROM metrics
WHERE time > NOW() - INTERVAL '1 hour'
  AND metric_name = 'cpu';

Rule 2: Use continuous aggregates for dashboard queries

sql

-- BAD: Aggregate 1B rows every dashboard load
SELECT time_bucket('1 hour', time), AVG(cpu)
FROM metrics
WHERE time > NOW() - INTERVAL '30 days'
GROUP BY 1;

-- GOOD: Query pre-computed rollup
SELECT hour, avg_cpu
FROM metrics_hourly
WHERE hour > NOW() - INTERVAL '30 days';

Rule 3: Downsample for visualization

typescript

// Request optimal point count
const points = Math.min(1000, chartWidth);
const query = `/api/metrics?start=${start}&end=${end}&points=${points}`;

规则1：始终先按时间过滤（时间字段已索引）

sql

-- 不良实践：全表扫描
SELECT * FROM metrics WHERE metric_name = 'cpu';

-- 最佳实践：使用时间索引
SELECT * FROM metrics
WHERE time > NOW() - INTERVAL '1 hour'
  AND metric_name = 'cpu';

规则2：仪表板查询使用预计算的连续聚合数据

sql

-- 不良实践：每次加载仪表板时聚合10亿条数据
SELECT time_bucket('1 hour', time), AVG(cpu)
FROM metrics
WHERE time > NOW() - INTERVAL '30 days'
GROUP BY 1;

-- 最佳实践：查询预计算的聚合结果
SELECT hour, avg_cpu
FROM metrics_hourly
WHERE hour > NOW() - INTERVAL '30 days';

规则3：为可视化需求进行下采样

typescript

// 请求最优数量的数据点
const points = Math.min(1000, chartWidth);
const query = `/api/metrics?start=${start}&end=${end}&points=${points}`;

Use Cases

适用场景

DevOps Monitoring → InfluxDB or TimescaleDB

Prometheus metrics, application traces, infrastructure

IoT Sensor Data → QuestDB or TimescaleDB

Millions of devices, high write throughput

Financial Tick Data → QuestDB or ClickHouse

Sub-millisecond queries, OHLC aggregates

User Analytics → ClickHouse

Event tracking, daily active users, funnel analysis

Real-time Dashboards → Any TSDB + Continuous Aggregates

Pre-computed rollups, WebSocket streaming, LTTB downsampling

DevOps监控 → InfluxDB或TimescaleDB

Prometheus指标、应用链路追踪、基础设施监控

IoT传感器数据 → QuestDB或TimescaleDB

数百万设备接入，高写入吞吐量

金融Tick数据 → QuestDB或ClickHouse

亚毫秒级查询、OHLC聚合

用户行为分析 → ClickHouse

事件追踪、日活用户、漏斗分析

实时仪表板 → 任意时间序列数据库 + 连续聚合

预计算聚合结果、WebSocket流传输、LTTB下采样