using-timeseries-databases
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseTime-Series Databases
时间序列数据库
Implement efficient storage and querying for time-stamped data (metrics, IoT sensors, financial ticks, logs).
为带时间戳的数据(指标、IoT传感器数据、金融交易tick数据、日志)实现高效的存储与查询。
Database Selection
数据库选型
Choose based on primary use case:
TimescaleDB - PostgreSQL extension
- Use when: Already on PostgreSQL, need SQL + JOINs, hybrid workloads
- Query: Standard SQL
- Scale: 100K-1M inserts/sec
InfluxDB - Purpose-built TSDB
- Use when: DevOps metrics, Prometheus integration, Telegraf ecosystem
- Query: InfluxQL or Flux
- Scale: 500K-1M points/sec
ClickHouse - Columnar analytics
- Use when: Fastest aggregations needed, analytics dashboards, log analysis
- Query: SQL
- Scale: 1M-10M inserts/sec, 100M-1B rows/sec queries
QuestDB - High-throughput IoT
- Use when: Highest write performance needed, financial tick data
- Query: SQL + Line Protocol
- Scale: 4M+ inserts/sec
根据核心使用场景选择合适的数据库:
TimescaleDB - PostgreSQL扩展
- 适用场景:已基于PostgreSQL构建系统,需要SQL查询与JOIN操作,混合工作负载
- 查询语言:标准SQL
- 扩展能力:10万-100万条/秒写入
InfluxDB - 专用时间序列数据库
- 适用场景:DevOps指标、Prometheus集成、Telegraf生态系统
- 查询语言:InfluxQL或Flux
- 扩展能力:50万-100万条/秒写入
ClickHouse - 列式分析数据库
- 适用场景:需要最快的聚合性能、分析仪表板、日志分析
- 查询语言:SQL
- 扩展能力:100万-1000万条/秒写入,1亿-10亿条/秒查询
QuestDB - 高吞吐量IoT数据库
- 适用场景:需要最高写入性能、金融tick数据存储
- 查询语言:SQL + Line Protocol
- 扩展能力:400万条/秒以上写入
Core Patterns
核心模式
1. Hypertables (TimescaleDB)
1. 超级表(TimescaleDB)
Automatic time-based partitioning:
sql
CREATE TABLE sensor_data (
time TIMESTAMPTZ NOT NULL,
sensor_id INTEGER NOT NULL,
temperature DOUBLE PRECISION,
humidity DOUBLE PRECISION
);
SELECT create_hypertable('sensor_data', 'time');Benefits:
- Efficient data expiration (drop old chunks)
- Parallel query execution
- Compression on older chunks (10-20x savings)
基于时间的自动分区:
sql
CREATE TABLE sensor_data (
time TIMESTAMPTZ NOT NULL,
sensor_id INTEGER NOT NULL,
temperature DOUBLE PRECISION,
humidity DOUBLE PRECISION
);
SELECT create_hypertable('sensor_data', 'time');优势:
- 高效的数据过期(自动删除旧分区)
- 并行查询执行
- 旧分区压缩(节省10-20倍存储空间)
2. Continuous Aggregates
2. 连续聚合
Pre-computed rollups for fast dashboard queries:
sql
-- TimescaleDB: hourly rollup
CREATE MATERIALIZED VIEW sensor_data_hourly
WITH (timescaledb.continuous) AS
SELECT time_bucket('1 hour', time) AS hour,
sensor_id,
AVG(temperature) AS avg_temp,
MAX(temperature) AS max_temp,
MIN(temperature) AS min_temp
FROM sensor_data
GROUP BY hour, sensor_id;
-- Auto-refresh policy
SELECT add_continuous_aggregate_policy('sensor_data_hourly',
start_offset => INTERVAL '3 hours',
end_offset => INTERVAL '1 hour',
schedule_interval => INTERVAL '1 hour');Query strategy:
- Short range (last hour): Raw data
- Medium range (last day): 1-minute rollups
- Long range (last month): 1-hour rollups
- Very long (last year): Daily rollups
预计算聚合结果,加速仪表板查询:
sql
-- TimescaleDB:小时级聚合
CREATE MATERIALIZED VIEW sensor_data_hourly
WITH (timescaledb.continuous) AS
SELECT time_bucket('1 hour', time) AS hour,
sensor_id,
AVG(temperature) AS avg_temp,
MAX(temperature) AS max_temp,
MIN(temperature) AS min_temp
FROM sensor_data
GROUP BY hour, sensor_id;
-- 自动刷新策略
SELECT add_continuous_aggregate_policy('sensor_data_hourly',
start_offset => INTERVAL '3 hours',
end_offset => INTERVAL '1 hour',
schedule_interval => INTERVAL '1 hour');查询策略:
- 短时间范围(最近1小时):查询原始数据
- 中等时间范围(最近1天):1分钟级聚合数据
- 长时间范围(最近1个月):1小时级聚合数据
- 超长时间范围(最近1年):天级聚合数据
3. Retention Policies
3. 数据保留策略
Automatic data expiration:
sql
-- TimescaleDB: delete data older than 90 days
SELECT add_retention_policy('sensor_data', INTERVAL '90 days');Common patterns:
- Raw data: 7-90 days
- Hourly rollups: 1-2 years
- Daily rollups: Infinite retention
自动数据过期:
sql
-- TimescaleDB:删除90天以上的旧数据
SELECT add_retention_policy('sensor_data', INTERVAL '90 days');常见模式:
- 原始数据:7-90天
- 小时级聚合数据:1-2年
- 天级聚合数据:永久保留
4. Downsampling for Visualization
4. 可视化下采样
Use LTTB (Largest-Triangle-Three-Buckets) algorithm to reduce points for charts.
Problem: Browsers can't smoothly render 1M points
Solution: Downsample to 500-1000 points preserving visual fidelity
sql
-- TimescaleDB toolkit LTTB
SELECT time, value
FROM lttb(
'SELECT time, temperature FROM sensor_data WHERE sensor_id = 1',
1000 -- target number of points
);Thresholds:
- < 1,000 points: No downsampling
- 1,000-10,000 points: LTTB to 1,000 points
- 10,000+ points: LTTB to 500 points or use pre-aggregated data
使用LTTB(Largest-Triangle-Three-Buckets,最大三角三分桶)算法减少图表展示的数据点数量。
问题:浏览器无法流畅渲染100万条数据点
解决方案:下采样至500-1000个数据点,同时保留视觉保真度
sql
-- TimescaleDB工具包中的LTTB函数
SELECT time, value
FROM lttb(
'SELECT time, temperature FROM sensor_data WHERE sensor_id = 1',
1000 -- 目标数据点数量
);阈值规则:
- 少于1000个点:无需下采样
- 1000-10000个点:LTTB下采样至1000个点
- 10000个点以上:LTTB下采样至500个点或使用预聚合数据
Dashboard Integration
仪表板集成
Time-series databases are the primary data source for real-time dashboards.
Query patterns by component:
| Component | Query Pattern | Example |
|---|---|---|
| KPI Card | Latest value | |
| Trend Chart | Time-bucketed avg | |
| Heatmap | Multi-metric window | |
| Alert | Threshold check | |
Data flow:
- Ingest metrics (Prometheus, MQTT, application events)
- Store in time-series DB with continuous aggregates
- Apply retention policies (raw: 30d, rollups: 1y)
- Query layer downsamples to optimal points (LTTB)
- Frontend renders with Recharts/visx
Auto-refresh intervals:
- Critical alerts: 1-5 seconds (WebSocket)
- Operations dashboard: 10-30 seconds (polling)
- Analytics dashboard: 1-5 minutes (cached)
- Historical reports: On-demand only
时间序列数据库是实时仪表板的核心数据源。
按组件划分的查询模式:
| 组件 | 查询模式 | 示例 |
|---|---|---|
| KPI卡片 | 最新值查询 | |
| 趋势图表 | 时间桶聚合平均值 | |
| 热力图 | 多指标窗口聚合 | |
| 告警 | 阈值检查 | |
数据流程:
- 采集指标数据(Prometheus、MQTT、应用事件)
- 存储至时间序列数据库并配置连续聚合
- 应用数据保留策略(原始数据保留30天,聚合数据保留1年)
- 查询层通过LTTB算法下采样至最优数据点数量
- 前端使用Recharts/visx渲染
自动刷新间隔:
- 关键告警:1-5秒(WebSocket)
- 运维仪表板:10-30秒(轮询)
- 分析仪表板:1-5分钟(缓存)
- 历史报表:仅按需查询
Database-Specific Details
数据库专属细节
For implementation guides, see:
- - Setup, tuning, compression
references/timescaledb.md - - InfluxQL/Flux, retention policies
references/influxdb.md - - MergeTree engines, clustering
references/clickhouse.md - - Line Protocol, SIMD optimization
references/questdb.md
For downsampling implementation:
- - LTTB algorithm, aggregation methods
references/downsampling-strategies.md
For examples:
- - TimescaleDB + FastAPI
examples/metrics-dashboard-backend/ - - InfluxDB + Go for IoT
examples/iot-data-pipeline/
For scripts:
- - Create TimescaleDB hypertables
scripts/setup_hypertable.py - - Generate retention policies
scripts/generate_retention_policy.py
如需实现指南,请参考:
- - 安装配置、性能调优、数据压缩
references/timescaledb.md - - InfluxQL/Flux语法、数据保留策略
references/influxdb.md - - MergeTree引擎、集群配置
references/clickhouse.md - - Line Protocol、SIMD优化
references/questdb.md
下采样实现参考:
- - LTTB算法、聚合方法
references/downsampling-strategies.md
示例项目:
- - TimescaleDB + FastAPI
examples/metrics-dashboard-backend/ - - InfluxDB + Go构建IoT数据管道
examples/iot-data-pipeline/
脚本工具:
- - 创建TimescaleDB超级表
scripts/setup_hypertable.py - - 生成数据保留策略
scripts/generate_retention_policy.py
Performance Optimization
性能优化
Write Optimization
写入优化
Batch inserts:
| Database | Batch Size | Expected Throughput |
|---|---|---|
| TimescaleDB | 1,000-10,000 | 100K-1M rows/sec |
| InfluxDB | 5,000+ | 500K-1M points/sec |
| ClickHouse | 10,000-100,000 | 1M-10M rows/sec |
| QuestDB | 10,000+ | 4M+ rows/sec |
批量插入建议:
| 数据库 | 批量大小 | 预期吞吐量 |
|---|---|---|
| TimescaleDB | 1000-10000 | 10万-100万条/秒 |
| InfluxDB | 5000+ | 50万-100万条/秒 |
| ClickHouse | 10000-100000 | 100万-1000万条/秒 |
| QuestDB | 10000+ | 400万条/秒以上 |
Query Optimization
查询优化
Rule 1: Always filter by time first (indexed)
sql
-- BAD: Full table scan
SELECT * FROM metrics WHERE metric_name = 'cpu';
-- GOOD: Time index used
SELECT * FROM metrics
WHERE time > NOW() - INTERVAL '1 hour'
AND metric_name = 'cpu';Rule 2: Use continuous aggregates for dashboard queries
sql
-- BAD: Aggregate 1B rows every dashboard load
SELECT time_bucket('1 hour', time), AVG(cpu)
FROM metrics
WHERE time > NOW() - INTERVAL '30 days'
GROUP BY 1;
-- GOOD: Query pre-computed rollup
SELECT hour, avg_cpu
FROM metrics_hourly
WHERE hour > NOW() - INTERVAL '30 days';Rule 3: Downsample for visualization
typescript
// Request optimal point count
const points = Math.min(1000, chartWidth);
const query = `/api/metrics?start=${start}&end=${end}&points=${points}`;规则1:始终先按时间过滤(时间字段已索引)
sql
-- 不良实践:全表扫描
SELECT * FROM metrics WHERE metric_name = 'cpu';
-- 最佳实践:使用时间索引
SELECT * FROM metrics
WHERE time > NOW() - INTERVAL '1 hour'
AND metric_name = 'cpu';规则2:仪表板查询使用预计算的连续聚合数据
sql
-- 不良实践:每次加载仪表板时聚合10亿条数据
SELECT time_bucket('1 hour', time), AVG(cpu)
FROM metrics
WHERE time > NOW() - INTERVAL '30 days'
GROUP BY 1;
-- 最佳实践:查询预计算的聚合结果
SELECT hour, avg_cpu
FROM metrics_hourly
WHERE hour > NOW() - INTERVAL '30 days';规则3:为可视化需求进行下采样
typescript
// 请求最优数量的数据点
const points = Math.min(1000, chartWidth);
const query = `/api/metrics?start=${start}&end=${end}&points=${points}`;Use Cases
适用场景
DevOps Monitoring → InfluxDB or TimescaleDB
- Prometheus metrics, application traces, infrastructure
IoT Sensor Data → QuestDB or TimescaleDB
- Millions of devices, high write throughput
Financial Tick Data → QuestDB or ClickHouse
- Sub-millisecond queries, OHLC aggregates
User Analytics → ClickHouse
- Event tracking, daily active users, funnel analysis
Real-time Dashboards → Any TSDB + Continuous Aggregates
- Pre-computed rollups, WebSocket streaming, LTTB downsampling
DevOps监控 → InfluxDB或TimescaleDB
- Prometheus指标、应用链路追踪、基础设施监控
IoT传感器数据 → QuestDB或TimescaleDB
- 数百万设备接入,高写入吞吐量
金融Tick数据 → QuestDB或ClickHouse
- 亚毫秒级查询、OHLC聚合
用户行为分析 → ClickHouse
- 事件追踪、日活用户、漏斗分析
实时仪表板 → 任意时间序列数据库 + 连续聚合
- 预计算聚合结果、WebSocket流传输、LTTB下采样