bigquery

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Google BigQuery

Google BigQuery

BigQuery is Google's serverless, highly scalable, and cost-effective multi-cloud data warehouse. It processes terabytes in seconds.
BigQuery是谷歌推出的无服务器、高可扩展且经济高效的多云数据仓库。它能在数秒内处理TB级数据。

When to Use

使用场景

  • Serverless Analytics: No infrastructure to manage. Just run SQL.
  • Real-time Analytics: High-speed streaming ingestion.
  • ML Integration:
    CREATE MODEL
    lets you train ML models using standard SQL (BigQuery ML).
  • 无服务器分析:无需管理基础设施,只需运行SQL即可。
  • 实时分析:支持高速流式数据摄入。
  • 机器学习集成
    CREATE MODEL
    语句允许您使用标准SQL训练机器学习模型(BigQuery ML)。

Quick Start

快速入门

sql
-- Standard SQL
SELECT name, COUNT(*) as count
FROM `bigquery-public-data.usa_names.usa_1910_2013`
GROUP BY name
ORDER BY count DESC
LIMIT 10;
sql
-- Standard SQL
SELECT name, COUNT(*) as count
FROM `bigquery-public-data.usa_names.usa_1910_2013`
GROUP BY name
ORDER BY count DESC
LIMIT 10;

Core Concepts

核心概念

Slots and Reservations

计算槽(Slot)与预留

A "Slot" is a unit of computational capacity. BigQuery autoscales slots, or you can reserve them for flat-rate pricing.
“Slot”是计算能力的单位。BigQuery可以自动扩展计算槽,您也可以通过预留计算槽享受固定费率定价。

Columnar Storage (Capacitor)

列式存储(Capacitor)

Optimized for aggregation queries. Reading one column is much cheaper/faster than reading all columns (
SELECT *
is expensive).
专为聚合查询优化。读取单列数据比读取所有列(
SELECT *
)成本更低、速度更快。

Partitioning & Clustering

分区与聚类

  • Partitioning: Splits table by Date/Int (e.g., Daily partitions). Prunes data scanning massive cost savings.
  • Clustering: Sorts data within partitions for faster filtering.
  • 分区:按日期/整数拆分表(例如按日分区)。通过数据剪枝大幅节省成本。
  • 聚类:在分区内对数据排序,加快过滤速度。

Best Practices (2025)

最佳实践(2025版)

Do:
  • Partition by Date: Almost mandatory for time-series logs.
  • Use BigQuery ML: Train models (Regression, K-Means) directly where data lives.
  • Estimate Cost:
    Dry Run
    your query to see how many bytes it will scan before running it.
Don't:
  • Don't run
    SELECT *
    : You pay per column read. Select only what you need.
  • Don't treat it like an OLTP: Single row inserts are slow (unless using Streaming API). It is for bulk analytics.
建议做法
  • 按日期分区:对于时序日志场景几乎是必备操作。
  • 使用BigQuery ML:直接在数据存储位置训练模型(回归、K-Means等)。
  • 估算成本:运行查询前先执行
    Dry Run
    (试运行),查看将扫描的数据量。
避免做法
  • 不要执行
    SELECT *
    :您需要为读取的列付费,只选择所需列即可。
  • 不要将其当作OLTP系统使用:单行插入速度较慢(除非使用流式API)。它适用于批量分析场景。

References

参考资料