bigquery
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseGoogle BigQuery
Google BigQuery
BigQuery is Google's serverless, highly scalable, and cost-effective multi-cloud data warehouse. It processes terabytes in seconds.
BigQuery是谷歌推出的无服务器、高可扩展且经济高效的多云数据仓库。它能在数秒内处理TB级数据。
When to Use
使用场景
- Serverless Analytics: No infrastructure to manage. Just run SQL.
- Real-time Analytics: High-speed streaming ingestion.
- ML Integration: lets you train ML models using standard SQL (BigQuery ML).
CREATE MODEL
- 无服务器分析:无需管理基础设施,只需运行SQL即可。
- 实时分析:支持高速流式数据摄入。
- 机器学习集成:语句允许您使用标准SQL训练机器学习模型(BigQuery ML)。
CREATE MODEL
Quick Start
快速入门
sql
-- Standard SQL
SELECT name, COUNT(*) as count
FROM `bigquery-public-data.usa_names.usa_1910_2013`
GROUP BY name
ORDER BY count DESC
LIMIT 10;sql
-- Standard SQL
SELECT name, COUNT(*) as count
FROM `bigquery-public-data.usa_names.usa_1910_2013`
GROUP BY name
ORDER BY count DESC
LIMIT 10;Core Concepts
核心概念
Slots and Reservations
计算槽(Slot)与预留
A "Slot" is a unit of computational capacity. BigQuery autoscales slots, or you can reserve them for flat-rate pricing.
“Slot”是计算能力的单位。BigQuery可以自动扩展计算槽,您也可以通过预留计算槽享受固定费率定价。
Columnar Storage (Capacitor)
列式存储(Capacitor)
Optimized for aggregation queries. Reading one column is much cheaper/faster than reading all columns ( is expensive).
SELECT *专为聚合查询优化。读取单列数据比读取所有列()成本更低、速度更快。
SELECT *Partitioning & Clustering
分区与聚类
- Partitioning: Splits table by Date/Int (e.g., Daily partitions). Prunes data scanning massive cost savings.
- Clustering: Sorts data within partitions for faster filtering.
- 分区:按日期/整数拆分表(例如按日分区)。通过数据剪枝大幅节省成本。
- 聚类:在分区内对数据排序,加快过滤速度。
Best Practices (2025)
最佳实践(2025版)
Do:
- Partition by Date: Almost mandatory for time-series logs.
- Use BigQuery ML: Train models (Regression, K-Means) directly where data lives.
- Estimate Cost: your query to see how many bytes it will scan before running it.
Dry Run
Don't:
- Don't run : You pay per column read. Select only what you need.
SELECT * - Don't treat it like an OLTP: Single row inserts are slow (unless using Streaming API). It is for bulk analytics.
建议做法:
- 按日期分区:对于时序日志场景几乎是必备操作。
- 使用BigQuery ML:直接在数据存储位置训练模型(回归、K-Means等)。
- 估算成本:运行查询前先执行(试运行),查看将扫描的数据量。
Dry Run
避免做法:
- 不要执行:您需要为读取的列付费,只选择所需列即可。
SELECT * - 不要将其当作OLTP系统使用:单行插入速度较慢(除非使用流式API)。它适用于批量分析场景。