bigquery

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

BigQuery

BigQuery

Query Mozilla telemetry data directly using the
bq
CLI.
使用
bq
CLI直接查询Mozilla遥测数据。

Prerequisites

前提条件

  • gcloud
    and
    bq
    CLI installed (
    brew install google-cloud-sdk
    )
  • Authenticated:
    gcloud auth login
    with a Mozilla account
  • Billing project set: queries run against a project you have
    bigquery.jobs.create
    on
  • (Optional but highly recommended) mozdata-claude-plugin — provides Glean Dictionary MCP for metric/ping discovery, making it much easier to find the right tables and columns
  • 已安装
    gcloud
    bq
    CLI(执行
    brew install google-cloud-sdk
    安装)
  • 已完成认证:使用Mozilla账号执行
    gcloud auth login
  • 已设置计费项目:查询需在你拥有
    bigquery.jobs.create
    权限的项目下运行
  • (可选但强烈推荐)mozdata-claude-plugin — 提供用于指标/ ping 发现的Glean Dictionary MCP,可大幅简化表和列的查找工作

Authentication

认证步骤

bash
undefined
bash
undefined

Check current account

检查当前账号

gcloud config get-value account
gcloud config get-value account

Re-authenticate if needed

如需重新认证

gcloud auth login
gcloud auth login

List available projects

列出可用项目

gcloud projects list --format="table(projectId,name)"
gcloud projects list --format="table(projectId,name)"

Set billing project (mozdata is the standard choice)

设置计费项目(mozdata是标准选择)

gcloud config set project mozdata

If queries fail with "Access Denied", the billing project likely lacks permissions. Try `--project_id=mozdata`.
gcloud config set project mozdata

如果查询时提示“Access Denied”,可能是计费项目缺少权限,尝试添加`--project_id=mozdata`参数。

Running Queries

运行查询

bash
undefined
bash
undefined

Basic query

基础查询

bq query --project_id=mozdata --use_legacy_sql=false --format=pretty "SELECT ..."
bq query --project_id=mozdata --use_legacy_sql=false --format=pretty "SELECT ..."

Dry run (check cost before executing)

试运行(执行前检查成本)

bq query --project_id=mozdata --use_legacy_sql=false --dry_run "SELECT ..."

Always use `--project_id=mozdata` and `--use_legacy_sql=false`.
bq query --project_id=mozdata --use_legacy_sql=false --dry_run "SELECT ..."

请始终使用`--project_id=mozdata`和`--use_legacy_sql=false`参数。

Table Selection

表选择

Choose the right table — this is the most important optimization:
Query TypeTableWhy
Windows version distribution
telemetry.windows_10_aggregate
Pre-aggregated, instant
DAU/MAU by standard dimensions
firefox_desktop_derived.active_users_aggregates_v3
Pre-computed, 100x faster
DAU with custom dimensions
firefox_desktop.baseline_clients_daily
One row per client per day
MAU/WAU/retention
firefox_desktop.baseline_clients_last_seen
Bit patterns, scan 1 day not 28
Event analysis
firefox_desktop.events_stream
Pre-unnested, clustered
Mobile search
search.mobile_search_clients_daily_v2
Pre-aggregated
Specific Glean metric
firefox_desktop.metrics
Raw metrics ping
All tables are in the
moz-fx-data-shared-prod
project. Fully qualify as
`moz-fx-data-shared-prod.{dataset}.{table}`
.
选择合适的表是最重要的优化手段:
查询类型原因
Windows版本分布
telemetry.windows_10_aggregate
预聚合,查询速度快
按标准维度统计DAU/MAU
firefox_desktop_derived.active_users_aggregates_v3
预计算,速度提升100倍
按自定义维度统计DAU
firefox_desktop.baseline_clients_daily
每个客户端每天对应一行数据
MAU/WAU/用户留存
firefox_desktop.baseline_clients_last_seen
采用位模式,仅需扫描1天数据而非28天
事件分析
firefox_desktop.events_stream
已预展开,支持聚类
移动端搜索
search.mobile_search_clients_daily_v2
预聚合
特定Glean指标
firefox_desktop.metrics
原始指标ping数据
所有表均位于
moz-fx-data-shared-prod
项目下,需使用完整路径
`moz-fx-data-shared-prod.{dataset}.{table}`
引用。

Critical Rules

重要规则

  • Always use aggregate tables first — raw tables are 10-100x more expensive
  • Always include partition filter
    submission_date
    or
    DATE(submission_timestamp)
  • Use
    sample_id = 0
    for development (1% sample) — remove for production
  • Say "clients" not "users" — BigQuery tracks
    client_id
    , not actual humans
  • Never join across products by client_id — each product has its own namespace
  • Use
    events_stream
    for events
    — never raw
    events_v1
    (requires UNNEST)
  • Use
    baseline_clients_last_seen
    for MAU
    — bit patterns, scan 1 day not 28
  • 优先使用聚合表 —— 原始表的查询成本是聚合表的10-100倍
  • 必须包含分区过滤条件 —— 使用
    submission_date
    DATE(submission_timestamp)
  • 开发环境使用
    sample_id = 0
    (1%样本数据)—— 生产环境移除该条件
  • 使用“clients”而非“users” —— BigQuery跟踪的是
    client_id
    ,而非真实用户
  • 禁止跨产品通过client_id关联 —— 每个产品有独立的命名空间
  • 事件分析使用
    events_stream
    —— 切勿使用原始
    events_v1
    (需手动UNNEST)
  • MAU统计使用
    baseline_clients_last_seen
    —— 采用位模式,仅需扫描1天数据而非28天

References

参考资料

  • references/tables.md
    — Detailed table schemas and common query patterns
  • references/os-versions.md
    — Windows, macOS, and Linux version distribution queries with build number, Darwin, and kernel version mappings
  • references/tables.md
    —— 详细的表结构和常见查询模式
  • references/os-versions.md
    —— 包含版本号、Darwin和内核版本映射的Windows、macOS及Linux版本分布查询

Related Skills

相关技能

  • redash — Web UI frontend to BigQuery with visualizations and sharing
  • mozdata:query-writing — Guided query writing with Glean Dictionary MCP
  • mozdata:probe-discovery — Find Glean metrics and telemetry probes
  • redash —— BigQuery的Web UI前端,支持可视化和分享功能
  • mozdata:query-writing —— 借助Glean Dictionary MCP进行引导式查询编写
  • mozdata:probe-discovery —— 查找Glean指标和遥测探针