bigquery
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseBigQuery CLI Skill
BigQuery CLI 操作指南
You are a BigQuery specialist using the CLI tool. This skill provides comprehensive guidance for working with Google BigQuery through a unified Rust-based CLI with query execution, template management, and server modes.
bigquery你是使用 CLI工具的BigQuery专家。本指南提供了通过这款基于Rust开发的统一CLI工具操作Google BigQuery的全面指导,涵盖查询执行、模板管理和服务器模式等功能。
bigqueryCore Capabilities
核心功能
The CLI provides:
bigquery- Authentication: Check status and login with gcloud
- Query Execution: Run SQL queries with cost awareness and confirmation prompts
- Dry Run: Estimate query costs without execution
- Dataset Operations: List datasets in a project
- Table Operations: List, describe, insert, load, and manage external tables
- Template System: Named query templates with parameter substitution
- MCP Server: Semantic search via stdio or HTTP modes
- LSP Server: SQL language server for editor integration
bigquery- 身份验证:检查状态并通过gcloud登录
- 查询执行:执行SQL查询,具备成本感知和确认提示
- 预运行(Dry Run):无需实际执行即可估算查询成本
- 数据集操作:列出项目中的所有数据集
- 表操作:列出、描述、插入、加载和管理外部表
- 模板系统:支持参数替换的命名查询模板
- MCP服务器:通过标准输入输出或HTTP模式实现语义搜索
- LSP服务器:用于编辑器集成的SQL语言服务器
Authentication
身份验证
Check Authentication Status
检查身份验证状态
bash
undefinedbash
undefinedCheck if authenticated and verify required scopes
检查身份验证状态并验证所需权限范围
bigquery auth check
bigquery auth check
Will show:
输出内容包括:
- Authentication status
- 身份验证状态
- Active account
- 当前活跃账户
- BigQuery scopes availability
- BigQuery权限范围是否可用
undefinedundefinedLogin with gcloud
通过gcloud登录
bash
undefinedbash
undefinedAuthenticate with gcloud including all required BigQuery scopes
通过gcloud进行身份验证,包含所有BigQuery所需权限范围
bigquery auth login
bigquery auth login
This will:
该命令会:
1. Run gcloud auth login
1. 执行gcloud auth login
2. Ensure all necessary BigQuery scopes are granted
2. 确保所有必要的BigQuery权限已授予
3. Verify authentication succeeded
3. 验证身份验证是否成功
**Best Practice**: Always run `bigquery auth check` first to verify authentication before operations.
**最佳实践**:在执行任何操作前,始终先运行`bigquery auth check`验证身份验证状态。Query Operations
查询操作
Running Queries
执行查询
bash
undefinedbash
undefinedBasic query execution (interactive cost confirmation)
基础查询执行(交互式成本确认)
bigquery query "SELECT * FROM dataset.table LIMIT 10"
bigquery query "SELECT * FROM dataset.table LIMIT 10"
Skip cost confirmation for automation
自动化场景下跳过成本确认
bigquery query --yes "SELECT COUNT(*) FROM dataset.table"
bigquery query --yes "SELECT COUNT(*) FROM dataset.table"
JSON output (default)
JSON格式输出(默认)
bigquery query "SELECT * FROM dataset.table LIMIT 5"
bigquery query "SELECT * FROM dataset.table LIMIT 5"
Text/table output
文本/表格格式输出
bigquery query --format text "SELECT * FROM dataset.table LIMIT 5"
**Cost Awareness**: The query command automatically:
1. Estimates query cost before execution
2. Displays bytes to be processed
3. Prompts for confirmation (unless `--yes` is used)
4. Prevents accidental expensive queriesbigquery query --format text "SELECT * FROM dataset.table LIMIT 5"
**成本感知**:查询命令会自动执行以下操作:
1. 执行前估算查询成本
2. 显示将处理的数据量
3. 提示确认(除非使用`--yes`参数)
4. 防止意外执行高成本查询Query Output Formats
查询输出格式
bash
undefinedbash
undefinedJSON output (default, machine-readable)
JSON格式输出(默认,适合机器处理)
bigquery query "SELECT * FROM dataset.table"
bigquery query --format json "SELECT * FROM dataset.table"
bigquery query "SELECT * FROM dataset.table"
bigquery query --format json "SELECT * FROM dataset.table"
Text output (human-readable table)
文本格式输出(适合人类阅读的表格)
bigquery query --format text "SELECT * FROM dataset.table"
undefinedbigquery query --format text "SELECT * FROM dataset.table"
undefinedDry Run (Cost Estimation)
预运行(成本估算)
bash
undefinedbash
undefinedEstimate cost without executing
无需执行即可估算查询成本
bigquery dry-run "SELECT * FROM large_dataset.table WHERE date >= '2025-01-01'"
bigquery dry-run "SELECT * FROM large_dataset.table WHERE date >= '2025-01-01'"
Returns:
返回内容:
- Bytes that would be processed
- 将会处理的数据量
- Estimated cost
- 估算的成本
- No actual data
- 不会产生实际数据操作
**Use dry-run to**:
- Estimate costs before running expensive queries
- Validate query syntax
- Check partition pruning effectiveness
- Test queries in CI/CD pipelines
**预运行的使用场景**:
- 在执行高成本查询前估算费用
- 验证查询语法
- 检查分区过滤的有效性
- 在CI/CD流水线中测试查询Dataset Operations
数据集操作
Listing Datasets
列出数据集
bash
undefinedbash
undefinedList datasets in current project (text format, default)
列出当前项目中的数据集(默认文本格式)
bigquery datasets list my-project
bigquery datasets list my-project
JSON output
JSON格式输出
bigquery datasets list my-project --format json
bigquery datasets list my-project --format json
Example output shows:
示例输出包含:
- Dataset ID
- 数据集ID
- Location
- 存储位置
- Creation time
- 创建时间
- Labels (if any)
- 标签(如有)
**Note**: Dataset reference format is `project.dataset` or just `project` to list all datasets.
**注意**:数据集引用格式为`project.dataset`,仅传入`project`则列出该项目下的所有数据集。Table Operations
表操作
Listing Tables
列出表
bash
undefinedbash
undefinedList tables in a dataset (text format, first 10)
列出数据集中的表(文本格式,默认显示前10个)
bigquery tables list my-project.my-dataset
bigquery tables list my-project.my-dataset
JSON output
JSON格式输出
bigquery tables list my-project.my-dataset --format json
bigquery tables list my-project.my-dataset --format json
Limit results
限制结果数量
bigquery tables list my-project.my-dataset --limit 20
bigquery tables list my-project.my-dataset --limit 20
Maximum limit is 100
最大限制为100
bigquery tables list my-project.my-dataset --limit 100
undefinedbigquery tables list my-project.my-dataset --limit 100
undefinedDescribing Table Schema
查看表Schema
bash
undefinedbash
undefinedShow table schema and metadata (text format)
显示表Schema和元数据(文本格式)
bigquery tables describe my-project.my-dataset.my-table
bigquery tables describe my-project.my-dataset.my-table
JSON output
JSON格式输出
bigquery tables describe my-project.my-dataset.my-table --format json
bigquery tables describe my-project.my-dataset.my-table --format json
Output includes:
输出内容包括:
- Column names and types
- 列名和类型
- Nullability (NULLABLE, REQUIRED, REPEATED)
- 可为空性(NULLABLE, REQUIRED, REPEATED)
- Mode information
- 模式信息
- Table metadata
- 表元数据
undefinedundefinedInserting Rows (Small Datasets)
插入行(小规模数据集)
Best for <1000 rows. Uses streaming insert API for immediate availability.
适用于少于1000行的数据。使用流式插入API,数据可立即使用。
JSONL (Newline-Delimited JSON) Format
JSONL(换行分隔JSON)格式
From JSONL File:
bash
undefined从JSONL文件插入:
bash
undefinedCreate sample JSONL file
创建示例JSONL文件
cat > users.jsonl <<EOF
{"id": "1", "name": "Alice Johnson", "email": "alice@example.com", "age": 30}
{"id": "2", "name": "Bob Smith", "email": "bob@example.com", "age": 25}
{"id": "3", "name": "Charlie Brown", "email": "charlie@example.com", "age": 35}
EOF
cat > users.jsonl <<EOF
{"id": "1", "name": "Alice Johnson", "email": "alice@example.com", "age": 30}
{"id": "2", "name": "Bob Smith", "email": "bob@example.com", "age": 25}
{"id": "3", "name": "Charlie Brown", "email": "charlie@example.com", "age": 35}
EOF
Insert from JSONL file
从JSONL文件插入数据
bigquery tables insert my-project.dataset.users
--data users.jsonl --format json
--data users.jsonl --format json
**From JSONL Stream (stdin):**
```bashbigquery tables insert my-project.dataset.users
--data users.jsonl --format json
--data users.jsonl --format json
**从JSONL流(标准输入)插入:**
```bashStream from command output
从命令输出流式插入
echo '{"id": "1", "name": "Alice", "email": "alice@example.com"}' |
bigquery tables insert my-project.dataset.users --data - --format json
bigquery tables insert my-project.dataset.users --data - --format json
echo '{"id": "1", "name": "Alice", "email": "alice@example.com"}' |
bigquery tables insert my-project.dataset.users --data - --format json
bigquery tables insert my-project.dataset.users --data - --format json
Stream from multiple sources (heredoc)
从多源流式插入(here文档)
cat << EOF | bigquery tables insert my-project.dataset.users --data - --format json
{"id": "1", "name": "Alice", "email": "alice@example.com", "age": 30}
{"id": "2", "name": "Bob", "email": "bob@example.com", "age": 25}
{"id": "3", "name": "Charlie", "email": "charlie@example.com", "age": 35}
EOF
cat << EOF | bigquery tables insert my-project.dataset.users --data - --format json
{"id": "1", "name": "Alice", "email": "alice@example.com", "age": 30}
{"id": "2", "name": "Bob", "email": "bob@example.com", "age": 25}
{"id": "3", "name": "Charlie", "email": "charlie@example.com", "age": 35}
EOF
Stream from application output
从应用程序输出流式插入
my-etl-tool --output jsonl | bigquery tables insert my-project.dataset.events --data -
my-etl-tool --output jsonl | bigquery tables insert my-project.dataset.events --data -
Stream from compressed file
从压缩文件流式插入
gunzip -c logs.jsonl.gz | bigquery tables insert my-project.dataset.logs --data -
gunzip -c logs.jsonl.gz | bigquery tables insert my-project.dataset.logs --data -
Stream from jq transformation
从jq转换结果流式插入
cat raw_data.json | jq -c '.records[]' |
bigquery tables insert my-project.dataset.processed --data -
bigquery tables insert my-project.dataset.processed --data -
**JSONL Format Requirements:**
- Each line is a separate JSON object
- Empty lines are automatically skipped
- No commas between objects
- Ideal for streaming and large datasets
- Format: `{"field1":"value1","field2":"value2"}\n`cat raw_data.json | jq -c '.records[]' |
bigquery tables insert my-project.dataset.processed --data -
bigquery tables insert my-project.dataset.processed --data -
**JSONL格式要求:**
- 每行是一个独立的JSON对象
- 自动跳过空行
- 对象之间不需要逗号分隔
- 适合流式处理和大规模数据集
- 格式示例:`{"field1":"value1","field2":"value2"}\n`CSV Format
CSV格式
From CSV File:
bash
undefined从CSV文件插入:
bash
undefinedCreate sample CSV file
创建示例CSV文件
cat > users.csv <<EOF
id,name,email,age
1,Alice Johnson,alice@example.com,30
2,Bob Smith,bob@example.com,25
3,"Charlie Brown, Jr.",charlie@example.com,35
EOF
cat > users.csv <<EOF
id,name,email,age
1,Alice Johnson,alice@example.com,30
2,Bob Smith,bob@example.com,25
3,"Charlie Brown, Jr.",charlie@example.com,35
EOF
Insert from CSV file
从CSV文件插入数据
bigquery tables insert my-project.dataset.users
--data users.csv --format csv
--data users.csv --format csv
**From CSV Stream (stdin):**
```bashbigquery tables insert my-project.dataset.users
--data users.csv --format csv
--data users.csv --format csv
**从CSV流(标准输入)插入:**
```bashStream from heredoc
从here文档流式插入
cat << EOF | bigquery tables insert my-project.dataset.users --data - --format csv
id,name,email,age
1,Alice Johnson,alice@example.com,30
2,Bob Smith,bob@example.com,25
3,Charlie Brown,charlie@example.com,35
EOF
cat << EOF | bigquery tables insert my-project.dataset.users --data - --format csv
id,name,email,age
1,Alice Johnson,alice@example.com,30
2,Bob Smith,bob@example.com,25
3,Charlie Brown,charlie@example.com,35
EOF
Stream from application output
从应用程序输出流式插入
./generate_report.sh | bigquery tables insert my-project.dataset.reports --data - --format csv
./generate_report.sh | bigquery tables insert my-project.dataset.reports --data - --format csv
Stream from compressed CSV
从压缩CSV文件流式插入
gunzip -c data.csv.gz | bigquery tables insert my-project.dataset.imports --data -
gunzip -c data.csv.gz | bigquery tables insert my-project.dataset.imports --data -
Stream from curl/API response
从curl/API响应流式插入
curl -s https://api.example.com/export.csv |
bigquery tables insert my-project.dataset.api_data --data - --format csv
bigquery tables insert my-project.dataset.api_data --data - --format csv
curl -s https://api.example.com/export.csv |
bigquery tables insert my-project.dataset.api_data --data - --format csv
bigquery tables insert my-project.dataset.api_data --data - --format csv
Transform and stream CSV
转换后流式插入CSV
cat raw.csv | tail -n +2 | awk '{print tolower($0)}' |
bigquery tables insert my-project.dataset.cleaned --data - --format csv
bigquery tables insert my-project.dataset.cleaned --data - --format csv
**CSV Format Requirements:**
- First row must contain column headers matching BigQuery table schema
- Values are inserted as strings (BigQuery will coerce types)
- Supports quoted fields, escaped quotes, and newlines (RFC 4180 compliant)
- Headers are case-sensitive and must match table column namescat raw.csv | tail -n +2 | awk '{print tolower($0)}' |
bigquery tables insert my-project.dataset.cleaned --data - --format csv
bigquery tables insert my-project.dataset.cleaned --data - --format csv
**CSV格式要求:**
- 第一行必须包含与BigQuery表Schema匹配的列名
- 值会以字符串形式插入(BigQuery会自动转换类型)
- 支持带引号的字段、转义引号和换行符(符合RFC 4180标准)
- 列名区分大小写,必须与表列名完全匹配Additional Insert Options
额外插入选项
bash
undefinedbash
undefinedInsert inline JSON (single object)
插入内联JSON(单个对象)
bigquery tables insert my-project.dataset.users
--json '{"id": "1", "name": "Alice", "email": "alice@example.com"}'
--json '{"id": "1", "name": "Alice", "email": "alice@example.com"}'
bigquery tables insert my-project.dataset.users
--json '{"id": "1", "name": "Alice", "email": "alice@example.com"}'
--json '{"id": "1", "name": "Alice", "email": "alice@example.com"}'
Insert inline JSON array
插入内联JSON数组
bigquery tables insert my-project.dataset.users
--json '[{"id": "1", "name": "Alice"}, {"id": "2", "name": "Bob"}]'
--json '[{"id": "1", "name": "Alice"}, {"id": "2", "name": "Bob"}]'
bigquery tables insert my-project.dataset.users
--json '[{"id": "1", "name": "Alice"}, {"id": "2", "name": "Bob"}]'
--json '[{"id": "1", "name": "Alice"}, {"id": "2", "name": "Bob"}]'
Dry-run validation (no data inserted)
预运行验证(不会插入数据)
bigquery tables insert my-project.dataset.users
--data users.csv --format csv --dry-run
--data users.csv --format csv --dry-run
bigquery tables insert my-project.dataset.users
--data users.csv --format csv --dry-run
--data users.csv --format csv --dry-run
Skip invalid rows instead of failing
跳过无效行而非直接失败
bigquery tables insert my-project.dataset.users
--data users.csv --format csv --skip-invalid
--data users.csv --format csv --skip-invalid
bigquery tables insert my-project.dataset.users
--data users.csv --format csv --skip-invalid
--data users.csv --format csv --skip-invalid
Ignore unknown fields in data
忽略数据中的未知字段
bigquery tables insert my-project.dataset.users
--data users.csv --format csv --ignore-unknown
--data users.csv --format csv --ignore-unknown
bigquery tables insert my-project.dataset.users
--data users.csv --format csv --ignore-unknown
--data users.csv --format csv --ignore-unknown
Combine options for production pipelines
生产流水线中组合使用多个选项
cat production_data.jsonl |
bigquery tables insert my-project.dataset.production
--data - --format json
--skip-invalid
--ignore-unknown
bigquery tables insert my-project.dataset.production
--data - --format json
--skip-invalid
--ignore-unknown
**Insert Options:**
- `--json <JSON>`: Inline JSON data (object or array)
- `--data <PATH>`: Path to data file, or `-` for stdin
- `--format <FORMAT>`: Data format (json or csv, default: json)
- `--dry-run`: Validate without inserting
- `--skip-invalid`: Skip invalid rows instead of failing
- `--ignore-unknown`: Ignore unknown fields in data
- `--yes`: Skip confirmation promptscat production_data.jsonl |
bigquery tables insert my-project.dataset.production
--data - --format json
--skip-invalid
--ignore-unknown
bigquery tables insert my-project.dataset.production
--data - --format json
--skip-invalid
--ignore-unknown
**插入选项说明:**
- `--json <JSON>`:内联JSON数据(对象或数组)
- `--data <PATH>`:数据文件路径,或使用`-`表示标准输入
- `--format <FORMAT>`:数据格式(json或csv,默认json)
- `--dry-run`:仅验证不插入数据
- `--skip-invalid`:跳过无效行而非失败
- `--ignore-unknown`:忽略数据中的未知字段
- `--yes`:跳过确认提示Loading Data (Large Datasets)
加载数据(大规模数据集)
Best for >10MB files or >1000 rows. Uses BigQuery load jobs.
⚠️ IMPORTANT: Local file loading requires GCS staging bucket configuration.
- If you get "The specified bucket does not exist" error, use for datasets <1000 rows instead
tables insert - For larger datasets, upload to GCS first, then use
bigquery tables load gs://...
bash
undefined适用于大于10MB的文件或多于1000行的数据。使用BigQuery加载作业。
⚠️ 重要提示:本地文件加载需要配置GCS暂存存储桶。
- 如果出现"The specified bucket does not exist"错误,对于少于1000行的数据集请使用命令
tables insert - 对于更大的数据集,请先上传至GCS,再使用命令
bigquery tables load gs://...
bash
undefinedLoad from Cloud Storage URI (RECOMMENDED - no bucket config needed)
从Cloud Storage URI加载数据(推荐方式,无需配置存储桶)
bigquery tables load my-project.dataset.users
gs://my-bucket/data.csv --format csv
gs://my-bucket/data.csv --format csv
bigquery tables load my-project.dataset.users
gs://my-bucket/data.csv --format csv
gs://my-bucket/data.csv --format csv
Load from local CSV file (requires GCS staging bucket configured)
从本地CSV文件加载数据(需要配置GCS暂存存储桶)
bigquery tables load my-project.dataset.users data.csv --format csv
bigquery tables load my-project.dataset.users data.csv --format csv
Load with schema auto-detection
自动检测Schema加载数据
bigquery tables load my-project.dataset.new_table data.csv
--format csv --autodetect
--format csv --autodetect
bigquery tables load my-project.dataset.new_table data.csv
--format csv --autodetect
--format csv --autodetect
Load with replace write disposition (truncates table first)
使用替换写入模式(先清空表)
bigquery tables load my-project.dataset.users data.csv
--format csv --write-disposition replace
--format csv --write-disposition replace
bigquery tables load my-project.dataset.users data.csv
--format csv --write-disposition replace
--format csv --write-disposition replace
Load JSON file
加载JSON文件
bigquery tables load my-project.dataset.events events.json
--format json
--format json
bigquery tables load my-project.dataset.events events.json
--format json
--format json
Supported formats: csv, json, avro, parquet, orc
支持的格式:csv、json、avro、parquet、orc
bigquery tables load my-project.dataset.table data.parquet
--format parquet
--format parquet
bigquery tables load my-project.dataset.table data.parquet
--format parquet
--format parquet
Dry-run validation (no data loaded)
预运行验证配置(不会加载数据)
bigquery tables load my-project.dataset.users data.csv
--format csv --dry-run
--format csv --dry-run
bigquery tables load my-project.dataset.users data.csv
--format csv --dry-run
--format csv --dry-run
Allow some bad records (skip up to 100 invalid rows)
允许存在部分错误记录(最多跳过100条无效行)
bigquery tables load my-project.dataset.users data.csv
--format csv --max-bad-records 100
--format csv --max-bad-records 100
bigquery tables load my-project.dataset.users data.csv
--format csv --max-bad-records 100
--format csv --max-bad-records 100
Ignore unknown fields
忽略未知字段
bigquery tables load my-project.dataset.users data.csv
--format csv --ignore-unknown
--format csv --ignore-unknown
bigquery tables load my-project.dataset.users data.csv
--format csv --ignore-unknown
--format csv --ignore-unknown
Skip confirmation prompts (for automation/CI)
跳过确认提示(适用于自动化/CI场景)
bigquery tables load my-project.dataset.users data.csv
--format csv --write-disposition replace --yes
--format csv --write-disposition replace --yes
**Load Job Features:**
- **GCS Staging Bucket Required:** Local file loading needs GCS bucket configuration
- Real-time progress tracking with exponential backoff
- Automatic cleanup of temporary files after completion
- Write modes: `append` (default) or `replace` (truncate first)
- Safety confirmations for destructive operations
- Configurable error tolerance with `--max-bad-records`
**When to Use:**
- Large datasets (>1000 rows or >10MB)
- Data already in Cloud Storage
- Bulk data migrations
**When NOT to Use:**
- Small datasets (<1000 rows) → Use `tables insert` instead (no GCS required)
- Don't have GCS staging bucket configured → Use `tables insert` or upload to GCS first
**Load Options:**
- `--format <FORMAT>`: csv, json, avro, parquet, orc (default: csv)
- `--write-disposition <DISPOSITION>`: append or replace (default: append)
- `--autodetect`: Auto-detect schema from source files
- `--dry-run`: Validate without loading
- `--max-bad-records <N>`: Maximum bad records before failing
- `--ignore-unknown`: Ignore unknown fields
- `--yes`: Skip confirmation promptsbigquery tables load my-project.dataset.users data.csv
--format csv --write-disposition replace --yes
--format csv --write-disposition replace --yes
**加载作业特性:**
- **需要GCS暂存存储桶**:本地文件加载需要配置GCS存储桶
- 实时进度跟踪,支持指数退避重试
- 完成后自动清理临时文件
- 写入模式:`append`(默认追加)或`replace`(先清空再写入)
- 对破坏性操作提供安全确认
- 可通过`--max-bad-records`配置错误容忍度
**适用场景:**
- 大规模数据集(超过1000行或10MB)
- 数据已存储在Cloud Storage中
- 批量数据迁移
**不适用场景:**
- 小规模数据集(少于1000行)→ 改用`tables insert`(无需GCS)
- 未配置GCS暂存存储桶 → 改用`tables insert`或先上传至GCS
**加载选项说明:**
- `--format <FORMAT>`:数据格式(csv、json、avro、parquet、orc,默认csv)
- `--write-disposition <DISPOSITION>`:写入模式(append或replace,默认append)
- `--autodetect`:从源文件自动检测Schema
- `--dry-run`:仅验证配置不加载数据
- `--max-bad-records <N>`:允许的最大错误记录数
- `--ignore-unknown`:忽略未知字段
- `--yes`:跳过确认提示Extracting Data
提取数据
Export table data to Cloud Storage in various formats:
bash
undefined将表数据导出到Cloud Storage,支持多种格式:
bash
undefinedExtract table to Cloud Storage as CSV
将表数据提取为CSV格式存储到Cloud Storage
bigquery tables extract my-project.dataset.users
gs://my-bucket/exports/users.csv --format csv
gs://my-bucket/exports/users.csv --format csv
bigquery tables extract my-project.dataset.users
gs://my-bucket/exports/users.csv --format csv
gs://my-bucket/exports/users.csv --format csv
Extract as JSON
提取为JSON格式
bigquery tables extract my-project.dataset.events
gs://my-bucket/exports/events-*.json --format json
gs://my-bucket/exports/events-*.json --format json
bigquery tables extract my-project.dataset.events
gs://my-bucket/exports/events-*.json --format json
gs://my-bucket/exports/events-*.json --format json
Extract with compression
提取并压缩
bigquery tables extract my-project.dataset.large_table
gs://my-bucket/exports/data-*.csv.gz --format csv --compression gzip
gs://my-bucket/exports/data-*.csv.gz --format csv --compression gzip
bigquery tables extract my-project.dataset.large_table
gs://my-bucket/exports/data-*.csv.gz --format csv --compression gzip
gs://my-bucket/exports/data-*.csv.gz --format csv --compression gzip
Extract as Avro with Snappy compression
提取为Avro格式并使用Snappy压缩
bigquery tables extract my-project.dataset.events
gs://my-bucket/exports/events-*.avro --format avro --compression snappy
gs://my-bucket/exports/events-*.avro --format avro --compression snappy
bigquery tables extract my-project.dataset.events
gs://my-bucket/exports/events-*.avro --format avro --compression snappy
gs://my-bucket/exports/events-*.avro --format avro --compression snappy
Extract as Parquet
提取为Parquet格式
bigquery tables extract my-project.dataset.analytics
gs://my-bucket/exports/analytics.parquet --format parquet
gs://my-bucket/exports/analytics.parquet --format parquet
bigquery tables extract my-project.dataset.analytics
gs://my-bucket/exports/analytics.parquet --format parquet
gs://my-bucket/exports/analytics.parquet --format parquet
CSV with custom delimiter and header
使用自定义分隔符和表头的CSV格式提取
bigquery tables extract my-project.dataset.data
gs://my-bucket/data.csv
--format csv
--field-delimiter "|"
--print-header
gs://my-bucket/data.csv
--format csv
--field-delimiter "|"
--print-header
bigquery tables extract my-project.dataset.data
gs://my-bucket/data.csv
--format csv
--field-delimiter "|"
--print-header
gs://my-bucket/data.csv
--format csv
--field-delimiter "|"
--print-header
Dry-run to validate configuration
预运行验证配置
bigquery tables extract my-project.dataset.users
gs://my-bucket/users.csv --format csv --dry-run
gs://my-bucket/users.csv --format csv --dry-run
bigquery tables extract my-project.dataset.users
gs://my-bucket/users.csv --format csv --dry-run
gs://my-bucket/users.csv --format csv --dry-run
Skip confirmation prompt
跳过确认提示
bigquery tables extract my-project.dataset.large
gs://my-bucket/export.csv --format csv --yes
gs://my-bucket/export.csv --format csv --yes
**Supported Formats:** CSV, JSON (newline-delimited), Avro, Parquet
**Compression:** none, gzip, snappy (Avro/Parquet only)bigquery tables extract my-project.dataset.large
gs://my-bucket/export.csv --format csv --yes
gs://my-bucket/export.csv --format csv --yes
**支持的格式:** CSV、JSON(换行分隔)、Avro、Parquet
**压缩方式:** none、gzip、snappy(仅Avro/Parquet支持)External Tables
外部表
External tables reference data in Cloud Storage without copying it to BigQuery.
外部表引用Cloud Storage中的数据,无需将数据复制到BigQuery。
Creating External Tables
创建外部表
bash
undefinedbash
undefinedCreate CSV external table
创建CSV格式外部表
bigquery tables create-external my-project.dataset.external_table
--source-uri gs://bucket/data.csv
--format csv
--schema "id:INTEGER,name:STRING,created_at:TIMESTAMP"
--source-uri gs://bucket/data.csv
--format csv
--schema "id:INTEGER,name:STRING,created_at:TIMESTAMP"
bigquery tables create-external my-project.dataset.external_table
--source-uri gs://bucket/data.csv
--format csv
--schema "id:INTEGER,name:STRING,created_at:TIMESTAMP"
--source-uri gs://bucket/data.csv
--format csv
--schema "id:INTEGER,name:STRING,created_at:TIMESTAMP"
Create with auto-detected schema
自动检测Schema创建外部表
bigquery tables create-external my-project.dataset.external_table
--source-uri gs://bucket/data.csv
--format csv
--autodetect
--source-uri gs://bucket/data.csv
--format csv
--autodetect
bigquery tables create-external my-project.dataset.external_table
--source-uri gs://bucket/data.csv
--format csv
--autodetect
--source-uri gs://bucket/data.csv
--format csv
--autodetect
Multiple source URIs (comma-separated)
多个源URI(逗号分隔)
bigquery tables create-external my-project.dataset.external_table
--source-uri "gs://bucket/file1.csv,gs://bucket/file2.csv"
--format csv
--autodetect
--source-uri "gs://bucket/file1.csv,gs://bucket/file2.csv"
--format csv
--autodetect
bigquery tables create-external my-project.dataset.external_table
--source-uri "gs://bucket/file1.csv,gs://bucket/file2.csv"
--format csv
--autodetect
--source-uri "gs://bucket/file1.csv,gs://bucket/file2.csv"
--format csv
--autodetect
Multiple source URIs (multiple flags)
多个源URI(多次使用参数)
bigquery tables create-external my-project.dataset.external_table
--source-uri gs://bucket/file1.csv
--source-uri gs://bucket/file2.csv
--format csv
--autodetect
--source-uri gs://bucket/file1.csv
--source-uri gs://bucket/file2.csv
--format csv
--autodetect
bigquery tables create-external my-project.dataset.external_table
--source-uri gs://bucket/file1.csv
--source-uri gs://bucket/file2.csv
--format csv
--autodetect
--source-uri gs://bucket/file1.csv
--source-uri gs://bucket/file2.csv
--format csv
--autodetect
CSV-specific options
CSV特定选项
bigquery tables create-external my-project.dataset.external_table
--source-uri gs://bucket/data.csv
--format csv
--schema "id:INTEGER,name:STRING"
--field-delimiter ","
--skip-leading-rows 1
--source-uri gs://bucket/data.csv
--format csv
--schema "id:INTEGER,name:STRING"
--field-delimiter ","
--skip-leading-rows 1
bigquery tables create-external my-project.dataset.external_table
--source-uri gs://bucket/data.csv
--format csv
--schema "id:INTEGER,name:STRING"
--field-delimiter ","
--skip-leading-rows 1
--source-uri gs://bucket/data.csv
--format csv
--schema "id:INTEGER,name:STRING"
--field-delimiter ","
--skip-leading-rows 1
Other formats (Parquet, JSON, Avro, ORC)
其他格式(Parquet、JSON、Avro、ORC)
bigquery tables create-external my-project.dataset.parquet_table
--source-uri gs://bucket/data.parquet
--format parquet
--autodetect
--source-uri gs://bucket/data.parquet
--format parquet
--autodetect
bigquery tables create-external my-project.dataset.json_table
--source-uri gs://bucket/data.jsonl
--format json
--autodetect
--source-uri gs://bucket/data.jsonl
--format json
--autodetect
**External Table Options**:
- `--source-uri <URI>`: Cloud Storage URI(s) - required
- `--format <FORMAT>`: csv, json, avro, parquet, orc - required
- `--schema <SCHEMA>`: Schema definition (column:type,column:type,...)
- `--autodetect`: Auto-detect schema from source files
- `--field-delimiter <DELIMITER>`: CSV field delimiter (default: ,)
- `--skip-leading-rows <N>`: CSV header rows to skipbigquery tables create-external my-project.dataset.parquet_table
--source-uri gs://bucket/data.parquet
--format parquet
--autodetect
--source-uri gs://bucket/data.parquet
--format parquet
--autodetect
bigquery tables create-external my-project.dataset.json_table
--source-uri gs://bucket/data.jsonl
--format json
--autodetect
--source-uri gs://bucket/data.jsonl
--format json
--autodetect
**外部表选项说明**:
- `--source-uri <URI>`:Cloud Storage URI(必填)
- `--format <FORMAT>`:数据格式(csv、json、avro、parquet、orc,必填)
- `--schema <SCHEMA>`:Schema定义(格式:column:type,column:type,...)
- `--autodetect`:从源文件自动检测Schema
- `--field-delimiter <DELIMITER>`:CSV字段分隔符(默认逗号)
- `--skip-leading-rows <N>`:需要跳过的CSV表头行数Updating External Tables
更新外部表
bash
undefinedbash
undefinedUpdate source URIs
更新源URI
bigquery tables update-external my-project.dataset.external_table
--source-uri gs://bucket/new-data.csv
--source-uri gs://bucket/new-data.csv
bigquery tables update-external my-project.dataset.external_table
--source-uri gs://bucket/new-data.csv
--source-uri gs://bucket/new-data.csv
Update schema
更新Schema
bigquery tables update-external my-project.dataset.external_table
--schema "id:INTEGER,name:STRING,email:STRING"
--schema "id:INTEGER,name:STRING,email:STRING"
bigquery tables update-external my-project.dataset.external_table
--schema "id:INTEGER,name:STRING,email:STRING"
--schema "id:INTEGER,name:STRING,email:STRING"
Update CSV options
更新CSV选项
bigquery tables update-external my-project.dataset.external_table
--field-delimiter "|"
--skip-leading-rows 2
--field-delimiter "|"
--skip-leading-rows 2
bigquery tables update-external my-project.dataset.external_table
--field-delimiter "|"
--skip-leading-rows 2
--field-delimiter "|"
--skip-leading-rows 2
Update multiple properties
同时更新多个属性
bigquery tables update-external my-project.dataset.external_table
--source-uri gs://bucket/new-data.csv
--schema "id:INTEGER,name:STRING,updated_at:TIMESTAMP"
--skip-leading-rows 1
--source-uri gs://bucket/new-data.csv
--schema "id:INTEGER,name:STRING,updated_at:TIMESTAMP"
--skip-leading-rows 1
undefinedbigquery tables update-external my-project.dataset.external_table
--source-uri gs://bucket/new-data.csv
--schema "id:INTEGER,name:STRING,updated_at:TIMESTAMP"
--skip-leading-rows 1
--source-uri gs://bucket/new-data.csv
--schema "id:INTEGER,name:STRING,updated_at:TIMESTAMP"
--skip-leading-rows 1
undefinedTemplate System
模板系统
Named query templates allow you to save frequently-used queries with parameter placeholders.
命名查询模板允许你保存常用查询,并支持参数占位符。
Listing Templates
列出模板
bash
undefinedbash
undefinedList all available templates (text format)
列出所有可用模板(文本格式)
bigquery templates list
bigquery templates list
JSON output
JSON格式输出
bigquery templates list --format json
bigquery templates list --format json
Shows:
输出内容包括:
- Template name
- 模板名称
- Description
- 描述
- Parameters
- 参数
- Query preview
- 查询预览
undefinedundefinedSearching Templates
搜索模板
bash
undefinedbash
undefinedSearch by name or description
按名称或描述搜索模板
bigquery templates search "customer"
bigquery templates search "daily metrics"
bigquery templates search "customer"
bigquery templates search "daily metrics"
JSON output
JSON格式输出
bigquery templates search "analytics" --format json
undefinedbigquery templates search "analytics" --format json
undefinedValidating Templates
验证模板
bash
undefinedbash
undefinedValidate template for parameter consistency
验证模板参数一致性
bigquery templates validate my-template
bigquery templates validate my-template
Checks:
检查内容:
- Parameter definitions match query placeholders
- 参数定义与查询占位符是否匹配
- Required parameters are defined
- 必填参数是否已定义
- Parameter types are valid
- 参数类型是否有效
undefinedundefinedRunning Templates
运行模板
bash
undefinedbash
undefinedRun template with default parameters
使用默认参数运行模板
bigquery templates run my-template
bigquery templates run my-template
Override parameters
覆盖模板参数
bigquery templates run daily-report
--param date=2025-01-15
--param region=US
--param date=2025-01-15
--param region=US
bigquery templates run daily-report
--param date=2025-01-15
--param region=US
--param date=2025-01-15
--param region=US
Multiple parameters
传入多个参数
bigquery templates run customer-analysis
--param customer_id=CUST123
--param start_date=2025-01-01
--param end_date=2025-01-31
--param customer_id=CUST123
--param start_date=2025-01-01
--param end_date=2025-01-31
bigquery templates run customer-analysis
--param customer_id=CUST123
--param start_date=2025-01-01
--param end_date=2025-01-31
--param customer_id=CUST123
--param start_date=2025-01-01
--param end_date=2025-01-31
JSON output
JSON格式输出
bigquery templates run my-template --format json
bigquery templates run my-template --format json
Skip cost confirmation
跳过成本确认
bigquery templates run expensive-query --yes
**Template Run Options**:
- `--param <KEY=VALUE>`: Parameter override (can be used multiple times)
- `--format <FORMAT>`: Output format (json or text, default: json)
- `--yes`: Skip cost confirmation promptbigquery templates run expensive-query --yes
**模板运行选项说明**:
- `--param <KEY=VALUE>`:覆盖模板参数(可多次使用)
- `--format <FORMAT>`:输出格式(json或text,默认json)
- `--yes`:跳过成本确认提示Template Workflow Example
模板工作流示例
bash
undefinedbash
undefined1. Search for templates
1. 搜索相关模板
bigquery templates search "revenue"
bigquery templates search "revenue"
2. Validate template before running
2. 运行前验证模板
bigquery templates validate monthly-revenue
bigquery templates validate monthly-revenue
3. Run with parameters
3. 传入参数运行模板
bigquery templates run monthly-revenue
--param month=2025-01
--param min_amount=1000
--param month=2025-01
--param min_amount=1000
bigquery templates run monthly-revenue
--param month=2025-01
--param min_amount=1000
--param month=2025-01
--param min_amount=1000
4. Run in automation (skip confirmation)
4. 自动化运行(跳过确认)
bigquery templates run monthly-revenue
--param month=2025-01
--yes
--format json > output.json
--param month=2025-01
--yes
--format json > output.json
**Use templates for**:
- Standardized reporting queries
- Common analytics patterns
- Scheduled data pipelines
- Team query sharing
- Reducing query errorsbigquery templates run monthly-revenue
--param month=2025-01
--yes
--format json > output.json
--param month=2025-01
--yes
--format json > output.json
**模板的适用场景**:
- 标准化报表查询
- 常见分析模式
- 定时数据流水线
- 团队查询共享
- 减少查询错误MCP Server Integration
MCP服务器集成
The BigQuery MCP server provides semantic search and natural language query capabilities via Model Context Protocol.
BigQuery MCP服务器通过Model Context Protocol提供语义搜索和自然语言查询功能。
Starting MCP Server
启动MCP服务器
STDIO Mode (for local clients):
bash
undefined标准输入输出模式(适用于本地客户端):
bash
undefinedStart MCP server in stdio mode
以标准输入输出模式启动MCP服务器
bigquery mcp stdio
bigquery mcp stdio
Server will:
服务器将:
- Accept MCP protocol messages on stdin
- 在标准输入接收MCP协议消息
- Send responses on stdout
- 在标准输出发送响应
- Expose BigQuery tools to MCP clients
- 向MCP客户端暴露BigQuery工具
**HTTP Mode** (for network clients):
```bash
**HTTP模式**(适用于网络客户端):
```bashStart HTTP MCP server on default port 8080
在默认端口8080启动HTTP MCP服务器
bigquery mcp http
bigquery mcp http
Specify custom port
指定自定义端口
bigquery mcp http --port 3000
bigquery mcp http --port 3000
Server provides:
服务器提供:
- HTTP endpoint for MCP protocol
- MCP协议的HTTP端点
- JSON-RPC over HTTP
- 基于HTTP的JSON-RPC
- Remote access to BigQuery tools
- BigQuery工具的远程访问
undefinedundefinedMCP Server Capabilities
MCP服务器功能
The MCP server exposes these tools through the Model Context Protocol:
- semantic_search: Search tables using natural language
- execute_query: Run SQL queries with automatic formatting
- get_schema: Retrieve table schemas
- list_tables: List available tables
- list_datasets: List available datasets
- explain_query: Get query execution plan
- optimize_query: Suggest query optimizations
- run_template: Execute named templates with parameters
MCP服务器通过Model Context Protocol暴露以下工具:
- semantic_search:使用自然语言搜索表
- execute_query:运行SQL查询并自动格式化
- get_schema:获取表Schema
- list_tables:列出可用表
- list_datasets:列出可用数据集
- explain_query:获取查询执行计划
- optimize_query:提供查询优化建议
- run_template:传入参数执行命名模板
MCP Configuration
MCP配置
Configure in Claude Code or other MCP-enabled applications:
STDIO Mode ( or similar):
.claude/mcp.jsonjson
{
"mcpServers": {
"bigquery": {
"command": "bigquery",
"args": ["mcp", "stdio"],
"env": {
"GOOGLE_CLOUD_PROJECT": "my-project"
}
}
}
}HTTP Mode:
json
{
"mcpServers": {
"bigquery": {
"url": "http://localhost:8080",
"transport": "http"
}
}
}在Claude Code或其他支持MCP的应用中配置:
标准输入输出模式(或类似配置文件):
.claude/mcp.jsonjson
{
"mcpServers": {
"bigquery": {
"command": "bigquery",
"args": ["mcp", "stdio"],
"env": {
"GOOGLE_CLOUD_PROJECT": "my-project"
}
}
}
}HTTP模式:
json
{
"mcpServers": {
"bigquery": {
"url": "http://localhost:8080",
"transport": "http"
}
}
}MCP Usage Patterns
MCP使用模式
When using BigQuery MCP through clients:
Semantic Search:
"Find all tables containing customer purchase data from the last 30 days"
→ MCP translates to appropriate SQL querySchema Discovery:
"What columns are in the analytics.events table?"
→ MCP returns schema informationNatural Language Queries:
"Show me total revenue by region for Q1 2025"
→ MCP generates and executes SQLTemplate Execution:
"Run the monthly revenue template for January 2025"
→ MCP executes template with parameters通过客户端使用BigQuery MCP时的常见场景:
语义搜索:
"查找所有包含过去30天客户购买数据的表"
→ MCP自动转换为对应的SQL查询Schema发现:
"analytics.events表有哪些列?"
→ MCP返回表Schema信息自然语言查询:
"显示2025年第一季度各地区的总收入"
→ MCP生成并执行SQL查询模板执行:
"运行2025年1月的月度收入模板"
→ MCP传入参数执行模板LSP Integration
LSP集成
The BigQuery LSP provides SQL language features in text editors.
BigQuery LSP为文本编辑器提供SQL语言特性支持。
Starting LSP Server
启动LSP服务器
bash
undefinedbash
undefinedStart LSP server
启动LSP服务器
bigquery lsp
bigquery lsp
Server provides:
服务器提供:
- Language Server Protocol communication
- Language Server Protocol通信
- SQL syntax validation
- SQL语法验证
- Schema-aware completions
- 基于Schema的自动补全
- Query formatting
- 查询格式化
- Hover documentation
- 悬停文档提示
undefinedundefinedLSP Features
LSP功能
- SQL syntax highlighting: Proper tokenization and highlighting
- Schema completion: Table and column suggestions based on project schema
- Query validation: Real-time syntax and semantic checks
- Hover documentation: Table and column info on hover
- Go to definition: Navigate to table definitions
- Query formatting: Auto-format SQL queries
- Diagnostics: Show errors and warnings inline
- SQL语法高亮:正确的标记化和高亮显示
- Schema补全:基于项目Schema提供表和列的建议
- 查询验证:实时语法和语义检查
- 悬停文档:悬停时显示表和列的信息
- 跳转到定义:导航到表定义
- 查询格式化:自动格式化SQL查询
- 诊断信息:在编辑器内联显示错误和警告
Editor Configuration
编辑器配置
Neovim:
lua
-- In nvim/lua/bigquery-lsp.lua or init.lua
vim.api.nvim_create_autocmd("FileType", {
pattern = { "sql", "bq", "bigquery" },
callback = function()
vim.lsp.start({
name = "bigquery-lsp",
cmd = { "bigquery", "lsp" },
root_dir = vim.fn.getcwd(),
})
end,
})VS Code (in or language server config):
settings.jsonjson
{
"bigquery-lsp": {
"command": "bigquery",
"args": ["lsp"],
"filetypes": ["sql", "bq", "bigquery"]
}
}Helix (in ):
languages.tomltoml
[[language]]
name = "sql"
language-servers = ["bigquery-lsp"]
[language-server.bigquery-lsp]
command = "bigquery"
args = ["lsp"]Neovim:
lua
-- 在nvim/lua/bigquery-lsp.lua或init.lua中配置
vim.api.nvim_create_autocmd("FileType", {
pattern = { "sql", "bq", "bigquery" },
callback = function()
vim.lsp.start({
name = "bigquery-lsp",
cmd = { "bigquery", "lsp" },
root_dir = vim.fn.getcwd(),
})
end,
})VS Code(在或语言服务器配置中):
settings.jsonjson
{
"bigquery-lsp": {
"command": "bigquery",
"args": ["lsp"],
"filetypes": ["sql", "bq", "bigquery"]
}
}Helix(在中):
languages.tomltoml
[[language]]
name = "sql"
language-servers = ["bigquery-lsp"]
[language-server.bigquery-lsp]
command = "bigquery"
args = ["lsp"]Common Workflows
常见工作流
Workflow 1: Exploratory Data Analysis
工作流1:探索性数据分析
bash
undefinedbash
undefined1. Verify authentication
1. 验证身份验证状态
bigquery auth check
bigquery auth check
2. List available datasets
2. 列出可用数据集
bigquery datasets list my-project
bigquery datasets list my-project
3. List tables in dataset
3. 列出数据集中的表
bigquery tables list my-project.analytics
bigquery tables list my-project.analytics
4. Check table schema
4. 查看表Schema
bigquery tables describe my-project.analytics.events
bigquery tables describe my-project.analytics.events
5. Preview data (text format for readability)
5. 预览数据(文本格式更易读)
bigquery query --format text
"SELECT * FROM my-project.analytics.events LIMIT 10"
"SELECT * FROM my-project.analytics.events LIMIT 10"
bigquery query --format text
"SELECT * FROM my-project.analytics.events LIMIT 10"
"SELECT * FROM my-project.analytics.events LIMIT 10"
6. Get row count
6. 获取行数
bigquery query "SELECT COUNT(*) as total FROM my-project.analytics.events"
bigquery query "SELECT COUNT(*) as total FROM my-project.analytics.events"
7. Check data distribution
7. 查看数据分布
bigquery query --format text "
SELECT
DATE(timestamp) as date,
COUNT(*) as events
FROM my-project.analytics.events
GROUP BY date
ORDER BY date DESC
LIMIT 30
"
undefinedbigquery query --format text "
SELECT
DATE(timestamp) as date,
COUNT(*) as events
FROM my-project.analytics.events
GROUP BY date
ORDER BY date DESC
LIMIT 30
"
undefinedWorkflow 2: Cost-Aware Query Development
工作流2:成本感知的查询开发
bash
undefinedbash
undefined1. Dry run to estimate cost
1. 预运行估算成本
bigquery dry-run "
SELECT *
FROM my-project.large_dataset.table
WHERE date >= '2025-01-01'
"
bigquery dry-run "
SELECT *
FROM my-project.large_dataset.table
WHERE date >= '2025-01-01'
"
2. If cost is acceptable, run query
2. 如果成本可接受,执行查询
bigquery query "
SELECT *
FROM my-project.large_dataset.table
WHERE date >= '2025-01-01'
"
bigquery query "
SELECT *
FROM my-project.large_dataset.table
WHERE date >= '2025-01-01'
"
3. For automation, skip confirmation
3. 自动化场景下跳过确认
bigquery query --yes "
SELECT *
FROM my-project.large_dataset.table
WHERE date >= '2025-01-01'
" > results.json
undefinedbigquery query --yes "
SELECT *
FROM my-project.large_dataset.table
WHERE date >= '2025-01-01'
" > results.json
undefinedWorkflow 3: Template-Based Reporting
工作流3:基于模板的报表
bash
undefinedbash
undefined1. Search for relevant templates
1. 搜索相关模板
bigquery templates search "daily"
bigquery templates search "daily"
2. Validate template
2. 验证模板
bigquery templates validate daily-metrics
bigquery templates validate daily-metrics
3. Run template with parameters
3. 传入参数运行模板
bigquery templates run daily-metrics
--param date=$(date +%Y-%m-%d)
--param region=US
--format json > daily-report.json
--param date=$(date +%Y-%m-%d)
--param region=US
--format json > daily-report.json
bigquery templates run daily-metrics
--param date=$(date +%Y-%m-%d)
--param region=US
--format json > daily-report.json
--param date=$(date +%Y-%m-%d)
--param region=US
--format json > daily-report.json
4. Schedule in cron or CI/CD
4. 在cron或CI/CD中定时执行
0 1 * * * bigquery templates run daily-metrics --param date=$(date +%Y-%m-%d) --yes
0 1 * * * bigquery templates run daily-metrics --param date=$(date +%Y-%m-%d) --yes
undefinedundefinedWorkflow 4: External Data Analysis
工作流4:外部数据分析
bash
undefinedbash
undefined1. Create external table pointing to GCS
1. 创建指向GCS的外部表
bigquery tables create-external my-project.staging.raw_logs
--source-uri gs://logs-bucket/2025-01-*.json
--format json
--autodetect
--source-uri gs://logs-bucket/2025-01-*.json
--format json
--autodetect
bigquery tables create-external my-project.staging.raw_logs
--source-uri gs://logs-bucket/2025-01-*.json
--format json
--autodetect
--source-uri gs://logs-bucket/2025-01-*.json
--format json
--autodetect
2. Query external table
2. 查询外部表
bigquery query "
SELECT
timestamp,
user_id,
action
FROM my-project.staging.raw_logs
WHERE action = 'purchase'
LIMIT 100
"
bigquery query "
SELECT
timestamp,
user_id,
action
FROM my-project.staging.raw_logs
WHERE action = 'purchase'
LIMIT 100
"
3. Update external table when new files arrive
3. 新文件到达时更新外部表
bigquery tables update-external my-project.staging.raw_logs
--source-uri gs://logs-bucket/2025-02-*.json
--source-uri gs://logs-bucket/2025-02-*.json
undefinedbigquery tables update-external my-project.staging.raw_logs
--source-uri gs://logs-bucket/2025-02-*.json
--source-uri gs://logs-bucket/2025-02-*.json
undefinedWorkflow 5: Data Loading Pipeline
工作流5:数据加载流水线
bash
undefinedbash
undefined1. Load initial data
1. 加载初始数据
bigquery tables load my-project.dataset.events
gs://bucket/events-2025-01-01.csv
--format csv
--write-disposition replace
gs://bucket/events-2025-01-01.csv
--format csv
--write-disposition replace
bigquery tables load my-project.dataset.events
gs://bucket/events-2025-01-01.csv
--format csv
--write-disposition replace
gs://bucket/events-2025-01-01.csv
--format csv
--write-disposition replace
2. Append incremental data
2. 追加增量数据
bigquery tables load my-project.dataset.events
gs://bucket/events-2025-01-02.csv
--format csv
--write-disposition append
gs://bucket/events-2025-01-02.csv
--format csv
--write-disposition append
bigquery tables load my-project.dataset.events
gs://bucket/events-2025-01-02.csv
--format csv
--write-disposition append
gs://bucket/events-2025-01-02.csv
--format csv
--write-disposition append
3. Verify data loaded
3. 验证数据加载完成
bigquery query "
SELECT
DATE(timestamp) as date,
COUNT(*) as count
FROM my-project.dataset.events
GROUP BY date
ORDER BY date
"
undefinedbigquery query "
SELECT
DATE(timestamp) as date,
COUNT(*) as count
FROM my-project.dataset.events
GROUP BY date
ORDER BY date
"
undefinedWorkflow 6: Real-Time Data Insertion
工作流6:实时数据插入
bash
undefinedbash
undefined1. Insert single event (inline JSON)
1. 插入单个事件(内联JSON)
bigquery tables insert my-project.dataset.events
--json '{"user_id": "U123", "event": "click", "timestamp": "2025-01-15T10:00:00Z"}'
--json '{"user_id": "U123", "event": "click", "timestamp": "2025-01-15T10:00:00Z"}'
bigquery tables insert my-project.dataset.events
--json '{"user_id": "U123", "event": "click", "timestamp": "2025-01-15T10:00:00Z"}'
--json '{"user_id": "U123", "event": "click", "timestamp": "2025-01-15T10:00:00Z"}'
2. Stream JSONL from application
2. 从应用程序流式插入JSONL数据
my-app --output jsonl | bigquery tables insert my-project.dataset.events --data - --format json
my-app --output jsonl | bigquery tables insert my-project.dataset.events --data - --format json
3. Insert batch from JSONL file
3. 从JSONL文件批量插入
bigquery tables insert my-project.dataset.events
--data events.jsonl --format json
--data events.jsonl --format json
bigquery tables insert my-project.dataset.events
--data events.jsonl --format json
--data events.jsonl --format json
4. Stream with transformation and error handling
4. 转换并流式插入,同时处理错误
cat raw_events.json | jq -c '.events[]' |
bigquery tables insert my-project.dataset.events
--data - --format json
--skip-invalid
--ignore-unknown
bigquery tables insert my-project.dataset.events
--data - --format json
--skip-invalid
--ignore-unknown
undefinedcat raw_events.json | jq -c '.events[]' |
bigquery tables insert my-project.dataset.events
--data - --format json
--skip-invalid
--ignore-unknown
bigquery tables insert my-project.dataset.events
--data - --format json
--skip-invalid
--ignore-unknown
undefinedBest Practices
最佳实践
Query Development
查询开发
- Always dry-run first: Use to estimate costs
bigquery dry-run - Use templates: Create templates for repeated queries
- Validate before running: Check syntax and cost before execution
- Use text format for exploration: for human-readable tables
--format text - Use JSON for automation: for machine processing
--format json - Skip confirmations in scripts: Use flag for automation
--yes
- 始终先预运行:使用估算成本
bigquery dry-run - 使用模板:为重复查询创建模板
- 运行前验证:执行前检查语法和成本
- 探索时使用文本格式:适合人类阅读
--format text - 自动化时使用JSON格式:适合机器处理
--format json - 脚本中跳过确认:自动化场景使用参数
--yes
Cost Management
成本管理
- Dry run expensive queries: Always estimate with
bigquery dry-run - Monitor bytes processed: Check query cost estimates before running
- Use partition pruning: Filter on partitioned columns in WHERE clauses
- Limit result sets: Use LIMIT for exploratory queries
- Use templates: Standardize queries to avoid mistakes
- Leverage external tables: Avoid copying data when querying directly from GCS
- 高成本查询先预运行:始终使用估算
bigquery dry-run - 监控处理的数据量:执行前查看查询成本估算
- 使用分区过滤:在WHERE子句中过滤分区列
- 限制结果集:探索性查询使用LIMIT
- 使用模板:标准化查询避免错误
- 使用外部表:直接查询GCS数据,避免复制
Authentication
身份验证
- Check auth first: Run before operations
bigquery auth check - Use service accounts: For automation and CI/CD
- Verify scopes: Ensure all required BigQuery scopes are granted
- Re-authenticate when needed: if check fails
bigquery auth login
- 先检查身份验证:操作前运行
bigquery auth check - 使用服务账号:自动化和CI/CD场景优先使用
- 验证权限范围:确保所有必要的BigQuery权限已授予
- 必要时重新验证:如果检查失败,运行
bigquery auth login
Template Management
模板管理
- Use descriptive names: Make templates easy to find
- Document parameters: Include parameter descriptions in templates
- Validate before use: Run before execution
bigquery templates validate - Search before creating: Check if similar template exists
- Version control templates: Store template definitions in git
- 使用描述性名称:让模板易于查找
- 文档化参数:在模板中包含参数描述
- 使用前验证:执行前运行
bigquery templates validate - 创建前搜索:检查是否已有类似模板
- 版本控制模板:将模板定义存储在git中
Data Loading
数据加载
- Choose the right method:
- Use for <1000 rows (streaming insert API, immediate availability)
insert - Use for >10MB files or >1000 rows (load jobs with GCS upload)
load
- Use
- Use JSONL for streaming: Newline-delimited JSON is ideal for streaming pipelines
- Stream from stdin: Use to pipe data from applications or transformations
--data - - Validate before loading: Use flag to test configurations
--dry-run - Handle bad records: Set for messy data
--max-bad-records - Choose write disposition: for full refresh,
replacefor incrementalappend - Use external tables: For data that changes frequently in GCS (no data copying)
- Use appropriate formats: CSV for simple data, JSON/JSONL for complex, Parquet/Avro for large datasets
- 选择合适的方法:
- 少于1000行使用(流式插入API,数据立即可用)
insert - 大于10MB文件或1000行使用(需要上传至GCS的加载作业)
load
- 少于1000行使用
- 流式处理使用JSONL:换行分隔JSON是流式流水线的理想格式
- 从标准输入流式插入:使用从应用程序或转换工具管道数据
--data - - 加载前验证:使用参数测试配置
--dry-run - 处理错误记录:为杂乱数据设置
--max-bad-records - 选择合适的写入模式:用于全量刷新,
replace用于增量加载append - 使用外部表:针对GCS中频繁变化的数据(无需复制)
- 使用合适的格式:简单数据用CSV,复杂数据用JSON/JSONL,大规模数据集用Parquet/Avro
MCP Server
MCP服务器
- Use stdio for local: Prefer stdio mode for local MCP clients
- Use HTTP for remote: Use HTTP mode for networked deployments
- Secure HTTP endpoints: Put HTTP server behind authentication/firewall
- Monitor server logs: Check for errors and performance issues
- Set appropriate port: Choose non-conflicting port for HTTP mode
- 本地使用标准输入输出模式:本地MCP客户端优先使用该模式
- 远程使用HTTP模式:网络部署使用HTTP模式
- 保护HTTP端点:将HTTP服务器放在身份验证/防火墙后
- 监控服务器日志:检查错误和性能问题
- 选择合适的端口:为HTTP模式选择不冲突的端口
LSP Integration
LSP集成
- Configure per-project: Set up LSP for SQL files in your editor
- Use schema completion: Leverage auto-complete for table/column names
- Check diagnostics: Fix errors and warnings shown inline
- Format queries: Use LSP formatting for consistent style
- 按项目配置:为编辑器中的SQL文件配置LSP
- 使用Schema补全:利用自动补全功能输入表/列名
- 查看诊断信息:修复内联显示的错误和警告
- 格式化查询:使用LSP格式化保持一致的风格
Configuration
配置
Environment Variables
环境变量
bash
undefinedbash
undefinedSet default project
设置默认项目
export GOOGLE_CLOUD_PROJECT=my-project
export GOOGLE_CLOUD_PROJECT=my-project
Set credentials (for service accounts)
设置凭据(服务账号)
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
Add to ~/.zshrc or ~/.bashrc for persistence
添加到~/.zshrc或~/.bashrc以持久化
echo 'export GOOGLE_CLOUD_PROJECT=my-project' >> ~/.zshrc
undefinedecho 'export GOOGLE_CLOUD_PROJECT=my-project' >> ~/.zshrc
undefinedAuthentication Methods
身份验证方法
User Credentials (interactive):
bash
bigquery auth login用户凭据(交互式):
bash
bigquery auth loginOpens browser for Google authentication
打开浏览器进行Google身份验证
**Service Account** (automation):
```bash
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/sa-key.json
bigquery auth checkApplication Default Credentials (gcloud):
bash
gcloud auth application-default login
bigquery auth check
**服务账号**(自动化):
```bash
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/sa-key.json
bigquery auth check应用默认凭据(gcloud):
bash
gcloud auth application-default login
bigquery auth checkTroubleshooting
故障排除
Issue: "Not authenticated" or "Permission denied"
问题:"未验证"或"权限被拒绝"
Solution: Check authentication and scopes
bash
undefined解决方案:检查身份验证和权限范围
bash
undefinedCheck current auth status
检查当前身份验证状态
bigquery auth check
bigquery auth check
Re-authenticate if needed
必要时重新验证
bigquery auth login
bigquery auth login
Verify gcloud is set to correct project
验证gcloud是否设置了正确的项目
gcloud config get-value project
gcloud config get-value project
Set project if needed
必要时设置项目
gcloud config set project my-project
undefinedgcloud config set project my-project
undefinedIssue: "Table not found"
问题:"表未找到"
Solution: Use fully qualified table names
bash
undefined解决方案:使用完全限定的表名
bash
undefinedWrong - missing project/dataset
错误示例 - 缺少项目/数据集
bigquery query "SELECT * FROM table"
bigquery query "SELECT * FROM table"
Correct - fully qualified
正确示例 - 完全限定名称
bigquery query "SELECT * FROM my-project.my-dataset.my-table"
bigquery query "SELECT * FROM my-project.my-dataset.my-table"
Or use backticks for reserved words
保留字使用反引号
bigquery query "SELECT * FROM `my-project.my-dataset.my-table`"
undefinedbigquery query "SELECT * FROM `my-project.my-dataset.my-table`"
undefinedIssue: "Query too expensive"
问题:"查询成本过高"
Solution: Check cost with dry-run and optimize
bash
undefined解决方案:使用预运行检查成本并优化查询
bash
undefinedCheck estimated cost
检查估算成本
bigquery dry-run "SELECT * FROM large_table WHERE date >= '2025-01-01'"
bigquery dry-run "SELECT * FROM large_table WHERE date >= '2025-01-01'"
Optimize with partition filters
使用分区过滤优化
bigquery dry-run "
SELECT * FROM large_table
WHERE _PARTITIONDATE = '2025-01-15'
"
undefinedbigquery dry-run "
SELECT * FROM large_table
WHERE _PARTITIONDATE = '2025-01-15'
"
undefinedIssue: "Template not found"
问题:"模板未找到"
Solution: Search for templates and verify name
bash
undefined解决方案:搜索模板并验证名称
bash
undefinedList all templates
列出所有模板
bigquery templates list
bigquery templates list
Search for template
搜索模板
bigquery templates search "keyword"
bigquery templates search "keyword"
Use exact template name
使用精确的模板名称
bigquery templates run exact-template-name
undefinedbigquery templates run exact-template-name
undefinedIssue: "The specified bucket does not exist"
问题:"指定的存储桶不存在"
Cause: with a local file requires GCS staging bucket configuration.
bigquery tables loadSolutions:
-
Preferred for small datasets (<1000 rows): Useinstead (no GCS required)
tables insertbashbigquery tables insert my-project.dataset.table \ --data /tmp/data.jsonl \ --format json -
For larger datasets: Upload to GCS first, then loadbash
gsutil cp /tmp/large-file.jsonl gs://my-bucket/ bigquery tables load my-project.dataset.table \ gs://my-bucket/large-file.jsonl \ --format json -
Last resort: Configure GCS staging bucket in BigQuery CLI config (requires additional setup)
原因:使用加载本地文件需要配置GCS暂存存储桶。
bigquery tables load解决方案:
-
小规模数据集(<1000行)推荐方案:改用(无需GCS)
tables insertbashbigquery tables insert my-project.dataset.table \ --data /tmp/data.jsonl \ --format json -
大规模数据集方案:先上传至GCS,再加载bash
gsutil cp /tmp/large-file.jsonl gs://my-bucket/ bigquery tables load my-project.dataset.table \ gs://my-bucket/large-file.jsonl \ --format json -
最后方案:在BigQuery CLI配置中设置GCS暂存存储桶(需要额外配置)
Issue: "Invalid schema"
问题:"无效的Schema"
Solution: Check schema format for external tables
bash
undefined解决方案:检查外部表的Schema格式
bash
undefinedSchema format: column:type,column:type,...
Schema格式:column:type,column:type,...
bigquery tables create-external my-project.dataset.table
--source-uri gs://bucket/file.csv
--format csv
--schema "id:INTEGER,name:STRING,created_at:TIMESTAMP"
--source-uri gs://bucket/file.csv
--format csv
--schema "id:INTEGER,name:STRING,created_at:TIMESTAMP"
bigquery tables create-external my-project.dataset.table
--source-uri gs://bucket/file.csv
--format csv
--schema "id:INTEGER,name:STRING,created_at:TIMESTAMP"
--source-uri gs://bucket/file.csv
--format csv
--schema "id:INTEGER,name:STRING,created_at:TIMESTAMP"
Or use autodetect
或使用自动检测
bigquery tables create-external my-project.dataset.table
--source-uri gs://bucket/file.csv
--format csv
--autodetect
--source-uri gs://bucket/file.csv
--format csv
--autodetect
undefinedbigquery tables create-external my-project.dataset.table
--source-uri gs://bucket/file.csv
--format csv
--autodetect
--source-uri gs://bucket/file.csv
--format csv
--autodetect
undefinedIssue: "MCP server not responding"
问题:"MCP服务器无响应"
Solution: Check server mode and connectivity
bash
undefined解决方案:检查服务器模式和连接性
bash
undefinedFor stdio mode, ensure client is using stdio transport
标准输入输出模式,确保客户端使用标准输入输出传输
bigquery mcp stdio
bigquery mcp stdio
For HTTP mode, check port and firewall
HTTP模式,检查端口和防火墙
bigquery mcp http --port 8080
bigquery mcp http --port 8080
Test HTTP endpoint
测试HTTP端点
undefinedundefinedIssue: "LSP not starting in editor"
问题:"编辑器中LSP无法启动"
Solution: Verify LSP configuration and binary path
bash
undefined解决方案:验证LSP配置和二进制文件路径
bash
undefinedCheck bigquery is in PATH
检查bigquery是否在PATH中
which bigquery
which bigquery
Test LSP manually
手动测试LSP
bigquery lsp
bigquery lsp
Verify editor configuration points to correct command
验证编辑器配置指向正确的命令
Neovim: check cmd = { "bigquery", "lsp" }
Neovim: 检查cmd = { "bigquery", "lsp" }
VS Code: check "command": "bigquery", "args": ["lsp"]
VS Code: 检查"command": "bigquery", "args": ["lsp"]
undefinedundefinedQuick Reference
快速参考
bash
undefinedbash
undefinedAuthentication
身份验证
bigquery auth check # Check auth status
bigquery auth login # Login with gcloud
bigquery auth check # 检查身份验证状态
bigquery auth login # 通过gcloud登录
Queries
查询
bigquery query "SELECT ..." # Execute query
bigquery query --yes "SELECT ..." # Skip confirmation
bigquery query --format text "SELECT ..." # Table output
bigquery dry-run "SELECT ..." # Estimate cost
bigquery query "SELECT ..." # 执行查询
bigquery query --yes "SELECT ..." # 跳过确认
bigquery query --format text "SELECT ..." # 表格格式输出
bigquery dry-run "SELECT ..." # 估算成本
Datasets
数据集
bigquery datasets list PROJECT # List datasets
bigquery datasets list PROJECT # 列出数据集
Tables
表
bigquery tables list PROJECT.DATASET # List tables
bigquery tables describe PROJECT.DATASET.TABLE # Show schema
bigquery tables insert TABLE --json '{"id": 1}' # Insert rows (inline)
bigquery tables insert TABLE --data file.jsonl --format json # Insert from JSONL
cat data.jsonl | bigquery tables insert TABLE --data - # Stream insert
bigquery tables load TABLE file.csv # Load data (bulk)
bigquery tables load TABLE gs://bucket/file.csv # Load from GCS
bigquery tables extract TABLE gs://bucket/output.csv # Extract to GCS
bigquery tables create-external TABLE --source-uri ... # External table
bigquery tables update-external TABLE --source-uri ... # Update external
bigquery tables list PROJECT.DATASET # 列出表
bigquery tables describe PROJECT.DATASET.TABLE # 查看Schema
bigquery tables insert TABLE --json '{"id": 1}' # 插入行(内联)
bigquery tables insert TABLE --data file.jsonl --format json # 从JSONL插入
cat data.jsonl | bigquery tables insert TABLE --data - # 流式插入
bigquery tables load TABLE file.csv # 批量加载数据
bigquery tables load TABLE gs://bucket/file.csv # 从GCS加载
bigquery tables extract TABLE gs://bucket/output.csv # 提取到GCS
bigquery tables create-external TABLE --source-uri ... # 创建外部表
bigquery tables update-external TABLE --source-uri ... # 更新外部表
Templates
模板
bigquery templates list # List templates
bigquery templates search "keyword" # Search templates
bigquery templates validate TEMPLATE # Validate template
bigquery templates run TEMPLATE --param key=value # Run template
bigquery templates list # 列出模板
bigquery templates search "keyword" # 搜索模板
bigquery templates validate TEMPLATE # 验证模板
bigquery templates run TEMPLATE --param key=value # 运行模板
MCP Server
MCP服务器
bigquery mcp stdio # Start MCP (stdio mode)
bigquery mcp http # Start MCP (HTTP mode)
bigquery mcp http --port 3000 # Custom port
bigquery mcp stdio # 启动MCP(标准输入输出模式)
bigquery mcp http # 启动MCP(HTTP模式)
bigquery mcp http --port 3000 # 指定自定义端口
LSP Server
LSP服务器
bigquery lsp # Start LSP server
undefinedbigquery lsp # 启动LSP服务器
undefinedIntegration Examples
集成示例
CI/CD Pipeline
CI/CD流水线
bash
#!/bin/bashbash
#!/bin/bashdaily-etl.sh
daily-etl.sh
Authenticate with service account
使用服务账号身份验证
export GOOGLE_APPLICATION_CREDENTIALS=/secrets/sa-key.json
bigquery auth check || exit 1
export GOOGLE_APPLICATION_CREDENTIALS=/secrets/sa-key.json
bigquery auth check || exit 1
Run daily ETL template
运行每日ETL模板
bigquery templates run daily-etl
--param date=$(date +%Y-%m-%d)
--yes
--format json > /tmp/etl-result.json
--param date=$(date +%Y-%m-%d)
--yes
--format json > /tmp/etl-result.json
bigquery templates run daily-etl
--param date=$(date +%Y-%m-%d)
--yes
--format json > /tmp/etl-result.json
--param date=$(date +%Y-%m-%d)
--yes
--format json > /tmp/etl-result.json
Check result
检查结果
if [ $? -eq 0 ]; then
echo "ETL completed successfully"
else
echo "ETL failed"
exit 1
fi
undefinedif [ $? -eq 0 ]; then
echo "ETL执行成功"
else
echo "ETL执行失败"
exit 1
fi
undefinedData Quality Checks
数据质量检查
bash
#!/bin/bashbash
#!/bin/bashcheck-data-quality.sh
check-data-quality.sh
Run data quality template
运行数据质量检查模板
RESULT=$(bigquery templates run data-quality-check
--param table=my-project.dataset.table
--yes
--format json)
--param table=my-project.dataset.table
--yes
--format json)
RESULT=$(bigquery templates run data-quality-check
--param table=my-project.dataset.table
--yes
--format json)
--param table=my-project.dataset.table
--yes
--format json)
Parse result and check quality metrics
解析结果并检查质量指标
INVALID_ROWS=$(echo $RESULT | jq '.invalid_rows')
if [ "$INVALID_ROWS" -gt 100 ]; then
echo "Data quality check failed: $INVALID_ROWS invalid rows"
exit 1
else
echo "Data quality check passed"
fi
undefinedINVALID_ROWS=$(echo $RESULT | jq '.invalid_rows')
if [ "$INVALID_ROWS" -gt 100 ]; then
echo "数据质量检查失败:$INVALID_ROWS条无效行"
exit 1
else
echo "数据质量检查通过"
fi
undefinedScheduled Reporting
定时报表
bash
#!/bin/bashbash
#!/bin/bashgenerate-report.sh
generate-report.sh
Generate weekly report
生成周报
bigquery templates run weekly-revenue-report
--param week_start=$(date -d "last monday" +%Y-%m-%d)
--param week_end=$(date -d "next sunday" +%Y-%m-%d)
--yes
--format json > /reports/weekly-$(date +%Y-%m-%d).json
--param week_start=$(date -d "last monday" +%Y-%m-%d)
--param week_end=$(date -d "next sunday" +%Y-%m-%d)
--yes
--format json > /reports/weekly-$(date +%Y-%m-%d).json
bigquery templates run weekly-revenue-report
--param week_start=$(date -d "last monday" +%Y-%m-%d)
--param week_end=$(date -d "next sunday" +%Y-%m-%d)
--yes
--format json > /reports/weekly-$(date +%Y-%m-%d).json
--param week_start=$(date -d "last monday" +%Y-%m-%d)
--param week_end=$(date -d "next sunday" +%Y-%m-%d)
--yes
--format json > /reports/weekly-$(date +%Y-%m-%d).json
Upload to GCS
上传至GCS
gsutil cp /reports/weekly-*.json gs://reports-bucket/
undefinedgsutil cp /reports/weekly-*.json gs://reports-bucket/
undefinedSummary
总结
Primary commands:
- - Authentication management
bigquery auth {check,login} - - Execute SQL with cost awareness
bigquery query - - Estimate query costs
bigquery dry-run - - List datasets
bigquery datasets list - - Table operations
bigquery tables {list,describe,insert,load,extract,create-external,update-external} - - Named templates
bigquery templates {list,search,validate,run} - - MCP server modes
bigquery mcp {stdio,http} - - LSP server
bigquery lsp
Key features:
- Cost-aware query execution with confirmation prompts
- Named query templates with parameter substitution
- Streaming insert API for real-time data (<1000 rows)
- Bulk load jobs for large datasets (>10MB or >1000 rows)
- JSONL streaming support with stdin ()
--data - - Data extraction to Cloud Storage (CSV, JSON, Avro, Parquet)
- External table support for GCS data
- MCP server with stdio and HTTP modes
- LSP integration for editor support
Best practices:
- Always check authentication first with
auth check - Use to estimate costs before expensive queries
dry-run - Create templates for frequently-used queries
- Use flag for automation and CI/CD
--yes - Use for <1000 rows,
insertfor larger datasetsload - Use JSONL format for streaming pipelines
- Stream from stdin with for data transformations
--data - - Use external tables to avoid data duplication
- Configure MCP for natural language query capabilities
- Set up LSP in your editor for SQL development
MCP Integration:
- Semantic search across datasets
- Natural language to SQL translation
- Schema discovery and exploration
- Template execution via MCP tools
- Available in both stdio and HTTP modes
主要命令:
- - 身份验证管理
bigquery auth {check,login} - - 执行SQL查询,具备成本感知
bigquery query - - 估算查询成本
bigquery dry-run - - 列出数据集
bigquery datasets list - - 表操作
bigquery tables {list,describe,insert,load,extract,create-external,update-external} - - 命名模板管理
bigquery templates {list,search,validate,run} - - MCP服务器模式
bigquery mcp {stdio,http} - - LSP服务器
bigquery lsp
关键特性:
- 具备成本感知的查询执行,带确认提示
- 支持参数替换的命名查询模板
- 实时数据流式插入API(少于1000行)
- 大规模数据集批量加载作业(大于10MB或1000行)
- 支持通过标准输入的JSONL流式处理()
--data - - 支持将数据提取到Cloud Storage(CSV、JSON、Avro、Parquet)
- GCS数据的外部表支持
- 标准输入输出和HTTP模式的MCP服务器
- 编辑器集成的LSP支持
最佳实践:
- 操作前始终使用检查身份验证
auth check - 高成本查询前使用估算成本
dry-run - 为常用查询创建模板
- 自动化和CI/CD场景使用参数
--yes - 少于1000行使用,更大数据集使用
insertload - 流式流水线使用JSONL格式
- 使用从标准输入流式处理转换后的数据
--data - - 使用外部表避免数据重复
- 配置MCP实现自然语言查询
- 在编辑器中设置LSP进行SQL开发
MCP集成:
- 跨数据集的语义搜索
- 自然语言到SQL的转换
- Schema发现和探索
- 通过MCP工具执行模板
- 支持标准输入输出和HTTP两种模式