dbx-query

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Databricks SQL Query

Databricks SQL 查询

Execute a read-only SQL query against Databricks and interpret the results.

针对Databricks执行只读SQL查询并解读结果。

Task

任务

$ARGUMENTS

Constraints

约束条件

SELECT queries ONLY — no INSERT, UPDATE, DELETE, CREATE, DROP, ALTER
Always fully qualify table names:
```
catalog.schema.table
```
Always list columns explicitly — no
```
SELECT *
```
Include
```
LIMIT
```
for exploratory queries
Use CTEs for complex multi-step queries

仅允许SELECT查询——禁止使用INSERT、UPDATE、DELETE、CREATE、DROP、ALTER
始终使用完全限定表名：
```
catalog.schema.table
```
始终显式列出列——禁止使用
```
SELECT *
```
探索性查询需包含
```
LIMIT
```
复杂多步骤查询使用CTE

Verify-Query-Validate Cycle

验证-查询-校验循环

1. Plan (think before writing SQL)

1. 规划（编写SQL前思考）

Before writing a query, reason through:

What tables do I need?
Do I know the column names, or do I need to verify with
```
dbx describe
```
?
If joining tables, what is the join path and cardinality?
Check memory for previously discovered join paths

编写查询前，需理清以下几点：

我需要哪些表？
我是否知道列名，还是需要用
```
dbx describe
```
验证？
如果关联表，关联路径和基数是什么？
回忆之前发现的关联路径

2. Verify (non-optional for unfamiliar tables)

2. 验证（对不熟悉的表为必填步骤）

Run

dbx describe

on any table you're not certain about:

bash

dbx describe <catalog>.<schema>.<table>

Review output for exact column names, data types, and NULL constraints.

对任何不确定的表运行

dbx describe

：

bash

dbx describe <catalog>.<schema>.<table>

查看输出以确认精确列名、数据类型和NULL约束。

3. Query

3. 查询

Construct and execute:

bash

dbx query "SELECT col1, col2 FROM catalog.schema.table WHERE condition ORDER BY col1 LIMIT 100"

For SQL in a file:

dbx query -f query.sql

To save output:

dbx query "SELECT ..." --format csv -o results.csv

构建并执行：

bash

dbx query "SELECT col1, col2 FROM catalog.schema.table WHERE condition ORDER BY col1 LIMIT 100"

若SQL在文件中：

dbx query -f query.sql

保存输出：

dbx query "SELECT ..." --format csv -o results.csv

4. Validate

4. 验证结果

Before accepting results:

Is the row count reasonable (> 0 and not suspiciously high)?
Do sample values look like real data?
If joining, are there unexpected duplicates (wrong join cardinality)?

接受结果前：

行数是否合理（>0且未高得离谱）？
样本值是否看起来像真实数据？
如果关联表，是否存在意外重复（关联基数错误）？

5. Iterate if needed

5. 必要时迭代

If results are unexpected:

Check column types with
```
dbx describe
```
Sample raw data with
```
dbx sample
```
Adjust query and re-execute

若结果不符合预期：

用
```
dbx describe
```
检查列类型
用
```
dbx sample
```
采样原始数据
调整查询并重新执行

Example: Simple Aggregation

示例：简单聚合

User asks: "How many orders per month in 2025?"

Plan: Need the orders table. I know from memory that

main.analytics.orders

has

order_id

and

order_date

columns.

Query:

sql

SELECT
  DATE_TRUNC('month', order_date) AS month,
  COUNT(order_id) AS order_count
FROM main.analytics.orders
WHERE order_date >= '2025-01-01'
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month

Validate: 12 rows returned (Jan-Dec), counts look reasonable, no NULLs.

用户提问："2025年每月有多少订单？"

规划：需要订单表。我记得

main.analytics.orders

表包含

order_id

和

order_date

列。

查询：

sql

SELECT
  DATE_TRUNC('month', order_date) AS month,
  COUNT(order_id) AS order_count
FROM main.analytics.orders
WHERE order_date >= '2025-01-01'
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month

验证结果：返回12行（1月至12月），计数合理，无NULL值。

Example: Multi-Table Join

示例：多表关联

User asks: "Show customer names with their total order value"

Plan: Need customers and orders. Check memory for join path —

orders.customer_id = customers.customer_id

(many-to-one). Need to aggregate orders.

Verify: Run

dbx describe

on both tables to confirm column names.

Query:

sql

SELECT
  c.customer_id,
  c.name,
  SUM(o.order_amount) AS total_value
FROM main.analytics.customers c
JOIN main.analytics.orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name
ORDER BY total_value DESC
LIMIT 20

Validate: 20 rows, values look reasonable, no duplicate customer_ids (GROUP BY working correctly).

用户提问："显示客户姓名及其订单总金额"

规划：需要客户表和订单表。回忆关联路径——

orders.customer_id = customers.customer_id

（多对一）。需要聚合订单数据。

验证：对两个表运行

dbx describe

以确认列名。

查询：

sql

SELECT
  c.customer_id,
  c.name,
  SUM(o.order_amount) AS total_value
FROM main.analytics.customers c
JOIN main.analytics.orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name
ORDER BY total_value DESC
LIMIT 20

验证结果：返回20行，金额合理，无重复customer_id（GROUP BY生效）。

Output Format

输出格式

Query

查询语句

sql

-- The SQL that was executed

sql

-- The SQL that was executed

Results Summary

结果摘要

Plain-English interpretation of what the data shows.

用通俗易懂的语言解读数据所展示的内容。

Raw Results

原始结果

The actual query output.

查询的实际输出内容。