dbx-query

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Databricks SQL Query

Databricks SQL 查询

Execute a read-only SQL query against Databricks and interpret the results.
针对Databricks执行只读SQL查询并解读结果。

Task

任务

$ARGUMENTS
$ARGUMENTS

Constraints

约束条件

  • SELECT queries ONLY — no INSERT, UPDATE, DELETE, CREATE, DROP, ALTER
  • Always fully qualify table names:
    catalog.schema.table
  • Always list columns explicitly — no
    SELECT *
  • Include
    LIMIT
    for exploratory queries
  • Use CTEs for complex multi-step queries
  • 仅允许SELECT查询——禁止使用INSERT、UPDATE、DELETE、CREATE、DROP、ALTER
  • 始终使用完全限定表名:
    catalog.schema.table
  • 始终显式列出列——禁止使用
    SELECT *
  • 探索性查询需包含
    LIMIT
  • 复杂多步骤查询使用CTE

Verify-Query-Validate Cycle

验证-查询-校验循环

1. Plan (think before writing SQL)

1. 规划(编写SQL前思考)

Before writing a query, reason through:
  • What tables do I need?
  • Do I know the column names, or do I need to verify with
    dbx describe
    ?
  • If joining tables, what is the join path and cardinality?
  • Check memory for previously discovered join paths
编写查询前,需理清以下几点:
  • 我需要哪些表?
  • 我是否知道列名,还是需要用
    dbx describe
    验证?
  • 如果关联表,关联路径和基数是什么?
  • 回忆之前发现的关联路径

2. Verify (non-optional for unfamiliar tables)

2. 验证(对不熟悉的表为必填步骤)

Run
dbx describe
on any table you're not certain about:
bash
dbx describe <catalog>.<schema>.<table>
Review output for exact column names, data types, and NULL constraints.
对任何不确定的表运行
dbx describe
bash
dbx describe <catalog>.<schema>.<table>
查看输出以确认精确列名、数据类型和NULL约束。

3. Query

3. 查询

Construct and execute:
bash
dbx query "SELECT col1, col2 FROM catalog.schema.table WHERE condition ORDER BY col1 LIMIT 100"
For SQL in a file:
dbx query -f query.sql
To save output:
dbx query "SELECT ..." --format csv -o results.csv
构建并执行:
bash
dbx query "SELECT col1, col2 FROM catalog.schema.table WHERE condition ORDER BY col1 LIMIT 100"
若SQL在文件中:
dbx query -f query.sql
保存输出:
dbx query "SELECT ..." --format csv -o results.csv

4. Validate

4. 验证结果

Before accepting results:
  • Is the row count reasonable (> 0 and not suspiciously high)?
  • Do sample values look like real data?
  • If joining, are there unexpected duplicates (wrong join cardinality)?
接受结果前:
  • 行数是否合理(>0且未高得离谱)?
  • 样本值是否看起来像真实数据?
  • 如果关联表,是否存在意外重复(关联基数错误)?

5. Iterate if needed

5. 必要时迭代

If results are unexpected:
  1. Check column types with
    dbx describe
  2. Sample raw data with
    dbx sample
  3. Adjust query and re-execute
若结果不符合预期:
  1. dbx describe
    检查列类型
  2. dbx sample
    采样原始数据
  3. 调整查询并重新执行

Example: Simple Aggregation

示例:简单聚合

User asks: "How many orders per month in 2025?"
Plan: Need the orders table. I know from memory that
main.analytics.orders
has
order_id
and
order_date
columns.
Query:
sql
SELECT
  DATE_TRUNC('month', order_date) AS month,
  COUNT(order_id) AS order_count
FROM main.analytics.orders
WHERE order_date >= '2025-01-01'
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month
Validate: 12 rows returned (Jan-Dec), counts look reasonable, no NULLs.
用户提问:"2025年每月有多少订单?"
规划:需要订单表。我记得
main.analytics.orders
表包含
order_id
order_date
列。
查询
sql
SELECT
  DATE_TRUNC('month', order_date) AS month,
  COUNT(order_id) AS order_count
FROM main.analytics.orders
WHERE order_date >= '2025-01-01'
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month
验证结果:返回12行(1月至12月),计数合理,无NULL值。

Example: Multi-Table Join

示例:多表关联

User asks: "Show customer names with their total order value"
Plan: Need customers and orders. Check memory for join path —
orders.customer_id = customers.customer_id
(many-to-one). Need to aggregate orders.
Verify: Run
dbx describe
on both tables to confirm column names.
Query:
sql
SELECT
  c.customer_id,
  c.name,
  SUM(o.order_amount) AS total_value
FROM main.analytics.customers c
JOIN main.analytics.orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name
ORDER BY total_value DESC
LIMIT 20
Validate: 20 rows, values look reasonable, no duplicate customer_ids (GROUP BY working correctly).
用户提问:"显示客户姓名及其订单总金额"
规划:需要客户表和订单表。回忆关联路径——
orders.customer_id = customers.customer_id
(多对一)。需要聚合订单数据。
验证:对两个表运行
dbx describe
以确认列名。
查询
sql
SELECT
  c.customer_id,
  c.name,
  SUM(o.order_amount) AS total_value
FROM main.analytics.customers c
JOIN main.analytics.orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name
ORDER BY total_value DESC
LIMIT 20
验证结果:返回20行,金额合理,无重复customer_id(GROUP BY生效)。

Output Format

输出格式

Query

查询语句

sql
-- The SQL that was executed
sql
-- The SQL that was executed

Results Summary

结果摘要

Plain-English interpretation of what the data shows.
用通俗易懂的语言解读数据所展示的内容。

Raw Results

原始结果

The actual query output.
查询的实际输出内容。