dbx-query
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDatabricks SQL Query
Databricks SQL 查询
Execute a read-only SQL query against Databricks and interpret the results.
针对Databricks执行只读SQL查询并解读结果。
Task
任务
$ARGUMENTS
$ARGUMENTS
Constraints
约束条件
- SELECT queries ONLY — no INSERT, UPDATE, DELETE, CREATE, DROP, ALTER
- Always fully qualify table names:
catalog.schema.table - Always list columns explicitly — no
SELECT * - Include for exploratory queries
LIMIT - Use CTEs for complex multi-step queries
- 仅允许SELECT查询——禁止使用INSERT、UPDATE、DELETE、CREATE、DROP、ALTER
- 始终使用完全限定表名:
catalog.schema.table - 始终显式列出列——禁止使用
SELECT * - 探索性查询需包含
LIMIT - 复杂多步骤查询使用CTE
Verify-Query-Validate Cycle
验证-查询-校验循环
1. Plan (think before writing SQL)
1. 规划(编写SQL前思考)
Before writing a query, reason through:
- What tables do I need?
- Do I know the column names, or do I need to verify with ?
dbx describe - If joining tables, what is the join path and cardinality?
- Check memory for previously discovered join paths
编写查询前,需理清以下几点:
- 我需要哪些表?
- 我是否知道列名,还是需要用验证?
dbx describe - 如果关联表,关联路径和基数是什么?
- 回忆之前发现的关联路径
2. Verify (non-optional for unfamiliar tables)
2. 验证(对不熟悉的表为必填步骤)
Run on any table you're not certain about:
dbx describebash
dbx describe <catalog>.<schema>.<table>Review output for exact column names, data types, and NULL constraints.
对任何不确定的表运行:
dbx describebash
dbx describe <catalog>.<schema>.<table>查看输出以确认精确列名、数据类型和NULL约束。
3. Query
3. 查询
Construct and execute:
bash
dbx query "SELECT col1, col2 FROM catalog.schema.table WHERE condition ORDER BY col1 LIMIT 100"For SQL in a file:
To save output:
dbx query -f query.sqldbx query "SELECT ..." --format csv -o results.csv构建并执行:
bash
dbx query "SELECT col1, col2 FROM catalog.schema.table WHERE condition ORDER BY col1 LIMIT 100"若SQL在文件中:
保存输出:
dbx query -f query.sqldbx query "SELECT ..." --format csv -o results.csv4. Validate
4. 验证结果
Before accepting results:
- Is the row count reasonable (> 0 and not suspiciously high)?
- Do sample values look like real data?
- If joining, are there unexpected duplicates (wrong join cardinality)?
接受结果前:
- 行数是否合理(>0且未高得离谱)?
- 样本值是否看起来像真实数据?
- 如果关联表,是否存在意外重复(关联基数错误)?
5. Iterate if needed
5. 必要时迭代
If results are unexpected:
- Check column types with
dbx describe - Sample raw data with
dbx sample - Adjust query and re-execute
若结果不符合预期:
- 用检查列类型
dbx describe - 用采样原始数据
dbx sample - 调整查询并重新执行
Example: Simple Aggregation
示例:简单聚合
User asks: "How many orders per month in 2025?"
Plan: Need the orders table. I know from memory that has and columns.
main.analytics.ordersorder_idorder_dateQuery:
sql
SELECT
DATE_TRUNC('month', order_date) AS month,
COUNT(order_id) AS order_count
FROM main.analytics.orders
WHERE order_date >= '2025-01-01'
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY monthValidate: 12 rows returned (Jan-Dec), counts look reasonable, no NULLs.
用户提问:"2025年每月有多少订单?"
规划:需要订单表。我记得表包含和列。
main.analytics.ordersorder_idorder_date查询:
sql
SELECT
DATE_TRUNC('month', order_date) AS month,
COUNT(order_id) AS order_count
FROM main.analytics.orders
WHERE order_date >= '2025-01-01'
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month验证结果:返回12行(1月至12月),计数合理,无NULL值。
Example: Multi-Table Join
示例:多表关联
User asks: "Show customer names with their total order value"
Plan: Need customers and orders. Check memory for join path — (many-to-one). Need to aggregate orders.
orders.customer_id = customers.customer_idVerify: Run on both tables to confirm column names.
dbx describeQuery:
sql
SELECT
c.customer_id,
c.name,
SUM(o.order_amount) AS total_value
FROM main.analytics.customers c
JOIN main.analytics.orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name
ORDER BY total_value DESC
LIMIT 20Validate: 20 rows, values look reasonable, no duplicate customer_ids (GROUP BY working correctly).
用户提问:"显示客户姓名及其订单总金额"
规划:需要客户表和订单表。回忆关联路径——(多对一)。需要聚合订单数据。
orders.customer_id = customers.customer_id验证:对两个表运行以确认列名。
dbx describe查询:
sql
SELECT
c.customer_id,
c.name,
SUM(o.order_amount) AS total_value
FROM main.analytics.customers c
JOIN main.analytics.orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name
ORDER BY total_value DESC
LIMIT 20验证结果:返回20行,金额合理,无重复customer_id(GROUP BY生效)。
Output Format
输出格式
Query
查询语句
sql
-- The SQL that was executedsql
-- The SQL that was executedResults Summary
结果摘要
Plain-English interpretation of what the data shows.
用通俗易懂的语言解读数据所展示的内容。
Raw Results
原始结果
The actual query output.
查询的实际输出内容。