optimizing-query-text
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseOptimize Query from SQL Text
根据SQL文本优化查询
OUTPUT FORMAT
输出格式
Return ONLY the optimized SQL query. No markdown formatting, no explanations, no bullet points - just pure SQL that can be executed directly in Snowflake.
仅返回优化后的SQL查询。不要使用Markdown格式,不要解释,不要项目符号——只返回可直接在Snowflake中执行的纯SQL。
CRITICAL: Semantic Preservation Rules
关键:语义保留规则
The optimized query MUST return IDENTICAL results to the original.
Before returning ANY optimization, verify:
- Same columns: Exact same columns in exact same order with exact same aliases
- Same rows: Filter conditions must be semantically equivalent
- Same ordering: Preserve exactly as written
ORDER BY - Same limits: If original has , keep
LIMIT N. If no LIMIT, do NOT add one.LIMIT N
If you cannot guarantee identical results, return the original query unchanged.
优化后的查询必须返回与原查询完全一致的结果。
在返回任何优化结果前,请验证:
- 相同列:列完全相同,顺序完全一致,别名完全相同
- 相同行:过滤条件必须在语义上等价
- 相同排序:严格保留原查询的语句
ORDER BY - 相同限制:如果原查询有,则保留该限制;如果没有LIMIT,请勿添加。
LIMIT N
如果无法保证结果完全一致,请原封不动返回原查询。
Pattern 1: Function on Filter Column
模式1:过滤列上使用函数
Problem: Functions on columns in WHERE clause prevent partition pruning and index usage.
问题:WHERE子句中列上使用函数会阻止分区裁剪和索引使用。
CAN Fix
可优化场景
| Original | Optimized | Why Safe |
|---|---|---|
| | Equivalent range |
| | Equivalent range |
| | Equivalent range |
| | Same boundaries |
| | Equivalent range |
| 原查询 | 优化后查询 | 安全性说明 |
|---|---|---|
| | 等价范围查询 |
| | 等价范围查询 |
| | 等价范围查询 |
| | 边界条件一致 |
| | 等价范围查询 |
CANNOT Fix
不可优化场景
| Pattern | Why Not |
|---|---|
| Dynamic values, cannot precompute range |
| Comparing two columns, both need function |
| Day-of-week has no contiguous range |
| Needed for grouping logic |
| Function in SELECT/GROUP BY is fine, only filter matters |
| 模式 | 原因 |
|---|---|
| 动态值,无法预计算范围 |
| 比较两列,均需使用函数 |
| 星期几没有连续范围 |
| 分组逻辑需要该函数 |
| SELECT/GROUP BY中使用函数无问题,仅过滤条件需关注 |
Pattern 2: Function on JOIN Column
模式2:连接列上使用函数
Problem: Functions on JOIN columns prevent hash joins, forcing slower nested loop joins.
问题:JOIN列上使用函数会阻止哈希连接,迫使用户使用更慢的嵌套循环连接。
CAN Fix
可优化场景
| Original | Optimized | Why Safe |
|---|---|---|
| | If both are same type (e.g., INTEGER) |
| | If data is already consistently cased |
| | If data has no leading/trailing spaces |
| 原查询 | 优化后查询 | 安全性说明 |
|---|---|---|
| | 如果两者类型相同(例如INTEGER) |
| | 如果数据大小写已保持一致 |
| | 如果数据没有前导/尾随空格 |
CANNOT Fix
不可优化场景
| Pattern | Why Not |
|---|---|
| Types genuinely differ, CAST required |
| Different granularity, DATE() required |
| If b.code might have different case |
| Arithmetic transformation, cannot remove |
| 模式 | 原因 |
|---|---|
| 类型确实不同,必须使用CAST |
| 粒度不同,必须使用DATE() |
| 如果b.code可能存在不同大小写 |
| 算术转换,无法移除 |
Pattern 3: NOT IN Subquery
模式3:NOT IN子查询
Problem: has poor performance and unexpected NULL behavior.
NOT IN问题:性能较差且存在意外的NULL行为。
NOT INCAN Fix
可优化场景
| Original | Optimized | Why Safe |
|---|---|---|
| | Equivalent when subquery column is NOT NULL |
| | NOT NULL guarantees equivalence |
| 原查询 | 优化后查询 | 安全性说明 |
|---|---|---|
| | 当子查询列不为NULL时等价 |
| | NOT NULL约束保证等价性 |
CANNOT Fix
不可优化场景
| Pattern | Why Not |
|---|---|
| If subquery returns NULL, NOT IN returns no rows; NOT EXISTS doesn't |
| Multi-column NOT IN has complex NULL semantics |
Key Rule: Only convert NOT IN to NOT EXISTS if you can verify the subquery column cannot be NULL.
| 模式 | 原因 |
|---|---|
| 如果子查询返回NULL,NOT IN不会返回任何行;NOT EXISTS则不会出现此问题 |
| 多列NOT IN的NULL语义复杂 |
核心规则:仅当你能确认子查询列不可能为NULL时,才将NOT IN转换为NOT EXISTS。
Pattern 4: Repeated Subquery
模式4:重复子查询
Problem: Same subquery executed multiple times causes redundant scans.
问题:相同子查询多次执行会导致冗余扫描。
CAN Fix
可优化场景
| Original | Optimized |
|---|---|
| Subquery appears 2+ times identically | Extract to CTE, reference CTE multiple times |
| Same aggregation used in multiple places | Compute once in CTE |
| 原查询 | 优化后查询 |
|---|---|
| 相同子查询出现2次及以上 | 提取为CTE,多次引用该CTE |
| 同一聚合在多个地方使用 | 在CTE中计算一次 |
CANNOT Fix
不可优化场景
| Pattern | Why Not |
|---|---|
| Correlated subquery (references outer table) | Each execution is different, cannot cache |
| Subqueries with different filters | Not actually the same subquery |
| Subquery in SELECT that depends on current row | Correlation prevents extraction |
| 模式 | 原因 |
|---|---|
| 关联子查询(引用外部表) | 每次执行结果不同,无法缓存 |
| 子查询过滤条件不同 | 并非真正相同的子查询 |
| SELECT中依赖当前行的子查询 | 关联关系导致无法提取 |
Pattern 5: Implicit Comma Joins
模式5:隐式逗号连接
Problem: Comma-separated tables in FROM clause are harder to read and optimize.
问题:FROM子句中用逗号分隔表的写法可读性差且难以优化。
CAN Fix - Always
可优化场景 - 始终可以优化
Convert to explicit JOIN syntax.
FROM a, b, c WHERE a.id = b.id AND b.id = c.idThis is always safe - just restructuring, no semantic change.
将转换为显式JOIN语法。
FROM a, b, c WHERE a.id = b.id AND b.id = c.id这种优化始终安全——只是重构写法,不会改变语义。
UNSAFE Optimizations (NEVER apply)
不安全的优化操作(绝对禁止)
- UNION to UNION ALL: UNION deduplicates rows, UNION ALL does not - different results
- Changing window functions: Do not modify or similar nested aggregates
SUM(SUM(x)) OVER(...) - Adding redundant filters: Do not add filters in JOIN ON if same filter exists in WHERE
- Changing column names: Copy column names EXACTLY from original - do not "simplify" or rename
- Changing column aliases: Keep all aliases exactly as original
- Adding early filtering in JOINs: If a filter is in WHERE, do not duplicate it in JOIN ON clause
- 将UNION改为UNION ALL:UNION会去重行,UNION ALL不会——结果不同
- 修改窗口函数:请勿修改或类似的嵌套聚合函数
SUM(SUM(x)) OVER(...) - 添加冗余过滤条件:如果WHERE子句中已有相同过滤条件,请勿在JOIN ON中重复添加
- 修改列名:完全复制原查询的列名——不要“简化”或重命名
- 修改列别名:严格保留所有原别名
- 在JOIN中提前过滤:如果过滤条件在WHERE中,请勿在JOIN ON子句中重复
Principles
原则
- Minimal changes: Make the fewest changes necessary. Simpler optimizations are more reliable.
- Preserve structure: Keep subqueries, CTEs, and overall query structure unless there's a clear benefit.
- When in doubt, don't: If unsure whether a change preserves semantics, skip it.
- Copy exactly: Column names, table aliases, and expressions should be copied character-for-character.
- 最小修改:仅进行必要的最少修改。更简单的优化更可靠。
- 保留结构:除非有明确的好处,否则保留子查询、CTE和整体查询结构。
- 存疑则不做:如果不确定修改是否会保留语义,请跳过该优化。
- 完全复制:列名、表别名和表达式应逐字符复制。
Priority Order
优先级顺序
- Date/time functions on filter columns - Highest impact
- Implicit joins to explicit JOIN - Always safe, improves readability
- NOT IN to NOT EXISTS - Only if NULL-safe
- 过滤列上的日期/时间函数 - 影响最大
- 隐式连接转显式JOIN - 始终安全,提升可读性
- NOT IN转NOT EXISTS - 仅在NULL安全的情况下进行
Requirements
要求
- Results must be identical: Same rows, same columns, same order
- Valid Snowflake SQL: Output must execute without errors in Snowflake
- 结果必须完全一致:相同的行、相同的列、相同的顺序
- 有效的Snowflake SQL:输出必须能在Snowflake中无错误执行