Optimize Query from SQL Text
OUTPUT FORMAT
Return ONLY the optimized SQL query. No markdown formatting, no explanations, no bullet points - just pure SQL that can be executed directly in Snowflake.
CRITICAL: Semantic Preservation Rules
The optimized query MUST return IDENTICAL results to the original.
Before returning ANY optimization, verify:
- Same columns: Exact same columns in exact same order with exact same aliases
- Same rows: Filter conditions must be semantically equivalent
- Same ordering: Preserve exactly as written
- Same limits: If original has , keep . If no LIMIT, do NOT add one.
If you cannot guarantee identical results, return the original query unchanged.
Pattern 1: Function on Filter Column
Problem: Functions on columns in WHERE clause prevent partition pruning and index usage.
CAN Fix
| Original | Optimized | Why Safe |
|---|
WHERE DATE(ts) = '2024-01-01'
| WHERE ts >= '2024-01-01' AND ts < '2024-01-02'
| Equivalent range |
| WHERE dt >= '2024-01-01' AND dt < '2025-01-01'
| Equivalent range |
WHERE MONTH(dt) = 3 AND YEAR(dt) = 2024
| WHERE dt >= '2024-03-01' AND dt < '2024-04-01'
| Equivalent range |
WHERE DATE(ts) >= '2024-01-01' AND DATE(ts) < '2024-02-01'
| WHERE ts >= '2024-01-01' AND ts < '2024-02-01'
| Same boundaries |
WHERE YEAR(dt) BETWEEN 1995 AND 1996
| WHERE dt >= '1995-01-01' AND dt < '1997-01-01'
| Equivalent range |
CANNOT Fix
| Pattern | Why Not |
|---|
WHERE YEAR(dt) IN (SELECT year FROM ...)
| Dynamic values, cannot precompute range |
WHERE DATE(ts) = DATE(other_col)
| Comparing two columns, both need function |
WHERE EXTRACT(DOW FROM dt) = 1
| Day-of-week has no contiguous range |
WHERE DATE_TRUNC('month', dt) = '2024-01-01'
in GROUP BY | Needed for grouping logic |
SELECT YEAR(dt) AS yr ... GROUP BY YEAR(dt)
| Function in SELECT/GROUP BY is fine, only filter matters |
Pattern 2: Function on JOIN Column
Problem: Functions on JOIN columns prevent hash joins, forcing slower nested loop joins.
CAN Fix
| Original | Optimized | Why Safe |
|---|
ON CAST(a.id AS VARCHAR) = CAST(b.id AS VARCHAR)
| | If both are same type (e.g., INTEGER) |
ON UPPER(a.code) = UPPER(b.code)
| | If data is already consistently cased |
ON TRIM(a.name) = TRIM(b.name)
| | If data has no leading/trailing spaces |
CANNOT Fix
| Pattern | Why Not |
|---|
ON CAST(a.id AS VARCHAR) = b.string_id
| Types genuinely differ, CAST required |
ON DATE(a.timestamp) = b.date_col
| Different granularity, DATE() required |
ON UPPER(a.code) = b.code
| If b.code might have different case |
| Arithmetic transformation, cannot remove |
Pattern 3: NOT IN Subquery
Problem:
has poor performance and unexpected NULL behavior.
CAN Fix
| Original | Optimized | Why Safe |
|---|
WHERE id NOT IN (SELECT id FROM t WHERE ...)
| WHERE NOT EXISTS (SELECT 1 FROM t WHERE t.id = main.id AND ...)
| Equivalent when subquery column is NOT NULL |
WHERE id NOT IN (SELECT id FROM t)
where id has NOT NULL constraint | WHERE NOT EXISTS (SELECT 1 FROM t WHERE t.id = main.id)
| NOT NULL guarantees equivalence |
CANNOT Fix
| Pattern | Why Not |
|---|
WHERE id NOT IN (SELECT nullable_col FROM t)
| If subquery returns NULL, NOT IN returns no rows; NOT EXISTS doesn't |
WHERE (a, b) NOT IN (SELECT x, y FROM t)
| Multi-column NOT IN has complex NULL semantics |
Key Rule: Only convert NOT IN to NOT EXISTS if you can verify the subquery column cannot be NULL.
Pattern 4: Repeated Subquery
Problem: Same subquery executed multiple times causes redundant scans.
CAN Fix
| Original | Optimized |
|---|
| Subquery appears 2+ times identically | Extract to CTE, reference CTE multiple times |
| Same aggregation used in multiple places | Compute once in CTE |
CANNOT Fix
| Pattern | Why Not |
|---|
| Correlated subquery (references outer table) | Each execution is different, cannot cache |
| Subqueries with different filters | Not actually the same subquery |
| Subquery in SELECT that depends on current row | Correlation prevents extraction |
Pattern 5: Implicit Comma Joins
Problem: Comma-separated tables in FROM clause are harder to read and optimize.
CAN Fix - Always
Convert
FROM a, b, c WHERE a.id = b.id AND b.id = c.id
to explicit JOIN syntax.
This is always safe - just restructuring, no semantic change.
UNSAFE Optimizations (NEVER apply)
- UNION to UNION ALL: UNION deduplicates rows, UNION ALL does not - different results
- Changing window functions: Do not modify or similar nested aggregates
- Adding redundant filters: Do not add filters in JOIN ON if same filter exists in WHERE
- Changing column names: Copy column names EXACTLY from original - do not "simplify" or rename
- Changing column aliases: Keep all aliases exactly as original
- Adding early filtering in JOINs: If a filter is in WHERE, do not duplicate it in JOIN ON clause
Principles
- Minimal changes: Make the fewest changes necessary. Simpler optimizations are more reliable.
- Preserve structure: Keep subqueries, CTEs, and overall query structure unless there's a clear benefit.
- When in doubt, don't: If unsure whether a change preserves semantics, skip it.
- Copy exactly: Column names, table aliases, and expressions should be copied character-for-character.
Priority Order
- Date/time functions on filter columns - Highest impact
- Implicit joins to explicit JOIN - Always safe, improves readability
- NOT IN to NOT EXISTS - Only if NULL-safe
Requirements
- Results must be identical: Same rows, same columns, same order
- Valid Snowflake SQL: Output must execute without errors in Snowflake