data-sql

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Data SQL Skill

Data SQL Skill

Version: 1.0 Stack: SQL (PostgreSQL, Redshift, Athena, Spark SQL)
SQL is the only language where a working query can be 1,000x slower than an equivalent working query. A missing index turns a 50ms lookup into a 50-second full table scan.
SELECT *
breaks downstream code when someone adds a column. Correlated subqueries silently execute N+1 patterns that disappear in development and cripple production. The query planner doesn't warn you — it just does what you asked.
SQL written with index awareness, explicit columns, and CTEs is both faster to execute and easier to debug. Clarity and performance reinforce each other.

版本:1.0 技术栈:SQL(PostgreSQL、Redshift、Athena、Spark SQL)
SQL是唯一一种存在“可用查询比等效可用查询慢1000倍”情况的语言。缺失的索引会将50毫秒的查询变成50秒的全表扫描。当有人添加列时,
SELECT *
会破坏下游代码。关联子查询会悄悄执行N+1模式,在开发环境中不明显,但会拖垮生产环境。查询优化器不会警告你——它只会执行你的指令。
具备索引意识、使用显式列和CTE编写的SQL不仅执行速度更快,也更易于调试。清晰性和性能相辅相成。

Scope and Boundaries

范围与边界

This skill covers:
  • Query optimization patterns
  • Schema design and normalization
  • Data modeling (star schema, slowly changing dimensions)
  • CTEs and query organization
  • Index strategy
  • Window functions
Defers to other skills:
  • security
    : SQL injection prevention, parameterized queries
  • data-pipelines
    : When to use SQL vs. application code
Use this skill when: Writing SQL queries or designing schemas.

本技能涵盖:
  • 查询优化模式
  • 架构设计与规范化
  • 数据建模(星型架构、缓慢变化维度)
  • CTE与查询组织
  • 索引策略
  • 窗口函数
交由其他技能处理:
  • security
    :SQL注入防护、参数化查询
  • data-pipelines
    :何时使用SQL而非应用代码
使用本技能的场景: 编写SQL查询或设计架构时。

Core Principles

核心原则

  1. CTEs for Readability — Break complex queries into named steps.
  2. Filter Early — Push WHERE clauses as close to source as possible.
  3. Explicit Columns — Never
    SELECT *
    in production code.
  4. Index-Aware Queries — Know what's indexed, write queries that use them.
  5. Idempotent Operations — INSERT/UPDATE should be safe to retry.

  1. 使用CTE提升可读性——将复杂查询拆分为多个命名步骤。
  2. 尽早过滤——将WHERE子句尽可能靠近数据源。
  3. 使用显式列——生产代码中绝不使用
    SELECT *
  4. 索引感知查询——了解已创建的索引,编写能利用它们的查询。
  5. 幂等操作——INSERT/UPDATE操作应可安全重试。

Patterns

模式

CTE Organization

CTE组织

sql
-- Good - logical steps, easy to debug
WITH active_users AS (
    SELECT user_id, email
    FROM users
    WHERE status = 'active'
),
recent_orders AS (
    SELECT user_id, COUNT(*) as order_count
    FROM orders
    WHERE created_at > CURRENT_DATE - INTERVAL '30 days'
    GROUP BY user_id
)
SELECT
    u.email,
    COALESCE(o.order_count, 0) as recent_orders
FROM active_users u
LEFT JOIN recent_orders o USING (user_id);
sql
-- Good - logical steps, easy to debug
WITH active_users AS (
    SELECT user_id, email
    FROM users
    WHERE status = 'active'
),
recent_orders AS (
    SELECT user_id, COUNT(*) as order_count
    FROM orders
    WHERE created_at > CURRENT_DATE - INTERVAL '30 days'
    GROUP BY user_id
)
SELECT
    u.email,
    COALESCE(o.order_count, 0) as recent_orders
FROM active_users u
LEFT JOIN recent_orders o USING (user_id);

Upsert Pattern (PostgreSQL)

Upsert模式(PostgreSQL)

sql
INSERT INTO inventory (sku, quantity, updated_at)
VALUES ('ABC123', 100, NOW())
ON CONFLICT (sku) DO UPDATE SET
    quantity = EXCLUDED.quantity,
    updated_at = EXCLUDED.updated_at;
sql
INSERT INTO inventory (sku, quantity, updated_at)
VALUES ('ABC123', 100, NOW())
ON CONFLICT (sku) DO UPDATE SET
    quantity = EXCLUDED.quantity,
    updated_at = EXCLUDED.updated_at;

Window Functions

窗口函数

sql
-- Running total and row number
SELECT
    date,
    revenue,
    SUM(revenue) OVER (ORDER BY date) as running_total,
    ROW_NUMBER() OVER (PARTITION BY category ORDER BY revenue DESC) as rank
FROM daily_sales;

sql
-- Running total and row number
SELECT
    date,
    revenue,
    SUM(revenue) OVER (ORDER BY date) as running_total,
    ROW_NUMBER() OVER (PARTITION BY category ORDER BY revenue DESC) as rank
FROM daily_sales;

Anti-Patterns

反模式

Anti-PatternProblemFix
SELECT *
Breaks on schema changes, wastes bandwidthList explicit columns
Correlated subqueriesN+1 query behaviorUse JOINs or window functions
OR
in JOIN conditions
Prevents index useRestructure or use UNION
Functions on indexed columns
WHERE YEAR(date) = 2024
Use range:
date >= '2024-01-01'
Missing
GROUP BY
columns
Undefined behaviorInclude all non-aggregated columns

反模式问题修复方案
SELECT *
架构变更时会出错,浪费带宽列出显式列
关联子查询N+1查询行为使用JOIN或窗口函数
JOIN条件中的
OR
无法使用索引重构查询或使用UNION
对索引列使用函数
WHERE YEAR(date) = 2024
使用范围查询:
date >= '2024-01-01'
缺失
GROUP BY
行为未定义包含所有非聚合列

Checklist

检查清单

  • No
    SELECT *
    in production queries
  • CTEs used for complex queries
  • WHERE clauses filter early
  • JOINs prefer indexed columns
  • Upserts are idempotent
  • Large queries tested with EXPLAIN

  • 生产查询中无
    SELECT *
  • 复杂查询使用CTE
  • WHERE子句尽早过滤
  • JOIN优先使用索引列
  • Upserts是幂等的
  • 大型查询使用EXPLAIN测试

References

参考资料

  • references/query-optimization.md
    — Index usage, join optimization, performance
  • references/window-functions.md
    — Window function patterns and examples
  • references/query-optimization.md
    —— 索引使用、JOIN优化、性能调优
  • references/window-functions.md
    —— 窗口函数模式与示例

Assets

资源

  • assets/query-patterns.md
    — Common SQL patterns and templates
  • assets/query-patterns.md
    —— 常见SQL模式与模板