data-sql

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Data SQL Skill

Version: 1.0 Stack: SQL (PostgreSQL, Redshift, Athena, Spark SQL)

SQL is the only language where a working query can be 1,000x slower than an equivalent working query. A missing index turns a 50ms lookup into a 50-second full table scan.

SELECT *

breaks downstream code when someone adds a column. Correlated subqueries silently execute N+1 patterns that disappear in development and cripple production. The query planner doesn't warn you — it just does what you asked.

SQL written with index awareness, explicit columns, and CTEs is both faster to execute and easier to debug. Clarity and performance reinforce each other.

版本：1.0 技术栈：SQL（PostgreSQL、Redshift、Athena、Spark SQL）

SQL是唯一一种存在“可用查询比等效可用查询慢1000倍”情况的语言。缺失的索引会将50毫秒的查询变成50秒的全表扫描。当有人添加列时，

SELECT *

会破坏下游代码。关联子查询会悄悄执行N+1模式，在开发环境中不明显，但会拖垮生产环境。查询优化器不会警告你——它只会执行你的指令。

具备索引意识、使用显式列和CTE编写的SQL不仅执行速度更快，也更易于调试。清晰性和性能相辅相成。

Scope and Boundaries

范围与边界

This skill covers:

Query optimization patterns
Schema design and normalization
Data modeling (star schema, slowly changing dimensions)
CTEs and query organization
Index strategy
Window functions

Defers to other skills:

```
security
```
: SQL injection prevention, parameterized queries
```
data-pipelines
```
: When to use SQL vs. application code

Use this skill when: Writing SQL queries or designing schemas.

本技能涵盖：

查询优化模式
架构设计与规范化
数据建模（星型架构、缓慢变化维度）
CTE与查询组织
索引策略
窗口函数

交由其他技能处理：

```
security
```
：SQL注入防护、参数化查询
```
data-pipelines
```
：何时使用SQL而非应用代码

使用本技能的场景： 编写SQL查询或设计架构时。

Core Principles

核心原则

CTEs for Readability — Break complex queries into named steps.
Filter Early — Push WHERE clauses as close to source as possible.
Explicit Columns — Never
```
SELECT *
```
in production code.
Index-Aware Queries — Know what's indexed, write queries that use them.
Idempotent Operations — INSERT/UPDATE should be safe to retry.

使用CTE提升可读性——将复杂查询拆分为多个命名步骤。
尽早过滤——将WHERE子句尽可能靠近数据源。
使用显式列——生产代码中绝不使用
```
SELECT *
```
。
索引感知查询——了解已创建的索引，编写能利用它们的查询。
幂等操作——INSERT/UPDATE操作应可安全重试。

Patterns

模式

CTE Organization

CTE组织

sql

-- Good - logical steps, easy to debug
WITH active_users AS (
    SELECT user_id, email
    FROM users
    WHERE status = 'active'
),
recent_orders AS (
    SELECT user_id, COUNT(*) as order_count
    FROM orders
    WHERE created_at > CURRENT_DATE - INTERVAL '30 days'
    GROUP BY user_id
)
SELECT
    u.email,
    COALESCE(o.order_count, 0) as recent_orders
FROM active_users u
LEFT JOIN recent_orders o USING (user_id);

sql

-- Good - logical steps, easy to debug
WITH active_users AS (
    SELECT user_id, email
    FROM users
    WHERE status = 'active'
),
recent_orders AS (
    SELECT user_id, COUNT(*) as order_count
    FROM orders
    WHERE created_at > CURRENT_DATE - INTERVAL '30 days'
    GROUP BY user_id
)
SELECT
    u.email,
    COALESCE(o.order_count, 0) as recent_orders
FROM active_users u
LEFT JOIN recent_orders o USING (user_id);

Upsert Pattern (PostgreSQL)

Upsert模式（PostgreSQL）

sql

INSERT INTO inventory (sku, quantity, updated_at)
VALUES ('ABC123', 100, NOW())
ON CONFLICT (sku) DO UPDATE SET
    quantity = EXCLUDED.quantity,
    updated_at = EXCLUDED.updated_at;

sql

INSERT INTO inventory (sku, quantity, updated_at)
VALUES ('ABC123', 100, NOW())
ON CONFLICT (sku) DO UPDATE SET
    quantity = EXCLUDED.quantity,
    updated_at = EXCLUDED.updated_at;

Window Functions

窗口函数

sql

-- Running total and row number
SELECT
    date,
    revenue,
    SUM(revenue) OVER (ORDER BY date) as running_total,
    ROW_NUMBER() OVER (PARTITION BY category ORDER BY revenue DESC) as rank
FROM daily_sales;

sql

-- Running total and row number
SELECT
    date,
    revenue,
    SUM(revenue) OVER (ORDER BY date) as running_total,
    ROW_NUMBER() OVER (PARTITION BY category ORDER BY revenue DESC) as rank
FROM daily_sales;

Anti-Patterns

反模式

Anti-Pattern	Problem	Fix
`SELECT *`	Breaks on schema changes, wastes bandwidth	List explicit columns
Correlated subqueries	N+1 query behavior	Use JOINs or window functions
`OR` in JOIN conditions	Prevents index use	Restructure or use UNION
Functions on indexed columns	`WHERE YEAR(date) = 2024`	Use range: `date >= '2024-01-01'`
Missing `GROUP BY` columns	Undefined behavior	Include all non-aggregated columns

反模式	问题	修复方案
`SELECT *`	架构变更时会出错，浪费带宽	列出显式列
关联子查询	N+1查询行为	使用JOIN或窗口函数
JOIN条件中的 `OR`	无法使用索引	重构查询或使用UNION
对索引列使用函数	`WHERE YEAR(date) = 2024`	使用范围查询： `date >= '2024-01-01'`
缺失 `GROUP BY` 列	行为未定义	包含所有非聚合列

Checklist

检查清单

No
```
SELECT *
```
in production queries
CTEs used for complex queries
WHERE clauses filter early
JOINs prefer indexed columns
Upserts are idempotent
Large queries tested with EXPLAIN

References

参考资料

```
references/query-optimization.md
```
— Index usage, join optimization, performance
```
references/window-functions.md
```
— Window function patterns and examples

```
references/query-optimization.md
```
—— 索引使用、JOIN优化、性能调优
```
references/window-functions.md
```
—— 窗口函数模式与示例

Assets

资源

```
assets/query-patterns.md
```
— Common SQL patterns and templates

```
assets/query-patterns.md
```
—— 常见SQL模式与模板