data-sql
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseData SQL Skill
Data SQL Skill
Version: 1.0
Stack: SQL (PostgreSQL, Redshift, Athena, Spark SQL)
SQL is the only language where a working query can be 1,000x slower than an equivalent working query. A missing index turns a 50ms lookup into a 50-second full table scan. breaks downstream code when someone adds a column. Correlated subqueries silently execute N+1 patterns that disappear in development and cripple production. The query planner doesn't warn you — it just does what you asked.
SELECT *SQL written with index awareness, explicit columns, and CTEs is both faster to execute and easier to debug. Clarity and performance reinforce each other.
版本:1.0
技术栈:SQL(PostgreSQL、Redshift、Athena、Spark SQL)
SQL是唯一一种存在“可用查询比等效可用查询慢1000倍”情况的语言。缺失的索引会将50毫秒的查询变成50秒的全表扫描。当有人添加列时,会破坏下游代码。关联子查询会悄悄执行N+1模式,在开发环境中不明显,但会拖垮生产环境。查询优化器不会警告你——它只会执行你的指令。
SELECT *具备索引意识、使用显式列和CTE编写的SQL不仅执行速度更快,也更易于调试。清晰性和性能相辅相成。
Scope and Boundaries
范围与边界
This skill covers:
- Query optimization patterns
- Schema design and normalization
- Data modeling (star schema, slowly changing dimensions)
- CTEs and query organization
- Index strategy
- Window functions
Defers to other skills:
- : SQL injection prevention, parameterized queries
security - : When to use SQL vs. application code
data-pipelines
Use this skill when: Writing SQL queries or designing schemas.
本技能涵盖:
- 查询优化模式
- 架构设计与规范化
- 数据建模(星型架构、缓慢变化维度)
- CTE与查询组织
- 索引策略
- 窗口函数
交由其他技能处理:
- :SQL注入防护、参数化查询
security - :何时使用SQL而非应用代码
data-pipelines
使用本技能的场景: 编写SQL查询或设计架构时。
Core Principles
核心原则
- CTEs for Readability — Break complex queries into named steps.
- Filter Early — Push WHERE clauses as close to source as possible.
- Explicit Columns — Never in production code.
SELECT * - Index-Aware Queries — Know what's indexed, write queries that use them.
- Idempotent Operations — INSERT/UPDATE should be safe to retry.
- 使用CTE提升可读性——将复杂查询拆分为多个命名步骤。
- 尽早过滤——将WHERE子句尽可能靠近数据源。
- 使用显式列——生产代码中绝不使用。
SELECT * - 索引感知查询——了解已创建的索引,编写能利用它们的查询。
- 幂等操作——INSERT/UPDATE操作应可安全重试。
Patterns
模式
CTE Organization
CTE组织
sql
-- Good - logical steps, easy to debug
WITH active_users AS (
SELECT user_id, email
FROM users
WHERE status = 'active'
),
recent_orders AS (
SELECT user_id, COUNT(*) as order_count
FROM orders
WHERE created_at > CURRENT_DATE - INTERVAL '30 days'
GROUP BY user_id
)
SELECT
u.email,
COALESCE(o.order_count, 0) as recent_orders
FROM active_users u
LEFT JOIN recent_orders o USING (user_id);sql
-- Good - logical steps, easy to debug
WITH active_users AS (
SELECT user_id, email
FROM users
WHERE status = 'active'
),
recent_orders AS (
SELECT user_id, COUNT(*) as order_count
FROM orders
WHERE created_at > CURRENT_DATE - INTERVAL '30 days'
GROUP BY user_id
)
SELECT
u.email,
COALESCE(o.order_count, 0) as recent_orders
FROM active_users u
LEFT JOIN recent_orders o USING (user_id);Upsert Pattern (PostgreSQL)
Upsert模式(PostgreSQL)
sql
INSERT INTO inventory (sku, quantity, updated_at)
VALUES ('ABC123', 100, NOW())
ON CONFLICT (sku) DO UPDATE SET
quantity = EXCLUDED.quantity,
updated_at = EXCLUDED.updated_at;sql
INSERT INTO inventory (sku, quantity, updated_at)
VALUES ('ABC123', 100, NOW())
ON CONFLICT (sku) DO UPDATE SET
quantity = EXCLUDED.quantity,
updated_at = EXCLUDED.updated_at;Window Functions
窗口函数
sql
-- Running total and row number
SELECT
date,
revenue,
SUM(revenue) OVER (ORDER BY date) as running_total,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY revenue DESC) as rank
FROM daily_sales;sql
-- Running total and row number
SELECT
date,
revenue,
SUM(revenue) OVER (ORDER BY date) as running_total,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY revenue DESC) as rank
FROM daily_sales;Anti-Patterns
反模式
| Anti-Pattern | Problem | Fix |
|---|---|---|
| Breaks on schema changes, wastes bandwidth | List explicit columns |
| Correlated subqueries | N+1 query behavior | Use JOINs or window functions |
| Prevents index use | Restructure or use UNION |
| Functions on indexed columns | | Use range: |
Missing | Undefined behavior | Include all non-aggregated columns |
| 反模式 | 问题 | 修复方案 |
|---|---|---|
| 架构变更时会出错,浪费带宽 | 列出显式列 |
| 关联子查询 | N+1查询行为 | 使用JOIN或窗口函数 |
JOIN条件中的 | 无法使用索引 | 重构查询或使用UNION |
| 对索引列使用函数 | | 使用范围查询: |
缺失 | 行为未定义 | 包含所有非聚合列 |
Checklist
检查清单
- No in production queries
SELECT * - CTEs used for complex queries
- WHERE clauses filter early
- JOINs prefer indexed columns
- Upserts are idempotent
- Large queries tested with EXPLAIN
- 生产查询中无
SELECT * - 复杂查询使用CTE
- WHERE子句尽早过滤
- JOIN优先使用索引列
- Upserts是幂等的
- 大型查询使用EXPLAIN测试
References
参考资料
- — Index usage, join optimization, performance
references/query-optimization.md - — Window function patterns and examples
references/window-functions.md
- —— 索引使用、JOIN优化、性能调优
references/query-optimization.md - —— 窗口函数模式与示例
references/window-functions.md
Assets
资源
- — Common SQL patterns and templates
assets/query-patterns.md
- —— 常见SQL模式与模板
assets/query-patterns.md