db-query-review

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Database Query Review

数据库查询审查

SQLAlchemy 2.x async on PostgreSQL is fast and powerful — and easy to misuse. The bugs surface at scale, not in tests. This skill is the pre-merge checklist.
SQLAlchemy 2.x async on PostgreSQL 快速且功能强大——但也容易被误用。这些问题不会在测试中暴露,只会在大规模场景下显现。本技能是合并前的检查清单。

The Checklist

检查清单

For every new or changed query, verify ALL of:
对于每一个新增或修改的查询,需验证以下所有项:

1. Tenant filter present

1. 存在租户过滤条件

Any query against a tenant-scoped table (anything with a
tenant_id
column) MUST include
WHERE tenant_id = :tenant_id
.
python
undefined
任何针对租户范围表(包含
tenant_id
列的表)的查询必须包含
WHERE tenant_id = :tenant_id
python
undefined

❌ Bad

❌ 错误示例

await db.execute(select(Event).where(Event.id == event_id))
await db.execute(select(Event).where(Event.id == event_id))

✅ Good

✅ 正确示例

await db.execute(select(Event).where(Event.id == event_id, Event.tenant_id == tenant_id))

Defer deep checks to the `tenancy-auditor` agent.
await db.execute(select(Event).where(Event.id == event_id, Event.tenant_id == tenant_id))

将深度检查委托给`tenancy-auditor` Agent。

2. Async session pattern correct

2. 异步会话模式正确

python
undefined
python
undefined

❌ Bad — sync session in async function

❌ 错误示例 —— 异步函数中使用同步会话

def get_user(db: Session, id: int): ...
def get_user(db: Session, id: int): ...

❌ Bad — session held across network call

❌ 错误示例 —— 会话在网络调用期间保持开启

async with async_session() as db: user = await db.get(User, id) result = await external_api.fetch(user.email) # session still open! user.last_synced = result.at await db.commit()
async with async_session() as db: user = await db.get(User, id) result = await external_api.fetch(user.email) # 会话仍处于开启状态! user.last_synced = result.at await db.commit()

✅ Good — close session before network, open new one for write

✅ 正确示例 —— 网络调用前关闭会话,开启新会话执行写入操作

async with async_session() as db: user = await db.get(User, id) result = await external_api.fetch(user.email) async with async_session() as db: user = await db.get(User, id) user.last_synced = result.at await db.commit()
undefined
async with async_session() as db: user = await db.get(User, id) result = await external_api.fetch(user.email) async with async_session() as db: user = await db.get(User, id) user.last_synced = result.at await db.commit()
undefined

3. No N+1 queries

3. 无N+1查询

python
undefined
python
undefined

❌ Bad

❌ 错误示例

segments = (await db.scalars(select(Segment))).all() for s in segments: members = (await db.scalars(select(Member).where(Member.segment_id == s.id))).all()
segments = (await db.scalars(select(Segment))).all() for s in segments: members = (await db.scalars(select(Member).where(Member.segment_id == s.id))).all()

✅ Good — eager load

✅ 正确示例 —— 预加载

stmt = select(Segment).options(selectinload(Segment.members)) segments = (await db.scalars(stmt)).all()

Watch for: `for x in result: query(x.related)`. Use `selectinload`, `joinedload`, or a single JOIN.
stmt = select(Segment).options(selectinload(Segment.members)) segments = (await db.scalars(stmt)).all()

注意此类模式:`for x in result: query(x.related)`。使用`selectinload`、`joinedload`或单次JOIN操作。

4. Index supports the WHERE / ORDER BY

4. 索引支持WHERE / ORDER BY子句

  • WHERE on a non-indexed column at scale = sequential scan = production fire.
  • Confirm there is an index on every column used in
    WHERE
    ,
    JOIN
    ,
    ORDER BY
    .
  • Composite indexes for multi-column filters: index
    (tenant_id, status, created_at)
    not three separate indexes.
Check existing indexes:
bash
grep -rn "Index\|index=True\|__table_args__" backend/app/models/<file>.py
  • 在非索引列上执行大规模WHERE查询 = 顺序扫描 = 生产事故。
  • 确认每个用于
    WHERE
    JOIN
    ORDER BY
    的列都有对应的索引。
  • 多列过滤使用复合索引:创建
    (tenant_id, status, created_at)
    复合索引,而非三个单独的索引。
检查现有索引:
bash
grep -rn "Index\|index=True\|__table_args__" backend/app/models/<file>.py

5. LIMIT on potentially-large result sets

5. 对潜在大数据集设置LIMIT

Anything that could return >1000 rows must have
.limit(...)
or pagination.
  • List endpoints: always paginate (
    limit
    ,
    offset
    or cursor).
  • Bulk reads in tasks: stream with
    .execution_options(stream_results=True)
    or batch via
    LIMIT/OFFSET
    .
任何可能返回超过1000行结果的查询必须设置
.limit(...)
或分页。
  • 列表接口:始终使用分页(
    limit
    offset
    或游标)。
  • 任务中的批量读取:使用
    .execution_options(stream_results=True)
    流式处理,或通过
    LIMIT/OFFSET
    分批处理。

6. No SELECT *

6. 避免使用SELECT *

Use explicit columns or load only needed relationships. Reduces wire size and lock contention.
使用明确的列或仅加载所需的关联关系。减少网络传输数据量和锁竞争。

7. UPDATE/DELETE has WHERE

7. UPDATE/DELETE语句包含WHERE子句

python
undefined
python
undefined

❌ Catastrophic

❌ 灾难性错误

await db.execute(delete(User))
await db.execute(delete(User))

✅ Good

✅ 正确示例

await db.execute(delete(User).where(User.id == user_id, User.tenant_id == tenant_id))

Sounds obvious. Has happened.
await db.execute(delete(User).where(User.id == user_id, User.tenant_id == tenant_id))

这听起来显而易见,但确实发生过此类事故。

8. Bulk operations use bulk APIs

8. 批量操作使用批量API

Looping
db.add(...)
then
commit()
for 10,000 rows = slow and may exceed connection timeout. Use
db.execute(insert(Model).values([...]))
or
bulk_insert_mappings
.
循环执行
db.add(...)
然后
commit()
处理10000行数据 = 速度缓慢且可能超出连接超时时间。使用
db.execute(insert(Model).values([...]))
bulk_insert_mappings

9. EXPLAIN ANALYZE before merge (for new hot-path queries)

9. 合并前执行EXPLAIN ANALYZE(针对新的热点路径查询)

bash
undefined
bash
undefined

Inside psql connected to dev DB

在连接到开发数据库的psql中执行

EXPLAIN (ANALYZE, BUFFERS) <your-query>;

- Look for `Seq Scan` on large tables → missing index.
- `Buffers: shared read=N` where N is large → not in cache, will be slow on cold start.
- `Rows Removed by Filter` >> `Rows` → query is over-fetching then filtering in memory.
EXPLAIN (ANALYZE, BUFFERS) <your-query>;

- 注意大型表上的`Seq Scan` → 缺失索引。
- `Buffers: shared read=N`中N值较大 → 数据不在缓存中,冷启动时会很慢。
- `Rows Removed by Filter`远大于`Rows` → 查询过度获取数据后在内存中过滤。

10. Migrations safe (delegate)

10. 迁移操作安全(委托处理)

If the change adds an index or alters a table, run the
migration-auditor
agent before merging.
如果变更涉及添加索引或修改表结构,在合并前运行
migration-auditor
Agent。

Quick scan commands

快速扫描命令

bash
undefined
bash
undefined

Find all queries in the diff

查找差异中的所有查询

git diff main...HEAD -- backend/app/ | grep -E '^+' | grep -iE 'select(|.scalars(|.execute(|insert(|update(|delete('
git diff main...HEAD -- backend/app/ | grep -E '^+' | grep -iE 'select(|.scalars(|.execute(|insert(|update(|delete('

Find loops with queries inside (N+1 candidates)

查找包含内部查询的循环(N+1问题候选)

git diff main...HEAD -- backend/app/ | grep -B 2 -A 5 -E '^+[[:space:]]*for ' | grep -E 'await.*execute|await.*scalars'
git diff main...HEAD -- backend/app/ | grep -B 2 -A 5 -E '^+[[:space:]]*for ' | grep -E 'await.*execute|await.*scalars'

Find sessions held across awaits (async-session misuse)

查找跨await语句持有会话的情况(异步会话误用)

grep -rn "async with .*session" backend/app/ | head
undefined
grep -rn "async with .*session" backend/app/ | head
undefined

Red flags worth a second look

需要重点关注的危险模式

PatternWhy it's risky
for x in items: await db.execute(...)
Almost always N+1
.all()
without
.limit()
Unbounded result set
func.lower(col) == ...
in WHERE
Index can't be used unless there's a functional index
OR
across two columns with separate indexes
Postgres often can't use either
created_at::date = ...
Same — kills index usage on
created_at
LIKE '%foo%'
Sequential scan unless you have a trigram index
Subquery in SELECT clauseOften an N+1 in disguise
模式风险原因
for x in items: await db.execute(...)
几乎肯定是N+1问题
.all()
未搭配
.limit()
使用
无限制的结果集
WHERE
子句中使用
func.lower(col) == ...
除非存在函数索引,否则无法使用现有索引
两个列使用单独索引且通过
OR
连接
PostgreSQL通常无法使用任何一个索引
created_at::date = ...
同样会导致
created_at
上的索引失效
LIKE '%foo%'
除非有 trigram 索引,否则会触发顺序扫描
SELECT子句中包含子查询通常是伪装的N+1问题

When to escalate

何时升级处理

  • Query touches a table with >1M rows and you're not sure about lock impact →
    migration-auditor
    .
  • Query crosses tenants intentionally (admin/aggregation) →
    tenancy-auditor
    to confirm safe.
  • Adding an index →
    migration-auditor
    (must be
    CONCURRENTLY
    ).
  • 查询涉及超过100万行数据的表,且不确定锁的影响 → 交给
    migration-auditor
    处理。
  • 查询有意跨租户(管理员/聚合场景)→ 交给
    tenancy-auditor
    确认安全性。
  • 添加索引 → 交给
    migration-auditor
    处理(必须使用
    CONCURRENTLY
    )。