db-query-review

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Database Query Review

数据库查询审查

SQLAlchemy 2.x async on PostgreSQL is fast and powerful — and easy to misuse. The bugs surface at scale, not in tests. This skill is the pre-merge checklist.

SQLAlchemy 2.x async on PostgreSQL 快速且功能强大——但也容易被误用。这些问题不会在测试中暴露，只会在大规模场景下显现。本技能是合并前的检查清单。

The Checklist

检查清单

For every new or changed query, verify ALL of:

对于每一个新增或修改的查询，需验证以下所有项：

1. Tenant filter present

1. 存在租户过滤条件

Any query against a tenant-scoped table (anything with a

tenant_id

column) MUST include

WHERE tenant_id = :tenant_id

python

undefined

任何针对租户范围表（包含

tenant_id

列的表）的查询必须包含

WHERE tenant_id = :tenant_id

。

python

undefined

❌ Bad

❌ 错误示例

await db.execute(select(Event).where(Event.id == event_id))

✅ Good

✅ 正确示例

await db.execute(select(Event).where(Event.id == event_id, Event.tenant_id == tenant_id))


Defer deep checks to the `tenancy-auditor` agent.

await db.execute(select(Event).where(Event.id == event_id, Event.tenant_id == tenant_id))


将深度检查委托给`tenancy-auditor` Agent。

2. Async session pattern correct

2. 异步会话模式正确

python

undefined

python

undefined

❌ Bad — sync session in async function

❌ 错误示例 —— 异步函数中使用同步会话

def get_user(db: Session, id: int): ...

❌ Bad — session held across network call

❌ 错误示例 —— 会话在网络调用期间保持开启

async with async_session() as db: user = await db.get(User, id) result = await external_api.fetch(user.email) # session still open! user.last_synced = result.at await db.commit()

async with async_session() as db: user = await db.get(User, id) result = await external_api.fetch(user.email) # 会话仍处于开启状态！ user.last_synced = result.at await db.commit()

✅ Good — close session before network, open new one for write

✅ 正确示例 —— 网络调用前关闭会话，开启新会话执行写入操作

async with async_session() as db: user = await db.get(User, id) result = await external_api.fetch(user.email) async with async_session() as db: user = await db.get(User, id) user.last_synced = result.at await db.commit()

undefined

undefined

3. No N+1 queries

3. 无N+1查询

python

undefined

python

undefined

❌ Bad

❌ 错误示例

segments = (await db.scalars(select(Segment))).all() for s in segments: members = (await db.scalars(select(Member).where(Member.segment_id == s.id))).all()

✅ Good — eager load

✅ 正确示例 —— 预加载

stmt = select(Segment).options(selectinload(Segment.members)) segments = (await db.scalars(stmt)).all()


Watch for: `for x in result: query(x.related)`. Use `selectinload`, `joinedload`, or a single JOIN.

stmt = select(Segment).options(selectinload(Segment.members)) segments = (await db.scalars(stmt)).all()


注意此类模式：`for x in result: query(x.related)`。使用`selectinload`、`joinedload`或单次JOIN操作。

4. Index supports the WHERE / ORDER BY

4. 索引支持WHERE / ORDER BY子句

WHERE on a non-indexed column at scale = sequential scan = production fire.
Confirm there is an index on every column used in
```
WHERE
```
,
```
JOIN
```
,
```
ORDER BY
```
.
Composite indexes for multi-column filters: index
```
(tenant_id, status, created_at)
```
not three separate indexes.

Check existing indexes:

bash

grep -rn "Index\|index=True\|__table_args__" backend/app/models/<file>.py

在非索引列上执行大规模WHERE查询 = 顺序扫描 = 生产事故。
确认每个用于
```
WHERE
```
、
```
JOIN
```
、
```
ORDER BY
```
的列都有对应的索引。
多列过滤使用复合索引：创建
```
(tenant_id, status, created_at)
```
复合索引，而非三个单独的索引。

检查现有索引：

bash

grep -rn "Index\|index=True\|__table_args__" backend/app/models/<file>.py

5. LIMIT on potentially-large result sets

5. 对潜在大数据集设置LIMIT

Anything that could return >1000 rows must have

.limit(...)

or pagination.

List endpoints: always paginate (
```
limit
```
,
```
offset
```
or cursor).

Bulk reads in tasks: stream with

.execution_options(stream_results=True)

or batch via

LIMIT/OFFSET

任何可能返回超过1000行结果的查询必须设置

.limit(...)

或分页。

列表接口：始终使用分页（
```
limit
```
、
```
offset
```
或游标）。
任务中的批量读取：使用
```
.execution_options(stream_results=True)
```
流式处理，或通过
```
LIMIT/OFFSET
```
分批处理。

6. No SELECT *

6. 避免使用SELECT *

Use explicit columns or load only needed relationships. Reduces wire size and lock contention.

使用明确的列或仅加载所需的关联关系。减少网络传输数据量和锁竞争。

7. UPDATE/DELETE has WHERE

7. UPDATE/DELETE语句包含WHERE子句

python

undefined

python

undefined

❌ Catastrophic

❌ 灾难性错误

await db.execute(delete(User))

✅ Good

✅ 正确示例

await db.execute(delete(User).where(User.id == user_id, User.tenant_id == tenant_id))


Sounds obvious. Has happened.

await db.execute(delete(User).where(User.id == user_id, User.tenant_id == tenant_id))


这听起来显而易见，但确实发生过此类事故。

8. Bulk operations use bulk APIs

8. 批量操作使用批量API

Looping

db.add(...)

then

commit()

for 10,000 rows = slow and may exceed connection timeout. Use

db.execute(insert(Model).values([...]))

bulk_insert_mappings

循环执行

db.add(...)

然后

commit()

处理10000行数据 = 速度缓慢且可能超出连接超时时间。使用

db.execute(insert(Model).values([...]))

或

bulk_insert_mappings

。

9. EXPLAIN ANALYZE before merge (for new hot-path queries)

9. 合并前执行EXPLAIN ANALYZE（针对新的热点路径查询）

bash

undefined

bash

undefined

Inside psql connected to dev DB

在连接到开发数据库的psql中执行

EXPLAIN (ANALYZE, BUFFERS) <your-query>;


- Look for `Seq Scan` on large tables → missing index.
- `Buffers: shared read=N` where N is large → not in cache, will be slow on cold start.
- `Rows Removed by Filter` >> `Rows` → query is over-fetching then filtering in memory.

EXPLAIN (ANALYZE, BUFFERS) <your-query>;


- 注意大型表上的`Seq Scan` → 缺失索引。
- `Buffers: shared read=N`中N值较大 → 数据不在缓存中，冷启动时会很慢。
- `Rows Removed by Filter`远大于`Rows` → 查询过度获取数据后在内存中过滤。

10. Migrations safe (delegate)

10. 迁移操作安全（委托处理）

If the change adds an index or alters a table, run the

migration-auditor

agent before merging.

如果变更涉及添加索引或修改表结构，在合并前运行

migration-auditor

Agent。

Quick scan commands

快速扫描命令

bash

undefined

bash

undefined

Find all queries in the diff

查找差异中的所有查询

Find loops with queries inside (N+1 candidates)

查找包含内部查询的循环（N+1问题候选）

git diff main...HEAD -- backend/app/ | grep -B 2 -A 5 -E '^+[[:space:]]*for ' | grep -E 'await.*execute|await.*scalars'

Find sessions held across awaits (async-session misuse)

查找跨await语句持有会话的情况（异步会话误用）

grep -rn "async with .*session" backend/app/ | head

undefined

grep -rn "async with .*session" backend/app/ | head

undefined

Red flags worth a second look

需要重点关注的危险模式

Pattern	Why it's risky
`for x in items: await db.execute(...)`	Almost always N+1
`.all()` without `.limit()`	Unbounded result set
`func.lower(col) == ...` in WHERE	Index can't be used unless there's a functional index
`OR` across two columns with separate indexes	Postgres often can't use either
`created_at::date = ...`	Same — kills index usage on `created_at`
`LIKE '%foo%'`	Sequential scan unless you have a trigram index
Subquery in SELECT clause	Often an N+1 in disguise

模式	风险原因
`for x in items: await db.execute(...)`	几乎肯定是N+1问题
`.all()` 未搭配 `.limit()` 使用	无限制的结果集
`WHERE` 子句中使用 `func.lower(col) == ...`	除非存在函数索引，否则无法使用现有索引
两个列使用单独索引且通过 `OR` 连接	PostgreSQL通常无法使用任何一个索引
`created_at::date = ...`	同样会导致 `created_at` 上的索引失效
`LIKE '%foo%'`	除非有 trigram 索引，否则会触发顺序扫描
SELECT子句中包含子查询	通常是伪装的N+1问题

When to escalate

何时升级处理

Query touches a table with >1M rows and you're not sure about lock impact →
```
migration-auditor
```
.
Query crosses tenants intentionally (admin/aggregation) →
```
tenancy-auditor
```
to confirm safe.
Adding an index →
```
migration-auditor
```
(must be
```
CONCURRENTLY
```
).

查询涉及超过100万行数据的表，且不确定锁的影响 → 交给
```
migration-auditor
```
处理。
查询有意跨租户（管理员/聚合场景）→ 交给
```
tenancy-auditor
```
确认安全性。
添加索引 → 交给
```
migration-auditor
```
处理（必须使用
```
CONCURRENTLY
```
）。