db-query-review
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDatabase Query Review
数据库查询审查
SQLAlchemy 2.x async on PostgreSQL is fast and powerful — and easy to misuse. The bugs surface at scale, not in tests. This skill is the pre-merge checklist.
SQLAlchemy 2.x async on PostgreSQL 快速且功能强大——但也容易被误用。这些问题不会在测试中暴露,只会在大规模场景下显现。本技能是合并前的检查清单。
The Checklist
检查清单
For every new or changed query, verify ALL of:
对于每一个新增或修改的查询,需验证以下所有项:
1. Tenant filter present
1. 存在租户过滤条件
Any query against a tenant-scoped table (anything with a column) MUST include .
tenant_idWHERE tenant_id = :tenant_idpython
undefined任何针对租户范围表(包含列的表)的查询必须包含。
tenant_idWHERE tenant_id = :tenant_idpython
undefined❌ Bad
❌ 错误示例
await db.execute(select(Event).where(Event.id == event_id))
await db.execute(select(Event).where(Event.id == event_id))
✅ Good
✅ 正确示例
await db.execute(select(Event).where(Event.id == event_id, Event.tenant_id == tenant_id))
Defer deep checks to the `tenancy-auditor` agent.await db.execute(select(Event).where(Event.id == event_id, Event.tenant_id == tenant_id))
将深度检查委托给`tenancy-auditor` Agent。2. Async session pattern correct
2. 异步会话模式正确
python
undefinedpython
undefined❌ Bad — sync session in async function
❌ 错误示例 —— 异步函数中使用同步会话
def get_user(db: Session, id: int): ...
def get_user(db: Session, id: int): ...
❌ Bad — session held across network call
❌ 错误示例 —— 会话在网络调用期间保持开启
async with async_session() as db:
user = await db.get(User, id)
result = await external_api.fetch(user.email) # session still open!
user.last_synced = result.at
await db.commit()
async with async_session() as db:
user = await db.get(User, id)
result = await external_api.fetch(user.email) # 会话仍处于开启状态!
user.last_synced = result.at
await db.commit()
✅ Good — close session before network, open new one for write
✅ 正确示例 —— 网络调用前关闭会话,开启新会话执行写入操作
async with async_session() as db:
user = await db.get(User, id)
result = await external_api.fetch(user.email)
async with async_session() as db:
user = await db.get(User, id)
user.last_synced = result.at
await db.commit()
undefinedasync with async_session() as db:
user = await db.get(User, id)
result = await external_api.fetch(user.email)
async with async_session() as db:
user = await db.get(User, id)
user.last_synced = result.at
await db.commit()
undefined3. No N+1 queries
3. 无N+1查询
python
undefinedpython
undefined❌ Bad
❌ 错误示例
segments = (await db.scalars(select(Segment))).all()
for s in segments:
members = (await db.scalars(select(Member).where(Member.segment_id == s.id))).all()
segments = (await db.scalars(select(Segment))).all()
for s in segments:
members = (await db.scalars(select(Member).where(Member.segment_id == s.id))).all()
✅ Good — eager load
✅ 正确示例 —— 预加载
stmt = select(Segment).options(selectinload(Segment.members))
segments = (await db.scalars(stmt)).all()
Watch for: `for x in result: query(x.related)`. Use `selectinload`, `joinedload`, or a single JOIN.stmt = select(Segment).options(selectinload(Segment.members))
segments = (await db.scalars(stmt)).all()
注意此类模式:`for x in result: query(x.related)`。使用`selectinload`、`joinedload`或单次JOIN操作。4. Index supports the WHERE / ORDER BY
4. 索引支持WHERE / ORDER BY子句
- WHERE on a non-indexed column at scale = sequential scan = production fire.
- Confirm there is an index on every column used in ,
WHERE,JOIN.ORDER BY - Composite indexes for multi-column filters: index not three separate indexes.
(tenant_id, status, created_at)
Check existing indexes:
bash
grep -rn "Index\|index=True\|__table_args__" backend/app/models/<file>.py- 在非索引列上执行大规模WHERE查询 = 顺序扫描 = 生产事故。
- 确认每个用于、
WHERE、JOIN的列都有对应的索引。ORDER BY - 多列过滤使用复合索引:创建复合索引,而非三个单独的索引。
(tenant_id, status, created_at)
检查现有索引:
bash
grep -rn "Index\|index=True\|__table_args__" backend/app/models/<file>.py5. LIMIT on potentially-large result sets
5. 对潜在大数据集设置LIMIT
Anything that could return >1000 rows must have or pagination.
.limit(...)- List endpoints: always paginate (,
limitor cursor).offset - Bulk reads in tasks: stream with or batch via
.execution_options(stream_results=True).LIMIT/OFFSET
任何可能返回超过1000行结果的查询必须设置或分页。
.limit(...)- 列表接口:始终使用分页(、
limit或游标)。offset - 任务中的批量读取:使用流式处理,或通过
.execution_options(stream_results=True)分批处理。LIMIT/OFFSET
6. No SELECT *
6. 避免使用SELECT *
Use explicit columns or load only needed relationships. Reduces wire size and lock contention.
使用明确的列或仅加载所需的关联关系。减少网络传输数据量和锁竞争。
7. UPDATE/DELETE has WHERE
7. UPDATE/DELETE语句包含WHERE子句
python
undefinedpython
undefined❌ Catastrophic
❌ 灾难性错误
await db.execute(delete(User))
await db.execute(delete(User))
✅ Good
✅ 正确示例
await db.execute(delete(User).where(User.id == user_id, User.tenant_id == tenant_id))
Sounds obvious. Has happened.await db.execute(delete(User).where(User.id == user_id, User.tenant_id == tenant_id))
这听起来显而易见,但确实发生过此类事故。8. Bulk operations use bulk APIs
8. 批量操作使用批量API
Looping then for 10,000 rows = slow and may exceed connection timeout. Use or .
db.add(...)commit()db.execute(insert(Model).values([...]))bulk_insert_mappings循环执行然后处理10000行数据 = 速度缓慢且可能超出连接超时时间。使用或。
db.add(...)commit()db.execute(insert(Model).values([...]))bulk_insert_mappings9. EXPLAIN ANALYZE before merge (for new hot-path queries)
9. 合并前执行EXPLAIN ANALYZE(针对新的热点路径查询)
bash
undefinedbash
undefinedInside psql connected to dev DB
在连接到开发数据库的psql中执行
EXPLAIN (ANALYZE, BUFFERS) <your-query>;
- Look for `Seq Scan` on large tables → missing index.
- `Buffers: shared read=N` where N is large → not in cache, will be slow on cold start.
- `Rows Removed by Filter` >> `Rows` → query is over-fetching then filtering in memory.EXPLAIN (ANALYZE, BUFFERS) <your-query>;
- 注意大型表上的`Seq Scan` → 缺失索引。
- `Buffers: shared read=N`中N值较大 → 数据不在缓存中,冷启动时会很慢。
- `Rows Removed by Filter`远大于`Rows` → 查询过度获取数据后在内存中过滤。10. Migrations safe (delegate)
10. 迁移操作安全(委托处理)
If the change adds an index or alters a table, run the agent before merging.
migration-auditor如果变更涉及添加索引或修改表结构,在合并前运行 Agent。
migration-auditorQuick scan commands
快速扫描命令
bash
undefinedbash
undefinedFind all queries in the diff
查找差异中的所有查询
git diff main...HEAD -- backend/app/ | grep -E '^+' | grep -iE 'select(|.scalars(|.execute(|insert(|update(|delete('
git diff main...HEAD -- backend/app/ | grep -E '^+' | grep -iE 'select(|.scalars(|.execute(|insert(|update(|delete('
Find loops with queries inside (N+1 candidates)
查找包含内部查询的循环(N+1问题候选)
git diff main...HEAD -- backend/app/ | grep -B 2 -A 5 -E '^+[[:space:]]*for ' | grep -E 'await.*execute|await.*scalars'
git diff main...HEAD -- backend/app/ | grep -B 2 -A 5 -E '^+[[:space:]]*for ' | grep -E 'await.*execute|await.*scalars'
Find sessions held across awaits (async-session misuse)
查找跨await语句持有会话的情况(异步会话误用)
grep -rn "async with .*session" backend/app/ | head
undefinedgrep -rn "async with .*session" backend/app/ | head
undefinedRed flags worth a second look
需要重点关注的危险模式
| Pattern | Why it's risky |
|---|---|
| Almost always N+1 |
| Unbounded result set |
| Index can't be used unless there's a functional index |
| Postgres often can't use either |
| Same — kills index usage on |
| Sequential scan unless you have a trigram index |
| Subquery in SELECT clause | Often an N+1 in disguise |
| 模式 | 风险原因 |
|---|---|
| 几乎肯定是N+1问题 |
| 无限制的结果集 |
| 除非存在函数索引,否则无法使用现有索引 |
两个列使用单独索引且通过 | PostgreSQL通常无法使用任何一个索引 |
| 同样会导致 |
| 除非有 trigram 索引,否则会触发顺序扫描 |
| SELECT子句中包含子查询 | 通常是伪装的N+1问题 |
When to escalate
何时升级处理
- Query touches a table with >1M rows and you're not sure about lock impact → .
migration-auditor - Query crosses tenants intentionally (admin/aggregation) → to confirm safe.
tenancy-auditor - Adding an index → (must be
migration-auditor).CONCURRENTLY
- 查询涉及超过100万行数据的表,且不确定锁的影响 → 交给处理。
migration-auditor - 查询有意跨租户(管理员/聚合场景)→ 交给确认安全性。
tenancy-auditor - 添加索引 → 交给处理(必须使用
migration-auditor)。CONCURRENTLY