ln-651-query-efficiency-auditor

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Query Efficiency Auditor (L3 Worker)

查询效率审计器（L3 工作器）

Specialized worker auditing database query patterns for redundancy, inefficiency, and misuse.

专为审计数据库查询模式中的冗余、低效和误用问题设计的工作器。

Purpose & Scope

目标与范围

Worker in ln-650 coordinator pipeline - invoked by ln-650-persistence-performance-auditor
Audit query efficiency (Priority: HIGH)
Check redundant fetches, batch operation misuse, caching scope problems
Return structured findings with severity, location, effort, recommendations
Calculate compliance score (X/10) for Query Efficiency category

ln-650 协调器流水线中的工作器 - 由ln-650-persistence-performance-auditor调用
审计查询效率（优先级：高）
检查冗余获取、批量操作误用、缓存范围问题
返回包含严重程度、位置、修复工作量和建议的结构化检测结果
计算查询效率类别的合规分数（X/10）

Inputs (from Coordinator)

输入（来自协调器）

MANDATORY READ: Load

shared/references/task_delegation_pattern.md#audit-coordinator--worker-contract

for contextStore structure.

Receives

contextStore

with:

tech_stack

best_practices

db_config

(database type, ORM settings),

codebase_root

Domain-aware: Supports

domain_mode

current_domain

必读： 加载

shared/references/task_delegation_pattern.md#audit-coordinator--worker-contract

以了解contextStore结构。

接收包含以下内容的

contextStore

：

tech_stack

、

best_practices

、

db_config

（数据库类型、ORM设置）、

codebase_root

。

领域感知： 支持

domain_mode

current_domain

。

Workflow

工作流程

Parse context from contextStore
- Extract tech_stack, best_practices, db_config
- Determine scan_path (same logic as ln-624)
Scan codebase for violations
- All Grep/Glob patterns use
```
scan_path
```
- Trace call chains for redundant fetches (requires reading caller + callee)
Collect findings with severity, location, effort, recommendation
Calculate score using penalty algorithm
Return JSON result to coordinator

解析contextStore中的上下文
- 提取tech_stack、best_practices、db_config
- 确定扫描路径（与ln-624相同的逻辑）
扫描代码库查找违规问题
- 所有Grep/Glob模式均使用
```
scan_path
```
- 追踪调用链以发现冗余获取（需要读取调用方和被调用方代码）
收集包含严重程度、位置、修复工作量和建议的检测结果
使用惩罚算法计算分数
向协调器返回JSON结果

Audit Rules (Priority: HIGH)

审计规则（优先级：高）

1. Redundant Entity Fetch

1. 冗余实体获取

What: Same entity fetched from DB twice in a call chain

Detection:

Find function A that calls
```
repo.get(id)
```
or
```
session.get(Model, id)
```
, then passes
```
id
```
(not object) to function B
Function B also calls
```
repo.get(id)
```
or
```
session.get(Model, id)
```
for the same entity
Common pattern:
```
acquire_next_pending()
```
returns job, but
```
_process_job(job_id)
```
re-fetches it

Detection patterns (Python/SQLAlchemy):

Grep for

repo.*get_by_id|session\.get\(|session\.query.*filter.*id

in service/handler files

Trace: if function receives
```
entity_id: int/UUID
```
AND internally does
```
repo.get(entity_id)
```
, check if caller already has entity object
Check
```
expire_on_commit
```
setting: if
```
False
```
, objects remain valid after commit

Severity:

HIGH: Redundant fetch in API request handler (adds latency per request)
MEDIUM: Redundant fetch in background job (less critical)

Recommendation: Pass entity object instead of ID, or remove second fetch when

expire_on_commit=False

Effort: S (change signature to accept object instead of ID)

问题描述： 同一实体在一个调用链中从数据库获取两次

检测方式：

找到函数A调用
```
repo.get(id)
```
或
```
session.get(Model, id)
```
，然后将
```
id
```
（而非对象）传递给函数B
函数B也针对同一实体调用
```
repo.get(id)
```
或
```
session.get(Model, id)
```
常见模式：
```
acquire_next_pending()
```
返回job，但
```
_process_job(job_id)
```
重新获取该job

检测模式（Python/SQLAlchemy）：

在服务/处理程序文件中搜索

repo.*get_by_id|session\.get\(|session\.query.*filter.*id

追踪：如果函数接收
```
entity_id: int/UUID
```
且内部执行
```
repo.get(entity_id)
```
，检查调用方是否已拥有该实体对象
检查
```
expire_on_commit
```
设置：如果为
```
False
```
，提交后对象仍保持有效

严重程度：

高： API请求处理程序中的冗余获取（每次请求增加延迟）
中：后台任务中的冗余获取（影响较小）

建议： 传递实体对象而非ID，或者当

expire_on_commit=False

时移除第二次获取

修复工作量： S（修改签名以接受对象而非ID）

2. N-UPDATE/DELETE Loop

2. N次更新/删除循环

What: Loop of individual UPDATE/DELETE operations instead of single batch query

Detection:

Pattern:

for item in items: await repo.update(item.id, ...)

for item in items: await repo.delete(item.id)

Pattern:

for item in items: session.execute(update(Model).where(...))

Detection patterns:

Grep for

for .* in .*:

followed by

repo\.(update|delete|reset|save|mark_)

within 1-3 lines

Grep for

for .* in .*:

followed by

session\.execute\(.*update\(

within 1-3 lines

Severity:

HIGH: Loop over >10 items (N separate round-trips to DB)
MEDIUM: Loop over <=10 items

Recommendation: Replace with single

UPDATE ... WHERE id IN (...)

session.execute(update(Model).where(Model.id.in_(ids)))

Effort: M (rewrite query + test)

问题描述： 循环执行单个更新/删除操作，而非单次批量查询

检测方式：

模式：

for item in items: await repo.update(item.id, ...)

或

for item in items: await repo.delete(item.id)

模式：

for item in items: session.execute(update(Model).where(...))

检测模式：

搜索

for .* in .*:

后1-3行内存在

repo\.(update|delete|reset|save|mark_)

的代码

搜索

for .* in .*:

后1-3行内存在

session\.execute\(.*update\(

的代码

严重程度：

高：循环处理超过10个条目（N次独立的数据库往返）
中：循环处理<=10个条目

建议： 替换为单次

UPDATE ... WHERE id IN (...)

或

session.execute(update(Model).where(Model.id.in_(ids)))

修复工作量： M（重写查询并测试）

3. Unnecessary Resolve

3. 不必要的解析

What: Re-resolving a value from DB when it is already available in the caller's scope

Detection:

Method receives
```
profile_id
```
and resolves engine from it, but caller already determined
```
engine
```
Method receives
```
lang_code
```
and looks up dialect_id, but caller already has both
```
lang
```
and
```
dialect
```
Pattern: function receives
```
X_id
```
, does
```
get(X_id)
```
, extracts
```
.field
```
, when caller already has
```
field
```

Severity:

MEDIUM: Extra DB query per invocation, especially in high-frequency paths

Recommendation: Split method into two variants:

with_known_value(value, ...)

and

resolving_value(id, ...)

; or pass resolved value directly

Effort: S-M (refactor signature, update callers)

问题描述： 当调用方作用域中已存在该值时，仍从数据库重新解析该值

检测方式：

方法接收
```
profile_id
```
并从中解析engine，但调用方已确定
```
engine
```
方法接收
```
lang_code
```
并查找dialect_id，但调用方已同时拥有
```
lang
```
和
```
dialect
```
模式：函数接收
```
X_id
```
，执行
```
get(X_id)
```
并提取
```
.field
```
，但调用方已拥有该
```
field
```

严重程度：

中：每次调用额外执行一次数据库查询，尤其是在高频路径中

建议： 将方法拆分为两个变体：

with_known_value(value, ...)

和

resolving_value(id, ...)

；或者直接传递已解析的值

修复工作量： S-M（重构签名，更新调用方）

4. Over-Fetching

4. 过度获取

What: Loading full ORM model when only few fields are needed

Detection:

session.query(Model)

select(Model)

without

.options(load_only(...))

for models with >10 columns

Especially in list/search endpoints that return many rows
Pattern: loading full entity but only using 2-3 fields

Severity:

MEDIUM: Large models (>15 columns) in list endpoints
LOW: Small models (<10 columns) or single-entity endpoints

Recommendation: Use

load_only()

defer()

, or raw

select(Model.col1, Model.col2)

for list queries

Effort: S (add load_only to query)

问题描述： 仅需要少数字段时却加载完整ORM模型

检测方式：

session.query(Model)

或

select(Model)

未使用

.options(load_only(...))

，且模型包含超过10个列

尤其在返回多行数据的列表/搜索端点中
模式：加载完整实体但仅使用2-3个字段

严重程度：

中：列表端点中使用大型模型（>15列）
低：使用小型模型（<10列）或单实体端点

建议： 对列表查询使用

load_only()

、

defer()

或原生

select(Model.col1, Model.col2)

修复工作量： S（向查询添加load_only）

5. Missing Bulk Operations

5. 缺失批量操作

What: Sequential INSERT/DELETE/UPDATE instead of bulk operations

Detection:

for item in items: session.add(item)

instead of

session.add_all(items)

```
for item in items: session.delete(item)
```
instead of bulk delete
Pattern: loop with single
```
INSERT
```
per iteration

Severity:

MEDIUM: Any sequential add/delete in loop (missed batch optimization)

Recommendation: Use

session.add_all()

session.execute(insert(Model).values(list_of_dicts))

bulk_save_objects()

Effort: S (replace loop with bulk call)

问题描述： 顺序执行INSERT/DELETE/UPDATE而非批量操作

检测方式：

使用

for item in items: session.add(item)

而非

session.add_all(items)

使用
```
for item in items: session.delete(item)
```
而非批量删除
模式：循环中每次迭代执行单次
```
INSERT
```

严重程度：

中：循环中任何顺序添加/删除操作（错失批量优化机会）

建议： 使用

session.add_all()

、

session.execute(insert(Model).values(list_of_dicts))

、

bulk_save_objects()

修复工作量： S（用批量调用替换循环）

6. Wrong Caching Scope

6. 错误的缓存范围

What: Request-scoped cache for data that rarely changes (should be app-scoped)

Detection:

Service registered as request-scoped (e.g., via FastAPI
```
Depends()
```
) with internal cache (
```
_cache
```
dict,
```
_loaded
```
flag)
Cache populated by expensive query (JOINs, aggregations) per each request
Data TTL >> request duration (e.g., engine configurations, language lists, feature flags)

Detection patterns:

Find classes with
```
_cache
```
,
```
_loaded
```
,
```
_initialized
```
attributes
Check if class is created per-request (via DI registration scope)
Compare: data change frequency vs cache lifetime

Severity:

HIGH: Expensive query (JOINs, subqueries) cached only per-request
MEDIUM: Simple query cached per-request

Recommendation: Move cache to app-scoped service (singleton), add TTL-based invalidation, or use CacheService with configurable TTL

Effort: M (change DI scope, add TTL logic)

问题描述： 对很少变化的数据使用请求作用域缓存（应使用应用作用域）

检测方式：

服务被注册为请求作用域（例如通过FastAPI
```
Depends()
```
），且内部包含缓存（
```
_cache
```
字典、
```
_loaded
```
标志）
每个请求都通过昂贵的查询（JOIN、聚合）填充缓存
数据TTL远大于请求时长（例如引擎配置、语言列表、功能标志）

检测模式：

查找包含
```
_cache
```
、
```
_loaded
```
、
```
_initialized
```
属性的类
检查该类是否按请求创建（通过DI注册作用域）
比较：数据变更频率 vs 缓存生命周期

严重程度：

高：昂贵查询（JOIN、子查询）仅按请求缓存
中：简单查询按请求缓存

建议： 将缓存移至应用作用域服务（单例），添加基于TTL的失效机制，或使用可配置TTL的CacheService

修复工作量： M（更改DI作用域，添加TTL逻辑）

Scoring Algorithm

评分算法

See

shared/references/audit_scoring.md

for unified formula and score interpretation.

请查看

shared/references/audit_scoring.md

获取统一公式和分数解释。

Output Format

输出格式

Return JSON to coordinator:

json

{
  "category": "Query Efficiency",
  "score": 6,
  "total_issues": 8,
  "critical": 0,
  "high": 3,
  "medium": 4,
  "low": 1,
  "findings": [
    {
      "severity": "HIGH",
      "location": "app/infrastructure/messaging/job_processor.py:434",
      "issue": "Redundant entity fetch: job re-fetched by ID after acquire_next_pending already returned it",
      "principle": "Query Efficiency / DRY Data Access",
      "recommendation": "Pass job object to _process_job instead of job_id",
      "effort": "S"
    }
  ]
}

向协调器返回JSON：

json

{
  "category": "Query Efficiency",
  "score": 6,
  "total_issues": 8,
  "critical": 0,
  "high": 3,
  "medium": 4,
  "low": 1,
  "findings": [
    {
      "severity": "HIGH",
      "location": "app/infrastructure/messaging/job_processor.py:434",
      "issue": "Redundant entity fetch: job re-fetched by ID after acquire_next_pending already returned it",
      "principle": "Query Efficiency / DRY Data Access",
      "recommendation": "Pass job object to _process_job instead of job_id",
      "effort": "S"
    }
  ]
}

Critical Rules

关键规则

Do not auto-fix: Report only
Trace call chains: Rules 1 and 3 require reading both caller and callee
ORM-aware: Check
```
expire_on_commit
```
,
```
autoflush
```
, session scope before flagging redundant fetches
Context-aware: Small datasets or infrequent operations may justify simpler code
Exclude tests: Do not flag test fixtures or setup code

请勿自动修复： 仅报告问题
追踪调用链： 规则1和3需要读取调用方和被调用方代码
ORM感知： 在标记冗余获取前，检查
```
expire_on_commit
```
、
```
autoflush
```
、会话作用域
上下文感知： 小型数据集或低频率操作可能更适合使用简单代码
排除测试代码： 请勿标记测试夹具或设置代码

Definition of Done

完成标准

contextStore parsed (tech_stack, db_config, ORM settings)
scan_path determined (domain path or codebase root)
All 6 checks completed:
- redundant fetch, N-UPDATE loop, unnecessary resolve, over-fetching, bulk ops, caching scope
Findings collected with severity, location, effort, recommendation
Score calculated
JSON returned to coordinator

Version: 1.0.0 Last Updated: 2026-02-04

已解析contextStore（tech_stack、db_config、ORM设置）
已确定扫描路径（领域路径或代码库根目录）
已完成所有6项检查：
- 冗余获取、N次更新循环、不必要的解析、过度获取、批量操作缺失、缓存范围错误
已收集包含严重程度、位置、修复工作量和建议的检测结果
已计算分数
已向协调器返回JSON结果

版本： 1.0.0 最后更新： 2026-02-04