ln-651-query-efficiency-auditor

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Query Efficiency Auditor (L3 Worker)

查询效率审计器(L3 工作器)

Specialized worker auditing database query patterns for redundancy, inefficiency, and misuse.
专为审计数据库查询模式中的冗余、低效和误用问题设计的工作器。

Purpose & Scope

目标与范围

  • Worker in ln-650 coordinator pipeline - invoked by ln-650-persistence-performance-auditor
  • Audit query efficiency (Priority: HIGH)
  • Check redundant fetches, batch operation misuse, caching scope problems
  • Return structured findings with severity, location, effort, recommendations
  • Calculate compliance score (X/10) for Query Efficiency category
  • ln-650 协调器流水线中的工作器 - 由ln-650-persistence-performance-auditor调用
  • 审计查询效率(优先级:高)
  • 检查冗余获取、批量操作误用、缓存范围问题
  • 返回包含严重程度、位置、修复工作量和建议的结构化检测结果
  • 计算查询效率类别的合规分数(X/10)

Inputs (from Coordinator)

输入(来自协调器)

MANDATORY READ: Load
shared/references/task_delegation_pattern.md#audit-coordinator--worker-contract
for contextStore structure.
Receives
contextStore
with:
tech_stack
,
best_practices
,
db_config
(database type, ORM settings),
codebase_root
.
Domain-aware: Supports
domain_mode
+
current_domain
.
必读: 加载
shared/references/task_delegation_pattern.md#audit-coordinator--worker-contract
以了解contextStore结构。
接收包含以下内容的
contextStore
tech_stack
best_practices
db_config
(数据库类型、ORM设置)、
codebase_root
领域感知: 支持
domain_mode
+
current_domain

Workflow

工作流程

  1. Parse context from contextStore
    • Extract tech_stack, best_practices, db_config
    • Determine scan_path (same logic as ln-624)
  2. Scan codebase for violations
    • All Grep/Glob patterns use
      scan_path
    • Trace call chains for redundant fetches (requires reading caller + callee)
  3. Collect findings with severity, location, effort, recommendation
  4. Calculate score using penalty algorithm
  5. Return JSON result to coordinator
  1. 解析contextStore中的上下文
    • 提取tech_stack、best_practices、db_config
    • 确定扫描路径(与ln-624相同的逻辑)
  2. 扫描代码库查找违规问题
    • 所有Grep/Glob模式均使用
      scan_path
    • 追踪调用链以发现冗余获取(需要读取调用方和被调用方代码)
  3. 收集包含严重程度、位置、修复工作量和建议的检测结果
  4. 使用惩罚算法计算分数
  5. 向协调器返回JSON结果

Audit Rules (Priority: HIGH)

审计规则(优先级:高)

1. Redundant Entity Fetch

1. 冗余实体获取

What: Same entity fetched from DB twice in a call chain
Detection:
  • Find function A that calls
    repo.get(id)
    or
    session.get(Model, id)
    , then passes
    id
    (not object) to function B
  • Function B also calls
    repo.get(id)
    or
    session.get(Model, id)
    for the same entity
  • Common pattern:
    acquire_next_pending()
    returns job, but
    _process_job(job_id)
    re-fetches it
Detection patterns (Python/SQLAlchemy):
  • Grep for
    repo.*get_by_id|session\.get\(|session\.query.*filter.*id
    in service/handler files
  • Trace: if function receives
    entity_id: int/UUID
    AND internally does
    repo.get(entity_id)
    , check if caller already has entity object
  • Check
    expire_on_commit
    setting: if
    False
    , objects remain valid after commit
Severity:
  • HIGH: Redundant fetch in API request handler (adds latency per request)
  • MEDIUM: Redundant fetch in background job (less critical)
Recommendation: Pass entity object instead of ID, or remove second fetch when
expire_on_commit=False
Effort: S (change signature to accept object instead of ID)
问题描述: 同一实体在一个调用链中从数据库获取两次
检测方式:
  • 找到函数A调用
    repo.get(id)
    session.get(Model, id)
    ,然后将
    id
    (而非对象)传递给函数B
  • 函数B也针对同一实体调用
    repo.get(id)
    session.get(Model, id)
  • 常见模式:
    acquire_next_pending()
    返回job,但
    _process_job(job_id)
    重新获取该job
检测模式(Python/SQLAlchemy):
  • 在服务/处理程序文件中搜索
    repo.*get_by_id|session\.get\(|session\.query.*filter.*id
  • 追踪:如果函数接收
    entity_id: int/UUID
    且内部执行
    repo.get(entity_id)
    ,检查调用方是否已拥有该实体对象
  • 检查
    expire_on_commit
    设置:如果为
    False
    ,提交后对象仍保持有效
严重程度:
  • 高: API请求处理程序中的冗余获取(每次请求增加延迟)
  • 中: 后台任务中的冗余获取(影响较小)
建议: 传递实体对象而非ID,或者当
expire_on_commit=False
时移除第二次获取
修复工作量: S(修改签名以接受对象而非ID)

2. N-UPDATE/DELETE Loop

2. N次更新/删除循环

What: Loop of individual UPDATE/DELETE operations instead of single batch query
Detection:
  • Pattern:
    for item in items: await repo.update(item.id, ...)
    or
    for item in items: await repo.delete(item.id)
  • Pattern:
    for item in items: session.execute(update(Model).where(...))
Detection patterns:
  • Grep for
    for .* in .*:
    followed by
    repo\.(update|delete|reset|save|mark_)
    within 1-3 lines
  • Grep for
    for .* in .*:
    followed by
    session\.execute\(.*update\(
    within 1-3 lines
Severity:
  • HIGH: Loop over >10 items (N separate round-trips to DB)
  • MEDIUM: Loop over <=10 items
Recommendation: Replace with single
UPDATE ... WHERE id IN (...)
or
session.execute(update(Model).where(Model.id.in_(ids)))
Effort: M (rewrite query + test)
问题描述: 循环执行单个更新/删除操作,而非单次批量查询
检测方式:
  • 模式:
    for item in items: await repo.update(item.id, ...)
    for item in items: await repo.delete(item.id)
  • 模式:
    for item in items: session.execute(update(Model).where(...))
检测模式:
  • 搜索
    for .* in .*:
    后1-3行内存在
    repo\.(update|delete|reset|save|mark_)
    的代码
  • 搜索
    for .* in .*:
    后1-3行内存在
    session\.execute\(.*update\(
    的代码
严重程度:
  • 高: 循环处理超过10个条目(N次独立的数据库往返)
  • 中: 循环处理<=10个条目
建议: 替换为单次
UPDATE ... WHERE id IN (...)
session.execute(update(Model).where(Model.id.in_(ids)))
修复工作量: M(重写查询并测试)

3. Unnecessary Resolve

3. 不必要的解析

What: Re-resolving a value from DB when it is already available in the caller's scope
Detection:
  • Method receives
    profile_id
    and resolves engine from it, but caller already determined
    engine
  • Method receives
    lang_code
    and looks up dialect_id, but caller already has both
    lang
    and
    dialect
  • Pattern: function receives
    X_id
    , does
    get(X_id)
    , extracts
    .field
    , when caller already has
    field
Severity:
  • MEDIUM: Extra DB query per invocation, especially in high-frequency paths
Recommendation: Split method into two variants:
with_known_value(value, ...)
and
resolving_value(id, ...)
; or pass resolved value directly
Effort: S-M (refactor signature, update callers)
问题描述: 当调用方作用域中已存在该值时,仍从数据库重新解析该值
检测方式:
  • 方法接收
    profile_id
    并从中解析engine,但调用方已确定
    engine
  • 方法接收
    lang_code
    并查找dialect_id,但调用方已同时拥有
    lang
    dialect
  • 模式:函数接收
    X_id
    ,执行
    get(X_id)
    并提取
    .field
    ,但调用方已拥有该
    field
严重程度:
  • 中: 每次调用额外执行一次数据库查询,尤其是在高频路径中
建议: 将方法拆分为两个变体:
with_known_value(value, ...)
resolving_value(id, ...)
;或者直接传递已解析的值
修复工作量: S-M(重构签名,更新调用方)

4. Over-Fetching

4. 过度获取

What: Loading full ORM model when only few fields are needed
Detection:
  • session.query(Model)
    or
    select(Model)
    without
    .options(load_only(...))
    for models with >10 columns
  • Especially in list/search endpoints that return many rows
  • Pattern: loading full entity but only using 2-3 fields
Severity:
  • MEDIUM: Large models (>15 columns) in list endpoints
  • LOW: Small models (<10 columns) or single-entity endpoints
Recommendation: Use
load_only()
,
defer()
, or raw
select(Model.col1, Model.col2)
for list queries
Effort: S (add load_only to query)
问题描述: 仅需要少数字段时却加载完整ORM模型
检测方式:
  • session.query(Model)
    select(Model)
    未使用
    .options(load_only(...))
    ,且模型包含超过10个列
  • 尤其在返回多行数据的列表/搜索端点中
  • 模式:加载完整实体但仅使用2-3个字段
严重程度:
  • 中: 列表端点中使用大型模型(>15列)
  • 低: 使用小型模型(<10列)或单实体端点
建议: 对列表查询使用
load_only()
defer()
或原生
select(Model.col1, Model.col2)
修复工作量: S(向查询添加load_only)

5. Missing Bulk Operations

5. 缺失批量操作

What: Sequential INSERT/DELETE/UPDATE instead of bulk operations
Detection:
  • for item in items: session.add(item)
    instead of
    session.add_all(items)
  • for item in items: session.delete(item)
    instead of bulk delete
  • Pattern: loop with single
    INSERT
    per iteration
Severity:
  • MEDIUM: Any sequential add/delete in loop (missed batch optimization)
Recommendation: Use
session.add_all()
,
session.execute(insert(Model).values(list_of_dicts))
,
bulk_save_objects()
Effort: S (replace loop with bulk call)
问题描述: 顺序执行INSERT/DELETE/UPDATE而非批量操作
检测方式:
  • 使用
    for item in items: session.add(item)
    而非
    session.add_all(items)
  • 使用
    for item in items: session.delete(item)
    而非批量删除
  • 模式:循环中每次迭代执行单次
    INSERT
严重程度:
  • 中: 循环中任何顺序添加/删除操作(错失批量优化机会)
建议: 使用
session.add_all()
session.execute(insert(Model).values(list_of_dicts))
bulk_save_objects()
修复工作量: S(用批量调用替换循环)

6. Wrong Caching Scope

6. 错误的缓存范围

What: Request-scoped cache for data that rarely changes (should be app-scoped)
Detection:
  • Service registered as request-scoped (e.g., via FastAPI
    Depends()
    ) with internal cache (
    _cache
    dict,
    _loaded
    flag)
  • Cache populated by expensive query (JOINs, aggregations) per each request
  • Data TTL >> request duration (e.g., engine configurations, language lists, feature flags)
Detection patterns:
  • Find classes with
    _cache
    ,
    _loaded
    ,
    _initialized
    attributes
  • Check if class is created per-request (via DI registration scope)
  • Compare: data change frequency vs cache lifetime
Severity:
  • HIGH: Expensive query (JOINs, subqueries) cached only per-request
  • MEDIUM: Simple query cached per-request
Recommendation: Move cache to app-scoped service (singleton), add TTL-based invalidation, or use CacheService with configurable TTL
Effort: M (change DI scope, add TTL logic)
问题描述: 对很少变化的数据使用请求作用域缓存(应使用应用作用域)
检测方式:
  • 服务被注册为请求作用域(例如通过FastAPI
    Depends()
    ),且内部包含缓存(
    _cache
    字典、
    _loaded
    标志)
  • 每个请求都通过昂贵的查询(JOIN、聚合)填充缓存
  • 数据TTL远大于请求时长(例如引擎配置、语言列表、功能标志)
检测模式:
  • 查找包含
    _cache
    _loaded
    _initialized
    属性的类
  • 检查该类是否按请求创建(通过DI注册作用域)
  • 比较:数据变更频率 vs 缓存生命周期
严重程度:
  • 高: 昂贵查询(JOIN、子查询)仅按请求缓存
  • 中: 简单查询按请求缓存
建议: 将缓存移至应用作用域服务(单例),添加基于TTL的失效机制,或使用可配置TTL的CacheService
修复工作量: M(更改DI作用域,添加TTL逻辑)

Scoring Algorithm

评分算法

See
shared/references/audit_scoring.md
for unified formula and score interpretation.
请查看
shared/references/audit_scoring.md
获取统一公式和分数解释。

Output Format

输出格式

Return JSON to coordinator:
json
{
  "category": "Query Efficiency",
  "score": 6,
  "total_issues": 8,
  "critical": 0,
  "high": 3,
  "medium": 4,
  "low": 1,
  "findings": [
    {
      "severity": "HIGH",
      "location": "app/infrastructure/messaging/job_processor.py:434",
      "issue": "Redundant entity fetch: job re-fetched by ID after acquire_next_pending already returned it",
      "principle": "Query Efficiency / DRY Data Access",
      "recommendation": "Pass job object to _process_job instead of job_id",
      "effort": "S"
    }
  ]
}
向协调器返回JSON:
json
{
  "category": "Query Efficiency",
  "score": 6,
  "total_issues": 8,
  "critical": 0,
  "high": 3,
  "medium": 4,
  "low": 1,
  "findings": [
    {
      "severity": "HIGH",
      "location": "app/infrastructure/messaging/job_processor.py:434",
      "issue": "Redundant entity fetch: job re-fetched by ID after acquire_next_pending already returned it",
      "principle": "Query Efficiency / DRY Data Access",
      "recommendation": "Pass job object to _process_job instead of job_id",
      "effort": "S"
    }
  ]
}

Critical Rules

关键规则

  • Do not auto-fix: Report only
  • Trace call chains: Rules 1 and 3 require reading both caller and callee
  • ORM-aware: Check
    expire_on_commit
    ,
    autoflush
    , session scope before flagging redundant fetches
  • Context-aware: Small datasets or infrequent operations may justify simpler code
  • Exclude tests: Do not flag test fixtures or setup code
  • 请勿自动修复: 仅报告问题
  • 追踪调用链: 规则1和3需要读取调用方和被调用方代码
  • ORM感知: 在标记冗余获取前,检查
    expire_on_commit
    autoflush
    、会话作用域
  • 上下文感知: 小型数据集或低频率操作可能更适合使用简单代码
  • 排除测试代码: 请勿标记测试夹具或设置代码

Definition of Done

完成标准

  • contextStore parsed (tech_stack, db_config, ORM settings)
  • scan_path determined (domain path or codebase root)
  • All 6 checks completed:
    • redundant fetch, N-UPDATE loop, unnecessary resolve, over-fetching, bulk ops, caching scope
  • Findings collected with severity, location, effort, recommendation
  • Score calculated
  • JSON returned to coordinator

Version: 1.0.0 Last Updated: 2026-02-04
  • 已解析contextStore(tech_stack、db_config、ORM设置)
  • 已确定扫描路径(领域路径或代码库根目录)
  • 已完成所有6项检查:
    • 冗余获取、N次更新循环、不必要的解析、过度获取、批量操作缺失、缓存范围错误
  • 已收集包含严重程度、位置、修复工作量和建议的检测结果
  • 已计算分数
  • 已向协调器返回JSON结果

版本: 1.0.0 最后更新: 2026-02-04