mongodb-query-patterns

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

MongoDB Query Patterns

MongoDB 查询模式

Overview

概述

The best query is the one you never make. The second best is the one that fetches only what it needs in a single round-trip.
Before writing any database call, ask: does this data already exist somewhere in the current request chain? If a controller fetched a document and then calls three services, those services should receive the document — not re-fetch it. Pass documents, not IDs.
Think in data flow, not individual queries. Trace how data moves through the request (controller → services → response) and ensure each document makes that journey exactly once.
最好的查询是你永远不需要执行的查询。次优的查询是在单次往返中只获取所需数据的查询。
在编写任何数据库调用之前,先问自己:**当前请求链路中是否已经存在这些数据?**如果控制器已经获取了一个文档,然后调用了三个服务,那么这些服务应该接收该文档——而不是重新获取它。传递文档,而非ID。
**要从数据流的角度思考,而非单个查询。**追踪数据在请求中的流动路径(控制器→服务→响应),确保每个文档在整个流程中只被获取一次。

When to Use

适用场景

  • Adding any new database operation to a service or controller
  • Building a new endpoint that reads or writes data
  • Wiring service calls together in a controller or orchestration function
  • Adding or modifying Mongoose schemas
  • Reviewing code that chains multiple service calls
  • 在服务或控制器中添加任何新的数据库操作
  • 构建读写数据的新接口
  • 在控制器或编排函数中串联服务调用
  • 添加或修改Mongoose schemas
  • 审查包含多个链式服务调用的代码

Data Flow Principles

数据流原则

1. Trace the Request Chain First

1. 先追踪请求链路

Before writing code, map the data flow:
Request → Controller → Service A → Service B → Service C → Response
                          ↓            ↓            ↓
                        needs        needs        needs
                       doc X        doc X        doc X
If multiple stops need the same document, fetch it once at the top and pass it down. Don't let each service independently query for what the caller already has.
编写代码前,先梳理数据流:
请求 → 控制器 → 服务A → 服务B → 服务C → 响应
                          ↓            ↓            ↓
                        需要        需要        需要
                       文档X        文档X        文档X
如果多个环节需要同一个文档,只在顶层获取一次并向下传递。不要让每个服务独立查询调用方已经拥有的数据。

2. Fetch Wide at the Top, Narrow Below

2. 顶层宽获取,下层窄使用

The first service in the chain may need a broader projection. Downstream services should accept the already-fetched document rather than re-querying with their own narrower projection. A few extra fields in memory cost nothing compared to an extra database round-trip.
链路中的第一个服务可能需要更宽泛的字段投影。下游服务应该接收已获取的文档,而非用自己的窄投影重新查询。内存中多几个字段的成本,远低于一次额外的数据库往返。

3. Every Query Must Justify Its Existence

3. 每个查询都必须有存在的理由

When you're about to write a
findById
or
findOne
, ask:
  • Is this document already available from the caller? → Accept it as a parameter
  • Is this the same document another branch of the code fetches? → Hoist it above the branch
  • Am I querying N documents one at a time in a loop? → Batch with
    $in
  • Am I querying just to count or aggregate? → Can I derive it from data I already have?
If none of these apply, the query is justified — write it efficiently.
当你要编写
findById
findOne
时,问自己:
  • 调用方是否已经提供了这个文档?→ 将其作为参数接收
  • 代码的另一个分支是否也查询了同一个文档?→ 将查询逻辑提升到分支之上
  • 我是否在循环中逐个查询N个文档?→
    $in
    批量查询
  • 我是否只是为了统计或聚合而查询?→ 能否从已有的数据中推导出来?
如果以上都不适用,这个查询才是合理的——高效地编写它。

Query Efficiency Rules

查询效率规则

When a query is justified, make it lean:
RuleHowWhy
Project only needed fields
.select('name email status')
Full documents carry every field — list exactly what the caller needs, never widen "just in case"
Return plain objects for reads
.lean()
Skips Mongoose hydration — 2-5x faster for read-only paths
Ensure filter fields are indexed
schema.index({ field: 1 })
Without an index, MongoDB scans every document (COLLSCAN)
Batch related reads
.find({ _id: { $in: ids } })
+ Map
N round-trips collapse to 1
Batch related writes
Model.bulkWrite([...])
N sequential updates collapse to 1
当查询确实必要时,让它更精简:
规则实现方式原因
仅获取所需字段
.select('name email status')
完整文档包含所有字段——只列出调用方需要的字段,绝不要“以防万一”而扩大范围
读取操作返回普通对象
.lean()
跳过Mongoose的实体化过程——对于只读路径,速度提升2-5倍
确保过滤字段已建立索引
schema.index({ field: 1 })
没有索引的话,MongoDB会扫描所有文档(COLLSCAN)
批量读取相关数据
.find({ _id: { $in: ids } })
+ Map
N次往返合并为1次
批量执行相关写入
Model.bulkWrite([...])
N次顺序更新合并为1次

Patterns

实践模式

Pass Data Through Service Boundaries

跨服务边界传递数据

The most common source of waste: services that re-fetch documents the caller already has.
typescript
// WRONG — each service queries the same document independently
async function checkout(orderId: string) {
  const order = await Order.findById(orderId).lean();
  await validateInventory(orderId);    // fetches order again
  await calculateTax(orderId);         // fetches order again
  await processPayment(orderId);       // fetches order again
}

// RIGHT — fetch once, pass through, always provide DB fallback
async function validateInventory(orderId: string, prefetched?: IOrder) {
  const order = prefetched
    || await Order.findById(orderId).select('items quantities').lean();
  // ...
}

async function checkout(orderId: string) {
  const order = await Order.findById(orderId).lean();
  await validateInventory(orderId, order);
  await calculateTax(orderId, order);
  await processPayment(orderId, order);
}
Always keep the DB fallback. Some callers (webhooks, cron jobs, background workers) won't have pre-fetched data.
最常见的浪费来源:服务重新查询调用方已经拥有的文档。
typescript
// 错误示例 — 每个服务独立查询同一个文档
async function checkout(orderId: string) {
  const order = await Order.findById(orderId).lean();
  await validateInventory(orderId);    // 再次查询订单
  await calculateTax(orderId);         // 再次查询订单
  await processPayment(orderId);       // 再次查询订单
}

// 正确示例 — 只查询一次,向下传递,始终保留数据库回退逻辑
async function validateInventory(orderId: string, prefetched?: IOrder) {
  const order = prefetched
    || await Order.findById(orderId).select('items quantities').lean();
  // ...
}

async function checkout(orderId: string) {
  const order = await Order.findById(orderId).lean();
  await validateInventory(orderId, order);
  await calculateTax(orderId, order);
  await processPayment(orderId, order);
}
始终保留数据库回退逻辑。一些调用方(如webhook、定时任务、后台 worker)可能没有预获取的数据。

Batch Instead of Loop

批量处理而非循环查询

When you need related data for a list of items, fetch everything in one query and index with a Map.
typescript
// WRONG — N+1: one query per item
for (const order of orders) {
  const product = await Product.findById(order.productId);
  // ...
}

// RIGHT — 1 query + O(1) lookups
const productIds = orders.map(o => o.productId);
const products = await Product.find({ _id: { $in: productIds } })
  .select('name price image')
  .lean();
const productMap = new Map(products.map(p => [p._id.toString(), p]));

for (const order of orders) {
  const product = productMap.get(order.productId.toString());
  // ...
}
Always use a Map for lookups — never
array.find()
inside a loop (O(n²)).
当你需要一组条目的相关数据时,一次查询所有数据并用Map建立索引。
typescript
// 错误示例 — N+1问题:每个条目对应一次查询
for (const order of orders) {
  const product = await Product.findById(order.productId);
  // ...
}

// 正确示例 — 1次查询 + O(1)查找
const productIds = orders.map(o => o.productId);
const products = await Product.find({ _id: { $in: productIds } })
  .select('name price image')
  .lean();
const productMap = new Map(products.map(p => [p._id.toString(), p]));

for (const order of orders) {
  const product = productMap.get(order.productId.toString());
  // ...
}
始终用Map进行查找——绝不要在循环中使用
array.find()
(时间复杂度O(n²))。

Hoist Common Queries

提取公共查询逻辑

When the same query appears in multiple code branches, it should execute once before the branch.
typescript
// WRONG — identical query in both branches
if (userProvidedId) {
  const doc = await Order.findById(orderId).select('status').lean();
  // validate...
} else {
  const doc = await Order.findById(orderId).select('status').lean();
  // use directly...
}

// RIGHT — query once
const doc = await Order.findById(orderId).select('status').lean();
if (userProvidedId) { /* validate */ } else { /* use */ }
当同一个查询出现在多个代码分支中时,应该在分支之上只执行一次。
typescript
// 错误示例 — 两个分支中存在相同的查询
if (userProvidedId) {
  const doc = await Order.findById(orderId).select('status').lean();
  // 验证...
} else {
  const doc = await Order.findById(orderId).select('status').lean();
  // 直接使用...
}

// 正确示例 — 只查询一次
const doc = await Order.findById(orderId).select('status').lean();
if (userProvidedId) { /* 验证 */ } else { /* 直接使用 */ }

Consolidate Same-Collection Queries

合并同集合查询

If you need multiple aggregates from the same collection, fetch the data once and compute in memory.
typescript
// WRONG — two round-trips to the same collection
const totalCount = await Booking.countDocuments({ eventId });
const perTypeCount = await Booking.aggregate([
  { $match: { eventId } },
  { $group: { _id: '$ticketType', count: { $sum: 1 } } },
]);

// RIGHT — one fetch, derive both
const bookings = await Booking.find({ eventId })
  .select('ticketType')
  .lean();
const totalCount = bookings.length;
const perTypeCount = new Map();
for (const b of bookings) {
  perTypeCount.set(b.ticketType, (perTypeCount.get(b.ticketType) || 0) + 1);
}
For bounded collections (e.g., bookings per event) this is safe. For unbounded collections, keep aggregation server-side.
如果你需要从同一个集合中获取多个聚合结果,只查询一次数据,然后在内存中计算。
typescript
// 错误示例 — 两次往返同一个集合
const totalCount = await Booking.countDocuments({ eventId });
const perTypeCount = await Booking.aggregate([
  { $match: { eventId } },
  { $group: { _id: '$ticketType', count: { $sum: 1 } } },
]);

// 正确示例 — 一次查询,推导两个结果
const bookings = await Booking.find({ eventId })
  .select('ticketType')
  .lean();
const totalCount = bookings.length;
const perTypeCount = new Map();
for (const b of bookings) {
  perTypeCount.set(b.ticketType, (perTypeCount.get(b.ticketType) || 0) + 1);
}
对于有限集合(如单活动的预订记录),这种方式是安全的。对于无限集合,仍在服务端进行聚合。

Bulk Writes Over Loops

批量写入而非循环执行

typescript
// WRONG — N sequential round-trips
for (const item of items) {
  await Item.findByIdAndUpdate(item._id, { $set: { archived: true } });
}

// RIGHT — 1 round-trip
await Item.bulkWrite(
  items.map(i => ({
    updateOne: {
      filter: { _id: i._id },
      update: { $set: { archived: true } },
    },
  }))
);
typescript
// 错误示例 — N次顺序往返
for (const item of items) {
  await Item.findByIdAndUpdate(item._id, { $set: { archived: true } });
}

// 正确示例 — 1次往返
await Item.bulkWrite(
  items.map(i => ({
    updateOne: {
      filter: { _id: i._id },
      update: { $set: { archived: true } },
    },
  }))
);

Avoid Hidden N+1 from
.populate()

避免
.populate()
带来的隐性N+1问题

.populate()
fires a separate query per populated path. Nested or chained populates on lists are silent N+1 bombs.
typescript
// WRONG — if orders has 50 items, this fires 50+ hidden queries
const orders = await Order.find({ userId })
  .populate('product')
  .populate('seller')
  .populate('reviews');

// RIGHT — batch manually
const orders = await Order.find({ userId }).select('product seller').lean();
const productIds = orders.map(o => o.product);
const sellerIds = orders.map(o => o.seller);

const [products, sellers] = await Promise.all([
  Product.find({ _id: { $in: productIds } }).select('name price').lean(),
  Seller.find({ _id: { $in: sellerIds } }).select('name rating').lean(),
]);
const productMap = new Map(products.map(p => [p._id.toString(), p]));
const sellerMap = new Map(sellers.map(s => [s._id.toString(), s]));
.populate()
is fine for single-document lookups. Avoid it when populating across a list.
.populate()
会为每个要填充的路径触发单独的查询。对列表进行嵌套或链式填充是隐性的N+1陷阱。
typescript
// 错误示例 — 如果orders有50条记录,会触发50+次隐性查询
const orders = await Order.find({ userId })
  .populate('product')
  .populate('seller')
  .populate('reviews');

// 正确示例 — 手动批量查询
const orders = await Order.find({ userId }).select('product seller').lean();
const productIds = orders.map(o => o.product);
const sellerIds = orders.map(o => o.seller);

const [products, sellers] = await Promise.all([
  Product.find({ _id: { $in: productIds } }).select('name price').lean(),
  Seller.find({ _id: { $in: sellerIds } }).select('name rating').lean(),
]);
const productMap = new Map(products.map(p => [p._id.toString(), p]));
const sellerMap = new Map(sellers.map(s => [s._id.toString(), s]));
.populate()
适用于单文档查找。当对列表进行填充时,避免使用它。

Update Directly — Don't Fetch to Modify

直接更新 — 不要为了修改而先查询

When updating a field, don't fetch the whole document just to change it and save it back.
typescript
// WRONG — 2 round-trips, fetches entire document
const user = await User.findById(userId);
user.lastLogin = new Date();
await user.save();

// RIGHT — 1 round-trip, touches only the field
await User.updateOne(
  { _id: userId },
  { $set: { lastLogin: new Date() } }
);
Use
findById
+
.save()
only when you need validation, middleware hooks, or optimistic concurrency.
当更新字段时,不要先获取整个文档再修改并保存。
typescript
// 错误示例 — 2次往返,获取完整文档
const user = await User.findById(userId);
user.lastLogin = new Date();
await user.save();

// 正确示例 — 1次往返,只修改目标字段
await User.updateOne(
  { _id: userId },
  { $set: { lastLogin: new Date() } }
);
只有当你需要验证、中间件钩子或乐观锁时,才使用
findById
+
.save()

Run Independent Queries Concurrently

并发执行独立查询

When you need data from multiple collections and the queries don't depend on each other, run them in parallel.
typescript
// WRONG — sequential, each waits for the previous
const users = await User.find({ _id: { $in: userIds } }).select('name email').lean();
const responses = await FormResponse.find({ eventId }).select('userId answers').lean();
const reviews = await Review.find({ eventId }).select('userId rating').lean();

// RIGHT — concurrent, all fire at once
const [users, responses, reviews] = await Promise.all([
  User.find({ _id: { $in: userIds } }).select('name email').lean(),
  FormResponse.find({ eventId }).select('userId answers').lean(),
  Review.find({ eventId }).select('userId rating').lean(),
]);
Use
Promise.all
whenever queries don't depend on each other's results. Three 100ms queries run concurrently take 100ms total, not 300ms.
当你需要从多个集合获取数据且查询之间互不依赖时,并行执行它们。
typescript
// 错误示例 — 顺序执行,每个查询等待前一个完成
const users = await User.find({ _id: { $in: userIds } }).select('name email').lean();
const responses = await FormResponse.find({ eventId }).select('userId answers').lean();
const reviews = await Review.find({ eventId }).select('userId rating').lean();

// 正确示例 — 并发执行,同时触发所有查询
const [users, responses, reviews] = await Promise.all([
  User.find({ _id: { $in: userIds } }).select('name email').lean(),
  FormResponse.find({ eventId }).select('userId answers').lean(),
  Review.find({ eventId }).select('userId rating').lean(),
]);
只要查询之间互不依赖,就使用
Promise.all
。三个100ms的查询并发执行总共只需要100ms,而非300ms。

Index Every Filter Field

为每个过滤字段建立索引

typescript
schema.index({ order_id: 1 });                 // single field
schema.index({ userId: 1, status: 1 });         // compound — high-selectivity field first
schema.index({ event_id: 1, type: 1 });         // compound for filtered aggregations
  • Mongoose auto-creates indexes on connection (
    autoIndex: true
    )
  • createIndex()
    is idempotent — safe to define even if the index exists
  • Verify with
    explain('executionStats')
    — look for
    IXSCAN
    , not
    COLLSCAN
  • Don't index write-only fields — indexes slow inserts and updates
typescript
schema.index({ order_id: 1 });                 // 单字段索引
schema.index({ userId: 1, status: 1 });         // 复合索引 — 高选择性字段在前
schema.index({ event_id: 1, type: 1 });         // 用于过滤聚合的复合索引
  • Mongoose会在连接时自动创建索引(
    autoIndex: true
  • createIndex()
    是幂等的——即使索引已存在,定义它也是安全的
  • explain('executionStats')
    验证——确保是
    IXSCAN
    ,而非
    COLLSCAN
  • 不要为只写字段建立索引——索引会减慢插入和更新操作

Checklist

检查清单

Before committing any code that touches the database:
[ ] Traced the request chain — no document is fetched more than once across the call stack
[ ] Services accept pre-fetched documents as optional params with DB fallback
[ ] Passing documents between functions, not just IDs
[ ] No query executes inside a loop — batched with $in where needed
[ ] No .populate() on a list of documents — batch manually with $in + Map
[ ] Every read query has .select() with only needed fields
[ ] Every read-only query has .lean()
[ ] Updates that don't need hooks use updateOne/bulkWrite, not findById + save
[ ] Filter fields have corresponding schema.index() definitions
[ ] Multiple writes use bulkWrite, not a loop
[ ] Independent queries run concurrently with Promise.all
[ ] Same query doesn't appear in multiple code branches — hoisted above
[ ] .select() lists exactly the fields needed — not widened "just in case"
在提交任何涉及数据库的代码之前,检查以下内容:
[ ] 已追踪请求链路 — 整个调用栈中没有文档被重复获取
[ ] 服务接收预获取的文档作为可选参数,并保留数据库回退逻辑
[ ] 在函数之间传递文档,而非仅传递ID
[ ] 没有在循环中执行查询 — 需要时用$in批量处理
[ ] 没有对文档列表使用.populate() — 用$in + Map手动批量处理
[ ] 每个读查询都用.select()指定了所需字段
[ ] 每个只读查询都使用了.lean()
[ ] 不需要钩子的更新操作使用updateOne/bulkWrite,而非findById + save
[ ] 过滤字段都有对应的schema.index()定义
[ ] 多次写入操作使用bulkWrite,而非循环执行
[ ] 独立查询用Promise.all并发执行
[ ] 同一个查询没有出现在多个代码分支中 — 已提升到分支之上
[ ] .select()精确列出了所需字段 — 没有“以防万一”而扩大范围

Common Mistakes

常见错误

MistakeWhy It Breaks
Passing IDs between services instead of documentsForces every downstream service to re-query — pass the document
.populate()
on a list of documents
Each populate fires a separate query per document — silent N+1
findById
+ modify +
.save()
for simple field update
2 round-trips + full document fetch — use
updateOne
directly
.select()
without
.lean()
Still returns Mongoose document with full change tracking overhead
.lean()
then calling
.save()
.lean()
returns plain objects — no Mongoose methods available
Batch
$in
+
array.find()
for lookup
O(n²) — always use a Map for O(1) lookups
Prefetch param without DB fallbackBreaks webhook/cron callers that don't have the pre-fetched data
Sequential independent queriesUse
Promise.all
— 3 queries at 100ms each take 100ms, not 300ms
Widening
.select()
"just in case"
Fetch only what the caller actually uses — extra fields waste bandwidth
Unbounded
.find()
without limit
Risk of loading entire collection into memory
Index on rarely-queried fieldIndexes slow every write for no read benefit
错误危害
在服务之间传递ID而非文档迫使每个下游服务重新查询 — 传递文档
对文档列表使用.populate()每个填充操作都会为每个文档触发单独的查询 — 隐性N+1问题
用findById + 修改 + .save()来更新简单字段2次往返 + 获取完整文档 — 直接使用updateOne
使用.select()但未使用.lean()仍会返回带有完整变更追踪开销的Mongoose文档
调用.lean()后再调用.save().lean()返回普通对象 — 没有Mongoose方法可用
批量$in查询后用array.find()查找O(n²)时间复杂度 — 始终用Map实现O(1)查找
预获取参数没有数据库回退逻辑会导致没有预获取数据的webhook/定时任务调用失败
顺序执行独立查询用Promise.all并发执行 — 三个100ms的查询总共只需100ms,而非300ms
“以防万一”扩大.select()的范围只获取调用方实际需要的字段 — 额外字段会浪费带宽
无限制的.find()查询(未加limit)有将整个集合加载到内存的风险
为很少查询的字段建立索引索引会减慢所有写入操作,却没有读取收益