neo4j-query-tuning-skill

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

When to Use

适用场景

Query takes unexpectedly long; need root-cause analysis
EXPLAIN/PROFILE output in hand — needs interpretation
Identifying which index is missing or unused
Deciding between slotted / pipelined / parallel runtimes
Monitoring live queries: SHOW QUERIES, SHOW TRANSACTIONS
Cardinality estimates wrong (plan replanning needed)

查询运行时间远超预期，需要进行根因分析
已获取EXPLAIN/PROFILE输出，需要解读
识别缺失或未被使用的索引
确定使用slotted / pipelined / parallel哪种运行时
监控实时查询：SHOW QUERIES、SHOW TRANSACTIONS
基数估计错误（需要重新规划执行计划）

When NOT to Use

不适用场景

Writing Cypher from scratch →
```
neo4j-cypher-skill
```
GDS algorithm performance →
```
neo4j-gds-skill
```
Schema design / data modelling →
```
neo4j-modeling-skill
```

从零开始编写Cypher →
```
neo4j-cypher-skill
```
GDS算法性能调优 →
```
neo4j-gds-skill
```
Schema设计/数据建模 →
```
neo4j-modeling-skill
```

EXPLAIN vs PROFILE

EXPLAIN 与 PROFILE 对比

	EXPLAIN	PROFILE
Executes query?	No	Yes
Returns data?	No	Yes
Shows `rows` (actual)	No	Yes
Shows `dbHits` (actual)	No	Yes
Shows `estimatedRows`	Yes	Yes
Cost	Zero	Full query cost

Run

PROFILE

twice — first run warms page cache; second gives representative metrics.

cypher

EXPLAIN MATCH (p:Person {email: $email}) RETURN p.name
PROFILE MATCH (p:Person {email: $email}) RETURN p.name

Query API alternative (no driver):

bash

curl -X POST https://<host>/db/<db>/query/v2 \
  -u <user>:<pass> -H "Content-Type: application/json" \
  -d '{"statement": "EXPLAIN MATCH (p:Person {email: $email}) RETURN p.name", "parameters": {"email": "a@b.com"}}'

	EXPLAIN	PROFILE
是否执行查询？	否	是
是否返回数据？	否	是
是否显示 `rows` （实际行数）	否	是
是否显示 `dbHits` （实际数据库访问次数）	否	是
是否显示 `estimatedRows` （预估行数）	是	是
开销	零	完整查询开销

需要运行两次PROFILE——第一次运行会预热页缓存，第二次运行的指标才具有代表性。

cypher

EXPLAIN MATCH (p:Person {email: $email}) RETURN p.name
PROFILE MATCH (p:Person {email: $email}) RETURN p.name

无需驱动的查询API替代方案：

bash

curl -X POST https://<host>/db/<db>/query/v2 \
  -u <user>:<pass> -H "Content-Type: application/json" \
  -d '{"statement": "EXPLAIN MATCH (p:Person {email: $email}) RETURN p.name", "parameters": {"email": "a@b.com"}}'

Key Plan Metrics

关键执行计划指标

Metric	Good	Investigate if
`dbHits`	Low; drops after index added	High relative to `rows`
`rows`	Shrinks early in plan	Large until final operator
`estimatedRows`	Close to `rows`	>10× divergence from actual
`pageCacheHitRatio`	>0.99	<0.90 (disk I/O bottleneck)
`pageCacheHits`	High	—
`pageCacheMisses`	Near 0	Rising (page cache too small)

Read plans bottom-up — leaf operators at bottom initiate data retrieval.

指标	理想状态	需要排查的情况
`dbHits`	数值低；添加索引后下降	相对于 `rows` 数值过高
`rows`	在执行计划早期就减少	直到最后一个操作符时数值仍很大
`estimatedRows`	与 `rows` 接近	与实际值偏差超过10倍
`pageCacheHitRatio`	>0.99	<0.90（磁盘I/O瓶颈）
`pageCacheHits`	数值高	—
`pageCacheMisses`	接近0	持续上升（页缓存过小）

执行计划需要从下往上阅读——底部的叶子操作符负责初始化数据检索。

Operator Reference

操作符参考

Operator	Good/Bad	Meaning	Fix
`NodeIndexSeek`	✓	Exact match via RANGE/LOOKUP index	—
`NodeUniqueIndexSeek`	✓	Unique constraint index hit	—
`NodeIndexContainsScan`	✓	TEXT index CONTAINS / STARTS WITH	—
`NodeIndexScan`	~	Full index scan (no predicate)	Add WHERE predicate or composite index
`NodeByLabelScan`	✗	Scans all nodes of label	Add RANGE index on lookup property
`AllNodesScan`	✗✗	Scans entire node store	Add label + index to MATCH
`Expand(All)`	~	Traverse relationships from node	Normal; limit with LIMIT or WHERE
`Expand(Into)`	~	Find rels between two matched nodes	Normal for known-endpoint joins
`Filter`	~	Predicate applied after scan	Move predicate into WHERE with index
`CartesianProduct`	✗	No join predicate between two MATCH	Add WHERE join or use WITH between MATCHes
`NodeHashJoin`	~	Hash join on node IDs	Normal; planner chose hash join
`ValueHashJoin`	~	Hash join on values	Normal; watch memory for large inputs
`EagerAggregation`	~	Full aggregation (ORDER BY, count(*))	Normal for aggregates
`Aggregation`	✓	Streaming aggregation	—
`Eager`	✗	Read/write conflict; materialises all rows	See Eager fix strategies below
`Sort`	~	Full sort — O(n log n)	Add `LIMIT` before Sort; push LIMIT earlier
`Top`	✓	Sort+Limit combined — O(n log k)	Preferred over Sort+Limit
`Limit`	✓	Truncates rows early	Push as early as possible
`Skip`	~	Offset pagination	Use keyset pagination on large graphs
`ProduceResults`	—	Final output operator	Root of tree
`UndirectedRelationshipByIdSeekPipe`	~	Lookup by relationship ID	Avoid `id(r)` — use `elementId(r)`

Full operator reference → references/plan-operators.md

操作符	优劣	含义	修复方案
`NodeIndexSeek`	✓	通过RANGE/LOOKUP索引精确匹配	—
`NodeUniqueIndexSeek`	✓	命中唯一约束索引	—
`NodeIndexContainsScan`	✓	TEXT索引的CONTAINS / STARTS WITH操作	—
`NodeIndexScan`	~	全索引扫描（无谓词）	添加WHERE谓词或复合索引
`NodeByLabelScan`	✗	扫描指定标签的所有节点	在查询属性上添加RANGE索引
`AllNodesScan`	✗✗	扫描整个节点存储	在MATCH语句中添加标签+索引
`Expand(All)`	~	从节点遍历关系	正常操作；可通过LIMIT或WHERE限制
`Expand(Into)`	~	查找两个匹配节点之间的关系	已知端点连接的正常操作
`Filter`	~	扫描后应用谓词	将谓词移至带索引的WHERE子句中
`CartesianProduct`	✗	两个MATCH语句之间无连接谓词	添加WHERE连接条件，或在MATCH之间使用WITH
`NodeHashJoin`	~	基于节点ID的哈希连接	正常操作；优化器选择哈希连接
`ValueHashJoin`	~	基于值的哈希连接	正常操作；需注意大输入时的内存占用
`EagerAggregation`	~	全量聚合（ORDER BY、count(*)）	聚合操作的正常行为
`Aggregation`	✓	流式聚合	—
`Eager`	✗	读写冲突；将所有行实例化	参考下方的Eager修复策略
`Sort`	~	全量排序——时间复杂度O(n log n)	在Sort前添加 `LIMIT` ；提前推送LIMIT
`Top`	✓	Sort+Limit组合操作——时间复杂度O(n log k)	优先于Sort+Limit使用
`Limit`	✓	提前截断行数	尽可能早地推送该操作
`Skip`	~	偏移分页	在大图中使用键集分页
`ProduceResults`	—	最终输出操作符	执行计划树的根节点
`UndirectedRelationshipByIdSeekPipe`	~	通过关系ID查找	避免使用 `id(r)` ——改用 `elementId(r)`

完整操作符参考 → references/plan-operators.md

Diagnostic Workflow (Agent Runbook)

诊断流程（Agent执行手册）

Step 1 — Baseline Plan

步骤1 — 获取基准执行计划

cypher

EXPLAIN <query>

Scan output for

AllNodesScan

NodeByLabelScan

CartesianProduct

Eager

cypher

EXPLAIN <query>

扫描输出，查找

AllNodesScan

、

NodeByLabelScan

、

CartesianProduct

、

Eager

操作符。

Step 2 — Check Indexes

步骤2 — 检查索引

cypher

SHOW INDEXES YIELD name, type, labelsOrTypes, properties, state
WHERE state = 'ONLINE'

Find whether the label/property from the bad operator has an index.

cypher

SHOW INDEXES YIELD name, type, labelsOrTypes, properties, state
WHERE state = 'ONLINE'

查看低效操作符涉及的标签/属性是否已有索引。

Step 3 — Create Missing Index

步骤3 — 创建缺失的索引

cypher

// RANGE index for equality/range predicates:
CREATE INDEX person_email IF NOT EXISTS FOR (n:Person) ON (n.email)
// TEXT index for CONTAINS/ENDS WITH:
CREATE TEXT INDEX person_bio IF NOT EXISTS FOR (n:Person) ON (n.bio)
// Composite for multi-property lookup:
CREATE INDEX order_status_date IF NOT EXISTS FOR (n:Order) ON (n.status, n.createdAt)

Wait for

state = 'ONLINE'

before measuring.

cypher

// 用于等值/范围谓词的RANGE索引：
CREATE INDEX person_email IF NOT EXISTS FOR (n:Person) ON (n.email)
// 用于CONTAINS/ENDS WITH的TEXT索引：
CREATE TEXT INDEX person_bio IF NOT EXISTS FOR (n:Person) ON (n.bio)
// 用于多属性查询的复合索引：
CREATE INDEX order_status_date IF NOT EXISTS FOR (n:Order) ON (n.status, n.createdAt)

等待索引

state = 'ONLINE'

后再进行性能测试。

Step 4 — Profile After Fix

步骤4 — 修复后执行PROFILE

cypher

PROFILE <query>

Compare

dbHits

and elapsed ms before/after. Target:

NodeIndexSeek

replaces scan operators.

cypher

PROFILE <query>

对比修复前后的

dbHits

和耗时（毫秒）。目标：用

NodeIndexSeek

替代扫描类操作符。

Step 5 — Stale Statistics (if estimatedRows wildly off)

步骤5 — 统计信息过时（若estimatedRows与实际值偏差极大）

cypher

CALL db.prepareForReplanning()
// or resample a specific index:
CALL db.resampleIndex("person_email")
// or resample all outdated:
CALL db.resampleOutdatedIndexes()

Config:

dbms.cypher.statistics_divergence_threshold

(default

0.75

— plan expires when stat changes >75%).

cypher

CALL db.prepareForReplanning()
// 或重新采样指定索引：
CALL db.resampleIndex("person_email")
// 或重新采样所有过时的索引：
CALL db.resampleOutdatedIndexes()

配置项：

dbms.cypher.statistics_divergence_threshold

（默认值

0.75

——当统计信息变化超过75%时，执行计划过期）。

Fixing Common Plan Problems

常见执行计划问题修复

Missing Index → NodeByLabelScan / AllNodesScan

缺失索引 → NodeByLabelScan / AllNodesScan

cypher

// Force index hint when planner ignores it:
MATCH (p:Person {email: $email})
USING INDEX p:Person(email)
RETURN p.name
// Force label scan (sometimes faster for high selectivity):
MATCH (p:Person {email: $email})
USING SCAN p:Person
RETURN p.name

cypher

// 当优化器忽略索引时，强制使用索引提示：
MATCH (p:Person {email: $email})
USING INDEX p:Person(email)
RETURN p.name
// 强制使用标签扫描（在高选择性场景下有时更快）：
MATCH (p:Person {email: $email})
USING SCAN p:Person
RETURN p.name

Wrong Anchor — Planner Picks Wrong Starting Node

锚点错误——优化器选择了错误的起始节点

Reorder MATCH or use hints:

cypher

// Force join at specific node:
MATCH (a:Author)-[:WROTE]->(b:Book)-[:IN_CATEGORY]->(c:Category {name: $cat})
USING JOIN ON b
RETURN a.name, b.title

调整MATCH顺序或使用提示：

cypher

// 强制在指定节点进行连接：
MATCH (a:Author)-[:WROTE]->(b:Book)-[:IN_CATEGORY]->(c:Category {name: $cat})
USING JOIN ON b
RETURN a.name, b.title

CartesianProduct — Two Unconnected MATCHes

CartesianProduct——两个无关联的MATCH语句

cypher

// Bad (Cartesian product):
MATCH (a:Author {id: $aid})
MATCH (b:Book  {id: $bid})
RETURN a.name, b.title

// Good (explicit join or WITH):
MATCH (a:Author {id: $aid})-[:WROTE]->(b:Book {id: $bid})
RETURN a.name, b.title
// Or: WITH between them to reset planning context

cypher

// 不良写法（笛卡尔积）：
MATCH (a:Author {id: $aid})
MATCH (b:Book  {id: $bid})
RETURN a.name, b.title

// 良好写法（显式连接或使用WITH）：
MATCH (a:Author {id: $aid})-[:WROTE]->(b:Book {id: $bid})
RETURN a.name, b.title
// 或者：在两个MATCH之间使用WITH重置优化上下文

Eager — Read/Write Conflict

Eager——读写冲突

Three strategies (pick simplest):

Add specific labels to MATCH nodes so planner distinguishes read/write sets

Collect-then-write:

WITH collect(n) AS nodes UNWIND nodes AS n SET n.x = 1

CALL IN TRANSACTIONS: isolates each batch in its own transaction

cypher

CYPHER 25
MATCH (p:Person) WHERE p.score > 100
CALL (p) { SET p.tier = 'gold' } IN TRANSACTIONS OF 1000 ROWS

三种策略（选择最简单的一种）：

为MATCH节点添加特定标签，让优化器区分读写集合

先收集再写入：

WITH collect(n) AS nodes UNWIND nodes AS n SET n.x = 1

CALL IN TRANSACTIONS：将每个批次隔离在独立事务中

cypher

CYPHER 25
MATCH (p:Person) WHERE p.score > 100
CALL (p) { SET p.tier = 'gold' } IN TRANSACTIONS OF 1000 ROWS

Expensive CONTAINS / ENDS WITH

昂贵的CONTAINS / ENDS WITH操作

cypher

// Needs TEXT index (RANGE does NOT support these):
CREATE TEXT INDEX person_bio IF NOT EXISTS FOR (n:Person) ON (n.bio)
MATCH (p:Person) WHERE p.bio CONTAINS $keyword RETURN p.name

cypher

// 需要TEXT索引（RANGE索引不支持这些操作）：
CREATE TEXT INDEX person_bio IF NOT EXISTS FOR (n:Person) ON (n.bio)
MATCH (p:Person) WHERE p.bio CONTAINS $keyword RETURN p.name

Over-Traversal — Push LIMIT Early

过度遍历——提前推送LIMIT

cypher

// Bad: LIMIT after expensive join
MATCH (a:Author)-[:WROTE]->(b:Book)-[:REVIEWED_BY]->(r:Review)
RETURN a.name, b.title, r.text LIMIT 10

// Good: anchor limit before fan-out
MATCH (a:Author)-[:WROTE]->(b:Book)
WITH a, b LIMIT 10
MATCH (b)-[:REVIEWED_BY]->(r:Review)
RETURN a.name, b.title, r.text

cypher

// 不良写法：在昂贵的连接后使用LIMIT
MATCH (a:Author)-[:WROTE]->(b:Book)-[:REVIEWED_BY]->(r:Review)
RETURN a.name, b.title, r.text LIMIT 10

// 良好写法：在扇出前设置锚点限制
MATCH (a:Author)-[:WROTE]->(b:Book)
WITH a, b LIMIT 10
MATCH (b)-[:REVIEWED_BY]->(r:Review)
RETURN a.name, b.title, r.text

Cypher Runtime Selection

Cypher运行时选择

Runtime	Select	Best For	Avoid When
`pipelined`	`CYPHER runtime=pipelined`	Default OLTP; streaming, low memory	Unsupported operators fall back to slotted
`slotted`	`CYPHER runtime=slotted`	Guaranteed stable behavior; debug	Performance-critical OLTP
`parallel`	`CYPHER 25 runtime=parallel`	Large analytical scans; aggregations	OLTP, writes, short queries, Aura Free

Pipelined is default for most queries. Parallel requires

dbms.cypher.parallel.worker_limit

configured; available on Enterprise and Aura Pro 2025+.

cypher

// Force parallel for large aggregation:
CYPHER 25 runtime=parallel
MATCH (n:Transaction) WHERE n.amount > 1000
RETURN n.currency, count(*), sum(n.amount)

运行时	指定方式	最佳适用场景	避免场景
`pipelined`	`CYPHER runtime=pipelined`	默认OLTP场景；流式处理，低内存占用	不支持的操作符会回退到slotted
`slotted`	`CYPHER runtime=slotted`	保证稳定行为；调试场景	性能敏感的OLTP场景
`parallel`	`CYPHER 25 runtime=parallel`	大型分析扫描；聚合操作	OLTP、写入操作、短查询、Aura Free版

Pipelined是大多数查询的默认运行时。Parallel需要配置

dbms.cypher.parallel.worker_limit

；适用于企业版和Aura Pro 2025+版本。

cypher

// 强制对大型聚合操作使用parallel运行时：
CYPHER 25 runtime=parallel
MATCH (n:Transaction) WHERE n.amount > 1000
RETURN n.currency, count(*), sum(n.amount)

Query Monitoring Commands

查询监控命令

cypher

// Live queries + resource usage:
SHOW QUERIES YIELD query, queryId, elapsedTimeMillis, allocatedBytes, status, username

// Running transactions:
SHOW TRANSACTIONS YIELD transactionId, currentQuery, currentQueryProgress, elapsedTime, status, username, cpuTime, activeLockCount  // currentQueryProgress added [2026.03]

// Kill a specific transaction:
TERMINATE TRANSACTION $transactionId

// Kill a query:
TERMINATE QUERY $queryId

// Graph count stats (node/rel counts by label/type — feed into planner):
CALL db.stats.retrieve('GRAPH COUNTS') YIELD section, data RETURN section, data

// Token stats (label/property/rel-type IDs):
CALL db.stats.retrieve('TOKENS') YIELD section, data RETURN section, data

Full monitoring reference → references/stats-and-monitoring.md

cypher

// 实时查询+资源使用情况：
SHOW QUERIES YIELD query, queryId, elapsedTimeMillis, allocatedBytes, status, username

// 运行中的事务：
SHOW TRANSACTIONS YIELD transactionId, currentQuery, currentQueryProgress, elapsedTime, status, username, cpuTime, activeLockCount  // currentQueryProgress于[2026.03]新增

// 终止特定事务：
TERMINATE TRANSACTION $transactionId

// 终止查询：
TERMINATE QUERY $queryId

// 图统计信息（按标签/类型统计节点/关系数量——供优化器使用）：
CALL db.stats.retrieve('GRAPH COUNTS') YIELD section, data RETURN section, data

// 令牌统计信息（标签/属性/关系类型ID）：
CALL db.stats.retrieve('TOKENS') YIELD section, data RETURN section, data

完整监控参考 → references/stats-and-monitoring.md

Checklist

检查清单

Run
```
EXPLAIN
```
first — identifies plan problems without execution cost
Check for
```
AllNodesScan
```
/
```
NodeByLabelScan
```
— missing index
Check for
```
CartesianProduct
```
— missing join predicate
Check for
```
Eager
```
— read/write conflict
```
SHOW INDEXES
```
— confirm relevant index exists and
```
state = 'ONLINE'
```
Create missing index; wait for ONLINE
Run
```
PROFILE
```
twice — first warms cache, second is representative
Compare
```
dbHits
```
before/after fix

estimatedRows

wildly off →

CALL db.prepareForReplanning()

Push
```
LIMIT
```
/
```
WITH n LIMIT k
```
before high-fanout operations
For CONTAINS/ENDS WITH — TEXT index, not RANGE
For large analytical queries — consider
```
runtime=parallel
```
Kill long-running queries with
```
TERMINATE TRANSACTION
```

先运行
```
EXPLAIN
```
——无需执行开销即可识别执行计划问题
检查是否存在
```
AllNodesScan
```
/
```
NodeByLabelScan
```
——可能缺失索引
检查是否存在
```
CartesianProduct
```
——缺失连接谓词
检查是否存在
```
Eager
```
——读写冲突
执行
```
SHOW INDEXES
```
——确认相关索引存在且
```
state = 'ONLINE'
```
创建缺失的索引；等待索引变为ONLINE状态
运行两次
```
PROFILE
```
——第一次预热缓存，第二次结果具有代表性
对比修复前后的
```
dbHits
```
数值

若

estimatedRows

与实际值偏差极大 → 执行

CALL db.prepareForReplanning()

在高扇出操作前推送
```
LIMIT
```
/
```
WITH n LIMIT k
```
对于CONTAINS/ENDS WITH操作——使用TEXT索引，而非RANGE索引
对于大型分析查询——考虑使用
```
runtime=parallel
```
使用
```
TERMINATE TRANSACTION
```
终止长时间运行的查询