neo4j-query-tuning-skill

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

When to Use

适用场景

  • Query takes unexpectedly long; need root-cause analysis
  • EXPLAIN/PROFILE output in hand — needs interpretation
  • Identifying which index is missing or unused
  • Deciding between slotted / pipelined / parallel runtimes
  • Monitoring live queries: SHOW QUERIES, SHOW TRANSACTIONS
  • Cardinality estimates wrong (plan replanning needed)
  • 查询运行时间远超预期,需要进行根因分析
  • 已获取EXPLAIN/PROFILE输出,需要解读
  • 识别缺失或未被使用的索引
  • 确定使用slotted / pipelined / parallel哪种运行时
  • 监控实时查询:SHOW QUERIES、SHOW TRANSACTIONS
  • 基数估计错误(需要重新规划执行计划)

When NOT to Use

不适用场景

  • Writing Cypher from scratch
    neo4j-cypher-skill
  • GDS algorithm performance
    neo4j-gds-skill
  • Schema design / data modelling
    neo4j-modeling-skill

  • 从零开始编写Cypher
    neo4j-cypher-skill
  • GDS算法性能调优
    neo4j-gds-skill
  • Schema设计/数据建模
    neo4j-modeling-skill

EXPLAIN vs PROFILE

EXPLAIN 与 PROFILE 对比

EXPLAINPROFILE
Executes query?NoYes
Returns data?NoYes
Shows
rows
(actual)
NoYes
Shows
dbHits
(actual)
NoYes
Shows
estimatedRows
YesYes
CostZeroFull query cost
Run
PROFILE
twice — first run warms page cache; second gives representative metrics.
cypher
EXPLAIN MATCH (p:Person {email: $email}) RETURN p.name
PROFILE MATCH (p:Person {email: $email}) RETURN p.name
Query API alternative (no driver):
bash
curl -X POST https://<host>/db/<db>/query/v2 \
  -u <user>:<pass> -H "Content-Type: application/json" \
  -d '{"statement": "EXPLAIN MATCH (p:Person {email: $email}) RETURN p.name", "parameters": {"email": "a@b.com"}}'

EXPLAINPROFILE
是否执行查询?
是否返回数据?
是否显示
rows
(实际行数)
是否显示
dbHits
(实际数据库访问次数)
是否显示
estimatedRows
(预估行数)
开销完整查询开销
需要运行两次PROFILE——第一次运行会预热页缓存,第二次运行的指标才具有代表性。
cypher
EXPLAIN MATCH (p:Person {email: $email}) RETURN p.name
PROFILE MATCH (p:Person {email: $email}) RETURN p.name
无需驱动的查询API替代方案:
bash
curl -X POST https://<host>/db/<db>/query/v2 \
  -u <user>:<pass> -H "Content-Type: application/json" \
  -d '{"statement": "EXPLAIN MATCH (p:Person {email: $email}) RETURN p.name", "parameters": {"email": "a@b.com"}}'

Key Plan Metrics

关键执行计划指标

MetricGoodInvestigate if
dbHits
Low; drops after index addedHigh relative to
rows
rows
Shrinks early in planLarge until final operator
estimatedRows
Close to
rows
>10× divergence from actual
pageCacheHitRatio
>0.99<0.90 (disk I/O bottleneck)
pageCacheHits
High
pageCacheMisses
Near 0Rising (page cache too small)
Read plans bottom-up — leaf operators at bottom initiate data retrieval.

指标理想状态需要排查的情况
dbHits
数值低;添加索引后下降相对于
rows
数值过高
rows
在执行计划早期就减少直到最后一个操作符时数值仍很大
estimatedRows
rows
接近
与实际值偏差超过10倍
pageCacheHitRatio
>0.99<0.90(磁盘I/O瓶颈)
pageCacheHits
数值高
pageCacheMisses
接近0持续上升(页缓存过小)
执行计划需要从下往上阅读——底部的叶子操作符负责初始化数据检索。

Operator Reference

操作符参考

OperatorGood/BadMeaningFix
NodeIndexSeek
Exact match via RANGE/LOOKUP index
NodeUniqueIndexSeek
Unique constraint index hit
NodeIndexContainsScan
TEXT index CONTAINS / STARTS WITH
NodeIndexScan
~Full index scan (no predicate)Add WHERE predicate or composite index
NodeByLabelScan
Scans all nodes of labelAdd RANGE index on lookup property
AllNodesScan
✗✗Scans entire node storeAdd label + index to MATCH
Expand(All)
~Traverse relationships from nodeNormal; limit with LIMIT or WHERE
Expand(Into)
~Find rels between two matched nodesNormal for known-endpoint joins
Filter
~Predicate applied after scanMove predicate into WHERE with index
CartesianProduct
No join predicate between two MATCHAdd WHERE join or use WITH between MATCHes
NodeHashJoin
~Hash join on node IDsNormal; planner chose hash join
ValueHashJoin
~Hash join on valuesNormal; watch memory for large inputs
EagerAggregation
~Full aggregation (ORDER BY, count(*))Normal for aggregates
Aggregation
Streaming aggregation
Eager
Read/write conflict; materialises all rowsSee Eager fix strategies below
Sort
~Full sort — O(n log n)Add
LIMIT
before Sort; push LIMIT earlier
Top
Sort+Limit combined — O(n log k)Preferred over Sort+Limit
Limit
Truncates rows earlyPush as early as possible
Skip
~Offset paginationUse keyset pagination on large graphs
ProduceResults
Final output operatorRoot of tree
UndirectedRelationshipByIdSeekPipe
~Lookup by relationship IDAvoid
id(r)
— use
elementId(r)
Full operator reference → references/plan-operators.md

操作符优劣含义修复方案
NodeIndexSeek
通过RANGE/LOOKUP索引精确匹配
NodeUniqueIndexSeek
命中唯一约束索引
NodeIndexContainsScan
TEXT索引的CONTAINS / STARTS WITH操作
NodeIndexScan
~全索引扫描(无谓词)添加WHERE谓词或复合索引
NodeByLabelScan
扫描指定标签的所有节点在查询属性上添加RANGE索引
AllNodesScan
✗✗扫描整个节点存储在MATCH语句中添加标签+索引
Expand(All)
~从节点遍历关系正常操作;可通过LIMIT或WHERE限制
Expand(Into)
~查找两个匹配节点之间的关系已知端点连接的正常操作
Filter
~扫描后应用谓词将谓词移至带索引的WHERE子句中
CartesianProduct
两个MATCH语句之间无连接谓词添加WHERE连接条件,或在MATCH之间使用WITH
NodeHashJoin
~基于节点ID的哈希连接正常操作;优化器选择哈希连接
ValueHashJoin
~基于值的哈希连接正常操作;需注意大输入时的内存占用
EagerAggregation
~全量聚合(ORDER BY、count(*))聚合操作的正常行为
Aggregation
流式聚合
Eager
读写冲突;将所有行实例化参考下方的Eager修复策略
Sort
~全量排序——时间复杂度O(n log n)在Sort前添加
LIMIT
;提前推送LIMIT
Top
Sort+Limit组合操作——时间复杂度O(n log k)优先于Sort+Limit使用
Limit
提前截断行数尽可能早地推送该操作
Skip
~偏移分页在大图中使用键集分页
ProduceResults
最终输出操作符执行计划树的根节点
UndirectedRelationshipByIdSeekPipe
~通过关系ID查找避免使用
id(r)
——改用
elementId(r)
完整操作符参考 → references/plan-operators.md

Diagnostic Workflow (Agent Runbook)

诊断流程(Agent执行手册)

Step 1 — Baseline Plan

步骤1 — 获取基准执行计划

cypher
EXPLAIN <query>
Scan output for
AllNodesScan
,
NodeByLabelScan
,
CartesianProduct
,
Eager
.
cypher
EXPLAIN <query>
扫描输出,查找
AllNodesScan
NodeByLabelScan
CartesianProduct
Eager
操作符。

Step 2 — Check Indexes

步骤2 — 检查索引

cypher
SHOW INDEXES YIELD name, type, labelsOrTypes, properties, state
WHERE state = 'ONLINE'
Find whether the label/property from the bad operator has an index.
cypher
SHOW INDEXES YIELD name, type, labelsOrTypes, properties, state
WHERE state = 'ONLINE'
查看低效操作符涉及的标签/属性是否已有索引。

Step 3 — Create Missing Index

步骤3 — 创建缺失的索引

cypher
// RANGE index for equality/range predicates:
CREATE INDEX person_email IF NOT EXISTS FOR (n:Person) ON (n.email)
// TEXT index for CONTAINS/ENDS WITH:
CREATE TEXT INDEX person_bio IF NOT EXISTS FOR (n:Person) ON (n.bio)
// Composite for multi-property lookup:
CREATE INDEX order_status_date IF NOT EXISTS FOR (n:Order) ON (n.status, n.createdAt)
Wait for
state = 'ONLINE'
before measuring.
cypher
// 用于等值/范围谓词的RANGE索引:
CREATE INDEX person_email IF NOT EXISTS FOR (n:Person) ON (n.email)
// 用于CONTAINS/ENDS WITH的TEXT索引:
CREATE TEXT INDEX person_bio IF NOT EXISTS FOR (n:Person) ON (n.bio)
// 用于多属性查询的复合索引:
CREATE INDEX order_status_date IF NOT EXISTS FOR (n:Order) ON (n.status, n.createdAt)
等待索引
state = 'ONLINE'
后再进行性能测试。

Step 4 — Profile After Fix

步骤4 — 修复后执行PROFILE

cypher
PROFILE <query>
Compare
dbHits
and elapsed ms before/after. Target:
NodeIndexSeek
replaces scan operators.
cypher
PROFILE <query>
对比修复前后的
dbHits
和耗时(毫秒)。目标:用
NodeIndexSeek
替代扫描类操作符。

Step 5 — Stale Statistics (if estimatedRows wildly off)

步骤5 — 统计信息过时(若estimatedRows与实际值偏差极大)

cypher
CALL db.prepareForReplanning()
// or resample a specific index:
CALL db.resampleIndex("person_email")
// or resample all outdated:
CALL db.resampleOutdatedIndexes()
Config:
dbms.cypher.statistics_divergence_threshold
(default
0.75
— plan expires when stat changes >75%).

cypher
CALL db.prepareForReplanning()
// 或重新采样指定索引:
CALL db.resampleIndex("person_email")
// 或重新采样所有过时的索引:
CALL db.resampleOutdatedIndexes()
配置项:
dbms.cypher.statistics_divergence_threshold
(默认值
0.75
——当统计信息变化超过75%时,执行计划过期)。

Fixing Common Plan Problems

常见执行计划问题修复

Missing Index → NodeByLabelScan / AllNodesScan

缺失索引 → NodeByLabelScan / AllNodesScan

cypher
// Force index hint when planner ignores it:
MATCH (p:Person {email: $email})
USING INDEX p:Person(email)
RETURN p.name
// Force label scan (sometimes faster for high selectivity):
MATCH (p:Person {email: $email})
USING SCAN p:Person
RETURN p.name
cypher
// 当优化器忽略索引时,强制使用索引提示:
MATCH (p:Person {email: $email})
USING INDEX p:Person(email)
RETURN p.name
// 强制使用标签扫描(在高选择性场景下有时更快):
MATCH (p:Person {email: $email})
USING SCAN p:Person
RETURN p.name

Wrong Anchor — Planner Picks Wrong Starting Node

锚点错误——优化器选择了错误的起始节点

Reorder MATCH or use hints:
cypher
// Force join at specific node:
MATCH (a:Author)-[:WROTE]->(b:Book)-[:IN_CATEGORY]->(c:Category {name: $cat})
USING JOIN ON b
RETURN a.name, b.title
调整MATCH顺序或使用提示:
cypher
// 强制在指定节点进行连接:
MATCH (a:Author)-[:WROTE]->(b:Book)-[:IN_CATEGORY]->(c:Category {name: $cat})
USING JOIN ON b
RETURN a.name, b.title

CartesianProduct — Two Unconnected MATCHes

CartesianProduct——两个无关联的MATCH语句

cypher
// Bad (Cartesian product):
MATCH (a:Author {id: $aid})
MATCH (b:Book  {id: $bid})
RETURN a.name, b.title

// Good (explicit join or WITH):
MATCH (a:Author {id: $aid})-[:WROTE]->(b:Book {id: $bid})
RETURN a.name, b.title
// Or: WITH between them to reset planning context
cypher
// 不良写法(笛卡尔积):
MATCH (a:Author {id: $aid})
MATCH (b:Book  {id: $bid})
RETURN a.name, b.title

// 良好写法(显式连接或使用WITH):
MATCH (a:Author {id: $aid})-[:WROTE]->(b:Book {id: $bid})
RETURN a.name, b.title
// 或者:在两个MATCH之间使用WITH重置优化上下文

Eager — Read/Write Conflict

Eager——读写冲突

Three strategies (pick simplest):
  1. Add specific labels to MATCH nodes so planner distinguishes read/write sets
  2. Collect-then-write:
    WITH collect(n) AS nodes UNWIND nodes AS n SET n.x = 1
  3. CALL IN TRANSACTIONS: isolates each batch in its own transaction
cypher
CYPHER 25
MATCH (p:Person) WHERE p.score > 100
CALL (p) { SET p.tier = 'gold' } IN TRANSACTIONS OF 1000 ROWS
三种策略(选择最简单的一种):
  1. 为MATCH节点添加特定标签,让优化器区分读写集合
  2. 先收集再写入
    WITH collect(n) AS nodes UNWIND nodes AS n SET n.x = 1
  3. CALL IN TRANSACTIONS:将每个批次隔离在独立事务中
cypher
CYPHER 25
MATCH (p:Person) WHERE p.score > 100
CALL (p) { SET p.tier = 'gold' } IN TRANSACTIONS OF 1000 ROWS

Expensive CONTAINS / ENDS WITH

昂贵的CONTAINS / ENDS WITH操作

cypher
// Needs TEXT index (RANGE does NOT support these):
CREATE TEXT INDEX person_bio IF NOT EXISTS FOR (n:Person) ON (n.bio)
MATCH (p:Person) WHERE p.bio CONTAINS $keyword RETURN p.name
cypher
// 需要TEXT索引(RANGE索引不支持这些操作):
CREATE TEXT INDEX person_bio IF NOT EXISTS FOR (n:Person) ON (n.bio)
MATCH (p:Person) WHERE p.bio CONTAINS $keyword RETURN p.name

Over-Traversal — Push LIMIT Early

过度遍历——提前推送LIMIT

cypher
// Bad: LIMIT after expensive join
MATCH (a:Author)-[:WROTE]->(b:Book)-[:REVIEWED_BY]->(r:Review)
RETURN a.name, b.title, r.text LIMIT 10

// Good: anchor limit before fan-out
MATCH (a:Author)-[:WROTE]->(b:Book)
WITH a, b LIMIT 10
MATCH (b)-[:REVIEWED_BY]->(r:Review)
RETURN a.name, b.title, r.text

cypher
// 不良写法:在昂贵的连接后使用LIMIT
MATCH (a:Author)-[:WROTE]->(b:Book)-[:REVIEWED_BY]->(r:Review)
RETURN a.name, b.title, r.text LIMIT 10

// 良好写法:在扇出前设置锚点限制
MATCH (a:Author)-[:WROTE]->(b:Book)
WITH a, b LIMIT 10
MATCH (b)-[:REVIEWED_BY]->(r:Review)
RETURN a.name, b.title, r.text

Cypher Runtime Selection

Cypher运行时选择

RuntimeSelectBest ForAvoid When
pipelined
CYPHER runtime=pipelined
Default OLTP; streaming, low memoryUnsupported operators fall back to slotted
slotted
CYPHER runtime=slotted
Guaranteed stable behavior; debugPerformance-critical OLTP
parallel
CYPHER 25 runtime=parallel
Large analytical scans; aggregationsOLTP, writes, short queries, Aura Free
Pipelined is default for most queries. Parallel requires
dbms.cypher.parallel.worker_limit
configured; available on Enterprise and Aura Pro 2025+.
cypher
// Force parallel for large aggregation:
CYPHER 25 runtime=parallel
MATCH (n:Transaction) WHERE n.amount > 1000
RETURN n.currency, count(*), sum(n.amount)

运行时指定方式最佳适用场景避免场景
pipelined
CYPHER runtime=pipelined
默认OLTP场景;流式处理,低内存占用不支持的操作符会回退到slotted
slotted
CYPHER runtime=slotted
保证稳定行为;调试场景性能敏感的OLTP场景
parallel
CYPHER 25 runtime=parallel
大型分析扫描;聚合操作OLTP、写入操作、短查询、Aura Free版
Pipelined是大多数查询的默认运行时。Parallel需要配置
dbms.cypher.parallel.worker_limit
;适用于企业版和Aura Pro 2025+版本。
cypher
// 强制对大型聚合操作使用parallel运行时:
CYPHER 25 runtime=parallel
MATCH (n:Transaction) WHERE n.amount > 1000
RETURN n.currency, count(*), sum(n.amount)

Query Monitoring Commands

查询监控命令

cypher
// Live queries + resource usage:
SHOW QUERIES YIELD query, queryId, elapsedTimeMillis, allocatedBytes, status, username

// Running transactions:
SHOW TRANSACTIONS YIELD transactionId, currentQuery, currentQueryProgress, elapsedTime, status, username, cpuTime, activeLockCount  // currentQueryProgress added [2026.03]

// Kill a specific transaction:
TERMINATE TRANSACTION $transactionId

// Kill a query:
TERMINATE QUERY $queryId

// Graph count stats (node/rel counts by label/type — feed into planner):
CALL db.stats.retrieve('GRAPH COUNTS') YIELD section, data RETURN section, data

// Token stats (label/property/rel-type IDs):
CALL db.stats.retrieve('TOKENS') YIELD section, data RETURN section, data
Full monitoring reference → references/stats-and-monitoring.md

cypher
// 实时查询+资源使用情况:
SHOW QUERIES YIELD query, queryId, elapsedTimeMillis, allocatedBytes, status, username

// 运行中的事务:
SHOW TRANSACTIONS YIELD transactionId, currentQuery, currentQueryProgress, elapsedTime, status, username, cpuTime, activeLockCount  // currentQueryProgress于[2026.03]新增

// 终止特定事务:
TERMINATE TRANSACTION $transactionId

// 终止查询:
TERMINATE QUERY $queryId

// 图统计信息(按标签/类型统计节点/关系数量——供优化器使用):
CALL db.stats.retrieve('GRAPH COUNTS') YIELD section, data RETURN section, data

// 令牌统计信息(标签/属性/关系类型ID):
CALL db.stats.retrieve('TOKENS') YIELD section, data RETURN section, data
完整监控参考 → references/stats-and-monitoring.md

Checklist

检查清单

  • Run
    EXPLAIN
    first — identifies plan problems without execution cost
  • Check for
    AllNodesScan
    /
    NodeByLabelScan
    — missing index
  • Check for
    CartesianProduct
    — missing join predicate
  • Check for
    Eager
    — read/write conflict
  • SHOW INDEXES
    — confirm relevant index exists and
    state = 'ONLINE'
  • Create missing index; wait for ONLINE
  • Run
    PROFILE
    twice — first warms cache, second is representative
  • Compare
    dbHits
    before/after fix
  • If
    estimatedRows
    wildly off →
    CALL db.prepareForReplanning()
  • Push
    LIMIT
    /
    WITH n LIMIT k
    before high-fanout operations
  • For CONTAINS/ENDS WITH — TEXT index, not RANGE
  • For large analytical queries — consider
    runtime=parallel
  • Kill long-running queries with
    TERMINATE TRANSACTION
  • 先运行
    EXPLAIN
    ——无需执行开销即可识别执行计划问题
  • 检查是否存在
    AllNodesScan
    /
    NodeByLabelScan
    ——可能缺失索引
  • 检查是否存在
    CartesianProduct
    ——缺失连接谓词
  • 检查是否存在
    Eager
    ——读写冲突
  • 执行
    SHOW INDEXES
    ——确认相关索引存在且
    state = 'ONLINE'
  • 创建缺失的索引;等待索引变为ONLINE状态
  • 运行两次
    PROFILE
    ——第一次预热缓存,第二次结果具有代表性
  • 对比修复前后的
    dbHits
    数值
  • estimatedRows
    与实际值偏差极大 → 执行
    CALL db.prepareForReplanning()
  • 在高扇出操作前推送
    LIMIT
    /
    WITH n LIMIT k
  • 对于CONTAINS/ENDS WITH操作——使用TEXT索引,而非RANGE索引
  • 对于大型分析查询——考虑使用
    runtime=parallel
  • 使用
    TERMINATE TRANSACTION
    终止长时间运行的查询