helix-query-optimize

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Helix Query Optimization

Helix查询优化

Optimize Helix queries by improving anchor choice, index alignment, filter timing, traversal breadth, and response shape.
通过优化锚点选择、索引匹配、过滤时机、遍历范围和响应结构来优化Helix查询。

When To Use

使用场景

Use this skill when the task is to:
  • optimize a slow Helix query
  • review query shape for performance problems
  • improve index usage
  • tighten BM25 or vector search routes
  • reduce over-broad traversal or oversized result projections
  • decide whether a dynamic query should become a stored route
当需要完成以下任务时,可使用此技能:
  • 优化缓慢的Helix查询
  • 检查查询结构以排查性能问题
  • 提升索引使用率
  • 收紧BM25或向量搜索路由范围
  • 缩小过宽的遍历范围或过大的结果投影
  • 判断动态查询是否应转为存储路由

First Steps

初始步骤

Before suggesting any changes:
  1. Read the existing query and identify the first anchor.
  2. List the labels, edge labels, predicates, projections, and traversal steps it uses.
  3. Check which indexes already exist in the application.
  4. Determine whether the route is stored or dynamic.
  5. Identify whether the route is a normal read, a write, a text search, a vector search, or a repeat traversal.
Do not suggest optimizations before you understand the current anchor and index story.
在提出任何修改建议前:
  1. 阅读现有查询并确定首个锚点。
  2. 列出其使用的标签、边标签、谓词、投影和遍历步骤。
  3. 检查应用中已存在的索引。
  4. 判断路由是存储路由还是动态路由。
  5. 确定路由类型:普通读取、写入、文本搜索、向量搜索或重复遍历。
在了解当前锚点和索引情况前,请勿提出优化建议。

Optimization Workflow

优化流程

1. Fix The Anchor First

1. 优先优化锚点

Prefer this order:
  1. node ID or edge ID
  2. unique property lookup
  3. equality-indexed property lookup
  4. tenant-scoped label scan
  5. broad label scan
If the route already knows
entityId
,
externalId
,
userId
,
tenantId
, or another indexed identifier, use that before broad traversal.
推荐按以下优先级选择锚点:
  1. 节点ID或边ID
  2. 唯一属性查找
  3. 等值索引属性查找
  4. 租户范围标签扫描
  5. 宽泛标签扫描
如果路由已获知
entityId
externalId
userId
tenantId
或其他带索引的标识符,应优先使用这些,而非宽泛遍历。

2. Match Query Shape To Existing Indexes

2. 让查询结构匹配现有索引

Check whether the query is aligned with:
  • equality indexes
  • range indexes
  • text indexes
  • vector indexes
  • tenant-scoped text or vector indexes
If the query shape is good but the index is missing, say that clearly instead of pretending the DSL alone can solve it.
检查查询是否与以下索引匹配:
  • 等值索引
  • 范围索引
  • 文本索引
  • 向量索引
  • 租户范围的文本或向量索引
如果查询结构合理但缺少对应索引,请明确说明,不要仅靠DSL来解决问题。

3. Move Filters Earlier

3. 提前过滤条件

Apply scope and status filters before broad graph expansion when possible.
Key examples:
  • tenantId
  • userId
  • active-record filtering like empty or null
    deletedAt
  • exact entity or relation identifiers
尽可能在图展开前应用范围和状态过滤。
典型示例:
  • tenantId
  • userId
  • 活跃记录过滤(如
    deletedAt
    为空或null)
  • 精确的实体或关系标识符

4. Shrink The Projection

4. 精简投影

Review what the route returns.
Prefer:
  • explicit
    project(...)
    for service-facing endpoints
  • omitting embeddings unless they are the payload
  • including
    $distance
    only when ranking metadata is needed
检查路由返回的内容。
推荐做法:
  • 面向服务的端点使用显式
    project(...)
  • 除非作为负载,否则省略嵌入向量
  • 仅在需要排序元数据时包含
    $distance

5. Control Traversal Breadth

5. 控制遍历范围

Inspect whether the route should use:
  • dedup()
  • limit(...)
  • range(...)
  • skip(...)
  • count()
  • first()
Use these intentionally. Do not add them blindly.
检查路由是否应使用以下操作:
  • dedup()
  • limit(...)
  • range(...)
  • skip(...)
  • count()
  • first()
要有目的性地使用这些操作,不要盲目添加。

6. Review Search Routes Separately

6. 单独优化搜索路由

For BM25 routes:
  • confirm the indexed property is correct
  • confirm tenant scope is preserved
  • consider over-fetch, post-filter, then trim when the search API cannot express the scope directly
For vector routes:
  • confirm the vector index exists
  • confirm tenant scope is preserved
  • confirm embeddings are omitted from the returned projection unless needed
对于BM25路由:
  • 确认索引属性正确
  • 确认租户范围已保留
  • 当搜索API无法直接表达范围时,考虑先过度获取、后过滤再修剪结果
对于向量路由:
  • 确认向量索引存在
  • 确认租户范围已保留
  • 确认返回投影中省略了嵌入向量(除非需要)

7. Prefer Stored Routes For Steady Traffic

7. 稳定流量优先使用存储路由

If the route is stable and production-facing, favor stored queries over dynamic inline queries. Dynamic routes are more flexible but they should not be the default choice for steady traffic.
如果路由稳定且面向生产环境,优先选择存储查询而非动态内联查询。动态路由更灵活,但不应作为稳定流量的默认选择。

8. Use Query Warming Only For Reads

8. 仅对读取操作使用查询预热

Query warming can help prepopulate caches for known read routes. It is not valid for writes.
查询预热有助于为已知的读取路由预填充缓存,不适用于写入操作。

Canonical Examples

标准示例

Better Anchor Choice

更优的锚点选择

rust
// weaker
g().n_with_label("Entity")
    .where_(Predicate::eq_param("status", "status"))
    .both(Some("RELATED_TO"))

// stronger when entityId is already known
g().n_with_label("Entity")
    .where_(Predicate::eq_param("entityId", "entityId"))
    .both(Some("RELATED_TO"))
rust
// 较差的写法
g().n_with_label("Entity")
    .where_(Predicate::eq_param("status", "status"))
    .both(Some("RELATED_TO"))

// 已知entityId时的更优写法
g().n_with_label("Entity")
    .where_(Predicate::eq_param("entityId", "entityId"))
    .both(Some("RELATED_TO"))

Smaller Search Projection

更精简的搜索投影

rust
// weaker
g().vector_search_nodes_with(...)
    .value_map(None::<Vec<&str>>)

// stronger
g().vector_search_nodes_with(...)
    .project(vec![
        PropertyProjection::new("$id"),
        PropertyProjection::new("title"),
        PropertyProjection::renamed("$distance", "distance"),
    ])
rust
// 较差的写法
g().vector_search_nodes_with(...)
    .value_map(None::<Vec<&str>>)

// 更优的写法
g().vector_search_nodes_with(...)
    .project(vec![
        PropertyProjection::new("$id"),
        PropertyProjection::new("title"),
        PropertyProjection::renamed("$distance", "distance"),
    ])

BM25 Over-Fetch Then Trim

BM25过度获取后修剪

rust
g().text_search_nodes_with(
    "Document",
    "body",
    PropertyInput::param("query"),
    Expr::param("bm25K"),
    None,
)
.where_(Predicate::eq_param("tenantId", "tenantId"))
.range(0, Expr::param("limit"))
rust
g().text_search_nodes_with(
    "Document",
    "body",
    PropertyInput::param("query"),
    Expr::param("bm25K"),
    None,
)
.where_(Predicate::eq_param("tenantId", "tenantId"))
.range(0, Expr::param("limit"))

Anti-Patterns

反模式

Do not:
  • recommend a broad scan before checking for indexed identifiers
  • ignore tenant scope on text or vector search
  • return embeddings by default in search results
  • suggest dynamic routes for stable production traffic without a reason
  • add
    dedup
    ,
    limit
    , or
    range
    without tying them to route semantics
  • focus on micro-tweaks before fixing the anchor or index alignment
请勿:
  • 在检查带索引的标识符前推荐宽泛扫描
  • 在文本或向量搜索中忽略租户范围
  • 默认在搜索结果中返回嵌入向量
  • 无理由地为稳定生产流量推荐动态路由
  • 未结合路由语义就添加
    dedup
    limit
    range
  • 在优化锚点或索引匹配前关注微调整

Validation Checklist

验证检查清单

Before finishing:
  • verify the first anchor is the narrowest practical indexed set
  • verify the route shape matches the indexes that exist or should exist
  • verify scope and status filters happen as early as possible
  • verify projections omit heavy fields unless required
  • verify traversal breadth is intentionally controlled
  • verify BM25 and vector routes preserve tenant scope
  • verify a stable production route is not using the dynamic path without a reason
  • verify warming recommendations are read-only
完成优化前,请确认:
  • 首个锚点是最窄的实用索引集合
  • 路由结构与已存在或应存在的索引匹配
  • 范围和状态过滤尽可能提前执行
  • 投影省略了冗余字段(除非必需)
  • 遍历范围得到了有目的性的控制
  • BM25和向量路由保留了租户范围
  • 稳定的生产路由未无理由地使用动态方式
  • 预热建议仅针对读取操作

Repo References

仓库参考

For shared references in this repo, see:
  • docs/optimization-checklist.md
  • docs/source-canon.md
  • examples/search-patterns.md
  • examples/optimization-patterns.md
关于本仓库中的共享参考内容,请查看:
  • docs/optimization-checklist.md
  • docs/source-canon.md
  • examples/search-patterns.md
  • examples/optimization-patterns.md