pgvector-semantic-search

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

pgvector for Semantic Search

基于pgvector的语义搜索

Semantic search finds content by meaning rather than exact keywords. An embedding model converts text into high-dimensional vectors, where similar meanings map to nearby points. pgvector stores these vectors in PostgreSQL and uses approximate nearest neighbor (ANN) indexes to find the closest matches quickly—scaling to millions of rows without leaving the database. Store your text alongside its embedding, then query by converting your search text to a vector and returning the rows with the smallest distance.
This guide covers pgvector setup and tuning—not embedding model selection or text chunking, which significantly affect search quality. Requires pgvector 0.8.0+ for all features (
halfvec
,
binary_quantize
, iterative scan).
语义搜索根据内容含义而非精确关键词查找内容。嵌入模型会将文本转换为高维向量,含义相似的文本对应的向量在空间中位置相近。pgvector将这些向量存储在PostgreSQL中,并使用近似最近邻(ANN)索引快速找到最匹配的结果——无需离开数据库即可扩展至数百万行数据。将文本与其嵌入向量一同存储,然后将搜索文本转换为向量,返回距离最小的行即可完成查询。
本指南涵盖pgvector的设置与调优——不涉及嵌入模型选择或文本分块,这两者对搜索质量影响显著。所有功能需要pgvector 0.8.0+版本支持(包括
halfvec
binary_quantize
、迭代扫描)。

Golden Path (Default Setup)

推荐路径(默认配置)

Use this configuration unless you have a specific reason not to.
  • Embedding column data type:
    halfvec(N)
    where
    N
    is your embedding dimension (must match everywhere). Examples use 1536; replace with your dimension
    N
    .
  • Distance: cosine (
    <=>
    )
  • Index: HNSW (
    m = 16
    ,
    ef_construction = 64
    ). Use
    halfvec_cosine_ops
    and query with
    <=>
    .
  • Query-time recall:
    SET hnsw.ef_search = 100
    (good starting point from published benchmarks, increase for higher recall at higher latency)
  • Query pattern:
    ORDER BY embedding <=> $1::halfvec(N) LIMIT k
This setup provides a strong speed–recall tradeoff for most text-embedding workloads.
除非有特殊需求,否则请使用以下配置:
  • 嵌入列数据类型:
    halfvec(N)
    ,其中
    N
    为嵌入维度(所有位置必须保持一致)。示例中使用1536;请替换为你的实际维度
    N
  • 距离计算:余弦距离(
    <=>
  • 索引:HNSW(
    m = 16
    ef_construction = 64
    )。使用
    halfvec_cosine_ops
    ,查询时使用
    <=>
  • 查询时召回率:
    SET hnsw.ef_search = 100
    (这是公开基准测试中的良好起始值,若要提升召回率可增大该值,但会增加延迟)
  • 查询模式:
    ORDER BY embedding <=> $1::halfvec(N) LIMIT k
该配置为大多数文本嵌入工作负载提供了出色的速度-召回率平衡。

Core Rules

核心规则

  • Enable the extension in each database:
    CREATE EXTENSION IF NOT EXISTS vector;
  • Use HNSW indexes by default—superior speed-recall tradeoff, can be created on empty tables, no training step required. Only consider IVFFlat for write-heavy or memory-bound workloads.
  • Use
    halfvec
    by default
    —store and index as
    halfvec
    for 50% smaller storage and indexes with minimal recall loss.
  • Index after bulk loading initial data for best build performance.
  • Create indexes concurrently in production:
    CREATE INDEX CONCURRENTLY ...
  • Use cosine distance by default (
    <=>
    ): For non-normalized embeddings, use cosine. For unit-normalized embeddings, cosine and inner product yield identical rankings; default to cosine.
  • Match query operator to index ops: Index with
    halfvec_cosine_ops
    requires
    <=>
    in queries;
    halfvec_l2_ops
    requires
    <->
    ; mismatched operators won't use the index.
  • Always cast query vectors explicitly (
    $1::halfvec(N)
    ) to avoid implicit-cast failures in prepared statements.
  • Always use the same embedding model for data and queries. Similarity search only works when the model generating the vectors is the same.
  • 在每个数据库中启用扩展
    CREATE EXTENSION IF NOT EXISTS vector;
  • 默认使用HNSW索引——速度-召回率平衡更优,可在空表上创建,无需训练步骤。仅在写入密集型或内存受限的工作负载中考虑使用IVFFlat。
  • 默认使用
    halfvec
    ——以
    halfvec
    格式存储和索引可将存储和索引大小减少50%,且召回率损失极小。
  • 批量加载初始数据后再创建索引,以获得最佳构建性能。
  • 生产环境中并发创建索引
    CREATE INDEX CONCURRENTLY ...
  • 默认使用余弦距离
    <=>
    ):对于未归一化的嵌入向量,使用余弦距离;对于单位归一化的嵌入向量,余弦距离与内积的排名结果一致,默认使用余弦距离。
  • 查询运算符与索引运算符类匹配:使用
    halfvec_cosine_ops
    创建的索引要求查询时使用
    <=>
    halfvec_l2_ops
    要求使用
    <->
    ;运算符不匹配将无法使用索引。
  • 始终显式转换查询向量
    $1::halfvec(N)
    ),以避免预准备语句中出现隐式转换失败。
  • 数据与查询始终使用相同的嵌入模型。只有当生成向量的模型相同时,相似性搜索才能正常工作。

Type Rules

类型规则

  • Store embeddings as
    halfvec(N)
  • Cast query vectors to
    halfvec(N)
  • Store binary quantized vectors as
    bit(N)
    in a generated column
  • Do not mix
    vector
    /
    halfvec
    /
    bit
    without explicit casts
  • Never call
    binary_quantize()
    on table columns inside
    ORDER BY
    ; store it instead
  • Dimensions must match: a
    halfvec(1536)
    column requires query vectors cast as
    ::halfvec(1536)
    .
  • 嵌入向量存储为
    halfvec(N)
  • 查询向量转换为
    halfvec(N)
  • 二进制量化向量存储为生成列中的
    bit(N)
    类型
  • 不要在未显式转换的情况下混合使用
    vector
    /
    halfvec
    /
    bit
  • 切勿在
    ORDER BY
    中对表列调用
    binary_quantize()
    ;应提前存储量化结果
  • 维度必须匹配:
    halfvec(1536)
    列要求查询向量转换为
    ::halfvec(1536)

Standard Pattern

标准模式

sql
-- Store and index as halfvec
CREATE TABLE items (
  id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  contents TEXT NOT NULL,
  embedding halfvec(1536) NOT NULL  -- NOT NULL requires embeddings generated before insert, not async
);
CREATE INDEX ON items USING hnsw (embedding halfvec_cosine_ops);

-- Query: returns 10 closest items. $1 is the embedding of your search text.
SELECT id, contents FROM items ORDER BY embedding <=> $1::halfvec(1536) LIMIT 10;
For other distance operators (L2, inner product, etc.), see the pgvector README.
sql
-- 以halfvec格式存储并创建索引
CREATE TABLE items (
  id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  contents TEXT NOT NULL,
  embedding halfvec(1536) NOT NULL  -- NOT NULL要求在插入前生成嵌入向量,而非异步生成
);
CREATE INDEX ON items USING hnsw (embedding halfvec_cosine_ops);

-- 查询:返回10个最匹配的条目。$1为搜索文本的嵌入向量。
SELECT id, contents FROM items ORDER BY embedding <=> $1::halfvec(1536) LIMIT 10;
如需使用其他距离运算符(L2、内积等),请查看pgvector README

HNSW Index

HNSW索引

The recommended index type. Creates a multilayer navigable graph with superior speed-recall tradeoff. Can be created on empty tables (no training step required).
sql
CREATE INDEX ON items USING hnsw (embedding halfvec_cosine_ops);

-- With tuning parameters
CREATE INDEX ON items USING hnsw (embedding halfvec_cosine_ops) WITH (m = 16, ef_construction = 64);
推荐使用的索引类型。创建多层可导航图,速度-召回率平衡更优。可在空表上创建(无需训练步骤)。
sql
CREATE INDEX ON items USING hnsw (embedding halfvec_cosine_ops);

-- 带调优参数的创建语句
CREATE INDEX ON items USING hnsw (embedding halfvec_cosine_ops) WITH (m = 16, ef_construction = 64);

HNSW Parameters

HNSW参数

ParameterDefaultDescription
m
16Max connections per layer. Higher = better recall, more memory
ef_construction
64Build-time candidate list. Higher = better graph quality, slower build
hnsw.ef_search
40Query-time candidate list. Higher = better recall, slower queries. Should be ≥ LIMIT.
ef_search tuning (rough guidelines—actual results vary by dataset):
ef_searchApprox RecallRelative Speed
40lower (~95% on some benchmarks)1x (baseline)
100higher~2x slower
200very-high~4x slower
400near-exact~8x slower
sql
-- Set search parameter for session
SET hnsw.ef_search = 100;

-- Set for single query
BEGIN;
SET LOCAL hnsw.ef_search = 100;
SELECT id, contents FROM items ORDER BY embedding <=> $1::halfvec(1536) LIMIT 10;
COMMIT;
参数默认值描述
m
16每层最大连接数。值越大,召回率越高,占用内存越多
ef_construction
64构建时的候选列表大小。值越大,图质量越好,构建速度越慢
hnsw.ef_search
40查询时的候选列表大小。值越大,召回率越高,查询速度越慢。该值应≥LIMIT。
ef_search调优(大致指南——实际结果因数据集而异):
ef_search近似召回率相对速度
40较低(部分基准测试中约95%)1倍(基准线)
100较高约慢2倍
200极高约慢4倍
400接近精确约慢8倍
sql
-- 为会话设置搜索参数
SET hnsw.ef_search = 100;

-- 为单个查询设置参数
BEGIN;
SET LOCAL hnsw.ef_search = 100;
SELECT id, contents FROM items ORDER BY embedding <=> $1::halfvec(1536) LIMIT 10;
COMMIT;

IVFFlat Index (Generally Not Recommended)

IVFFlat索引(通常不推荐)

Default to HNSW. Use IVFFlat only when HNSW’s operational costs matter more than peak recall.
Choose IVFFlat if:
  • Write-heavy or constantly changing data AND you're willing to rebuild the index frequently
  • You rebuild indexes often and want predictable build time and memory usage
  • Memory is tight and you cannot keep an HNSW graph mostly resident
  • Data is partitioned or tiered, and this index lives on colder partitions
Avoid IVFFlat if you need:
  • highest recall at low latency
  • minimal tuning
  • a “set and forget” index
Notes:
  • IVFFlat requires data to exist before index creation.
  • Recall depends on
    lists
    and
    ivfflat.probes
    ; higher probes = better recall, slower queries.
Starter config:
sql
CREATE INDEX ON items
USING ivfflat (embedding halfvec_cosine_ops)
WITH (lists = 1000);

SET ivfflat.probes = 10;
默认使用HNSW。仅当HNSW的运维成本比峰值召回率更重要时,才使用IVFFlat。
在以下场景选择IVFFlat:
  • 写入密集型或数据频繁变化的工作负载,且你愿意频繁重建索引
  • 你需要经常重建索引,且希望构建时间和内存使用可预测
  • 内存紧张,无法将HNSW图大部分驻留在内存中
  • 数据已分区或分层,且该索引位于较冷的分区上
在以下场景避免使用IVFFlat:
  • 需要低延迟下的最高召回率
  • 希望调优工作最少
  • 需要“一劳永逸”的索引
注意事项:
  • IVFFlat要求创建索引前数据已存在。
  • 召回率取决于
    lists
    ivfflat.probes
    ;probes值越大,召回率越高,查询速度越慢。
初始配置:
sql
CREATE INDEX ON items
USING ivfflat (embedding halfvec_cosine_ops)
WITH (lists = 1000);

SET ivfflat.probes = 10;

Quantization Strategies

量化策略

  • Quantization is a memory decision, not a recall decision.
  • Use
    halfvec
    by default for storage and indexing.
  • Estimate HNSW index footprint as ~4–6 KB per 1536-dim
    halfvec
    (m=16) (order-of-magnitude); 3072-dim is ~2×; m=32 roughly doubles HNSW link/graph overhead.
  • If p95/p99 latency rises while CPU is mostly idle, the HNSW index is likely no longer resident in memory.
  • If
    halfvec
    doesn’t fit, use binary quantization + re-ranking.
  • 量化是内存优化决策,而非召回率优化决策。
  • 默认使用
    halfvec
    进行存储和索引。
  • 估算HNSW索引占用空间:对于1536维的
    halfvec
    (m=16),约为每行4–6 KB(数量级);3072维约为2倍;m=32时,HNSW链接/图开销大致翻倍。
  • 如果p95/p99延迟上升但CPU大多处于空闲状态,说明HNSW索引可能不再驻留在内存中。
  • 如果
    halfvec
    仍无法满足内存需求,使用二进制量化+重排序。

Guidelines for 1536-dim vectors

1536维向量指南

Approximate
halfvec
capacity at
m=16
, 1536-dim (assumes RAM mostly available for index caching):
RAMApprox max halfvec vectors
16 GB~2–3M vectors
32 GB~4–6M vectors
64 GB~8–12M vectors
128 GB~16–25M vectors
For 3072-dim embeddings, divide these numbers by ~2.
For
m=32
, also divide capacity by ~2.
If the index cannot fit in memory at this scale, use binary quantization.
These are ranges, not guarantees. Validate by monitoring cache residency and p95/p99 latency under load.
在m=16、1536维的情况下,
halfvec
的近似容量(假设内存主要用于索引缓存):
内存近似最大halfvec向量数量
16 GB~2–3M 向量
32 GB~4–6M 向量
64 GB~8–12M 向量
128 GB~16–25M 向量
对于3072维嵌入向量,上述数值需除以约2。
对于
m=32
,容量也需除以约2。
如果索引无法在此规模下放入内存,使用二进制量化。
这些是范围值,而非保证值。需通过监控缓存驻留率和负载下的p95/p99延迟进行验证。

Binary Quantization (For Very Large Datasets)

二进制量化(适用于超大型数据集)

32× memory reduction. Use with re-ranking for acceptable recall.
sql
-- Table with generated column for binary quantization
CREATE TABLE items (
  id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  contents TEXT NOT NULL,
  embedding halfvec(1536) NOT NULL,
  embedding_bq bit(1536) GENERATED ALWAYS AS (binary_quantize(embedding)::bit(1536)) STORED
);

CREATE INDEX ON items USING hnsw (embedding_bq bit_hamming_ops);

-- Query with re-ranking for better recall
-- ef_search must be >= inner LIMIT to retrieve enough candidates
SET hnsw.ef_search = 800;
WITH q AS (
  SELECT binary_quantize($1::halfvec(1536))::bit(1536) AS qb
)
SELECT *
FROM (
  SELECT i.id, i.contents, i.embedding
  FROM items i, q
  ORDER BY i.embedding_bq <~> q.qb -- computes binary distance using index
  LIMIT 800
) candidates
ORDER BY candidates.embedding <=> $1::halfvec(1536) -- computes halfvec distance (no index), more accurate than binary
LIMIT 10;
The 80× oversampling ratio (800 candidates for 10 results) is a reasonable starting point. Binary quantization loses precision, so more candidates are needed to find true nearest neighbors during re-ranking. Increase if recall is insufficient; decrease if re-ranking latency is too high.
可减少32倍内存占用。结合重排序可获得可接受的召回率。
sql
-- 包含二进制量化生成列的表
CREATE TABLE items (
  id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  contents TEXT NOT NULL,
  embedding halfvec(1536) NOT NULL,
  embedding_bq bit(1536) GENERATED ALWAYS AS (binary_quantize(embedding)::bit(1536)) STORED
);

CREATE INDEX ON items USING hnsw (embedding_bq bit_hamming_ops);

-- 带重排序的查询以提升召回率
-- ef_search必须≥内部LIMIT,以获取足够的候选结果
SET hnsw.ef_search = 800;
WITH q AS (
  SELECT binary_quantize($1::halfvec(1536))::bit(1536) AS qb
)
SELECT *
FROM (
  SELECT i.id, i.contents, i.embedding
  FROM items i, q
  ORDER BY i.embedding_bq <~> q.qb -- 使用索引计算二进制距离
  LIMIT 800
) candidates
ORDER BY candidates.embedding <=> $1::halfvec(1536) -- 计算halfvec距离(不使用索引),比二进制距离更准确
LIMIT 10;
80倍过采样率(为10个结果获取800个候选)是合理的起始值。二进制量化会损失精度,因此需要更多候选结果才能在重排序时找到真正的最近邻。如果召回率不足,可增大该值;如果重排序延迟过高,可减小该值。

Performance by Dataset Size

按数据集规模优化性能

ScaleVectorsConfigNotes
Small<100KDefaultsIndex optional but improves tail latency
Medium100K–5MDefaultsMonitor p95 latency; most common production range
Large5M+
ef_construction=100+
Memory residency critical
Very Large10M+Binary quantization + re-rankingAdd RAM or partition first if possible
Tune
ef_search
first for recall; only increase
m
if recall plateaus and memory allows. Under concurrency, tail latency spikes when the index doesn't fit in memory. Binary quantization is an escape hatch—prefer adding RAM or partitioning first.
规模向量数量配置说明
小型<100K默认配置索引可选,但可改善尾部延迟
中型100K–5M默认配置监控p95延迟;这是最常见的生产环境规模
大型5M+
ef_construction=100+
内存驻留至关重要
超大型10M+二进制量化+重排序如有可能,优先增加内存或分区
优先调优
ef_search
以提升召回率;仅当召回率进入瓶颈且内存允许时,再增大
m
。在并发场景下,当索引无法放入内存时,尾部延迟会骤增。二进制量化是最后的手段——优先选择增加内存或分区。

Filtering Best Practices

过滤最佳实践

Filtered vector search requires care. Depending on filter selectivity and query shape, filters can cause early termination (too few rows, missing results) or increase work (latency).
带过滤条件的向量搜索需要谨慎处理。根据过滤条件的选择性和查询形状,过滤可能导致提前终止(结果过少、丢失结果)或增加计算量(延迟上升)。

Iterative scan (recommended when filters are selective)

迭代扫描(过滤条件选择性高时推荐)

By default, HNSW may stop early when a WHERE clause is present, which can lead to fewer results than expected. Iterative scan allows HNSW to continue searching until enough filtered rows are found.
Enable iterative scan when filters materially reduce the result set.
sql
-- Enable iterative scans for filtered queries
SET hnsw.iterative_scan = relaxed_order;

SELECT id, contents
FROM items
WHERE category_id = 123
ORDER BY embedding <=> $1::halfvec(1536)
LIMIT 10;
If results are still sparse, increase the scan budget:
sql
SET hnsw.max_scan_tuples = 50000;
Trade-off: increasing
hnsw.max_scan_tuples
improves recall but can significantly increase latency.
When iterative scan is not needed:
  • The filter matches a large portion of the table (low selectivity)
  • You are prefiltering via a B-tree index
  • You are querying a single partition or partial index
默认情况下,当存在WHERE子句时,HNSW可能提前终止,导致结果数量少于预期。迭代扫描允许HNSW继续搜索,直到找到足够多的符合过滤条件的行。
当过滤条件显著减少结果集时,启用迭代扫描。
sql
-- 为带过滤条件的查询启用迭代扫描
SET hnsw.iterative_scan = relaxed_order;

SELECT id, contents
FROM items
WHERE category_id = 123
ORDER BY embedding <=> $1::halfvec(1536)
LIMIT 10;
如果结果仍然稀疏,增大扫描预算:
sql
SET hnsw.max_scan_tuples = 50000;
权衡:增大
hnsw.max_scan_tuples
可提升召回率,但会显著增加延迟。
无需使用迭代扫描的场景:
  • 过滤条件匹配表中大部分数据(低选择性)
  • 你已通过B-tree索引预过滤
  • 你正在查询单个分区或部分索引

Choose the right filtering strategy

选择合适的过滤策略

Highly selective filters (under ~10k rows) Use a B-tree index on the filter column so Postgres can prefilter before ANN.
sql
CREATE INDEX ON items (category_id);
Low-cardinality filters (few distinct values) Use partial HNSW indexes per filter value.
sql
CREATE INDEX ON items
USING hnsw (embedding halfvec_cosine_ops)
WHERE category_id = 11;
Many filter values or large datasets Partition by the filter key to keep each ANN index small.
sql
CREATE TABLE items (
  embedding halfvec(1536),
  category_id int
) PARTITION BY LIST (category_id);
高选择性过滤条件(匹配行数少于~10k) 在过滤列上创建B-tree索引,使Postgres可在ANN索引查询前预过滤数据。
sql
CREATE INDEX ON items (category_id);
低基数过滤条件(不同值数量少) 为每个过滤值创建部分HNSW索引。
sql
CREATE INDEX ON items
USING hnsw (embedding halfvec_cosine_ops)
WHERE category_id = 11;
多过滤值或大型数据集 按过滤键分区,使每个ANN索引保持较小规模。
sql
CREATE TABLE items (
  embedding halfvec(1536),
  category_id int
) PARTITION BY LIST (category_id);

Key rules

关键规则

  • Filters that match few rows require prefiltering, partitioning, or iterative scan.
  • Always validate filtered queries by measuring p95/p99 latency and tuples visited under realistic load.
  • 匹配行数少的过滤条件需要预过滤、分区或迭代扫描。
  • 始终在真实负载下通过测量p95/p99延迟和访问的元组数量来验证带过滤条件的查询。

Alternative: pgvectorscale for label-based filtering

替代方案:使用pgvectorscale进行基于标签的过滤

For large datasets with label-based filters, pgvectorscale's StreamingDiskANN index supports filtered indexes on
smallint[]
columns. Labels are indexed alongside vectors, enabling efficient filtered search without the accuracy tradeoffs of HNSW post-filtering. See the pgvectorscale documentation for setup details.
对于带标签过滤的大型数据集,pgvectorscale的StreamingDiskANN索引支持在
smallint[]
列上创建带过滤条件的索引。标签与向量一同索引,无需HNSW后过滤的精度权衡即可实现高效的带过滤条件搜索。设置细节请查看pgvectorscale文档。

Bulk Loading

批量加载

sql
-- COPY is fastest; binary format is faster but requires proper encoding
-- Text format: '[0.1, 0.2, ...]'
COPY items (contents, embedding) FROM STDIN;
-- Binary format (if your client supports it):
COPY items (contents, embedding) FROM STDIN WITH (FORMAT BINARY);

-- Add indexes AFTER loading
SET maintenance_work_mem = '4GB';
SET max_parallel_maintenance_workers = 7;
CREATE INDEX ON items USING hnsw (embedding halfvec_cosine_ops);
sql
-- COPY是最快的方式;二进制格式更快,但需要正确编码
-- 文本格式:'[0.1, 0.2, ...]'
COPY items (contents, embedding) FROM STDIN;
-- 二进制格式(如果你的客户端支持):
COPY items (contents, embedding) FROM STDIN WITH (FORMAT BINARY);

-- 加载完成后再添加索引
SET maintenance_work_mem = '4GB';
SET max_parallel_maintenance_workers = 7;
CREATE INDEX ON items USING hnsw (embedding halfvec_cosine_ops);

Maintenance

维护

  • VACUUM regularly after updates/deletes—stale entries may persist until vacuumed
  • REINDEX if performance degrades after high churn (rebuilds the graph from scratch)
  • For write-heavy workloads with frequent deletes, consider IVFFlat or partitioning by time using hypertables
  • 定期执行VACUUM——更新/删除后,过期条目可能会保留到执行VACUUM为止
  • REINDEX——如果高 churn后性能下降(从头重建图)
  • 对于写入密集型且频繁删除的工作负载,考虑使用IVFFlat或使用hypertables按时间分区

Monitoring & Debugging

监控与调试

sql
-- Check index size
SELECT pg_size_pretty(pg_relation_size('items_embedding_idx'));

-- Debug query performance
EXPLAIN (ANALYZE, BUFFERS) SELECT id, contents FROM items ORDER BY embedding <=> $1::halfvec(1536) LIMIT 10;

-- Monitor index build progress
SELECT phase, round(100.0 * blocks_done / nullif(blocks_total, 0), 1) AS "%" 
FROM pg_stat_progress_create_index;

-- Compare approximate vs exact recall
BEGIN;
SET LOCAL enable_indexscan = off;  -- Force exact search
SELECT id, contents FROM items ORDER BY embedding <=> $1::halfvec(1536) LIMIT 10;
COMMIT;

-- Force index use for debugging
BEGIN;
SET LOCAL enable_seqscan = off;
SELECT id, contents FROM items ORDER BY embedding <=> $1::halfvec(1536) LIMIT 10;
COMMIT;
sql
-- 检查索引大小
SELECT pg_size_pretty(pg_relation_size('items_embedding_idx'));

-- 调试查询性能
EXPLAIN (ANALYZE, BUFFERS) SELECT id, contents FROM items ORDER BY embedding <=> $1::halfvec(1536) LIMIT 10;

-- 监控索引构建进度
SELECT phase, round(100.0 * blocks_done / nullif(blocks_total, 0), 1) AS "%" 
FROM pg_stat_progress_create_index;

-- 比较近似搜索与精确搜索的召回率
BEGIN;
SET LOCAL enable_indexscan = off;  -- 强制精确搜索
SELECT id, contents FROM items ORDER BY embedding <=> $1::halfvec(1536) LIMIT 10;
COMMIT;

-- 强制使用索引进行调试
BEGIN;
SET LOCAL enable_seqscan = off;
SELECT id, contents FROM items ORDER BY embedding <=> $1::halfvec(1536) LIMIT 10;
COMMIT;

Common Issues (Symptom → Fix)

常见问题(症状→解决方案)

SymptomLikely CauseFix
Query does not use ANN indexMissing
ORDER BY
+
LIMIT
, operator mismatch, or implicit casts
Use
ORDER BY
with a distance operator that matches the index ops class; explicitly cast query vectors
Fewer results than expected (filtered query)HNSW stops early due to filterEnable iterative scan; increase
hnsw.max_scan_tuples
; or prefilter (B-tree), use partial indexes, or partition
Fewer results than expected (unfiltered query)ANN recall too lowIncrease
hnsw.ef_search
High latency with low CPU usageHNSW index not resident in memoryUse
halfvec
, reduce
m
/
ef_construction
, add RAM, partition, or use binary quantization
Slow index buildsInsufficient build memory or parallelismIncrease
maintenance_work_mem
and
max_parallel_maintenance_workers
; build after bulk load
Out-of-memory errorsIndex too large for available RAMUse
halfvec
, reduce index parameters, or switch to binary quantization with re-ranking
Zero or missing resultsNULL or zero vectorsAvoid NULL embeddings; do not use zero vectors with cosine distance
症状可能原因解决方案
查询未使用ANN索引缺少
ORDER BY
+
LIMIT
、运算符不匹配或隐式转换
使用与索引运算符类匹配的距离运算符进行
ORDER BY
;显式转换查询向量
带过滤条件的查询结果少于预期HNSW因过滤条件提前终止启用迭代扫描;增大
hnsw.max_scan_tuples
;或使用预过滤(B-tree)、部分索引或分区
无过滤条件的查询结果少于预期ANN召回率过低增大
hnsw.ef_search
延迟高但CPU使用率低HNSW索引未驻留在内存中使用
halfvec
、减小
m
/
ef_construction
、增加内存、分区或使用二进制量化
索引构建缓慢构建内存或并行度不足增大
maintenance_work_mem
max_parallel_maintenance_workers
;批量加载后再构建索引
内存不足错误索引超出可用内存使用
halfvec
、减小索引参数或切换为二进制量化+重排序
结果为零或缺失向量为NULL或零向量避免NULL嵌入向量;不要对零向量使用余弦距离