pgvector-semantic-search

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

pgvector for Semantic Search

基于pgvector的语义搜索

Semantic search finds content by meaning rather than exact keywords. An embedding model converts text into high-dimensional vectors, where similar meanings map to nearby points. pgvector stores these vectors in PostgreSQL and uses approximate nearest neighbor (ANN) indexes to find the closest matches quickly—scaling to millions of rows without leaving the database. Store your text alongside its embedding, then query by converting your search text to a vector and returning the rows with the smallest distance.

This guide covers pgvector setup and tuning—not embedding model selection or text chunking, which significantly affect search quality. Requires pgvector 0.8.0+ for all features (

halfvec

binary_quantize

, iterative scan).

语义搜索根据内容含义而非精确关键词查找内容。嵌入模型会将文本转换为高维向量，含义相似的文本对应的向量在空间中位置相近。pgvector将这些向量存储在PostgreSQL中，并使用近似最近邻（ANN）索引快速找到最匹配的结果——无需离开数据库即可扩展至数百万行数据。将文本与其嵌入向量一同存储，然后将搜索文本转换为向量，返回距离最小的行即可完成查询。

本指南涵盖pgvector的设置与调优——不涉及嵌入模型选择或文本分块，这两者对搜索质量影响显著。所有功能需要pgvector 0.8.0+版本支持（包括

halfvec

、

binary_quantize

、迭代扫描）。

Golden Path (Default Setup)

Core Rules

核心规则

Enable the extension in each database:
```
CREATE EXTENSION IF NOT EXISTS vector;
```
Use HNSW indexes by default—superior speed-recall tradeoff, can be created on empty tables, no training step required. Only consider IVFFlat for write-heavy or memory-bound workloads.
Use
halfvec
by default—store and index as
```
halfvec
```
for 50% smaller storage and indexes with minimal recall loss.
Index after bulk loading initial data for best build performance.
Create indexes concurrently in production:
```
CREATE INDEX CONCURRENTLY ...
```
Use cosine distance by default (
```
<=>
```
): For non-normalized embeddings, use cosine. For unit-normalized embeddings, cosine and inner product yield identical rankings; default to cosine.
Match query operator to index ops: Index with
```
halfvec_cosine_ops
```
requires
```
<=>
```
in queries;
```
halfvec_l2_ops
```
requires
```
<->
```
; mismatched operators won't use the index.
Always cast query vectors explicitly (
```
$1::halfvec(N)
```
) to avoid implicit-cast failures in prepared statements.
Always use the same embedding model for data and queries. Similarity search only works when the model generating the vectors is the same.

在每个数据库中启用扩展：
```
CREATE EXTENSION IF NOT EXISTS vector;
```
默认使用HNSW索引——速度-召回率平衡更优，可在空表上创建，无需训练步骤。仅在写入密集型或内存受限的工作负载中考虑使用IVFFlat。
默认使用
halfvec
——以
```
halfvec
```
格式存储和索引可将存储和索引大小减少50%，且召回率损失极小。
批量加载初始数据后再创建索引，以获得最佳构建性能。
生产环境中并发创建索引：
```
CREATE INDEX CONCURRENTLY ...
```
默认使用余弦距离（
```
<=>
```
）：对于未归一化的嵌入向量，使用余弦距离；对于单位归一化的嵌入向量，余弦距离与内积的排名结果一致，默认使用余弦距离。
查询运算符与索引运算符类匹配：使用
```
halfvec_cosine_ops
```
创建的索引要求查询时使用
```
<=>
```
；
```
halfvec_l2_ops
```
要求使用
```
<->
```
；运算符不匹配将无法使用索引。
始终显式转换查询向量（
```
$1::halfvec(N)
```
），以避免预准备语句中出现隐式转换失败。
数据与查询始终使用相同的嵌入模型。只有当生成向量的模型相同时，相似性搜索才能正常工作。

Type Rules

类型规则

Store embeddings as
```
halfvec(N)
```
Cast query vectors to
```
halfvec(N)
```
Store binary quantized vectors as
```
bit(N)
```
in a generated column
Do not mix
```
vector
```
/
```
halfvec
```
/
```
bit
```
without explicit casts
Never call
```
binary_quantize()
```
on table columns inside
```
ORDER BY
```
; store it instead
Dimensions must match: a
```
halfvec(1536)
```
column requires query vectors cast as
```
::halfvec(1536)
```
.

嵌入向量存储为
```
halfvec(N)
```
查询向量转换为
```
halfvec(N)
```
二进制量化向量存储为生成列中的
```
bit(N)
```
类型
不要在未显式转换的情况下混合使用
```
vector
```
/
```
halfvec
```
/
```
bit
```
切勿在
```
ORDER BY
```
中对表列调用
```
binary_quantize()
```
；应提前存储量化结果
维度必须匹配：
```
halfvec(1536)
```
列要求查询向量转换为
```
::halfvec(1536)
```
。

Standard Pattern

标准模式

sql

-- Store and index as halfvec
CREATE TABLE items (
  id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  contents TEXT NOT NULL,
  embedding halfvec(1536) NOT NULL  -- NOT NULL requires embeddings generated before insert, not async
);
CREATE INDEX ON items USING hnsw (embedding halfvec_cosine_ops);

-- Query: returns 10 closest items. $1 is the embedding of your search text.
SELECT id, contents FROM items ORDER BY embedding <=> $1::halfvec(1536) LIMIT 10;

For other distance operators (L2, inner product, etc.), see the pgvector README.

sql

-- 以halfvec格式存储并创建索引
CREATE TABLE items (
  id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  contents TEXT NOT NULL,
  embedding halfvec(1536) NOT NULL  -- NOT NULL要求在插入前生成嵌入向量，而非异步生成
);
CREATE INDEX ON items USING hnsw (embedding halfvec_cosine_ops);

-- 查询：返回10个最匹配的条目。$1为搜索文本的嵌入向量。
SELECT id, contents FROM items ORDER BY embedding <=> $1::halfvec(1536) LIMIT 10;

如需使用其他距离运算符（L2、内积等），请查看pgvector README。

HNSW Index

HNSW索引

The recommended index type. Creates a multilayer navigable graph with superior speed-recall tradeoff. Can be created on empty tables (no training step required).

sql

CREATE INDEX ON items USING hnsw (embedding halfvec_cosine_ops);

-- With tuning parameters
CREATE INDEX ON items USING hnsw (embedding halfvec_cosine_ops) WITH (m = 16, ef_construction = 64);

推荐使用的索引类型。创建多层可导航图，速度-召回率平衡更优。可在空表上创建（无需训练步骤）。

sql

CREATE INDEX ON items USING hnsw (embedding halfvec_cosine_ops);

-- 带调优参数的创建语句
CREATE INDEX ON items USING hnsw (embedding halfvec_cosine_ops) WITH (m = 16, ef_construction = 64);

HNSW Parameters

HNSW参数

Parameter	Default	Description
`m`	16	Max connections per layer. Higher = better recall, more memory
`ef_construction`	64	Build-time candidate list. Higher = better graph quality, slower build
`hnsw.ef_search`	40	Query-time candidate list. Higher = better recall, slower queries. Should be ≥ LIMIT.

ef_search tuning (rough guidelines—actual results vary by dataset):

ef_search	Approx Recall	Relative Speed
40	lower (~95% on some benchmarks)	1x (baseline)
100	higher	~2x slower
200	very-high	~4x slower
400	near-exact	~8x slower

sql

-- Set search parameter for session
SET hnsw.ef_search = 100;

-- Set for single query
BEGIN;
SET LOCAL hnsw.ef_search = 100;
SELECT id, contents FROM items ORDER BY embedding <=> $1::halfvec(1536) LIMIT 10;
COMMIT;

参数	默认值	描述
`m`	16	每层最大连接数。值越大，召回率越高，占用内存越多
`ef_construction`	64	构建时的候选列表大小。值越大，图质量越好，构建速度越慢
`hnsw.ef_search`	40	查询时的候选列表大小。值越大，召回率越高，查询速度越慢。该值应≥LIMIT。

ef_search调优（大致指南——实际结果因数据集而异）：

ef_search	近似召回率	相对速度
40	较低（部分基准测试中约95%）	1倍（基准线）
100	较高	约慢2倍
200	极高	约慢4倍
400	接近精确	约慢8倍

sql

-- 为会话设置搜索参数
SET hnsw.ef_search = 100;

-- 为单个查询设置参数
BEGIN;
SET LOCAL hnsw.ef_search = 100;
SELECT id, contents FROM items ORDER BY embedding <=> $1::halfvec(1536) LIMIT 10;
COMMIT;

IVFFlat Index (Generally Not Recommended)

IVFFlat索引（通常不推荐）

Default to HNSW. Use IVFFlat only when HNSW’s operational costs matter more than peak recall.

Choose IVFFlat if:

Write-heavy or constantly changing data AND you're willing to rebuild the index frequently
You rebuild indexes often and want predictable build time and memory usage
Memory is tight and you cannot keep an HNSW graph mostly resident
Data is partitioned or tiered, and this index lives on colder partitions

Avoid IVFFlat if you need:

highest recall at low latency
minimal tuning
a “set and forget” index

Notes:

IVFFlat requires data to exist before index creation.
Recall depends on
```
lists
```
and
```
ivfflat.probes
```
; higher probes = better recall, slower queries.

Starter config:

sql

CREATE INDEX ON items
USING ivfflat (embedding halfvec_cosine_ops)
WITH (lists = 1000);

SET ivfflat.probes = 10;

默认使用HNSW。仅当HNSW的运维成本比峰值召回率更重要时，才使用IVFFlat。

在以下场景选择IVFFlat：

写入密集型或数据频繁变化的工作负载，且你愿意频繁重建索引
你需要经常重建索引，且希望构建时间和内存使用可预测
内存紧张，无法将HNSW图大部分驻留在内存中
数据已分区或分层，且该索引位于较冷的分区上

在以下场景避免使用IVFFlat：

需要低延迟下的最高召回率
希望调优工作最少
需要“一劳永逸”的索引

注意事项：

IVFFlat要求创建索引前数据已存在。
召回率取决于
```
lists
```
和
```
ivfflat.probes
```
；probes值越大，召回率越高，查询速度越慢。

初始配置：

sql

CREATE INDEX ON items
USING ivfflat (embedding halfvec_cosine_ops)
WITH (lists = 1000);

SET ivfflat.probes = 10;

Quantization Strategies

量化策略

Quantization is a memory decision, not a recall decision.
Use
```
halfvec
```
by default for storage and indexing.
Estimate HNSW index footprint as ~4–6 KB per 1536-dim
```
halfvec
```
(m=16) (order-of-magnitude); 3072-dim is ~2×; m=32 roughly doubles HNSW link/graph overhead.
If p95/p99 latency rises while CPU is mostly idle, the HNSW index is likely no longer resident in memory.
If
```
halfvec
```
doesn’t fit, use binary quantization + re-ranking.

量化是内存优化决策，而非召回率优化决策。
默认使用
```
halfvec
```
进行存储和索引。
估算HNSW索引占用空间：对于1536维的
```
halfvec
```
（m=16），约为每行4–6 KB（数量级）；3072维约为2倍；m=32时，HNSW链接/图开销大致翻倍。
如果p95/p99延迟上升但CPU大多处于空闲状态，说明HNSW索引可能不再驻留在内存中。
如果
```
halfvec
```
仍无法满足内存需求，使用二进制量化+重排序。

Guidelines for 1536-dim vectors

1536维向量指南

Approximate

halfvec

capacity at

m=16

, 1536-dim (assumes RAM mostly available for index caching):

RAM	Approx max halfvec vectors
16 GB	~2–3M vectors
32 GB	~4–6M vectors
64 GB	~8–12M vectors
128 GB	~16–25M vectors

For 3072-dim embeddings, divide these numbers by ~2.
For

m=32

, also divide capacity by ~2.

If the index cannot fit in memory at this scale, use binary quantization.

These are ranges, not guarantees. Validate by monitoring cache residency and p95/p99 latency under load.

在m=16、1536维的情况下，

halfvec

的近似容量（假设内存主要用于索引缓存）：

内存	近似最大halfvec向量数量
16 GB	~2–3M 向量
32 GB	~4–6M 向量
64 GB	~8–12M 向量
128 GB	~16–25M 向量

对于3072维嵌入向量，上述数值需除以约2。
对于

m=32

，容量也需除以约2。

如果索引无法在此规模下放入内存，使用二进制量化。

这些是范围值，而非保证值。需通过监控缓存驻留率和负载下的p95/p99延迟进行验证。

Binary Quantization (For Very Large Datasets)

二进制量化（适用于超大型数据集）

32× memory reduction. Use with re-ranking for acceptable recall.

sql

-- Table with generated column for binary quantization
CREATE TABLE items (
  id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  contents TEXT NOT NULL,
  embedding halfvec(1536) NOT NULL,
  embedding_bq bit(1536) GENERATED ALWAYS AS (binary_quantize(embedding)::bit(1536)) STORED
);

CREATE INDEX ON items USING hnsw (embedding_bq bit_hamming_ops);

-- Query with re-ranking for better recall
-- ef_search must be >= inner LIMIT to retrieve enough candidates
SET hnsw.ef_search = 800;
WITH q AS (
  SELECT binary_quantize($1::halfvec(1536))::bit(1536) AS qb
)
SELECT *
FROM (
  SELECT i.id, i.contents, i.embedding
  FROM items i, q
  ORDER BY i.embedding_bq <~> q.qb -- computes binary distance using index
  LIMIT 800
) candidates
ORDER BY candidates.embedding <=> $1::halfvec(1536) -- computes halfvec distance (no index), more accurate than binary
LIMIT 10;

The 80× oversampling ratio (800 candidates for 10 results) is a reasonable starting point. Binary quantization loses precision, so more candidates are needed to find true nearest neighbors during re-ranking. Increase if recall is insufficient; decrease if re-ranking latency is too high.

可减少32倍内存占用。结合重排序可获得可接受的召回率。

sql

-- 包含二进制量化生成列的表
CREATE TABLE items (
  id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  contents TEXT NOT NULL,
  embedding halfvec(1536) NOT NULL,
  embedding_bq bit(1536) GENERATED ALWAYS AS (binary_quantize(embedding)::bit(1536)) STORED
);

CREATE INDEX ON items USING hnsw (embedding_bq bit_hamming_ops);

-- 带重排序的查询以提升召回率
-- ef_search必须≥内部LIMIT，以获取足够的候选结果
SET hnsw.ef_search = 800;
WITH q AS (
  SELECT binary_quantize($1::halfvec(1536))::bit(1536) AS qb
)
SELECT *
FROM (
  SELECT i.id, i.contents, i.embedding
  FROM items i, q
  ORDER BY i.embedding_bq <~> q.qb -- 使用索引计算二进制距离
  LIMIT 800
) candidates
ORDER BY candidates.embedding <=> $1::halfvec(1536) -- 计算halfvec距离（不使用索引），比二进制距离更准确
LIMIT 10;

80倍过采样率（为10个结果获取800个候选）是合理的起始值。二进制量化会损失精度，因此需要更多候选结果才能在重排序时找到真正的最近邻。如果召回率不足，可增大该值；如果重排序延迟过高，可减小该值。

Performance by Dataset Size

按数据集规模优化性能

Scale	Vectors	Config	Notes
Small	<100K	Defaults	Index optional but improves tail latency
Medium	100K–5M	Defaults	Monitor p95 latency; most common production range
Large	5M+	`ef_construction=100+`	Memory residency critical
Very Large	10M+	Binary quantization + re-ranking	Add RAM or partition first if possible

Tune

ef_search

first for recall; only increase

if recall plateaus and memory allows. Under concurrency, tail latency spikes when the index doesn't fit in memory. Binary quantization is an escape hatch—prefer adding RAM or partitioning first.

规模	向量数量	配置	说明
小型	<100K	默认配置	索引可选，但可改善尾部延迟
中型	100K–5M	默认配置	监控p95延迟；这是最常见的生产环境规模
大型	5M+	`ef_construction=100+`	内存驻留至关重要
超大型	10M+	二进制量化+重排序	如有可能，优先增加内存或分区

优先调优

ef_search

以提升召回率；仅当召回率进入瓶颈且内存允许时，再增大

。在并发场景下，当索引无法放入内存时，尾部延迟会骤增。二进制量化是最后的手段——优先选择增加内存或分区。

Filtering Best Practices

过滤最佳实践

Filtered vector search requires care. Depending on filter selectivity and query shape, filters can cause early termination (too few rows, missing results) or increase work (latency).

带过滤条件的向量搜索需要谨慎处理。根据过滤条件的选择性和查询形状，过滤可能导致提前终止（结果过少、丢失结果）或增加计算量（延迟上升）。

Iterative scan (recommended when filters are selective)

迭代扫描（过滤条件选择性高时推荐）

By default, HNSW may stop early when a WHERE clause is present, which can lead to fewer results than expected. Iterative scan allows HNSW to continue searching until enough filtered rows are found.

Enable iterative scan when filters materially reduce the result set.

sql

-- Enable iterative scans for filtered queries
SET hnsw.iterative_scan = relaxed_order;

SELECT id, contents
FROM items
WHERE category_id = 123
ORDER BY embedding <=> $1::halfvec(1536)
LIMIT 10;

If results are still sparse, increase the scan budget:

sql

SET hnsw.max_scan_tuples = 50000;

Trade-off: increasing

hnsw.max_scan_tuples

improves recall but can significantly increase latency.

When iterative scan is not needed:

The filter matches a large portion of the table (low selectivity)
You are prefiltering via a B-tree index
You are querying a single partition or partial index

默认情况下，当存在WHERE子句时，HNSW可能提前终止，导致结果数量少于预期。迭代扫描允许HNSW继续搜索，直到找到足够多的符合过滤条件的行。

当过滤条件显著减少结果集时，启用迭代扫描。

sql

-- 为带过滤条件的查询启用迭代扫描
SET hnsw.iterative_scan = relaxed_order;

SELECT id, contents
FROM items
WHERE category_id = 123
ORDER BY embedding <=> $1::halfvec(1536)
LIMIT 10;

如果结果仍然稀疏，增大扫描预算：

sql

SET hnsw.max_scan_tuples = 50000;

权衡：增大

hnsw.max_scan_tuples

可提升召回率，但会显著增加延迟。

无需使用迭代扫描的场景：

过滤条件匹配表中大部分数据（低选择性）
你已通过B-tree索引预过滤
你正在查询单个分区或部分索引

Choose the right filtering strategy

选择合适的过滤策略

Highly selective filters (under ~10k rows) Use a B-tree index on the filter column so Postgres can prefilter before ANN.

sql

CREATE INDEX ON items (category_id);

Low-cardinality filters (few distinct values) Use partial HNSW indexes per filter value.

sql

CREATE INDEX ON items
USING hnsw (embedding halfvec_cosine_ops)
WHERE category_id = 11;

Many filter values or large datasets Partition by the filter key to keep each ANN index small.

sql

CREATE TABLE items (
  embedding halfvec(1536),
  category_id int
) PARTITION BY LIST (category_id);

高选择性过滤条件（匹配行数少于~10k） 在过滤列上创建B-tree索引，使Postgres可在ANN索引查询前预过滤数据。

sql

CREATE INDEX ON items (category_id);

低基数过滤条件（不同值数量少） 为每个过滤值创建部分HNSW索引。

sql

CREATE INDEX ON items
USING hnsw (embedding halfvec_cosine_ops)
WHERE category_id = 11;

多过滤值或大型数据集 按过滤键分区，使每个ANN索引保持较小规模。

sql

CREATE TABLE items (
  embedding halfvec(1536),
  category_id int
) PARTITION BY LIST (category_id);

Key rules

关键规则

Filters that match few rows require prefiltering, partitioning, or iterative scan.
Always validate filtered queries by measuring p95/p99 latency and tuples visited under realistic load.

匹配行数少的过滤条件需要预过滤、分区或迭代扫描。
始终在真实负载下通过测量p95/p99延迟和访问的元组数量来验证带过滤条件的查询。

Alternative: pgvectorscale for label-based filtering

替代方案：使用pgvectorscale进行基于标签的过滤

For large datasets with label-based filters, pgvectorscale's StreamingDiskANN index supports filtered indexes on

smallint[]

columns. Labels are indexed alongside vectors, enabling efficient filtered search without the accuracy tradeoffs of HNSW post-filtering. See the pgvectorscale documentation for setup details.

对于带标签过滤的大型数据集，pgvectorscale的StreamingDiskANN索引支持在

smallint[]

列上创建带过滤条件的索引。标签与向量一同索引，无需HNSW后过滤的精度权衡即可实现高效的带过滤条件搜索。设置细节请查看pgvectorscale文档。

Bulk Loading

批量加载

sql

-- COPY is fastest; binary format is faster but requires proper encoding
-- Text format: '[0.1, 0.2, ...]'
COPY items (contents, embedding) FROM STDIN;
-- Binary format (if your client supports it):
COPY items (contents, embedding) FROM STDIN WITH (FORMAT BINARY);

-- Add indexes AFTER loading
SET maintenance_work_mem = '4GB';
SET max_parallel_maintenance_workers = 7;
CREATE INDEX ON items USING hnsw (embedding halfvec_cosine_ops);

sql

-- COPY是最快的方式；二进制格式更快，但需要正确编码
-- 文本格式：'[0.1, 0.2, ...]'
COPY items (contents, embedding) FROM STDIN;
-- 二进制格式（如果你的客户端支持）：
COPY items (contents, embedding) FROM STDIN WITH (FORMAT BINARY);

-- 加载完成后再添加索引
SET maintenance_work_mem = '4GB';
SET max_parallel_maintenance_workers = 7;
CREATE INDEX ON items USING hnsw (embedding halfvec_cosine_ops);

Maintenance

维护

VACUUM regularly after updates/deletes—stale entries may persist until vacuumed
REINDEX if performance degrades after high churn (rebuilds the graph from scratch)
For write-heavy workloads with frequent deletes, consider IVFFlat or partitioning by time using hypertables

定期执行VACUUM——更新/删除后，过期条目可能会保留到执行VACUUM为止
REINDEX——如果高 churn后性能下降（从头重建图）
对于写入密集型且频繁删除的工作负载，考虑使用IVFFlat或使用hypertables按时间分区

Monitoring & Debugging

监控与调试

sql

-- Check index size
SELECT pg_size_pretty(pg_relation_size('items_embedding_idx'));

-- Debug query performance
EXPLAIN (ANALYZE, BUFFERS) SELECT id, contents FROM items ORDER BY embedding <=> $1::halfvec(1536) LIMIT 10;

-- Monitor index build progress
SELECT phase, round(100.0 * blocks_done / nullif(blocks_total, 0), 1) AS "%" 
FROM pg_stat_progress_create_index;

-- Compare approximate vs exact recall
BEGIN;
SET LOCAL enable_indexscan = off;  -- Force exact search
SELECT id, contents FROM items ORDER BY embedding <=> $1::halfvec(1536) LIMIT 10;
COMMIT;

-- Force index use for debugging
BEGIN;
SET LOCAL enable_seqscan = off;
SELECT id, contents FROM items ORDER BY embedding <=> $1::halfvec(1536) LIMIT 10;
COMMIT;

sql

-- 检查索引大小
SELECT pg_size_pretty(pg_relation_size('items_embedding_idx'));

-- 调试查询性能
EXPLAIN (ANALYZE, BUFFERS) SELECT id, contents FROM items ORDER BY embedding <=> $1::halfvec(1536) LIMIT 10;

-- 监控索引构建进度
SELECT phase, round(100.0 * blocks_done / nullif(blocks_total, 0), 1) AS "%" 
FROM pg_stat_progress_create_index;

-- 比较近似搜索与精确搜索的召回率
BEGIN;
SET LOCAL enable_indexscan = off;  -- 强制精确搜索
SELECT id, contents FROM items ORDER BY embedding <=> $1::halfvec(1536) LIMIT 10;
COMMIT;

-- 强制使用索引进行调试
BEGIN;
SET LOCAL enable_seqscan = off;
SELECT id, contents FROM items ORDER BY embedding <=> $1::halfvec(1536) LIMIT 10;
COMMIT;

Common Issues (Symptom → Fix)

常见问题（症状→解决方案）

Symptom	Likely Cause	Fix
Query does not use ANN index	Missing `ORDER BY` + `LIMIT` , operator mismatch, or implicit casts	Use `ORDER BY` with a distance operator that matches the index ops class; explicitly cast query vectors
Fewer results than expected (filtered query)	HNSW stops early due to filter	Enable iterative scan; increase `hnsw.max_scan_tuples` ; or prefilter (B-tree), use partial indexes, or partition
Fewer results than expected (unfiltered query)	ANN recall too low	Increase `hnsw.ef_search`
High latency with low CPU usage	HNSW index not resident in memory	Use `halfvec` , reduce `m` / `ef_construction` , add RAM, partition, or use binary quantization
Slow index builds	Insufficient build memory or parallelism	Increase `maintenance_work_mem` and `max_parallel_maintenance_workers` ; build after bulk load
Out-of-memory errors	Index too large for available RAM	Use `halfvec` , reduce index parameters, or switch to binary quantization with re-ranking
Zero or missing results	NULL or zero vectors	Avoid NULL embeddings; do not use zero vectors with cosine distance

症状	可能原因	解决方案
查询未使用ANN索引	缺少 `ORDER BY` + `LIMIT` 、运算符不匹配或隐式转换	使用与索引运算符类匹配的距离运算符进行 `ORDER BY` ；显式转换查询向量
带过滤条件的查询结果少于预期	HNSW因过滤条件提前终止	启用迭代扫描；增大 `hnsw.max_scan_tuples` ；或使用预过滤（B-tree）、部分索引或分区
无过滤条件的查询结果少于预期	ANN召回率过低	增大 `hnsw.ef_search`
延迟高但CPU使用率低	HNSW索引未驻留在内存中	使用 `halfvec` 、减小 `m` / `ef_construction` 、增加内存、分区或使用二进制量化
索引构建缓慢	构建内存或并行度不足	增大 `maintenance_work_mem` 和 `max_parallel_maintenance_workers` ；批量加载后再构建索引
内存不足错误	索引超出可用内存	使用 `halfvec` 、减小索引参数或切换为二进制量化+重排序
结果为零或缺失	向量为NULL或零向量	避免NULL嵌入向量；不要对零向量使用余弦距离