pgvector-semantic-search
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesepgvector for Semantic Search
基于pgvector的语义搜索
Semantic search finds content by meaning rather than exact keywords. An embedding model converts text into high-dimensional vectors, where similar meanings map to nearby points. pgvector stores these vectors in PostgreSQL and uses approximate nearest neighbor (ANN) indexes to find the closest matches quickly—scaling to millions of rows without leaving the database. Store your text alongside its embedding, then query by converting your search text to a vector and returning the rows with the smallest distance.
This guide covers pgvector setup and tuning—not embedding model selection or text chunking, which significantly affect search quality. Requires pgvector 0.8.0+ for all features (, , iterative scan).
halfvecbinary_quantize语义搜索根据内容含义而非精确关键词查找内容。嵌入模型会将文本转换为高维向量,含义相似的文本对应的向量在空间中位置相近。pgvector将这些向量存储在PostgreSQL中,并使用近似最近邻(ANN)索引快速找到最匹配的结果——无需离开数据库即可扩展至数百万行数据。将文本与其嵌入向量一同存储,然后将搜索文本转换为向量,返回距离最小的行即可完成查询。
本指南涵盖pgvector的设置与调优——不涉及嵌入模型选择或文本分块,这两者对搜索质量影响显著。所有功能需要pgvector 0.8.0+版本支持(包括、、迭代扫描)。
halfvecbinary_quantizeGolden Path (Default Setup)
推荐路径(默认配置)
Use this configuration unless you have a specific reason not to.
- Embedding column data type: where
halfvec(N)is your embedding dimension (must match everywhere). Examples use 1536; replace with your dimensionN.N - Distance: cosine ()
<=> - Index: HNSW (,
m = 16). Useef_construction = 64and query withhalfvec_cosine_ops.<=> - Query-time recall: (good starting point from published benchmarks, increase for higher recall at higher latency)
SET hnsw.ef_search = 100 - Query pattern:
ORDER BY embedding <=> $1::halfvec(N) LIMIT k
This setup provides a strong speed–recall tradeoff for most text-embedding workloads.
除非有特殊需求,否则请使用以下配置:
- 嵌入列数据类型:,其中
halfvec(N)为嵌入维度(所有位置必须保持一致)。示例中使用1536;请替换为你的实际维度N。N - 距离计算:余弦距离()
<=> - 索引:HNSW(,
m = 16)。使用ef_construction = 64,查询时使用halfvec_cosine_ops。<=> - 查询时召回率:(这是公开基准测试中的良好起始值,若要提升召回率可增大该值,但会增加延迟)
SET hnsw.ef_search = 100 - 查询模式:
ORDER BY embedding <=> $1::halfvec(N) LIMIT k
该配置为大多数文本嵌入工作负载提供了出色的速度-召回率平衡。
Core Rules
核心规则
- Enable the extension in each database:
CREATE EXTENSION IF NOT EXISTS vector; - Use HNSW indexes by default—superior speed-recall tradeoff, can be created on empty tables, no training step required. Only consider IVFFlat for write-heavy or memory-bound workloads.
- Use by default—store and index as
halfvecfor 50% smaller storage and indexes with minimal recall loss.halfvec - Index after bulk loading initial data for best build performance.
- Create indexes concurrently in production:
CREATE INDEX CONCURRENTLY ... - Use cosine distance by default (): For non-normalized embeddings, use cosine. For unit-normalized embeddings, cosine and inner product yield identical rankings; default to cosine.
<=> - Match query operator to index ops: Index with requires
halfvec_cosine_opsin queries;<=>requireshalfvec_l2_ops; mismatched operators won't use the index.<-> - Always cast query vectors explicitly () to avoid implicit-cast failures in prepared statements.
$1::halfvec(N) - Always use the same embedding model for data and queries. Similarity search only works when the model generating the vectors is the same.
- 在每个数据库中启用扩展:
CREATE EXTENSION IF NOT EXISTS vector; - 默认使用HNSW索引——速度-召回率平衡更优,可在空表上创建,无需训练步骤。仅在写入密集型或内存受限的工作负载中考虑使用IVFFlat。
- 默认使用——以
halfvec格式存储和索引可将存储和索引大小减少50%,且召回率损失极小。halfvec - 批量加载初始数据后再创建索引,以获得最佳构建性能。
- 生产环境中并发创建索引:
CREATE INDEX CONCURRENTLY ... - 默认使用余弦距离():对于未归一化的嵌入向量,使用余弦距离;对于单位归一化的嵌入向量,余弦距离与内积的排名结果一致,默认使用余弦距离。
<=> - 查询运算符与索引运算符类匹配:使用创建的索引要求查询时使用
halfvec_cosine_ops;<=>要求使用halfvec_l2_ops;运算符不匹配将无法使用索引。<-> - 始终显式转换查询向量(),以避免预准备语句中出现隐式转换失败。
$1::halfvec(N) - 数据与查询始终使用相同的嵌入模型。只有当生成向量的模型相同时,相似性搜索才能正常工作。
Type Rules
类型规则
- Store embeddings as
halfvec(N) - Cast query vectors to
halfvec(N) - Store binary quantized vectors as in a generated column
bit(N) - Do not mix /
vector/halfvecwithout explicit castsbit - Never call on table columns inside
binary_quantize(); store it insteadORDER BY - Dimensions must match: a column requires query vectors cast as
halfvec(1536).::halfvec(1536)
- 嵌入向量存储为
halfvec(N) - 查询向量转换为
halfvec(N) - 二进制量化向量存储为生成列中的类型
bit(N) - 不要在未显式转换的情况下混合使用/
vector/halfvecbit - 切勿在中对表列调用
ORDER BY;应提前存储量化结果binary_quantize() - 维度必须匹配:列要求查询向量转换为
halfvec(1536)。::halfvec(1536)
Standard Pattern
标准模式
sql
-- Store and index as halfvec
CREATE TABLE items (
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
contents TEXT NOT NULL,
embedding halfvec(1536) NOT NULL -- NOT NULL requires embeddings generated before insert, not async
);
CREATE INDEX ON items USING hnsw (embedding halfvec_cosine_ops);
-- Query: returns 10 closest items. $1 is the embedding of your search text.
SELECT id, contents FROM items ORDER BY embedding <=> $1::halfvec(1536) LIMIT 10;For other distance operators (L2, inner product, etc.), see the pgvector README.
sql
-- 以halfvec格式存储并创建索引
CREATE TABLE items (
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
contents TEXT NOT NULL,
embedding halfvec(1536) NOT NULL -- NOT NULL要求在插入前生成嵌入向量,而非异步生成
);
CREATE INDEX ON items USING hnsw (embedding halfvec_cosine_ops);
-- 查询:返回10个最匹配的条目。$1为搜索文本的嵌入向量。
SELECT id, contents FROM items ORDER BY embedding <=> $1::halfvec(1536) LIMIT 10;如需使用其他距离运算符(L2、内积等),请查看pgvector README。
HNSW Index
HNSW索引
The recommended index type. Creates a multilayer navigable graph with superior speed-recall tradeoff. Can be created on empty tables (no training step required).
sql
CREATE INDEX ON items USING hnsw (embedding halfvec_cosine_ops);
-- With tuning parameters
CREATE INDEX ON items USING hnsw (embedding halfvec_cosine_ops) WITH (m = 16, ef_construction = 64);推荐使用的索引类型。创建多层可导航图,速度-召回率平衡更优。可在空表上创建(无需训练步骤)。
sql
CREATE INDEX ON items USING hnsw (embedding halfvec_cosine_ops);
-- 带调优参数的创建语句
CREATE INDEX ON items USING hnsw (embedding halfvec_cosine_ops) WITH (m = 16, ef_construction = 64);HNSW Parameters
HNSW参数
| Parameter | Default | Description |
|---|---|---|
| 16 | Max connections per layer. Higher = better recall, more memory |
| 64 | Build-time candidate list. Higher = better graph quality, slower build |
| 40 | Query-time candidate list. Higher = better recall, slower queries. Should be ≥ LIMIT. |
ef_search tuning (rough guidelines—actual results vary by dataset):
| ef_search | Approx Recall | Relative Speed |
|---|---|---|
| 40 | lower (~95% on some benchmarks) | 1x (baseline) |
| 100 | higher | ~2x slower |
| 200 | very-high | ~4x slower |
| 400 | near-exact | ~8x slower |
sql
-- Set search parameter for session
SET hnsw.ef_search = 100;
-- Set for single query
BEGIN;
SET LOCAL hnsw.ef_search = 100;
SELECT id, contents FROM items ORDER BY embedding <=> $1::halfvec(1536) LIMIT 10;
COMMIT;| 参数 | 默认值 | 描述 |
|---|---|---|
| 16 | 每层最大连接数。值越大,召回率越高,占用内存越多 |
| 64 | 构建时的候选列表大小。值越大,图质量越好,构建速度越慢 |
| 40 | 查询时的候选列表大小。值越大,召回率越高,查询速度越慢。该值应≥LIMIT。 |
ef_search调优(大致指南——实际结果因数据集而异):
| ef_search | 近似召回率 | 相对速度 |
|---|---|---|
| 40 | 较低(部分基准测试中约95%) | 1倍(基准线) |
| 100 | 较高 | 约慢2倍 |
| 200 | 极高 | 约慢4倍 |
| 400 | 接近精确 | 约慢8倍 |
sql
-- 为会话设置搜索参数
SET hnsw.ef_search = 100;
-- 为单个查询设置参数
BEGIN;
SET LOCAL hnsw.ef_search = 100;
SELECT id, contents FROM items ORDER BY embedding <=> $1::halfvec(1536) LIMIT 10;
COMMIT;IVFFlat Index (Generally Not Recommended)
IVFFlat索引(通常不推荐)
Default to HNSW. Use IVFFlat only when HNSW’s operational costs matter more than peak recall.
Choose IVFFlat if:
- Write-heavy or constantly changing data AND you're willing to rebuild the index frequently
- You rebuild indexes often and want predictable build time and memory usage
- Memory is tight and you cannot keep an HNSW graph mostly resident
- Data is partitioned or tiered, and this index lives on colder partitions
Avoid IVFFlat if you need:
- highest recall at low latency
- minimal tuning
- a “set and forget” index
Notes:
- IVFFlat requires data to exist before index creation.
- Recall depends on and
lists; higher probes = better recall, slower queries.ivfflat.probes
Starter config:
sql
CREATE INDEX ON items
USING ivfflat (embedding halfvec_cosine_ops)
WITH (lists = 1000);
SET ivfflat.probes = 10;默认使用HNSW。仅当HNSW的运维成本比峰值召回率更重要时,才使用IVFFlat。
在以下场景选择IVFFlat:
- 写入密集型或数据频繁变化的工作负载,且你愿意频繁重建索引
- 你需要经常重建索引,且希望构建时间和内存使用可预测
- 内存紧张,无法将HNSW图大部分驻留在内存中
- 数据已分区或分层,且该索引位于较冷的分区上
在以下场景避免使用IVFFlat:
- 需要低延迟下的最高召回率
- 希望调优工作最少
- 需要“一劳永逸”的索引
注意事项:
- IVFFlat要求创建索引前数据已存在。
- 召回率取决于和
lists;probes值越大,召回率越高,查询速度越慢。ivfflat.probes
初始配置:
sql
CREATE INDEX ON items
USING ivfflat (embedding halfvec_cosine_ops)
WITH (lists = 1000);
SET ivfflat.probes = 10;Quantization Strategies
量化策略
- Quantization is a memory decision, not a recall decision.
- Use by default for storage and indexing.
halfvec - Estimate HNSW index footprint as ~4–6 KB per 1536-dim (m=16) (order-of-magnitude); 3072-dim is ~2×; m=32 roughly doubles HNSW link/graph overhead.
halfvec - If p95/p99 latency rises while CPU is mostly idle, the HNSW index is likely no longer resident in memory.
- If doesn’t fit, use binary quantization + re-ranking.
halfvec
- 量化是内存优化决策,而非召回率优化决策。
- 默认使用进行存储和索引。
halfvec - 估算HNSW索引占用空间:对于1536维的(m=16),约为每行4–6 KB(数量级);3072维约为2倍;m=32时,HNSW链接/图开销大致翻倍。
halfvec - 如果p95/p99延迟上升但CPU大多处于空闲状态,说明HNSW索引可能不再驻留在内存中。
- 如果仍无法满足内存需求,使用二进制量化+重排序。
halfvec
Guidelines for 1536-dim vectors
1536维向量指南
Approximate capacity at , 1536-dim (assumes RAM mostly available for index caching):
halfvecm=16| RAM | Approx max halfvec vectors |
|---|---|
| 16 GB | ~2–3M vectors |
| 32 GB | ~4–6M vectors |
| 64 GB | ~8–12M vectors |
| 128 GB | ~16–25M vectors |
For 3072-dim embeddings, divide these numbers by ~2.
For, also divide capacity by ~2.
For
m=32If the index cannot fit in memory at this scale, use binary quantization.
These are ranges, not guarantees. Validate by monitoring cache residency and p95/p99 latency under load.
在m=16、1536维的情况下,的近似容量(假设内存主要用于索引缓存):
halfvec| 内存 | 近似最大halfvec向量数量 |
|---|---|
| 16 GB | ~2–3M 向量 |
| 32 GB | ~4–6M 向量 |
| 64 GB | ~8–12M 向量 |
| 128 GB | ~16–25M 向量 |
对于3072维嵌入向量,上述数值需除以约2。
对于,容量也需除以约2。
对于
m=32如果索引无法在此规模下放入内存,使用二进制量化。
这些是范围值,而非保证值。需通过监控缓存驻留率和负载下的p95/p99延迟进行验证。
Binary Quantization (For Very Large Datasets)
二进制量化(适用于超大型数据集)
32× memory reduction. Use with re-ranking for acceptable recall.
sql
-- Table with generated column for binary quantization
CREATE TABLE items (
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
contents TEXT NOT NULL,
embedding halfvec(1536) NOT NULL,
embedding_bq bit(1536) GENERATED ALWAYS AS (binary_quantize(embedding)::bit(1536)) STORED
);
CREATE INDEX ON items USING hnsw (embedding_bq bit_hamming_ops);
-- Query with re-ranking for better recall
-- ef_search must be >= inner LIMIT to retrieve enough candidates
SET hnsw.ef_search = 800;
WITH q AS (
SELECT binary_quantize($1::halfvec(1536))::bit(1536) AS qb
)
SELECT *
FROM (
SELECT i.id, i.contents, i.embedding
FROM items i, q
ORDER BY i.embedding_bq <~> q.qb -- computes binary distance using index
LIMIT 800
) candidates
ORDER BY candidates.embedding <=> $1::halfvec(1536) -- computes halfvec distance (no index), more accurate than binary
LIMIT 10;The 80× oversampling ratio (800 candidates for 10 results) is a reasonable starting point. Binary quantization loses precision, so more candidates are needed to find true nearest neighbors during re-ranking. Increase if recall is insufficient; decrease if re-ranking latency is too high.
可减少32倍内存占用。结合重排序可获得可接受的召回率。
sql
-- 包含二进制量化生成列的表
CREATE TABLE items (
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
contents TEXT NOT NULL,
embedding halfvec(1536) NOT NULL,
embedding_bq bit(1536) GENERATED ALWAYS AS (binary_quantize(embedding)::bit(1536)) STORED
);
CREATE INDEX ON items USING hnsw (embedding_bq bit_hamming_ops);
-- 带重排序的查询以提升召回率
-- ef_search必须≥内部LIMIT,以获取足够的候选结果
SET hnsw.ef_search = 800;
WITH q AS (
SELECT binary_quantize($1::halfvec(1536))::bit(1536) AS qb
)
SELECT *
FROM (
SELECT i.id, i.contents, i.embedding
FROM items i, q
ORDER BY i.embedding_bq <~> q.qb -- 使用索引计算二进制距离
LIMIT 800
) candidates
ORDER BY candidates.embedding <=> $1::halfvec(1536) -- 计算halfvec距离(不使用索引),比二进制距离更准确
LIMIT 10;80倍过采样率(为10个结果获取800个候选)是合理的起始值。二进制量化会损失精度,因此需要更多候选结果才能在重排序时找到真正的最近邻。如果召回率不足,可增大该值;如果重排序延迟过高,可减小该值。
Performance by Dataset Size
按数据集规模优化性能
| Scale | Vectors | Config | Notes |
|---|---|---|---|
| Small | <100K | Defaults | Index optional but improves tail latency |
| Medium | 100K–5M | Defaults | Monitor p95 latency; most common production range |
| Large | 5M+ | | Memory residency critical |
| Very Large | 10M+ | Binary quantization + re-ranking | Add RAM or partition first if possible |
Tune first for recall; only increase if recall plateaus and memory allows. Under concurrency, tail latency spikes when the index doesn't fit in memory. Binary quantization is an escape hatch—prefer adding RAM or partitioning first.
ef_searchm| 规模 | 向量数量 | 配置 | 说明 |
|---|---|---|---|
| 小型 | <100K | 默认配置 | 索引可选,但可改善尾部延迟 |
| 中型 | 100K–5M | 默认配置 | 监控p95延迟;这是最常见的生产环境规模 |
| 大型 | 5M+ | | 内存驻留至关重要 |
| 超大型 | 10M+ | 二进制量化+重排序 | 如有可能,优先增加内存或分区 |
优先调优以提升召回率;仅当召回率进入瓶颈且内存允许时,再增大。在并发场景下,当索引无法放入内存时,尾部延迟会骤增。二进制量化是最后的手段——优先选择增加内存或分区。
ef_searchmFiltering Best Practices
过滤最佳实践
Filtered vector search requires care. Depending on filter selectivity and query shape, filters can cause early termination (too few rows, missing results) or increase work (latency).
带过滤条件的向量搜索需要谨慎处理。根据过滤条件的选择性和查询形状,过滤可能导致提前终止(结果过少、丢失结果)或增加计算量(延迟上升)。
Iterative scan (recommended when filters are selective)
迭代扫描(过滤条件选择性高时推荐)
By default, HNSW may stop early when a WHERE clause is present, which can lead to fewer results than expected. Iterative scan allows HNSW to continue searching until enough filtered rows are found.
Enable iterative scan when filters materially reduce the result set.
sql
-- Enable iterative scans for filtered queries
SET hnsw.iterative_scan = relaxed_order;
SELECT id, contents
FROM items
WHERE category_id = 123
ORDER BY embedding <=> $1::halfvec(1536)
LIMIT 10;If results are still sparse, increase the scan budget:
sql
SET hnsw.max_scan_tuples = 50000;Trade-off: increasing improves recall but can significantly increase latency.
hnsw.max_scan_tuplesWhen iterative scan is not needed:
- The filter matches a large portion of the table (low selectivity)
- You are prefiltering via a B-tree index
- You are querying a single partition or partial index
默认情况下,当存在WHERE子句时,HNSW可能提前终止,导致结果数量少于预期。迭代扫描允许HNSW继续搜索,直到找到足够多的符合过滤条件的行。
当过滤条件显著减少结果集时,启用迭代扫描。
sql
-- 为带过滤条件的查询启用迭代扫描
SET hnsw.iterative_scan = relaxed_order;
SELECT id, contents
FROM items
WHERE category_id = 123
ORDER BY embedding <=> $1::halfvec(1536)
LIMIT 10;如果结果仍然稀疏,增大扫描预算:
sql
SET hnsw.max_scan_tuples = 50000;权衡:增大可提升召回率,但会显著增加延迟。
hnsw.max_scan_tuples无需使用迭代扫描的场景:
- 过滤条件匹配表中大部分数据(低选择性)
- 你已通过B-tree索引预过滤
- 你正在查询单个分区或部分索引
Choose the right filtering strategy
选择合适的过滤策略
Highly selective filters (under ~10k rows)
Use a B-tree index on the filter column so Postgres can prefilter before ANN.
sql
CREATE INDEX ON items (category_id);Low-cardinality filters (few distinct values)
Use partial HNSW indexes per filter value.
sql
CREATE INDEX ON items
USING hnsw (embedding halfvec_cosine_ops)
WHERE category_id = 11;Many filter values or large datasets
Partition by the filter key to keep each ANN index small.
sql
CREATE TABLE items (
embedding halfvec(1536),
category_id int
) PARTITION BY LIST (category_id);高选择性过滤条件(匹配行数少于~10k)
在过滤列上创建B-tree索引,使Postgres可在ANN索引查询前预过滤数据。
sql
CREATE INDEX ON items (category_id);低基数过滤条件(不同值数量少)
为每个过滤值创建部分HNSW索引。
sql
CREATE INDEX ON items
USING hnsw (embedding halfvec_cosine_ops)
WHERE category_id = 11;多过滤值或大型数据集
按过滤键分区,使每个ANN索引保持较小规模。
sql
CREATE TABLE items (
embedding halfvec(1536),
category_id int
) PARTITION BY LIST (category_id);Key rules
关键规则
- Filters that match few rows require prefiltering, partitioning, or iterative scan.
- Always validate filtered queries by measuring p95/p99 latency and tuples visited under realistic load.
- 匹配行数少的过滤条件需要预过滤、分区或迭代扫描。
- 始终在真实负载下通过测量p95/p99延迟和访问的元组数量来验证带过滤条件的查询。
Alternative: pgvectorscale for label-based filtering
替代方案:使用pgvectorscale进行基于标签的过滤
For large datasets with label-based filters, pgvectorscale's StreamingDiskANN index supports filtered indexes on columns. Labels are indexed alongside vectors, enabling efficient filtered search without the accuracy tradeoffs of HNSW post-filtering. See the pgvectorscale documentation for setup details.
smallint[]对于带标签过滤的大型数据集,pgvectorscale的StreamingDiskANN索引支持在列上创建带过滤条件的索引。标签与向量一同索引,无需HNSW后过滤的精度权衡即可实现高效的带过滤条件搜索。设置细节请查看pgvectorscale文档。
smallint[]Bulk Loading
批量加载
sql
-- COPY is fastest; binary format is faster but requires proper encoding
-- Text format: '[0.1, 0.2, ...]'
COPY items (contents, embedding) FROM STDIN;
-- Binary format (if your client supports it):
COPY items (contents, embedding) FROM STDIN WITH (FORMAT BINARY);
-- Add indexes AFTER loading
SET maintenance_work_mem = '4GB';
SET max_parallel_maintenance_workers = 7;
CREATE INDEX ON items USING hnsw (embedding halfvec_cosine_ops);sql
-- COPY是最快的方式;二进制格式更快,但需要正确编码
-- 文本格式:'[0.1, 0.2, ...]'
COPY items (contents, embedding) FROM STDIN;
-- 二进制格式(如果你的客户端支持):
COPY items (contents, embedding) FROM STDIN WITH (FORMAT BINARY);
-- 加载完成后再添加索引
SET maintenance_work_mem = '4GB';
SET max_parallel_maintenance_workers = 7;
CREATE INDEX ON items USING hnsw (embedding halfvec_cosine_ops);Maintenance
维护
- VACUUM regularly after updates/deletes—stale entries may persist until vacuumed
- REINDEX if performance degrades after high churn (rebuilds the graph from scratch)
- For write-heavy workloads with frequent deletes, consider IVFFlat or partitioning by time using hypertables
- 定期执行VACUUM——更新/删除后,过期条目可能会保留到执行VACUUM为止
- REINDEX——如果高 churn后性能下降(从头重建图)
- 对于写入密集型且频繁删除的工作负载,考虑使用IVFFlat或使用hypertables按时间分区
Monitoring & Debugging
监控与调试
sql
-- Check index size
SELECT pg_size_pretty(pg_relation_size('items_embedding_idx'));
-- Debug query performance
EXPLAIN (ANALYZE, BUFFERS) SELECT id, contents FROM items ORDER BY embedding <=> $1::halfvec(1536) LIMIT 10;
-- Monitor index build progress
SELECT phase, round(100.0 * blocks_done / nullif(blocks_total, 0), 1) AS "%"
FROM pg_stat_progress_create_index;
-- Compare approximate vs exact recall
BEGIN;
SET LOCAL enable_indexscan = off; -- Force exact search
SELECT id, contents FROM items ORDER BY embedding <=> $1::halfvec(1536) LIMIT 10;
COMMIT;
-- Force index use for debugging
BEGIN;
SET LOCAL enable_seqscan = off;
SELECT id, contents FROM items ORDER BY embedding <=> $1::halfvec(1536) LIMIT 10;
COMMIT;sql
-- 检查索引大小
SELECT pg_size_pretty(pg_relation_size('items_embedding_idx'));
-- 调试查询性能
EXPLAIN (ANALYZE, BUFFERS) SELECT id, contents FROM items ORDER BY embedding <=> $1::halfvec(1536) LIMIT 10;
-- 监控索引构建进度
SELECT phase, round(100.0 * blocks_done / nullif(blocks_total, 0), 1) AS "%"
FROM pg_stat_progress_create_index;
-- 比较近似搜索与精确搜索的召回率
BEGIN;
SET LOCAL enable_indexscan = off; -- 强制精确搜索
SELECT id, contents FROM items ORDER BY embedding <=> $1::halfvec(1536) LIMIT 10;
COMMIT;
-- 强制使用索引进行调试
BEGIN;
SET LOCAL enable_seqscan = off;
SELECT id, contents FROM items ORDER BY embedding <=> $1::halfvec(1536) LIMIT 10;
COMMIT;Common Issues (Symptom → Fix)
常见问题(症状→解决方案)
| Symptom | Likely Cause | Fix |
|---|---|---|
| Query does not use ANN index | Missing | Use |
| Fewer results than expected (filtered query) | HNSW stops early due to filter | Enable iterative scan; increase |
| Fewer results than expected (unfiltered query) | ANN recall too low | Increase |
| High latency with low CPU usage | HNSW index not resident in memory | Use |
| Slow index builds | Insufficient build memory or parallelism | Increase |
| Out-of-memory errors | Index too large for available RAM | Use |
| Zero or missing results | NULL or zero vectors | Avoid NULL embeddings; do not use zero vectors with cosine distance |
| 症状 | 可能原因 | 解决方案 |
|---|---|---|
| 查询未使用ANN索引 | 缺少 | 使用与索引运算符类匹配的距离运算符进行 |
| 带过滤条件的查询结果少于预期 | HNSW因过滤条件提前终止 | 启用迭代扫描;增大 |
| 无过滤条件的查询结果少于预期 | ANN召回率过低 | 增大 |
| 延迟高但CPU使用率低 | HNSW索引未驻留在内存中 | 使用 |
| 索引构建缓慢 | 构建内存或并行度不足 | 增大 |
| 内存不足错误 | 索引超出可用内存 | 使用 |
| 结果为零或缺失 | 向量为NULL或零向量 | 避免NULL嵌入向量;不要对零向量使用余弦距离 |