redis-vector-search
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseRedis Vector Search
Redis 向量搜索
Guidance for storing and searching embeddings in Redis. Covers index configuration, algorithm selection, hybrid filtering, and the RAG retrieval pattern with RedisVL.
本文指导如何在Redis中存储和搜索嵌入向量,涵盖索引配置、算法选择、混合过滤,以及基于RedisVL的RAG检索模式。
When to apply
适用场景
- Defining a field in
VECTOR(raw RQE) or a RedisVLFT.CREATE.IndexSchema - Choosing HNSW vs FLAT and tuning HNSW parameters.
- Adding category, date, or tenant filters to a vector query.
- Building a retrieval-augmented generation (RAG) pipeline on top of Redis.
This skill builds on the skill — vector fields live inside RQE indexes and share the same / machinery.
redis-query-engineFT.CREATEFT.SEARCH- 在(原生RQE)或RedisVL
FT.CREATE中定义IndexSchema字段时。VECTOR - 选择HNSW与FLAT算法并调优HNSW参数时。
- 为向量查询添加分类、日期或租户过滤器时。
- 在Redis之上构建检索增强生成(RAG)流水线时。
本技能基于技能扩展——向量字段存在于RQE索引中,共享相同的 / 机制。
redis-query-engineFT.CREATEFT.SEARCH1. Configure the vector index properly
1. 正确配置向量索引
Three settings must match the embedding model:
- — the model's output dimensionality (e.g. 1536 for OpenAI
DIM). A mismatch produces silent garbage.text-embedding-3-small - —
DISTANCE_METRICfor normalized text embeddings (the common case),COSINEfor unnormalized inner-product,IPfor raw Euclidean.L2 - /
TYPE— usuallydatatype. UseFLOAT32or quantized variants only when memory cost is a hard constraint.FLOAT16
Raw RQE:
FT.CREATE idx:docs ON HASH PREFIX 1 doc:
SCHEMA
content TEXT
embedding VECTOR HNSW 6
TYPE FLOAT32
DIM 1536
DISTANCE_METRIC COSINERedisVL:
python
schema = IndexSchema.from_dict({
"index": {"name": "idx:docs", "prefix": "doc:"},
"fields": [
{"name": "content", "type": "text"},
{"name": "embedding", "type": "vector", "attrs": {
"dims": 1536, "algorithm": "HNSW",
"datatype": "FLOAT32", "distance_metric": "COSINE",
}},
]
})See references/index-creation.md for redis-py and RedisVL variants.
有三个设置必须与嵌入模型匹配:
- ——模型输出的维度(例如OpenAI
DIM为1536)。不匹配会导致无提示的错误结果。text-embedding-3-small - ——归一化文本嵌入使用
DISTANCE_METRIC(常见情况),非归一化内积使用COSINE,原始欧氏距离使用IP。L2 - /
TYPE——通常为datatype。仅当内存成本是硬性约束时,才使用FLOAT32或量化变体。FLOAT16
原生RQE示例:
FT.CREATE idx:docs ON HASH PREFIX 1 doc:
SCHEMA
content TEXT
embedding VECTOR HNSW 6
TYPE FLOAT32
DIM 1536
DISTANCE_METRIC COSINERedisVL示例:
python
schema = IndexSchema.from_dict({
"index": {"name": "idx:docs", "prefix": "doc:"},
"fields": [
{"name": "content", "type": "text"},
{"name": "embedding", "type": "vector", "attrs": {
"dims": 1536, "algorithm": "HNSW",
"datatype": "FLOAT32", "distance_metric": "COSINE",
}},
]
})查看references/index-creation.md获取redis-py和RedisVL的更多变体。
2. HNSW vs FLAT
2. HNSW 对比 FLAT
| Algorithm | Speed | Accuracy | Memory | Best for |
|---|---|---|---|---|
| HNSW | Fast (approximate) | ~95%+ recall (tunable) | Higher | Large datasets (>10k vectors), latency-sensitive |
| FLAT | Slow (exact) | 100% | Lower | Small datasets (<10k), accuracy-critical |
Default to HNSW for any production-scale workload. Tuning levers:
- — connections per node (16–64). Higher = better recall, more memory.
M - — build-time graph quality (100–500). Higher = better index, slower build.
EF_CONSTRUCTION - — query-time candidate-list size. Higher = better recall, slower queries.
EF_RUNTIME
Use FLAT when the corpus is small and you need exact results (e.g. semantic dedup over a few thousand items).
See references/algorithm-choice.md.
| 算法 | 速度 | 准确度 | 内存占用 | 最佳适用场景 |
|---|---|---|---|---|
| HNSW | 快(近似搜索) | ~95%+召回率(可调节) | 较高 | 大型数据集(>10k向量)、对延迟敏感的场景 |
| FLAT | 慢(精确搜索) | 100% | 较低 | 小型数据集(<10k)、对准确度要求极高的场景 |
对于任何生产级规模的工作负载,默认选择HNSW。可调参数:
- ——每个节点的连接数(16–64)。值越高,召回率越好,但内存占用越大。
M - ——构建时的图质量(100–500)。值越高,索引质量越好,但构建速度越慢。
EF_CONSTRUCTION - ——查询时的候选列表大小。值越高,召回率越好,但查询速度越慢。
EF_RUNTIME
当数据集较小且需要精确结果时(例如对数千条数据进行语义去重),选择FLAT。
查看references/algorithm-choice.md。
3. Hybrid search — filter before vector
3. 混合搜索——先过滤再向量检索
Apply attribute filters (TAG / NUMERIC) so the engine narrows the search space before the vector comparison. Don't fetch a wide result set and then filter client-side — that's slower and less accurate.
python
from redisvl.query import VectorQuery
from redisvl.query.filter import Num, Tag
filters = (Tag("category") == "technology") & (Num("date") >= 2024)
query = VectorQuery(
vector=query_embedding,
vector_field_name="embedding",
return_fields=["content", "category", "date"],
num_results=10,
filter_expression=filters,
)
results = index.query(query)For text + vector fusion (BM25-weighted text scoring combined with vector similarity), use on Redis ≥ 8.4 with redis-py ≥ 7.1, or on older Redis. That's a different "hybrid" from filtered vector search above.
HybridQueryAggregateHybridQuerySee references/hybrid-search.md.
应用属性过滤器(TAG / NUMERIC),让引擎在进行向量比较之前缩小搜索范围。不要先获取大范围结果再在客户端过滤——这种方式更慢且准确度更低。
python
from redisvl.query import VectorQuery
from redisvl.query.filter import Num, Tag
filters = (Tag("category") == "technology") & (Num("date") >= 2024)
query = VectorQuery(
vector=query_embedding,
vector_field_name="embedding",
return_fields=["content", "category", "date"],
num_results=10,
filter_expression=filters,
)
results = index.query(query)对于文本+向量融合(BM25加权文本评分与向量相似度结合),在Redis ≥ 8.4且redis-py ≥ 7.1版本中使用,或在旧版Redis中使用。这与上述过滤式向量搜索是不同的“混合”类型。
HybridQueryAggregateHybridQuery查看references/hybrid-search.md。
4. RAG pattern
4. RAG 模式
Standard pipeline: embed the user query → vector search Redis → pass top-K context to the LLM.
python
undefined标准流水线:嵌入用户查询 → 在Redis中进行向量搜索 → 将Top-K上下文传递给大语言模型(LLM)。
python
undefinedIndex documents with embeddings
为文档添加嵌入向量并建立索引
records = [{"content": doc.content,
"embedding": embed_model.encode(doc.content).tolist(),
"source": doc.source}
for doc in documents]
index.load(records)
records = [{"content": doc.content,
"embedding": embed_model.encode(doc.content).tolist(),
"source": doc.source}
for doc in documents]
index.load(records)
Retrieve relevant context for a user question
为用户问题检索相关上下文
q_emb = embed_model.encode(user_question)
results = index.query(VectorQuery(
vector=q_emb,
vector_field_name="embedding",
return_fields=["content", "source"],
num_results=5,
))
q_emb = embed_model.encode(user_question)
results = index.query(VectorQuery(
vector=q_emb,
vector_field_name="embedding",
return_fields=["content", "source"],
num_results=5,
))
Generate with retrieved context
结合检索到的上下文生成回答
context = "\n".join(r["content"] for r in results)
response = llm.generate(f"Context: {context}\n\nQuestion: {user_question}")
Practical tips:
- **Match metric to model.** Most modern text embedding models pair best with `COSINE`.
- **Chunk long documents** before indexing — retrieval over 200–500-token chunks usually beats indexing whole pages.
- **Batch inserts** with `index.load([...])` instead of one call per record.
- **Pre-filter with attributes** (tenant, recency, document type) before the vector search.
See [references/rag-pattern.md](references/rag-pattern.md).context = "\n".join(r["content"] for r in results)
response = llm.generate(f"Context: {context}\n\nQuestion: {user_question}")
实用技巧:
- **匹配度量与模型**:大多数现代文本嵌入模型最适合搭配`COSINE`度量。
- **拆分长文档**:在建立索引前拆分长文档——对200–500token的片段进行检索通常比索引整页内容效果更好。
- **批量插入**:使用`index.load([...])`批量插入,而非逐条调用。
- **先按属性过滤**:在向量搜索前先按属性(租户、时效性、文档类型)过滤。
查看[references/rag-pattern.md](references/rag-pattern.md)。