faiss

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

FAISS - Efficient Similarity Search

FAISS - 高效相似性搜索

Facebook AI's library for billion-scale vector similarity search.

Facebook AI推出的面向十亿级向量规模的相似性搜索库。

When to use FAISS

何时使用FAISS

Use FAISS when:

Need fast similarity search on large vector datasets (millions/billions)
GPU acceleration required
Pure vector similarity (no metadata filtering needed)
High throughput, low latency critical
Offline/batch processing of embeddings

Metrics:

31,700+ GitHub stars
Meta/Facebook AI Research
Handles billions of vectors
C++ with Python bindings

Use alternatives instead:

Chroma/Pinecone: Need metadata filtering
Weaviate: Need full database features
Annoy: Simpler, fewer features

在以下场景使用FAISS：

需在大型向量数据集（百万/十亿级）上进行快速相似性搜索
需要GPU加速
仅需纯向量相似性搜索（无需元数据过滤）
高吞吐量、低延迟要求严格
对嵌入向量进行离线/批量处理

相关指标:

GitHub星标数31,700+
由Meta/Facebook AI Research开发
支持数十亿级向量处理
基于**C++**开发，提供Python绑定

以下场景请使用替代方案：

Chroma/Pinecone: 需要元数据过滤功能
Weaviate: 需要完整数据库特性
Annoy: 需求简单、需要更少功能的场景

Quick start

快速开始

Installation

安装

bash

undefined

bash

undefined

CPU only

仅CPU版本

pip install faiss-cpu

GPU support

带GPU支持版本

pip install faiss-gpu

undefined

pip install faiss-gpu

undefined

Basic usage

基础用法

python

import faiss
import numpy as np

python

import faiss
import numpy as np

Create sample data (1000 vectors, 128 dimensions)

创建示例数据（1000个向量，128维）

d = 128 nb = 1000 vectors = np.random.random((nb, d)).astype('float32')

Create index

创建索引

index = faiss.IndexFlatL2(d) # L2 distance index.add(vectors) # Add vectors

index = faiss.IndexFlatL2(d) # L2距离 index.add(vectors) # 添加向量

Search

搜索

k = 5 # Find 5 nearest neighbors query = np.random.random((1, d)).astype('float32') distances, indices = index.search(query, k)

print(f"Nearest neighbors: {indices}") print(f"Distances: {distances}")

undefined

k = 5 # 查找5个最近邻 query = np.random.random((1, d)).astype('float32') distances, indices = index.search(query, k)

print(f"最近邻索引: {indices}") print(f"距离值: {distances}")

undefined

Index types

索引类型

1. Flat (exact search)

1. Flat（精确搜索）

python

undefined

python

undefined

L2 (Euclidean) distance

L2（欧几里得）距离

index = faiss.IndexFlatL2(d)

Inner product (cosine similarity if normalized)

内积（向量归一化后等价于余弦相似度）

index = faiss.IndexFlatIP(d)

Slowest, most accurate

速度最慢，但精度最高

undefined

undefined

2. IVF (inverted file) - Fast approximate

2. IVF（倒排文件）- 快速近似搜索

python

undefined

python

undefined

Create quantizer

创建量化器

quantizer = faiss.IndexFlatL2(d)

IVF index with 100 clusters

包含100个聚类的IVF索引

nlist = 100 index = faiss.IndexIVFFlat(quantizer, d, nlist)

Train on data

使用数据训练索引

index.train(vectors)

Add vectors

添加向量

index.add(vectors)

Search (nprobe = clusters to search)

搜索（nprobe = 要搜索的聚类数量）

index.nprobe = 10 distances, indices = index.search(query, k)

undefined

index.nprobe = 10 distances, indices = index.search(query, k)

undefined

3. HNSW (Hierarchical NSW) - Best quality/speed

3. HNSW（层次导航小世界）- 精度与速度最优平衡

python

undefined

python

undefined

HNSW index

HNSW索引

M = 32 # Number of connections per layer index = faiss.IndexHNSWFlat(d, M)

M = 32 # 每层的连接数 index = faiss.IndexHNSWFlat(d, M)

No training needed

无需训练

index.add(vectors)

Search

搜索

distances, indices = index.search(query, k)

undefined

distances, indices = index.search(query, k)

undefined

4. Product Quantization - Memory efficient

4. 乘积量化 - 内存高效

python

undefined

python

undefined

PQ reduces memory by 16-32×

PQ可将内存占用降低16-32倍

m = 8 # Number of subquantizers nbits = 8 index = faiss.IndexPQ(d, m, nbits)

m = 8 # 子量化器数量 nbits = 8 index = faiss.IndexPQ(d, m, nbits)

Train and add

训练并添加向量

index.train(vectors) index.add(vectors)

undefined

index.train(vectors) index.add(vectors)

undefined

Save and load

保存与加载索引

python

undefined

python

undefined

Save index

保存索引

faiss.write_index(index, "large.index")

Load index

加载索引

index = faiss.read_index("large.index")

Continue using

继续使用索引

distances, indices = index.search(query, k)

undefined

distances, indices = index.search(query, k)

undefined

GPU acceleration

GPU加速

python

undefined

python

undefined

Single GPU

单GPU使用

res = faiss.StandardGpuResources() index_cpu = faiss.IndexFlatL2(d) index_gpu = faiss.index_cpu_to_gpu(res, 0, index_cpu) # GPU 0

res = faiss.StandardGpuResources() index_cpu = faiss.IndexFlatL2(d) index_gpu = faiss.index_cpu_to_gpu(res, 0, index_cpu) # 使用GPU 0

Multi-GPU

多GPU使用

index_gpu = faiss.index_cpu_to_all_gpus(index_cpu)

10-100× faster than CPU

速度比CPU快10-100倍

undefined

undefined

LangChain integration

与LangChain集成

python

from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings

python

from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings

Create FAISS vector store

创建FAISS向量存储

vectorstore = FAISS.from_documents(docs, OpenAIEmbeddings())

Save

保存

vectorstore.save_local("faiss_index")

Load

加载

vectorstore = FAISS.load_local( "faiss_index", OpenAIEmbeddings(), allow_dangerous_deserialization=True )

Search

搜索

results = vectorstore.similarity_search("query", k=5)

undefined

results = vectorstore.similarity_search("query", k=5)

undefined

LlamaIndex integration

与LlamaIndex集成

python

from llama_index.vector_stores.faiss import FaissVectorStore
import faiss

python

from llama_index.vector_stores.faiss import FaissVectorStore
import faiss

Create FAISS index

创建FAISS索引

d = 1536 faiss_index = faiss.IndexFlatL2(d)

vector_store = FaissVectorStore(faiss_index=faiss_index)

undefined

d = 1536 faiss_index = faiss.IndexFlatL2(d)

vector_store = FaissVectorStore(faiss_index=faiss_index)

undefined

Best practices

最佳实践

Choose right index type - Flat for <10K, IVF for 10K-1M, HNSW for quality
Normalize for cosine - Use IndexFlatIP with normalized vectors
Use GPU for large datasets - 10-100× faster
Save trained indices - Training is expensive
Tune nprobe/ef_search - Balance speed/accuracy
Monitor memory - PQ for large datasets
Batch queries - Better GPU utilization

选择合适的索引类型 - 数据量<10K用Flat，10K-1M用IVF，追求精度用HNSW
余弦相似度需归一化 - 对归一化后的向量使用IndexFlatIP
大型数据集用GPU - 速度比CPU快10-100倍
保存训练后的索引 - 训练过程成本较高
调优nprobe/ef_search - 平衡搜索速度与精度
监控内存占用 - 大型数据集使用PQ索引
批量查询 - 提升GPU利用率

Performance

性能对比

Index Type	Build Time	Search Time	Memory	Accuracy
Flat	Fast	Slow	High	100%
IVF	Medium	Fast	Medium	95-99%
HNSW	Slow	Fastest	High	99%
PQ	Medium	Fast	Low	90-95%

索引类型	构建时间	搜索时间	内存占用	精度
Flat	快	慢	高	100%
IVF	中	快	中	95-99%
HNSW	慢	最快	高	99%
PQ	中	快	低	90-95%