FAISS - Efficient Similarity Search

FAISS - 高效相似性搜索

Facebook AI's library for billion-scale vector similarity search.

Facebook AI推出的面向十亿级规模向量的相似性搜索库。

When to use FAISS

何时使用FAISS

Use FAISS when:

Need fast similarity search on large vector datasets (millions/billions)
GPU acceleration required
Pure vector similarity (no metadata filtering needed)
High throughput, low latency critical
Offline/batch processing of embeddings

Metrics:

31,700+ GitHub stars
Meta/Facebook AI Research
Handles billions of vectors
C++ with Python bindings

Use alternatives instead:

Chroma/Pinecone: Need metadata filtering
Weaviate: Need full database features
Annoy: Simpler, fewer features

使用FAISS的场景：

需要在大规模向量数据集（百万/十亿级）上进行快速相似性搜索
需要GPU加速
仅需向量相似性检索（无需元数据过滤）
对高吞吐量、低延迟有严格要求
对嵌入向量进行离线/批量处理

相关指标：

GitHub星标数31700+
由Meta/Facebook AI Research开发
支持处理数十亿级向量
基于C++开发，提供Python绑定

可选择替代工具的场景：

Chroma/Pinecone：需要元数据过滤功能时选择
Weaviate：需要完整数据库功能时选择
Annoy：需求简单、需要更少功能时选择

Quick start

快速开始

Installation

安装

bash

undefined

bash

undefined

CPU only

pip install faiss-cpu

GPU support

pip install faiss-gpu

undefined

pip install faiss-gpu

undefined

Basic usage

基础使用

python

import faiss
import numpy as np

python

import faiss
import numpy as np

Create sample data (1000 vectors, 128 dimensions)

d = 128 nb = 1000 vectors = np.random.random((nb, d)).astype('float32')

Create index

index = faiss.IndexFlatL2(d) # L2 distance index.add(vectors) # Add vectors

Search

k = 5 # Find 5 nearest neighbors query = np.random.random((1, d)).astype('float32') distances, indices = index.search(query, k)

print(f"Nearest neighbors: {indices}") print(f"Distances: {distances}")

undefined

k = 5 # Find 5 nearest neighbors query = np.random.random((1, d)).astype('float32') distances, indices = index.search(query, k)

print(f"Nearest neighbors: {indices}") print(f"Distances: {distances}")

undefined

Index types

索引类型

1. Flat (exact search)

1. Flat（精确搜索）

python

undefined

python

undefined

L2 (Euclidean) distance

index = faiss.IndexFlatL2(d)

Inner product (cosine similarity if normalized)

index = faiss.IndexFlatIP(d)

Slowest, most accurate

undefined

undefined

2. IVF (inverted file) - Fast approximate

2. IVF（倒排文件）- 快速近似搜索

python

undefined

python

undefined

Create quantizer

quantizer = faiss.IndexFlatL2(d)

IVF index with 100 clusters

nlist = 100 index = faiss.IndexIVFFlat(quantizer, d, nlist)

Train on data

index.train(vectors)

Add vectors

index.add(vectors)

Search (nprobe = clusters to search)

index.nprobe = 10 distances, indices = index.search(query, k)

undefined

index.nprobe = 10 distances, indices = index.search(query, k)

undefined

3. HNSW (Hierarchical NSW) - Best quality/speed

3. HNSW（层次导航小世界图）- 最佳精度/速度平衡

python

undefined

python

undefined

HNSW index

M = 32 # Number of connections per layer index = faiss.IndexHNSWFlat(d, M)

No training needed

index.add(vectors)

Search

distances, indices = index.search(query, k)

undefined

distances, indices = index.search(query, k)

undefined

4. Product Quantization - Memory efficient

4. 乘积量化 - 内存高效

python

undefined

python

undefined

PQ reduces memory by 16-32×

m = 8 # Number of subquantizers nbits = 8 index = faiss.IndexPQ(d, m, nbits)

Train and add

index.train(vectors) index.add(vectors)

undefined

index.train(vectors) index.add(vectors)

undefined

Save and load

保存与加载

python

undefined

python

undefined

Save index

faiss.write_index(index, "large.index")

Load index

index = faiss.read_index("large.index")

Continue using

distances, indices = index.search(query, k)

undefined

distances, indices = index.search(query, k)

undefined

GPU acceleration

GPU加速

python

undefined

python

undefined

Single GPU

res = faiss.StandardGpuResources() index_cpu = faiss.IndexFlatL2(d) index_gpu = faiss.index_cpu_to_gpu(res, 0, index_cpu) # GPU 0

Multi-GPU

index_gpu = faiss.index_cpu_to_all_gpus(index_cpu)

10-100× faster than CPU

undefined

undefined

LangChain integration

与LangChain集成

python

from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings

python

from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings

Create FAISS vector store

vectorstore = FAISS.from_documents(docs, OpenAIEmbeddings())

Save

vectorstore.save_local("faiss_index")

Load

vectorstore = FAISS.load_local( "faiss_index", OpenAIEmbeddings(), allow_dangerous_deserialization=True )

Search

results = vectorstore.similarity_search("query", k=5)

undefined

results = vectorstore.similarity_search("query", k=5)

undefined

LlamaIndex integration

与LlamaIndex集成

python

from llama_index.vector_stores.faiss import FaissVectorStore
import faiss

python

from llama_index.vector_stores.faiss import FaissVectorStore
import faiss

Create FAISS index

d = 1536 faiss_index = faiss.IndexFlatL2(d)

vector_store = FaissVectorStore(faiss_index=faiss_index)

undefined

d = 1536 faiss_index = faiss.IndexFlatL2(d)

vector_store = FaissVectorStore(faiss_index=faiss_index)

undefined

Best practices

最佳实践

Choose right index type - Flat for <10K, IVF for 10K-1M, HNSW for quality
Normalize for cosine - Use IndexFlatIP with normalized vectors
Use GPU for large datasets - 10-100× faster
Save trained indices - Training is expensive
Tune nprobe/ef_search - Balance speed/accuracy
Monitor memory - PQ for large datasets
Batch queries - Better GPU utilization

选择合适的索引类型 - 数据量<1万时用Flat，1万-100万时用IVF，追求精度时用HNSW
为余弦相似度做归一化 - 对归一化后的向量使用IndexFlatIP
大规模数据集使用GPU - 速度比CPU快10-100倍
保存训练好的索引 - 训练过程成本较高
调优nprobe/ef_search参数 - 在速度与精度之间取得平衡
监控内存使用 - 大规模数据集使用PQ索引
批量处理查询 - 提升GPU利用率

Performance

性能对比

Index Type	Build Time	Search Time	Memory	Accuracy
Flat	Fast	Slow	High	100%
IVF	Medium	Fast	Medium	95-99%
HNSW	Slow	Fastest	High	99%
PQ	Medium	Fast	Low	90-95%

索引类型	构建时间	搜索时间	内存占用	精度
Flat	快	慢	高	100%
IVF	中等	快	中等	95-99%
HNSW	慢	最快	高	99%
PQ	中等	快	低	90-95%

faiss

Original

Translation

FAISS - Efficient Similarity Search

FAISS - 高效相似性搜索

When to use FAISS

何时使用FAISS

Quick start

快速开始

Installation

安装

CPU only

CPU only

GPU support

GPU support

Basic usage

基础使用

Create sample data (1000 vectors, 128 dimensions)

Create sample data (1000 vectors, 128 dimensions)

Create index

Create index

Search

Search

Index types

索引类型

1. Flat (exact search)

1. Flat（精确搜索）

L2 (Euclidean) distance

L2 (Euclidean) distance

Inner product (cosine similarity if normalized)

Inner product (cosine similarity if normalized)

Slowest, most accurate

Slowest, most accurate

2. IVF (inverted file) - Fast approximate

2. IVF（倒排文件）- 快速近似搜索

Create quantizer

Create quantizer

IVF index with 100 clusters

IVF index with 100 clusters

Train on data

Train on data

Add vectors

Add vectors

Search (nprobe = clusters to search)

Search (nprobe = clusters to search)

3. HNSW (Hierarchical NSW) - Best quality/speed

3. HNSW（层次导航小世界图）- 最佳精度/速度平衡

HNSW index

HNSW index

No training needed

No training needed

Search

Search

4. Product Quantization - Memory efficient

4. 乘积量化 - 内存高效

PQ reduces memory by 16-32×

PQ reduces memory by 16-32×

Train and add

Train and add

Save and load

保存与加载

Save index

Save index

Load index

Load index

Continue using

Continue using

GPU acceleration

GPU加速

Single GPU

Single GPU

Multi-GPU

Multi-GPU

10-100× faster than CPU

10-100× faster than CPU

LangChain integration

与LangChain集成

Create FAISS vector store

Create FAISS vector store

Save