faiss

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

FAISS - Efficient Similarity Search

FAISS - 高效相似性搜索

Facebook AI's library for billion-scale vector similarity search.
Facebook AI推出的面向十亿级规模向量的相似性搜索库。

When to use FAISS

何时使用FAISS

Use FAISS when:
  • Need fast similarity search on large vector datasets (millions/billions)
  • GPU acceleration required
  • Pure vector similarity (no metadata filtering needed)
  • High throughput, low latency critical
  • Offline/batch processing of embeddings
Metrics:
  • 31,700+ GitHub stars
  • Meta/Facebook AI Research
  • Handles billions of vectors
  • C++ with Python bindings
Use alternatives instead:
  • Chroma/Pinecone: Need metadata filtering
  • Weaviate: Need full database features
  • Annoy: Simpler, fewer features
使用FAISS的场景:
  • 需要在大规模向量数据集(百万/十亿级)上进行快速相似性搜索
  • 需要GPU加速
  • 仅需向量相似性检索(无需元数据过滤)
  • 对高吞吐量、低延迟有严格要求
  • 对嵌入向量进行离线/批量处理
相关指标:
  • GitHub星标数31700+
  • 由Meta/Facebook AI Research开发
  • 支持处理数十亿级向量
  • 基于C++开发,提供Python绑定
可选择替代工具的场景:
  • Chroma/Pinecone:需要元数据过滤功能时选择
  • Weaviate:需要完整数据库功能时选择
  • Annoy:需求简单、需要更少功能时选择

Quick start

快速开始

Installation

安装

bash
undefined
bash
undefined

CPU only

CPU only

pip install faiss-cpu
pip install faiss-cpu

GPU support

GPU support

pip install faiss-gpu
undefined
pip install faiss-gpu
undefined

Basic usage

基础使用

python
import faiss
import numpy as np
python
import faiss
import numpy as np

Create sample data (1000 vectors, 128 dimensions)

Create sample data (1000 vectors, 128 dimensions)

d = 128 nb = 1000 vectors = np.random.random((nb, d)).astype('float32')
d = 128 nb = 1000 vectors = np.random.random((nb, d)).astype('float32')

Create index

Create index

index = faiss.IndexFlatL2(d) # L2 distance index.add(vectors) # Add vectors
index = faiss.IndexFlatL2(d) # L2 distance index.add(vectors) # Add vectors

Search

Search

k = 5 # Find 5 nearest neighbors query = np.random.random((1, d)).astype('float32') distances, indices = index.search(query, k)
print(f"Nearest neighbors: {indices}") print(f"Distances: {distances}")
undefined
k = 5 # Find 5 nearest neighbors query = np.random.random((1, d)).astype('float32') distances, indices = index.search(query, k)
print(f"Nearest neighbors: {indices}") print(f"Distances: {distances}")
undefined

Index types

索引类型

1. Flat (exact search)

1. Flat(精确搜索)

python
undefined
python
undefined

L2 (Euclidean) distance

L2 (Euclidean) distance

index = faiss.IndexFlatL2(d)
index = faiss.IndexFlatL2(d)

Inner product (cosine similarity if normalized)

Inner product (cosine similarity if normalized)

index = faiss.IndexFlatIP(d)
index = faiss.IndexFlatIP(d)

Slowest, most accurate

Slowest, most accurate

undefined
undefined

2. IVF (inverted file) - Fast approximate

2. IVF(倒排文件)- 快速近似搜索

python
undefined
python
undefined

Create quantizer

Create quantizer

quantizer = faiss.IndexFlatL2(d)
quantizer = faiss.IndexFlatL2(d)

IVF index with 100 clusters

IVF index with 100 clusters

nlist = 100 index = faiss.IndexIVFFlat(quantizer, d, nlist)
nlist = 100 index = faiss.IndexIVFFlat(quantizer, d, nlist)

Train on data

Train on data

index.train(vectors)
index.train(vectors)

Add vectors

Add vectors

index.add(vectors)
index.add(vectors)

Search (nprobe = clusters to search)

Search (nprobe = clusters to search)

index.nprobe = 10 distances, indices = index.search(query, k)
undefined
index.nprobe = 10 distances, indices = index.search(query, k)
undefined

3. HNSW (Hierarchical NSW) - Best quality/speed

3. HNSW(层次导航小世界图)- 最佳精度/速度平衡

python
undefined
python
undefined

HNSW index

HNSW index

M = 32 # Number of connections per layer index = faiss.IndexHNSWFlat(d, M)
M = 32 # Number of connections per layer index = faiss.IndexHNSWFlat(d, M)

No training needed

No training needed

index.add(vectors)
index.add(vectors)

Search

Search

distances, indices = index.search(query, k)
undefined
distances, indices = index.search(query, k)
undefined

4. Product Quantization - Memory efficient

4. 乘积量化 - 内存高效

python
undefined
python
undefined

PQ reduces memory by 16-32×

PQ reduces memory by 16-32×

m = 8 # Number of subquantizers nbits = 8 index = faiss.IndexPQ(d, m, nbits)
m = 8 # Number of subquantizers nbits = 8 index = faiss.IndexPQ(d, m, nbits)

Train and add

Train and add

index.train(vectors) index.add(vectors)
undefined
index.train(vectors) index.add(vectors)
undefined

Save and load

保存与加载

python
undefined
python
undefined

Save index

Save index

faiss.write_index(index, "large.index")
faiss.write_index(index, "large.index")

Load index

Load index

index = faiss.read_index("large.index")
index = faiss.read_index("large.index")

Continue using

Continue using

distances, indices = index.search(query, k)
undefined
distances, indices = index.search(query, k)
undefined

GPU acceleration

GPU加速

python
undefined
python
undefined

Single GPU

Single GPU

res = faiss.StandardGpuResources() index_cpu = faiss.IndexFlatL2(d) index_gpu = faiss.index_cpu_to_gpu(res, 0, index_cpu) # GPU 0
res = faiss.StandardGpuResources() index_cpu = faiss.IndexFlatL2(d) index_gpu = faiss.index_cpu_to_gpu(res, 0, index_cpu) # GPU 0

Multi-GPU

Multi-GPU

index_gpu = faiss.index_cpu_to_all_gpus(index_cpu)
index_gpu = faiss.index_cpu_to_all_gpus(index_cpu)

10-100× faster than CPU

10-100× faster than CPU

undefined
undefined

LangChain integration

与LangChain集成

python
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
python
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings

Create FAISS vector store

Create FAISS vector store

vectorstore = FAISS.from_documents(docs, OpenAIEmbeddings())
vectorstore = FAISS.from_documents(docs, OpenAIEmbeddings())

Save

Save

vectorstore.save_local("faiss_index")
vectorstore.save_local("faiss_index")

Load

Load

vectorstore = FAISS.load_local( "faiss_index", OpenAIEmbeddings(), allow_dangerous_deserialization=True )
vectorstore = FAISS.load_local( "faiss_index", OpenAIEmbeddings(), allow_dangerous_deserialization=True )

Search

Search

results = vectorstore.similarity_search("query", k=5)
undefined
results = vectorstore.similarity_search("query", k=5)
undefined

LlamaIndex integration

与LlamaIndex集成

python
from llama_index.vector_stores.faiss import FaissVectorStore
import faiss
python
from llama_index.vector_stores.faiss import FaissVectorStore
import faiss

Create FAISS index

Create FAISS index

d = 1536 faiss_index = faiss.IndexFlatL2(d)
vector_store = FaissVectorStore(faiss_index=faiss_index)
undefined
d = 1536 faiss_index = faiss.IndexFlatL2(d)
vector_store = FaissVectorStore(faiss_index=faiss_index)
undefined

Best practices

最佳实践

  1. Choose right index type - Flat for <10K, IVF for 10K-1M, HNSW for quality
  2. Normalize for cosine - Use IndexFlatIP with normalized vectors
  3. Use GPU for large datasets - 10-100× faster
  4. Save trained indices - Training is expensive
  5. Tune nprobe/ef_search - Balance speed/accuracy
  6. Monitor memory - PQ for large datasets
  7. Batch queries - Better GPU utilization
  1. 选择合适的索引类型 - 数据量<1万时用Flat,1万-100万时用IVF,追求精度时用HNSW
  2. 为余弦相似度做归一化 - 对归一化后的向量使用IndexFlatIP
  3. 大规模数据集使用GPU - 速度比CPU快10-100倍
  4. 保存训练好的索引 - 训练过程成本较高
  5. 调优nprobe/ef_search参数 - 在速度与精度之间取得平衡
  6. 监控内存使用 - 大规模数据集使用PQ索引
  7. 批量处理查询 - 提升GPU利用率

Performance

性能对比

Index TypeBuild TimeSearch TimeMemoryAccuracy
FlatFastSlowHigh100%
IVFMediumFastMedium95-99%
HNSWSlowFastestHigh99%
PQMediumFastLow90-95%
索引类型构建时间搜索时间内存占用精度
Flat100%
IVF中等中等95-99%
HNSW最快99%
PQ中等90-95%

Resources

相关资源