using-vector-databases
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseVector Databases for AI Applications
面向AI应用的向量数据库
When to Use This Skill
何时使用该技能
Use this skill when implementing:
- RAG (Retrieval-Augmented Generation) systems for AI chatbots
- Semantic search capabilities (meaning-based, not just keyword)
- Recommendation systems based on similarity
- Multi-modal AI (unified search across text, images, audio)
- Document similarity and deduplication
- Question answering over private knowledge bases
在实现以下系统时可使用本技能:
- **RAG(检索增强生成)**系统,用于AI聊天机器人
- 语义搜索功能(基于语义而非仅关键词)
- 基于相似度的推荐系统
- 多模态AI(跨文本、图像、音频的统一搜索)
- 文档相似度与去重
- 基于私有知识库的问答系统
Quick Decision Framework
快速决策框架
1. Vector Database Selection
1. 向量数据库选择
START: Choosing a Vector Database
EXISTING INFRASTRUCTURE?
├─ Using PostgreSQL already?
│ └─ pgvector (<10M vectors, tight budget)
│ See: references/pgvector.md
│
└─ No existing vector database?
│
├─ OPERATIONAL PREFERENCE?
│ │
│ ├─ Zero-ops managed only
│ │ └─ Pinecone (fully managed, excellent DX)
│ │ See: references/pinecone.md
│ │
│ └─ Flexible (self-hosted or managed)
│ │
│ ├─ SCALE: <100M vectors + complex filtering ⭐
│ │ └─ Qdrant (RECOMMENDED)
│ │ • Best metadata filtering
│ │ • Built-in hybrid search (BM25 + Vector)
│ │ • Self-host: Docker/K8s
│ │ • Managed: Qdrant Cloud
│ │ See: references/qdrant.md
│ │
│ ├─ SCALE: >100M vectors + GPU acceleration
│ │ └─ Milvus / Zilliz Cloud
│ │ See: references/milvus.md
│ │
│ ├─ Embedded / No server
│ │ └─ LanceDB (serverless, edge deployment)
│ │
│ └─ Local prototyping
│ └─ Chroma (simple API, in-memory)START: Choosing a Vector Database
EXISTING INFRASTRUCTURE?
├─ Using PostgreSQL already?
│ └─ pgvector (<10M vectors, tight budget)
│ See: references/pgvector.md
│
└─ No existing vector database?
│
├─ OPERATIONAL PREFERENCE?
│ │
│ ├─ Zero-ops managed only
│ │ └─ Pinecone (fully managed, excellent DX)
│ │ See: references/pinecone.md
│ │
│ └─ Flexible (self-hosted or managed)
│ │
│ ├─ SCALE: <100M vectors + complex filtering ⭐
│ │ └─ Qdrant (RECOMMENDED)
│ │ • Best metadata filtering
│ │ • Built-in hybrid search (BM25 + Vector)
│ │ • Self-host: Docker/K8s
│ │ • Managed: Qdrant Cloud
│ │ See: references/qdrant.md
│ │
│ ├─ SCALE: >100M vectors + GPU acceleration
│ │ └─ Milvus / Zilliz Cloud
│ │ See: references/milvus.md
│ │
│ ├─ Embedded / No server
│ │ └─ LanceDB (serverless, edge deployment)
│ │
│ └─ Local prototyping
│ └─ Chroma (simple API, in-memory)2. Embedding Model Selection
2. 嵌入模型选择
REQUIREMENTS?
├─ Best quality (cost no object)
│ └─ Voyage AI voyage-3 (1024d)
│ • 9.74% better than OpenAI on MTEB
│ • ~$0.12/1M tokens
│ See: references/embedding-strategies.md
│
├─ Enterprise reliability
│ └─ OpenAI text-embedding-3-large (3072d)
│ • Industry standard
│ • ~$0.13/1M tokens
│ • Maturity shortening: reduce to 256/512/1024d
│
├─ Cost-optimized
│ └─ OpenAI text-embedding-3-small (1536d)
│ • ~$0.02/1M tokens (6x cheaper)
│ • 90-95% of large model performance
│
├─ Multilingual (100+ languages)
│ └─ Cohere embed-v3 (1024d)
│ • ~$0.10/1M tokens
│
└─ Self-hosted / Privacy-critical
├─ English: nomic-embed-text-v1.5 (768d, Apache 2.0)
├─ Multilingual: BAAI/bge-m3 (1024d, MIT)
└─ Long docs: jina-embeddings-v2 (768d, 8K context)REQUIREMENTS?
├─ Best quality (cost no object)
│ └─ Voyage AI voyage-3 (1024d)
│ • 9.74% better than OpenAI on MTEB
│ • ~$0.12/1M tokens
│ See: references/embedding-strategies.md
│
├─ Enterprise reliability
│ └─ OpenAI text-embedding-3-large (3072d)
│ • Industry standard
│ • ~$0.13/1M tokens
│ • Maturity shortening: reduce to 256/512/1024d
│
├─ Cost-optimized
│ └─ OpenAI text-embedding-3-small (1536d)
│ • ~$0.02/1M tokens (6x cheaper)
│ • 90-95% of large model performance
│
├─ Multilingual (100+ languages)
│ └─ Cohere embed-v3 (1024d)
│ • ~$0.10/1M tokens
│
└─ Self-hosted / Privacy-critical
├─ English: nomic-embed-text-v1.5 (768d, Apache 2.0)
├─ Multilingual: BAAI/bge-m3 (1024d, MIT)
└─ Long docs: jina-embeddings-v2 (768d, 8K context)Core Concepts
核心概念
Document Chunking Strategy
文档分块策略
Recommended defaults for most RAG systems:
- Chunk size: 512 tokens (not characters)
- Overlap: 50 tokens (10% overlap)
Why these numbers?
- 512 tokens balances context vs. precision
- Too small (128-256): Fragments concepts, loses context
- Too large (1024-2048): Dilutes relevance, wastes LLM tokens
- 50 token overlap ensures sentences aren't split mid-context
See for advanced strategies by content type.
references/chunking-patterns.md大多数RAG系统的推荐默认配置:
- 分块大小: 512 tokens(非字符数)
- 重叠部分: 50 tokens(10%重叠)
为何选择这些数值?
- 512 tokens平衡了上下文完整性与检索精度
- 过小(128-256):会拆分完整概念,丢失上下文
- 过大(1024-2048):会稀释相关性,浪费LLM tokens
- 50 tokens的重叠确保句子不会被中途拆分,保留上下文连贯性
如需针对不同内容类型的进阶策略,请查看。
references/chunking-patterns.mdHybrid Search (Vector + Keyword)
混合搜索(向量+关键词)
Hybrid Search = Vector Similarity + BM25 Keyword Matching
User Query: "OAuth refresh token implementation"
│
┌──────┴──────┐
│ │
Vector Search Keyword Search
(Semantic) (BM25)
│ │
Top 20 docs Top 20 docs
│ │
└──────┬──────┘
│
Reciprocal Rank Fusion
(Merge + Re-rank)
│
Final Top 5 ResultsWhy hybrid matters:
- Vector captures semantic meaning ("OAuth refresh" ≈ "token renewal")
- Keyword ensures exact matches ("refresh_token" literal)
- Combined provides best retrieval quality
See for implementation details.
references/hybrid-search.md混合搜索 = 向量相似度匹配 + BM25关键词匹配
User Query: "OAuth refresh token implementation"
│
┌──────┴──────┐
│ │
Vector Search Keyword Search
(Semantic) (BM25)
│ │
Top 20 docs Top 20 docs
│ │
└──────┬──────┘
│
Reciprocal Rank Fusion
(Merge + Re-rank)
│
Final Top 5 Results混合搜索的重要性:
- 向量搜索捕捉语义含义("OAuth refresh" ≈ "token renewal")
- 关键词搜索确保精确匹配(如字面意义上的"refresh_token")
- 两者结合可实现最佳检索质量
如需实现细节,请查看。
references/hybrid-search.mdGetting Started
快速开始
Python + Qdrant Example
Python + Qdrant 示例
python
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStructpython
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct1. Initialize client
1. Initialize client
client = QdrantClient("localhost", port=6333)
client = QdrantClient("localhost", port=6333)
2. Create collection
2. Create collection
client.create_collection(
collection_name="documents",
vectors_config=VectorParams(size=1024, distance=Distance.COSINE)
)
client.create_collection(
collection_name="documents",
vectors_config=VectorParams(size=1024, distance=Distance.COSINE)
)
3. Insert documents with embeddings
3. Insert documents with embeddings
points = [
PointStruct(
id=idx,
vector=embedding, # From OpenAI/Voyage/etc
payload={
"text": chunk_text,
"source": "docs/api.md",
"section": "Authentication"
}
)
for idx, (embedding, chunk_text) in enumerate(chunks)
]
client.upsert(collection_name="documents", points=points)
points = [
PointStruct(
id=idx,
vector=embedding, # From OpenAI/Voyage/etc
payload={
"text": chunk_text,
"source": "docs/api.md",
"section": "Authentication"
}
)
for idx, (embedding, chunk_text) in enumerate(chunks)
]
client.upsert(collection_name="documents", points=points)
4. Search with metadata filtering
4. Search with metadata filtering
results = client.search(
collection_name="documents",
query_vector=query_embedding,
limit=5,
query_filter={
"must": [
{"key": "section", "match": {"value": "Authentication"}}
]
}
)
For complete examples, see `examples/qdrant-python/`.results = client.search(
collection_name="documents",
query_vector=query_embedding,
limit=5,
query_filter={
"must": [
{"key": "section", "match": {"value": "Authentication"}}
]
}
)
完整示例请查看`examples/qdrant-python/`。TypeScript + Qdrant Example
TypeScript + Qdrant 示例
typescript
import { QdrantClient } from '@qdrant/js-client-rest';
const client = new QdrantClient({ url: 'http://localhost:6333' });
// Create collection
await client.createCollection('documents', {
vectors: { size: 1024, distance: 'Cosine' }
});
// Insert documents
await client.upsert('documents', {
points: chunks.map((chunk, idx) => ({
id: idx,
vector: chunk.embedding,
payload: {
text: chunk.text,
source: chunk.source
}
}))
});
// Search
const results = await client.search('documents', {
vector: queryEmbedding,
limit: 5,
filter: {
must: [
{ key: 'source', match: { value: 'docs/api.md' } }
]
}
});For complete examples, see .
examples/typescript-rag/typescript
import { QdrantClient } from '@qdrant/js-client-rest';
const client = new QdrantClient({ url: 'http://localhost:6333' });
// Create collection
await client.createCollection('documents', {
vectors: { size: 1024, distance: 'Cosine' }
});
// Insert documents
await client.upsert('documents', {
points: chunks.map((chunk, idx) => ({
id: idx,
vector: chunk.embedding,
payload: {
text: chunk.text,
source: chunk.source
}
}))
});
// Search
const results = await client.search('documents', {
vector: queryEmbedding,
limit: 5,
filter: {
must: [
{ key: 'source', match: { value: 'docs/api.md' } }
]
}
});完整示例请查看。
examples/typescript-rag/RAG Pipeline Architecture
RAG 流水线架构
Complete Pipeline Components
完整流水线组件
1. INGESTION
├─ Document Loading (PDF, web, code, Office)
├─ Text Extraction & Cleaning
├─ Chunking (semantic, recursive, code-aware)
└─ Embedding Generation (batch, rate-limited)
2. INDEXING
├─ Vector Store Insertion (batch upsert)
├─ Index Configuration (HNSW, distance metric)
└─ Keyword Index (BM25 for hybrid search)
3. RETRIEVAL (Query Time)
├─ Query Processing (expansion, embedding)
├─ Hybrid Search (vector + keyword)
├─ Filtering & Post-Processing (metadata, MMR)
└─ Re-Ranking (cross-encoder, LLM-based)
4. GENERATION
├─ Context Construction (format chunks, citations)
├─ Prompt Engineering (system + context + query)
├─ LLM Inference (streaming, temperature tuning)
└─ Response Post-Processing (citations, validation)
5. EVALUATION (Production Critical)
├─ Retrieval Metrics (precision, recall, relevancy)
├─ Generation Metrics (faithfulness, correctness)
└─ System Metrics (latency, cost, satisfaction)1. INGESTION
├─ Document Loading (PDF, web, code, Office)
├─ Text Extraction & Cleaning
├─ Chunking (semantic, recursive, code-aware)
└─ Embedding Generation (batch, rate-limited)
2. INDEXING
├─ Vector Store Insertion (batch upsert)
├─ Index Configuration (HNSW, distance metric)
└─ Keyword Index (BM25 for hybrid search)
3. RETRIEVAL (Query Time)
├─ Query Processing (expansion, embedding)
├─ Hybrid Search (vector + keyword)
├─ Filtering & Post-Processing (metadata, MMR)
└─ Re-Ranking (cross-encoder, LLM-based)
4. GENERATION
├─ Context Construction (format chunks, citations)
├─ Prompt Engineering (system + context + query)
├─ LLM Inference (streaming, temperature tuning)
└─ Response Post-Processing (citations, validation)
5. EVALUATION (Production Critical)
├─ Retrieval Metrics (precision, recall, relevancy)
├─ Generation Metrics (faithfulness, correctness)
└─ System Metrics (latency, cost, satisfaction)Essential Metadata for Production RAG
生产级RAG的必备元数据
Critical for filtering and relevance:
python
metadata = {
# SOURCE TRACKING
"source": "docs/api-reference.md",
"source_type": "documentation", # code, docs, logs, chat
"last_updated": "2025-12-01T12:00:00Z",
# HIERARCHICAL CONTEXT
"section": "Authentication",
"subsection": "OAuth 2.1",
"heading_hierarchy": ["API Reference", "Authentication", "OAuth 2.1"],
# CONTENT CLASSIFICATION
"content_type": "code_example", # prose, code, table, list
"programming_language": "python",
# FILTERING DIMENSIONS
"product_version": "v2.0",
"audience": "enterprise", # free, pro, enterprise
# RETRIEVAL HINTS
"chunk_index": 3,
"total_chunks": 12,
"has_code": True
}Why metadata matters:
- Enables filtering BEFORE vector search (reduces search space)
- Improves relevance through targeted retrieval
- Supports multi-tenant systems (filter by user/org)
- Enables versioned documentation (filter by product version)
对过滤和相关性至关重要:
python
metadata = {
# SOURCE TRACKING
"source": "docs/api-reference.md",
"source_type": "documentation", # code, docs, logs, chat
"last_updated": "2025-12-01T12:00:00Z",
# HIERARCHICAL CONTEXT
"section": "Authentication",
"subsection": "OAuth 2.1",
"heading_hierarchy": ["API Reference", "Authentication", "OAuth 2.1"],
# CONTENT CLASSIFICATION
"content_type": "code_example", # prose, code, table, list
"programming_language": "python",
# FILTERING DIMENSIONS
"product_version": "v2.0",
"audience": "enterprise", # free, pro, enterprise
# RETRIEVAL HINTS
"chunk_index": 3,
"total_chunks": 12,
"has_code": True
}元数据的重要性:
- 支持在向量搜索前进行过滤(缩小搜索范围)
- 通过定向检索提升相关性
- 支持多租户系统(按用户/组织过滤)
- 支持版本化文档(按产品版本过滤)
Evaluation with RAGAS
使用RAGAS进行评估
Use scripts/evaluate_rag.py for automated evaluation:
python
from ragas import evaluate
from ragas.metrics import (
faithfulness, # Answer grounded in context
answer_relevancy, # Answer addresses query
context_recall, # Retrieved docs cover ground truth
context_precision # Retrieved docs are relevant
)使用scripts/evaluate_rag.py进行自动化评估:
python
from ragas import evaluate
from ragas.metrics import (
faithfulness, # Answer grounded in context
answer_relevancy, # Answer addresses query
context_recall, # Retrieved docs cover ground truth
context_precision # Retrieved docs are relevant
)Test dataset
Test dataset
test_data = {
"question": ["How do I refresh OAuth tokens?"],
"answer": ["Use /token with refresh_token grant..."],
"contexts": [["OAuth refresh documentation..."]],
"ground_truth": ["POST to /token with grant_type=refresh_token"]
}
test_data = {
"question": ["How do I refresh OAuth tokens?"],
"answer": ["Use /token with refresh_token grant..."],
"contexts": [["OAuth refresh documentation..."]],
"ground_truth": ["POST to /token with grant_type=refresh_token"]
}
Evaluate
Evaluate
results = evaluate(test_data, metrics=[
faithfulness,
answer_relevancy,
context_recall,
context_precision
])
results = evaluate(test_data, metrics=[
faithfulness,
answer_relevancy,
context_recall,
context_precision
])
Production targets:
Production targets:
faithfulness: >0.90 (minimal hallucination)
faithfulness: >0.90 (minimal hallucination)
answer_relevancy: >0.85 (addresses user query)
answer_relevancy: >0.85 (addresses user query)
context_recall: >0.80 (sufficient context retrieved)
context_recall: >0.80 (sufficient context retrieved)
context_precision: >0.75 (minimal noise)
context_precision: >0.75 (minimal noise)
undefinedundefinedPerformance Optimization
性能优化
Embedding Generation
嵌入生成
- Batch processing: 100-500 chunks per batch
- Caching: Cache embeddings by content hash
- Rate limiting: Respect API provider limits (exponential backoff)
- 批量处理: 每批100-500个分块
- 缓存: 按内容哈希缓存嵌入结果
- 速率限制: 遵守API提供商的限制(指数退避策略)
Vector Search
向量搜索
- Index type: HNSW (Hierarchical Navigable Small World) for most cases
- Distance metric: Cosine for normalized embeddings
- Pre-filtering: Apply metadata filters before vector search
- Result diversity: Use MMR (Maximal Marginal Relevance) to reduce redundancy
- 索引类型: 大多数场景使用HNSW(层次可导航小世界)
- 距离度量: 归一化嵌入使用余弦距离
- 预过滤: 在向量搜索前应用元数据过滤
- 结果多样性: 使用MMR(最大边际相关性)减少冗余
Cost Optimization
成本优化
- Embedding model: Consider text-embedding-3-small for budget constraints
- Dimension reduction: Use maturity shortening (3072d → 1024d)
- Caching: Implement semantic caching for repeated queries
- Batch operations: Group insertions/updates for efficiency
- 嵌入模型: 预算有限时考虑text-embedding-3-small
- 维度缩减: 使用maturity shortening(3072d → 1024d)
- 缓存: 为重复查询实现语义缓存
- 批量操作: 分组执行插入/更新操作以提升效率
Common Workflows
常见工作流
1. Building a RAG Chatbot
1. 构建RAG聊天机器人
- Vector database: Qdrant (self-hosted or cloud)
- Embeddings: OpenAI text-embedding-3-large
- Chunking: 512 tokens, 50 overlap, semantic splitter
- Search: Hybrid (vector + BM25)
- Integration: Frontend with ai-chat skill
See for complete implementation.
examples/qdrant-python/- 向量数据库:Qdrant(自托管或云服务)
- 嵌入模型:OpenAI text-embedding-3-large
- 分块策略:512 tokens,50个重叠,语义拆分器
- 搜索方式:混合搜索(向量+BM25)
- 集成:与ai-chat技能配合实现前端聊天界面
完整实现请查看。
examples/qdrant-python/2. Semantic Search Engine
2. 语义搜索引擎
- Vector database: Qdrant or Pinecone
- Embeddings: Voyage AI voyage-3 (best quality)
- Chunking: Content-type specific (see chunking-patterns.md)
- Search: Hybrid with re-ranking
- Filtering: Pre-filter by metadata (date, category, etc.)
- 向量数据库:Qdrant或Pinecone
- 嵌入模型:Voyage AI voyage-3(最佳质量)
- 分块策略:针对不同内容类型定制(查看chunking-patterns.md)
- 搜索方式:混合搜索+重排序
- 过滤:按元数据(日期、类别等)预过滤
3. Code Search
3. 代码搜索
- Vector database: Qdrant
- Embeddings: OpenAI text-embedding-3-large
- Chunking: AST-based (function/class boundaries)
- Metadata: Language, file path, imports
- Search: Hybrid with language filtering
See for code-specific implementation.
examples/qdrant-python/- 向量数据库:Qdrant
- 嵌入模型:OpenAI text-embedding-3-large
- 分块策略:基于AST(函数/类边界)
- 元数据:编程语言、文件路径、导入依赖
- 搜索方式:混合搜索+语言过滤
代码专属实现请查看。
examples/qdrant-python/Integration with Other Skills
与其他技能的集成
Frontend Skills
前端技能
- ai-chat: Vector DB powers RAG pipeline behind chat interface
- search-filter: Replace keyword search with semantic search
- data-viz: Visualize embedding spaces, similarity scores
- ai-chat: 向量数据库为聊天界面背后的RAG流水线提供支持
- search-filter: 用语义搜索替代关键词搜索
- data-viz: 可视化嵌入空间、相似度得分
Backend Skills
后端技能
- databases-relational: Hybrid approach using pgvector extension
- api-patterns: Expose semantic search via REST/GraphQL
- observability: Monitor embedding quality and retrieval metrics
- databases-relational: 使用pgvector扩展实现混合方案
- api-patterns: 通过REST/GraphQL暴露语义搜索接口
- observability: 监控嵌入质量和检索指标
Multi-Language Support
多语言支持
Python (Primary)
Python(首选)
- Client:
qdrant-client - Framework: LangChain, LlamaIndex
- See:
examples/qdrant-python/
- 客户端:
qdrant-client - 框架:LangChain、LlamaIndex
- 查看:
examples/qdrant-python/
Rust
Rust
- Client: (1,549 code snippets in Context7)
qdrant-client - Framework: Raw Rust for performance-critical systems
- See:
examples/rust-axum-vector/
- 客户端:(Context7中有1549个代码片段)
qdrant-client - 框架:纯Rust实现,适用于性能关键型系统
- 查看:
examples/rust-axum-vector/
TypeScript
TypeScript
- Client:
@qdrant/js-client-rest - Framework: LangChain.js, integration with Next.js
- See:
examples/typescript-rag/
- 客户端:
@qdrant/js-client-rest - 框架:LangChain.js,与Next.js集成
- 查看:
examples/typescript-rag/
Go
Go
- Client:
qdrant-go - Use case: High-performance microservices
- 客户端:
qdrant-go - 适用场景:高性能微服务
Troubleshooting
故障排查
Poor Retrieval Quality
检索质量不佳
- Check chunking strategy (too large/small?)
- Verify metadata filtering (too restrictive?)
- Try hybrid search instead of vector-only
- Implement re-ranking stage
- Evaluate with RAGAS metrics
- 检查分块策略(过大/过小?)
- 验证元数据过滤(过于严格?)
- 尝试混合搜索而非仅向量搜索
- 实现重排序阶段
- 使用RAGAS指标进行评估
Slow Performance
性能缓慢
- Use HNSW index (not Flat)
- Pre-filter with metadata before vector search
- Reduce vector dimensions (maturity shortening)
- Batch operations (insertions, searches)
- Consider GPU acceleration (Milvus)
- 使用HNSW索引(而非Flat)
- 向量搜索前先用元数据预过滤
- 减少向量维度(maturity shortening)
- 批量操作(插入、搜索)
- 考虑GPU加速(Milvus)
High Costs
成本过高
- Switch to text-embedding-3-small
- Implement semantic caching
- Reduce chunk overlap
- Use self-hosted embeddings (nomic, bge-m3)
- Batch embedding generation
- 切换到text-embedding-3-small
- 实现语义缓存
- 减少分块重叠
- 使用自托管嵌入模型(nomic、bge-m3)
- 批量生成嵌入
Qdrant Context7 Documentation
Qdrant Context7 文档
Primary resource:
/llmstxt/qdrant_tech_llms-full_txt- Trust score: High
- Code snippets: 10,154
- Quality score: 83.1
Access via Context7:
resolve-library-id({ libraryName: "Qdrant" })
get-library-docs({
context7CompatibleLibraryID: "/llmstxt/qdrant_tech_llms-full_txt",
topic: "hybrid search collections python",
mode: "code"
})主要资源:
/llmstxt/qdrant_tech_llms-full_txt- 信任评分: 高
- 代码片段: 10154个
- 质量评分: 83.1
通过Context7访问:
resolve-library-id({ libraryName: "Qdrant" })
get-library-docs({
context7CompatibleLibraryID: "/llmstxt/qdrant_tech_llms-full_txt",
topic: "hybrid search collections python",
mode: "code"
})Additional Resources
额外资源
Reference Documentation
参考文档
- - Comprehensive Qdrant guide
references/qdrant.md - - PostgreSQL pgvector extension
references/pgvector.md - - Milvus/Zilliz for billion-scale
references/milvus.md - - Embedding model comparison
references/embedding-strategies.md - - Advanced chunking techniques
references/chunking-patterns.md
- - 全面Qdrant指南
references/qdrant.md - - PostgreSQL pgvector扩展
references/pgvector.md - - 十亿级规模的Milvus/Zilliz
references/milvus.md - - 嵌入模型对比
references/embedding-strategies.md - - 进阶分块技术
references/chunking-patterns.md
Code Examples
代码示例
- - FastAPI + Qdrant RAG pipeline
examples/qdrant-python/ - - PostgreSQL + Prisma integration
examples/pgvector-prisma/ - - TypeScript RAG with Hono
examples/typescript-rag/
- - FastAPI + Qdrant RAG流水线
examples/qdrant-python/ - - PostgreSQL + Prisma集成
examples/pgvector-prisma/ - - TypeScript RAG with Hono
examples/typescript-rag/
Automation Scripts
自动化脚本
- - Batch embedding generation
scripts/generate_embeddings.py - - Performance benchmarking
scripts/benchmark_similarity.py - - RAGAS-based evaluation
scripts/evaluate_rag.py
Next Steps:
- Choose vector database based on scale and infrastructure
- Select embedding model based on quality vs. cost trade-off
- Implement chunking strategy for the content type
- Set up hybrid search for production quality
- Evaluate with RAGAS metrics
- Optimize for performance and cost
- - 批量嵌入生成
scripts/generate_embeddings.py - - 性能基准测试
scripts/benchmark_similarity.py - - 基于RAGAS的评估
scripts/evaluate_rag.py
下一步:
- 根据规模和基础设施选择向量数据库
- 根据质量与成本的权衡选择嵌入模型
- 针对内容类型实现分块策略
- 搭建生产级混合搜索
- 使用RAGAS指标进行评估
- 针对性能和成本进行优化