using-vector-databases

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Vector Databases for AI Applications

面向AI应用的向量数据库

When to Use This Skill

何时使用该技能

Use this skill when implementing:
  • RAG (Retrieval-Augmented Generation) systems for AI chatbots
  • Semantic search capabilities (meaning-based, not just keyword)
  • Recommendation systems based on similarity
  • Multi-modal AI (unified search across text, images, audio)
  • Document similarity and deduplication
  • Question answering over private knowledge bases
在实现以下系统时可使用本技能:
  • **RAG(检索增强生成)**系统,用于AI聊天机器人
  • 语义搜索功能(基于语义而非仅关键词)
  • 基于相似度的推荐系统
  • 多模态AI(跨文本、图像、音频的统一搜索)
  • 文档相似度与去重
  • 基于私有知识库的问答系统

Quick Decision Framework

快速决策框架

1. Vector Database Selection

1. 向量数据库选择

START: Choosing a Vector Database

EXISTING INFRASTRUCTURE?
├─ Using PostgreSQL already?
│  └─ pgvector (<10M vectors, tight budget)
│      See: references/pgvector.md
└─ No existing vector database?
   ├─ OPERATIONAL PREFERENCE?
   │  │
   │  ├─ Zero-ops managed only
   │  │  └─ Pinecone (fully managed, excellent DX)
   │  │      See: references/pinecone.md
   │  │
   │  └─ Flexible (self-hosted or managed)
   │     │
   │     ├─ SCALE: <100M vectors + complex filtering ⭐
   │     │  └─ Qdrant (RECOMMENDED)
   │     │      • Best metadata filtering
   │     │      • Built-in hybrid search (BM25 + Vector)
   │     │      • Self-host: Docker/K8s
   │     │      • Managed: Qdrant Cloud
   │     │      See: references/qdrant.md
   │     │
   │     ├─ SCALE: >100M vectors + GPU acceleration
   │     │  └─ Milvus / Zilliz Cloud
   │     │      See: references/milvus.md
   │     │
   │     ├─ Embedded / No server
   │     │  └─ LanceDB (serverless, edge deployment)
   │     │
   │     └─ Local prototyping
   │        └─ Chroma (simple API, in-memory)
START: Choosing a Vector Database

EXISTING INFRASTRUCTURE?
├─ Using PostgreSQL already?
│  └─ pgvector (<10M vectors, tight budget)
│      See: references/pgvector.md
└─ No existing vector database?
   ├─ OPERATIONAL PREFERENCE?
   │  │
   │  ├─ Zero-ops managed only
   │  │  └─ Pinecone (fully managed, excellent DX)
   │  │      See: references/pinecone.md
   │  │
   │  └─ Flexible (self-hosted or managed)
   │     │
   │     ├─ SCALE: <100M vectors + complex filtering ⭐
   │     │  └─ Qdrant (RECOMMENDED)
   │     │      • Best metadata filtering
   │     │      • Built-in hybrid search (BM25 + Vector)
   │     │      • Self-host: Docker/K8s
   │     │      • Managed: Qdrant Cloud
   │     │      See: references/qdrant.md
   │     │
   │     ├─ SCALE: >100M vectors + GPU acceleration
   │     │  └─ Milvus / Zilliz Cloud
   │     │      See: references/milvus.md
   │     │
   │     ├─ Embedded / No server
   │     │  └─ LanceDB (serverless, edge deployment)
   │     │
   │     └─ Local prototyping
   │        └─ Chroma (simple API, in-memory)

2. Embedding Model Selection

2. 嵌入模型选择

REQUIREMENTS?

├─ Best quality (cost no object)
│  └─ Voyage AI voyage-3 (1024d)
│      • 9.74% better than OpenAI on MTEB
│      • ~$0.12/1M tokens
│      See: references/embedding-strategies.md
├─ Enterprise reliability
│  └─ OpenAI text-embedding-3-large (3072d)
│      • Industry standard
│      • ~$0.13/1M tokens
│      • Maturity shortening: reduce to 256/512/1024d
├─ Cost-optimized
│  └─ OpenAI text-embedding-3-small (1536d)
│      • ~$0.02/1M tokens (6x cheaper)
│      • 90-95% of large model performance
├─ Multilingual (100+ languages)
│  └─ Cohere embed-v3 (1024d)
│      • ~$0.10/1M tokens
└─ Self-hosted / Privacy-critical
   ├─ English: nomic-embed-text-v1.5 (768d, Apache 2.0)
   ├─ Multilingual: BAAI/bge-m3 (1024d, MIT)
   └─ Long docs: jina-embeddings-v2 (768d, 8K context)
REQUIREMENTS?

├─ Best quality (cost no object)
│  └─ Voyage AI voyage-3 (1024d)
│      • 9.74% better than OpenAI on MTEB
│      • ~$0.12/1M tokens
│      See: references/embedding-strategies.md
├─ Enterprise reliability
│  └─ OpenAI text-embedding-3-large (3072d)
│      • Industry standard
│      • ~$0.13/1M tokens
│      • Maturity shortening: reduce to 256/512/1024d
├─ Cost-optimized
│  └─ OpenAI text-embedding-3-small (1536d)
│      • ~$0.02/1M tokens (6x cheaper)
│      • 90-95% of large model performance
├─ Multilingual (100+ languages)
│  └─ Cohere embed-v3 (1024d)
│      • ~$0.10/1M tokens
└─ Self-hosted / Privacy-critical
   ├─ English: nomic-embed-text-v1.5 (768d, Apache 2.0)
   ├─ Multilingual: BAAI/bge-m3 (1024d, MIT)
   └─ Long docs: jina-embeddings-v2 (768d, 8K context)

Core Concepts

核心概念

Document Chunking Strategy

文档分块策略

Recommended defaults for most RAG systems:
  • Chunk size: 512 tokens (not characters)
  • Overlap: 50 tokens (10% overlap)
Why these numbers?
  • 512 tokens balances context vs. precision
    • Too small (128-256): Fragments concepts, loses context
    • Too large (1024-2048): Dilutes relevance, wastes LLM tokens
  • 50 token overlap ensures sentences aren't split mid-context
See
references/chunking-patterns.md
for advanced strategies by content type.
大多数RAG系统的推荐默认配置:
  • 分块大小: 512 tokens(非字符数)
  • 重叠部分: 50 tokens(10%重叠)
为何选择这些数值?
  • 512 tokens平衡了上下文完整性与检索精度
    • 过小(128-256):会拆分完整概念,丢失上下文
    • 过大(1024-2048):会稀释相关性,浪费LLM tokens
  • 50 tokens的重叠确保句子不会被中途拆分,保留上下文连贯性
如需针对不同内容类型的进阶策略,请查看
references/chunking-patterns.md

Hybrid Search (Vector + Keyword)

混合搜索(向量+关键词)

Hybrid Search = Vector Similarity + BM25 Keyword Matching
User Query: "OAuth refresh token implementation"
    ┌──────┴──────┐
    │             │
Vector Search   Keyword Search
(Semantic)      (BM25)
    │             │
Top 20 docs   Top 20 docs
    │             │
    └──────┬──────┘
   Reciprocal Rank Fusion
   (Merge + Re-rank)
    Final Top 5 Results
Why hybrid matters:
  • Vector captures semantic meaning ("OAuth refresh" ≈ "token renewal")
  • Keyword ensures exact matches ("refresh_token" literal)
  • Combined provides best retrieval quality
See
references/hybrid-search.md
for implementation details.
混合搜索 = 向量相似度匹配 + BM25关键词匹配
User Query: "OAuth refresh token implementation"
    ┌──────┴──────┐
    │             │
Vector Search   Keyword Search
(Semantic)      (BM25)
    │             │
Top 20 docs   Top 20 docs
    │             │
    └──────┬──────┘
   Reciprocal Rank Fusion
   (Merge + Re-rank)
    Final Top 5 Results
混合搜索的重要性:
  • 向量搜索捕捉语义含义("OAuth refresh" ≈ "token renewal")
  • 关键词搜索确保精确匹配(如字面意义上的"refresh_token")
  • 两者结合可实现最佳检索质量
如需实现细节,请查看
references/hybrid-search.md

Getting Started

快速开始

Python + Qdrant Example

Python + Qdrant 示例

python
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
python
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct

1. Initialize client

1. Initialize client

client = QdrantClient("localhost", port=6333)
client = QdrantClient("localhost", port=6333)

2. Create collection

2. Create collection

client.create_collection( collection_name="documents", vectors_config=VectorParams(size=1024, distance=Distance.COSINE) )
client.create_collection( collection_name="documents", vectors_config=VectorParams(size=1024, distance=Distance.COSINE) )

3. Insert documents with embeddings

3. Insert documents with embeddings

points = [ PointStruct( id=idx, vector=embedding, # From OpenAI/Voyage/etc payload={ "text": chunk_text, "source": "docs/api.md", "section": "Authentication" } ) for idx, (embedding, chunk_text) in enumerate(chunks) ] client.upsert(collection_name="documents", points=points)
points = [ PointStruct( id=idx, vector=embedding, # From OpenAI/Voyage/etc payload={ "text": chunk_text, "source": "docs/api.md", "section": "Authentication" } ) for idx, (embedding, chunk_text) in enumerate(chunks) ] client.upsert(collection_name="documents", points=points)

4. Search with metadata filtering

4. Search with metadata filtering

results = client.search( collection_name="documents", query_vector=query_embedding, limit=5, query_filter={ "must": [ {"key": "section", "match": {"value": "Authentication"}} ] } )

For complete examples, see `examples/qdrant-python/`.
results = client.search( collection_name="documents", query_vector=query_embedding, limit=5, query_filter={ "must": [ {"key": "section", "match": {"value": "Authentication"}} ] } )

完整示例请查看`examples/qdrant-python/`。

TypeScript + Qdrant Example

TypeScript + Qdrant 示例

typescript
import { QdrantClient } from '@qdrant/js-client-rest';

const client = new QdrantClient({ url: 'http://localhost:6333' });

// Create collection
await client.createCollection('documents', {
  vectors: { size: 1024, distance: 'Cosine' }
});

// Insert documents
await client.upsert('documents', {
  points: chunks.map((chunk, idx) => ({
    id: idx,
    vector: chunk.embedding,
    payload: {
      text: chunk.text,
      source: chunk.source
    }
  }))
});

// Search
const results = await client.search('documents', {
  vector: queryEmbedding,
  limit: 5,
  filter: {
    must: [
      { key: 'source', match: { value: 'docs/api.md' } }
    ]
  }
});
For complete examples, see
examples/typescript-rag/
.
typescript
import { QdrantClient } from '@qdrant/js-client-rest';

const client = new QdrantClient({ url: 'http://localhost:6333' });

// Create collection
await client.createCollection('documents', {
  vectors: { size: 1024, distance: 'Cosine' }
});

// Insert documents
await client.upsert('documents', {
  points: chunks.map((chunk, idx) => ({
    id: idx,
    vector: chunk.embedding,
    payload: {
      text: chunk.text,
      source: chunk.source
    }
  }))
});

// Search
const results = await client.search('documents', {
  vector: queryEmbedding,
  limit: 5,
  filter: {
    must: [
      { key: 'source', match: { value: 'docs/api.md' } }
    ]
  }
});
完整示例请查看
examples/typescript-rag/

RAG Pipeline Architecture

RAG 流水线架构

Complete Pipeline Components

完整流水线组件

1. INGESTION
   ├─ Document Loading (PDF, web, code, Office)
   ├─ Text Extraction & Cleaning
   ├─ Chunking (semantic, recursive, code-aware)
   └─ Embedding Generation (batch, rate-limited)

2. INDEXING
   ├─ Vector Store Insertion (batch upsert)
   ├─ Index Configuration (HNSW, distance metric)
   └─ Keyword Index (BM25 for hybrid search)

3. RETRIEVAL (Query Time)
   ├─ Query Processing (expansion, embedding)
   ├─ Hybrid Search (vector + keyword)
   ├─ Filtering & Post-Processing (metadata, MMR)
   └─ Re-Ranking (cross-encoder, LLM-based)

4. GENERATION
   ├─ Context Construction (format chunks, citations)
   ├─ Prompt Engineering (system + context + query)
   ├─ LLM Inference (streaming, temperature tuning)
   └─ Response Post-Processing (citations, validation)

5. EVALUATION (Production Critical)
   ├─ Retrieval Metrics (precision, recall, relevancy)
   ├─ Generation Metrics (faithfulness, correctness)
   └─ System Metrics (latency, cost, satisfaction)
1. INGESTION
   ├─ Document Loading (PDF, web, code, Office)
   ├─ Text Extraction & Cleaning
   ├─ Chunking (semantic, recursive, code-aware)
   └─ Embedding Generation (batch, rate-limited)

2. INDEXING
   ├─ Vector Store Insertion (batch upsert)
   ├─ Index Configuration (HNSW, distance metric)
   └─ Keyword Index (BM25 for hybrid search)

3. RETRIEVAL (Query Time)
   ├─ Query Processing (expansion, embedding)
   ├─ Hybrid Search (vector + keyword)
   ├─ Filtering & Post-Processing (metadata, MMR)
   └─ Re-Ranking (cross-encoder, LLM-based)

4. GENERATION
   ├─ Context Construction (format chunks, citations)
   ├─ Prompt Engineering (system + context + query)
   ├─ LLM Inference (streaming, temperature tuning)
   └─ Response Post-Processing (citations, validation)

5. EVALUATION (Production Critical)
   ├─ Retrieval Metrics (precision, recall, relevancy)
   ├─ Generation Metrics (faithfulness, correctness)
   └─ System Metrics (latency, cost, satisfaction)

Essential Metadata for Production RAG

生产级RAG的必备元数据

Critical for filtering and relevance:
python
metadata = {
    # SOURCE TRACKING
    "source": "docs/api-reference.md",
    "source_type": "documentation",  # code, docs, logs, chat
    "last_updated": "2025-12-01T12:00:00Z",

    # HIERARCHICAL CONTEXT
    "section": "Authentication",
    "subsection": "OAuth 2.1",
    "heading_hierarchy": ["API Reference", "Authentication", "OAuth 2.1"],

    # CONTENT CLASSIFICATION
    "content_type": "code_example",  # prose, code, table, list
    "programming_language": "python",

    # FILTERING DIMENSIONS
    "product_version": "v2.0",
    "audience": "enterprise",  # free, pro, enterprise

    # RETRIEVAL HINTS
    "chunk_index": 3,
    "total_chunks": 12,
    "has_code": True
}
Why metadata matters:
  • Enables filtering BEFORE vector search (reduces search space)
  • Improves relevance through targeted retrieval
  • Supports multi-tenant systems (filter by user/org)
  • Enables versioned documentation (filter by product version)
对过滤和相关性至关重要:
python
metadata = {
    # SOURCE TRACKING
    "source": "docs/api-reference.md",
    "source_type": "documentation",  # code, docs, logs, chat
    "last_updated": "2025-12-01T12:00:00Z",

    # HIERARCHICAL CONTEXT
    "section": "Authentication",
    "subsection": "OAuth 2.1",
    "heading_hierarchy": ["API Reference", "Authentication", "OAuth 2.1"],

    # CONTENT CLASSIFICATION
    "content_type": "code_example",  # prose, code, table, list
    "programming_language": "python",

    # FILTERING DIMENSIONS
    "product_version": "v2.0",
    "audience": "enterprise",  # free, pro, enterprise

    # RETRIEVAL HINTS
    "chunk_index": 3,
    "total_chunks": 12,
    "has_code": True
}
元数据的重要性:
  • 支持在向量搜索前进行过滤(缩小搜索范围)
  • 通过定向检索提升相关性
  • 支持多租户系统(按用户/组织过滤)
  • 支持版本化文档(按产品版本过滤)

Evaluation with RAGAS

使用RAGAS进行评估

Use scripts/evaluate_rag.py for automated evaluation:
python
from ragas import evaluate
from ragas.metrics import (
    faithfulness,       # Answer grounded in context
    answer_relevancy,   # Answer addresses query
    context_recall,     # Retrieved docs cover ground truth
    context_precision   # Retrieved docs are relevant
)
使用scripts/evaluate_rag.py进行自动化评估:
python
from ragas import evaluate
from ragas.metrics import (
    faithfulness,       # Answer grounded in context
    answer_relevancy,   # Answer addresses query
    context_recall,     # Retrieved docs cover ground truth
    context_precision   # Retrieved docs are relevant
)

Test dataset

Test dataset

test_data = { "question": ["How do I refresh OAuth tokens?"], "answer": ["Use /token with refresh_token grant..."], "contexts": [["OAuth refresh documentation..."]], "ground_truth": ["POST to /token with grant_type=refresh_token"] }
test_data = { "question": ["How do I refresh OAuth tokens?"], "answer": ["Use /token with refresh_token grant..."], "contexts": [["OAuth refresh documentation..."]], "ground_truth": ["POST to /token with grant_type=refresh_token"] }

Evaluate

Evaluate

results = evaluate(test_data, metrics=[ faithfulness, answer_relevancy, context_recall, context_precision ])
results = evaluate(test_data, metrics=[ faithfulness, answer_relevancy, context_recall, context_precision ])

Production targets:

Production targets:

faithfulness: >0.90 (minimal hallucination)

faithfulness: >0.90 (minimal hallucination)

answer_relevancy: >0.85 (addresses user query)

answer_relevancy: >0.85 (addresses user query)

context_recall: >0.80 (sufficient context retrieved)

context_recall: >0.80 (sufficient context retrieved)

context_precision: >0.75 (minimal noise)

context_precision: >0.75 (minimal noise)

undefined
undefined

Performance Optimization

性能优化

Embedding Generation

嵌入生成

  • Batch processing: 100-500 chunks per batch
  • Caching: Cache embeddings by content hash
  • Rate limiting: Respect API provider limits (exponential backoff)
  • 批量处理: 每批100-500个分块
  • 缓存: 按内容哈希缓存嵌入结果
  • 速率限制: 遵守API提供商的限制(指数退避策略)

Vector Search

向量搜索

  • Index type: HNSW (Hierarchical Navigable Small World) for most cases
  • Distance metric: Cosine for normalized embeddings
  • Pre-filtering: Apply metadata filters before vector search
  • Result diversity: Use MMR (Maximal Marginal Relevance) to reduce redundancy
  • 索引类型: 大多数场景使用HNSW(层次可导航小世界)
  • 距离度量: 归一化嵌入使用余弦距离
  • 预过滤: 在向量搜索前应用元数据过滤
  • 结果多样性: 使用MMR(最大边际相关性)减少冗余

Cost Optimization

成本优化

  • Embedding model: Consider text-embedding-3-small for budget constraints
  • Dimension reduction: Use maturity shortening (3072d → 1024d)
  • Caching: Implement semantic caching for repeated queries
  • Batch operations: Group insertions/updates for efficiency
  • 嵌入模型: 预算有限时考虑text-embedding-3-small
  • 维度缩减: 使用maturity shortening(3072d → 1024d)
  • 缓存: 为重复查询实现语义缓存
  • 批量操作: 分组执行插入/更新操作以提升效率

Common Workflows

常见工作流

1. Building a RAG Chatbot

1. 构建RAG聊天机器人

  • Vector database: Qdrant (self-hosted or cloud)
  • Embeddings: OpenAI text-embedding-3-large
  • Chunking: 512 tokens, 50 overlap, semantic splitter
  • Search: Hybrid (vector + BM25)
  • Integration: Frontend with ai-chat skill
See
examples/qdrant-python/
for complete implementation.
  • 向量数据库:Qdrant(自托管或云服务)
  • 嵌入模型:OpenAI text-embedding-3-large
  • 分块策略:512 tokens,50个重叠,语义拆分器
  • 搜索方式:混合搜索(向量+BM25)
  • 集成:与ai-chat技能配合实现前端聊天界面
完整实现请查看
examples/qdrant-python/

2. Semantic Search Engine

2. 语义搜索引擎

  • Vector database: Qdrant or Pinecone
  • Embeddings: Voyage AI voyage-3 (best quality)
  • Chunking: Content-type specific (see chunking-patterns.md)
  • Search: Hybrid with re-ranking
  • Filtering: Pre-filter by metadata (date, category, etc.)
  • 向量数据库:Qdrant或Pinecone
  • 嵌入模型:Voyage AI voyage-3(最佳质量)
  • 分块策略:针对不同内容类型定制(查看chunking-patterns.md)
  • 搜索方式:混合搜索+重排序
  • 过滤:按元数据(日期、类别等)预过滤

3. Code Search

3. 代码搜索

  • Vector database: Qdrant
  • Embeddings: OpenAI text-embedding-3-large
  • Chunking: AST-based (function/class boundaries)
  • Metadata: Language, file path, imports
  • Search: Hybrid with language filtering
See
examples/qdrant-python/
for code-specific implementation.
  • 向量数据库:Qdrant
  • 嵌入模型:OpenAI text-embedding-3-large
  • 分块策略:基于AST(函数/类边界)
  • 元数据:编程语言、文件路径、导入依赖
  • 搜索方式:混合搜索+语言过滤
代码专属实现请查看
examples/qdrant-python/

Integration with Other Skills

与其他技能的集成

Frontend Skills

前端技能

  • ai-chat: Vector DB powers RAG pipeline behind chat interface
  • search-filter: Replace keyword search with semantic search
  • data-viz: Visualize embedding spaces, similarity scores
  • ai-chat: 向量数据库为聊天界面背后的RAG流水线提供支持
  • search-filter: 用语义搜索替代关键词搜索
  • data-viz: 可视化嵌入空间、相似度得分

Backend Skills

后端技能

  • databases-relational: Hybrid approach using pgvector extension
  • api-patterns: Expose semantic search via REST/GraphQL
  • observability: Monitor embedding quality and retrieval metrics
  • databases-relational: 使用pgvector扩展实现混合方案
  • api-patterns: 通过REST/GraphQL暴露语义搜索接口
  • observability: 监控嵌入质量和检索指标

Multi-Language Support

多语言支持

Python (Primary)

Python(首选)

  • Client:
    qdrant-client
  • Framework: LangChain, LlamaIndex
  • See:
    examples/qdrant-python/
  • 客户端:
    qdrant-client
  • 框架:LangChain、LlamaIndex
  • 查看:
    examples/qdrant-python/

Rust

Rust

  • Client:
    qdrant-client
    (1,549 code snippets in Context7)
  • Framework: Raw Rust for performance-critical systems
  • See:
    examples/rust-axum-vector/
  • 客户端:
    qdrant-client
    (Context7中有1549个代码片段)
  • 框架:纯Rust实现,适用于性能关键型系统
  • 查看:
    examples/rust-axum-vector/

TypeScript

TypeScript

  • Client:
    @qdrant/js-client-rest
  • Framework: LangChain.js, integration with Next.js
  • See:
    examples/typescript-rag/
  • 客户端:
    @qdrant/js-client-rest
  • 框架:LangChain.js,与Next.js集成
  • 查看:
    examples/typescript-rag/

Go

Go

  • Client:
    qdrant-go
  • Use case: High-performance microservices
  • 客户端:
    qdrant-go
  • 适用场景:高性能微服务

Troubleshooting

故障排查

Poor Retrieval Quality

检索质量不佳

  1. Check chunking strategy (too large/small?)
  2. Verify metadata filtering (too restrictive?)
  3. Try hybrid search instead of vector-only
  4. Implement re-ranking stage
  5. Evaluate with RAGAS metrics
  1. 检查分块策略(过大/过小?)
  2. 验证元数据过滤(过于严格?)
  3. 尝试混合搜索而非仅向量搜索
  4. 实现重排序阶段
  5. 使用RAGAS指标进行评估

Slow Performance

性能缓慢

  1. Use HNSW index (not Flat)
  2. Pre-filter with metadata before vector search
  3. Reduce vector dimensions (maturity shortening)
  4. Batch operations (insertions, searches)
  5. Consider GPU acceleration (Milvus)
  1. 使用HNSW索引(而非Flat)
  2. 向量搜索前先用元数据预过滤
  3. 减少向量维度(maturity shortening)
  4. 批量操作(插入、搜索)
  5. 考虑GPU加速(Milvus)

High Costs

成本过高

  1. Switch to text-embedding-3-small
  2. Implement semantic caching
  3. Reduce chunk overlap
  4. Use self-hosted embeddings (nomic, bge-m3)
  5. Batch embedding generation
  1. 切换到text-embedding-3-small
  2. 实现语义缓存
  3. 减少分块重叠
  4. 使用自托管嵌入模型(nomic、bge-m3)
  5. 批量生成嵌入

Qdrant Context7 Documentation

Qdrant Context7 文档

Primary resource:
/llmstxt/qdrant_tech_llms-full_txt
  • Trust score: High
  • Code snippets: 10,154
  • Quality score: 83.1
Access via Context7:
resolve-library-id({ libraryName: "Qdrant" })
get-library-docs({
  context7CompatibleLibraryID: "/llmstxt/qdrant_tech_llms-full_txt",
  topic: "hybrid search collections python",
  mode: "code"
})
主要资源:
/llmstxt/qdrant_tech_llms-full_txt
  • 信任评分:
  • 代码片段: 10154个
  • 质量评分: 83.1
通过Context7访问:
resolve-library-id({ libraryName: "Qdrant" })
get-library-docs({
  context7CompatibleLibraryID: "/llmstxt/qdrant_tech_llms-full_txt",
  topic: "hybrid search collections python",
  mode: "code"
})

Additional Resources

额外资源

Reference Documentation

参考文档

  • references/qdrant.md
    - Comprehensive Qdrant guide
  • references/pgvector.md
    - PostgreSQL pgvector extension
  • references/milvus.md
    - Milvus/Zilliz for billion-scale
  • references/embedding-strategies.md
    - Embedding model comparison
  • references/chunking-patterns.md
    - Advanced chunking techniques
  • references/qdrant.md
    - 全面Qdrant指南
  • references/pgvector.md
    - PostgreSQL pgvector扩展
  • references/milvus.md
    - 十亿级规模的Milvus/Zilliz
  • references/embedding-strategies.md
    - 嵌入模型对比
  • references/chunking-patterns.md
    - 进阶分块技术

Code Examples

代码示例

  • examples/qdrant-python/
    - FastAPI + Qdrant RAG pipeline
  • examples/pgvector-prisma/
    - PostgreSQL + Prisma integration
  • examples/typescript-rag/
    - TypeScript RAG with Hono
  • examples/qdrant-python/
    - FastAPI + Qdrant RAG流水线
  • examples/pgvector-prisma/
    - PostgreSQL + Prisma集成
  • examples/typescript-rag/
    - TypeScript RAG with Hono

Automation Scripts

自动化脚本

  • scripts/generate_embeddings.py
    - Batch embedding generation
  • scripts/benchmark_similarity.py
    - Performance benchmarking
  • scripts/evaluate_rag.py
    - RAGAS-based evaluation

Next Steps:
  1. Choose vector database based on scale and infrastructure
  2. Select embedding model based on quality vs. cost trade-off
  3. Implement chunking strategy for the content type
  4. Set up hybrid search for production quality
  5. Evaluate with RAGAS metrics
  6. Optimize for performance and cost
  • scripts/generate_embeddings.py
    - 批量嵌入生成
  • scripts/benchmark_similarity.py
    - 性能基准测试
  • scripts/evaluate_rag.py
    - 基于RAGAS的评估

下一步:
  1. 根据规模和基础设施选择向量数据库
  2. 根据质量与成本的权衡选择嵌入模型
  3. 针对内容类型实现分块策略
  4. 搭建生产级混合搜索
  5. 使用RAGAS指标进行评估
  6. 针对性能和成本进行优化