Loading...
Loading...
Compare original and translation side by side
START: Choosing a Vector Database
EXISTING INFRASTRUCTURE?
├─ Using PostgreSQL already?
│ └─ pgvector (<10M vectors, tight budget)
│ See: references/pgvector.md
│
└─ No existing vector database?
│
├─ OPERATIONAL PREFERENCE?
│ │
│ ├─ Zero-ops managed only
│ │ └─ Pinecone (fully managed, excellent DX)
│ │ See: references/pinecone.md
│ │
│ └─ Flexible (self-hosted or managed)
│ │
│ ├─ SCALE: <100M vectors + complex filtering ⭐
│ │ └─ Qdrant (RECOMMENDED)
│ │ • Best metadata filtering
│ │ • Built-in hybrid search (BM25 + Vector)
│ │ • Self-host: Docker/K8s
│ │ • Managed: Qdrant Cloud
│ │ See: references/qdrant.md
│ │
│ ├─ SCALE: >100M vectors + GPU acceleration
│ │ └─ Milvus / Zilliz Cloud
│ │ See: references/milvus.md
│ │
│ ├─ Embedded / No server
│ │ └─ LanceDB (serverless, edge deployment)
│ │
│ └─ Local prototyping
│ └─ Chroma (simple API, in-memory)START: Choosing a Vector Database
EXISTING INFRASTRUCTURE?
├─ Using PostgreSQL already?
│ └─ pgvector (<10M vectors, tight budget)
│ See: references/pgvector.md
│
└─ No existing vector database?
│
├─ OPERATIONAL PREFERENCE?
│ │
│ ├─ Zero-ops managed only
│ │ └─ Pinecone (fully managed, excellent DX)
│ │ See: references/pinecone.md
│ │
│ └─ Flexible (self-hosted or managed)
│ │
│ ├─ SCALE: <100M vectors + complex filtering ⭐
│ │ └─ Qdrant (RECOMMENDED)
│ │ • Best metadata filtering
│ │ • Built-in hybrid search (BM25 + Vector)
│ │ • Self-host: Docker/K8s
│ │ • Managed: Qdrant Cloud
│ │ See: references/qdrant.md
│ │
│ ├─ SCALE: >100M vectors + GPU acceleration
│ │ └─ Milvus / Zilliz Cloud
│ │ See: references/milvus.md
│ │
│ ├─ Embedded / No server
│ │ └─ LanceDB (serverless, edge deployment)
│ │
│ └─ Local prototyping
│ └─ Chroma (simple API, in-memory)REQUIREMENTS?
├─ Best quality (cost no object)
│ └─ Voyage AI voyage-3 (1024d)
│ • 9.74% better than OpenAI on MTEB
│ • ~$0.12/1M tokens
│ See: references/embedding-strategies.md
│
├─ Enterprise reliability
│ └─ OpenAI text-embedding-3-large (3072d)
│ • Industry standard
│ • ~$0.13/1M tokens
│ • Maturity shortening: reduce to 256/512/1024d
│
├─ Cost-optimized
│ └─ OpenAI text-embedding-3-small (1536d)
│ • ~$0.02/1M tokens (6x cheaper)
│ • 90-95% of large model performance
│
├─ Multilingual (100+ languages)
│ └─ Cohere embed-v3 (1024d)
│ • ~$0.10/1M tokens
│
└─ Self-hosted / Privacy-critical
├─ English: nomic-embed-text-v1.5 (768d, Apache 2.0)
├─ Multilingual: BAAI/bge-m3 (1024d, MIT)
└─ Long docs: jina-embeddings-v2 (768d, 8K context)REQUIREMENTS?
├─ Best quality (cost no object)
│ └─ Voyage AI voyage-3 (1024d)
│ • 9.74% better than OpenAI on MTEB
│ • ~$0.12/1M tokens
│ See: references/embedding-strategies.md
│
├─ Enterprise reliability
│ └─ OpenAI text-embedding-3-large (3072d)
│ • Industry standard
│ • ~$0.13/1M tokens
│ • Maturity shortening: reduce to 256/512/1024d
│
├─ Cost-optimized
│ └─ OpenAI text-embedding-3-small (1536d)
│ • ~$0.02/1M tokens (6x cheaper)
│ • 90-95% of large model performance
│
├─ Multilingual (100+ languages)
│ └─ Cohere embed-v3 (1024d)
│ • ~$0.10/1M tokens
│
└─ Self-hosted / Privacy-critical
├─ English: nomic-embed-text-v1.5 (768d, Apache 2.0)
├─ Multilingual: BAAI/bge-m3 (1024d, MIT)
└─ Long docs: jina-embeddings-v2 (768d, 8K context)references/chunking-patterns.mdreferences/chunking-patterns.mdUser Query: "OAuth refresh token implementation"
│
┌──────┴──────┐
│ │
Vector Search Keyword Search
(Semantic) (BM25)
│ │
Top 20 docs Top 20 docs
│ │
└──────┬──────┘
│
Reciprocal Rank Fusion
(Merge + Re-rank)
│
Final Top 5 Resultsreferences/hybrid-search.mdUser Query: "OAuth refresh token implementation"
│
┌──────┴──────┐
│ │
Vector Search Keyword Search
(Semantic) (BM25)
│ │
Top 20 docs Top 20 docs
│ │
└──────┬──────┘
│
Reciprocal Rank Fusion
(Merge + Re-rank)
│
Final Top 5 Resultsreferences/hybrid-search.mdfrom qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStructfrom qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
For complete examples, see `examples/qdrant-python/`.
完整示例请查看`examples/qdrant-python/`。import { QdrantClient } from '@qdrant/js-client-rest';
const client = new QdrantClient({ url: 'http://localhost:6333' });
// Create collection
await client.createCollection('documents', {
vectors: { size: 1024, distance: 'Cosine' }
});
// Insert documents
await client.upsert('documents', {
points: chunks.map((chunk, idx) => ({
id: idx,
vector: chunk.embedding,
payload: {
text: chunk.text,
source: chunk.source
}
}))
});
// Search
const results = await client.search('documents', {
vector: queryEmbedding,
limit: 5,
filter: {
must: [
{ key: 'source', match: { value: 'docs/api.md' } }
]
}
});examples/typescript-rag/import { QdrantClient } from '@qdrant/js-client-rest';
const client = new QdrantClient({ url: 'http://localhost:6333' });
// Create collection
await client.createCollection('documents', {
vectors: { size: 1024, distance: 'Cosine' }
});
// Insert documents
await client.upsert('documents', {
points: chunks.map((chunk, idx) => ({
id: idx,
vector: chunk.embedding,
payload: {
text: chunk.text,
source: chunk.source
}
}))
});
// Search
const results = await client.search('documents', {
vector: queryEmbedding,
limit: 5,
filter: {
must: [
{ key: 'source', match: { value: 'docs/api.md' } }
]
}
});examples/typescript-rag/1. INGESTION
├─ Document Loading (PDF, web, code, Office)
├─ Text Extraction & Cleaning
├─ Chunking (semantic, recursive, code-aware)
└─ Embedding Generation (batch, rate-limited)
2. INDEXING
├─ Vector Store Insertion (batch upsert)
├─ Index Configuration (HNSW, distance metric)
└─ Keyword Index (BM25 for hybrid search)
3. RETRIEVAL (Query Time)
├─ Query Processing (expansion, embedding)
├─ Hybrid Search (vector + keyword)
├─ Filtering & Post-Processing (metadata, MMR)
└─ Re-Ranking (cross-encoder, LLM-based)
4. GENERATION
├─ Context Construction (format chunks, citations)
├─ Prompt Engineering (system + context + query)
├─ LLM Inference (streaming, temperature tuning)
└─ Response Post-Processing (citations, validation)
5. EVALUATION (Production Critical)
├─ Retrieval Metrics (precision, recall, relevancy)
├─ Generation Metrics (faithfulness, correctness)
└─ System Metrics (latency, cost, satisfaction)1. INGESTION
├─ Document Loading (PDF, web, code, Office)
├─ Text Extraction & Cleaning
├─ Chunking (semantic, recursive, code-aware)
└─ Embedding Generation (batch, rate-limited)
2. INDEXING
├─ Vector Store Insertion (batch upsert)
├─ Index Configuration (HNSW, distance metric)
└─ Keyword Index (BM25 for hybrid search)
3. RETRIEVAL (Query Time)
├─ Query Processing (expansion, embedding)
├─ Hybrid Search (vector + keyword)
├─ Filtering & Post-Processing (metadata, MMR)
└─ Re-Ranking (cross-encoder, LLM-based)
4. GENERATION
├─ Context Construction (format chunks, citations)
├─ Prompt Engineering (system + context + query)
├─ LLM Inference (streaming, temperature tuning)
└─ Response Post-Processing (citations, validation)
5. EVALUATION (Production Critical)
├─ Retrieval Metrics (precision, recall, relevancy)
├─ Generation Metrics (faithfulness, correctness)
└─ System Metrics (latency, cost, satisfaction)metadata = {
# SOURCE TRACKING
"source": "docs/api-reference.md",
"source_type": "documentation", # code, docs, logs, chat
"last_updated": "2025-12-01T12:00:00Z",
# HIERARCHICAL CONTEXT
"section": "Authentication",
"subsection": "OAuth 2.1",
"heading_hierarchy": ["API Reference", "Authentication", "OAuth 2.1"],
# CONTENT CLASSIFICATION
"content_type": "code_example", # prose, code, table, list
"programming_language": "python",
# FILTERING DIMENSIONS
"product_version": "v2.0",
"audience": "enterprise", # free, pro, enterprise
# RETRIEVAL HINTS
"chunk_index": 3,
"total_chunks": 12,
"has_code": True
}metadata = {
# SOURCE TRACKING
"source": "docs/api-reference.md",
"source_type": "documentation", # code, docs, logs, chat
"last_updated": "2025-12-01T12:00:00Z",
# HIERARCHICAL CONTEXT
"section": "Authentication",
"subsection": "OAuth 2.1",
"heading_hierarchy": ["API Reference", "Authentication", "OAuth 2.1"],
# CONTENT CLASSIFICATION
"content_type": "code_example", # prose, code, table, list
"programming_language": "python",
# FILTERING DIMENSIONS
"product_version": "v2.0",
"audience": "enterprise", # free, pro, enterprise
# RETRIEVAL HINTS
"chunk_index": 3,
"total_chunks": 12,
"has_code": True
}from ragas import evaluate
from ragas.metrics import (
faithfulness, # Answer grounded in context
answer_relevancy, # Answer addresses query
context_recall, # Retrieved docs cover ground truth
context_precision # Retrieved docs are relevant
)from ragas import evaluate
from ragas.metrics import (
faithfulness, # Answer grounded in context
answer_relevancy, # Answer addresses query
context_recall, # Retrieved docs cover ground truth
context_precision # Retrieved docs are relevant
)undefinedundefinedexamples/qdrant-python/examples/qdrant-python/examples/qdrant-python/examples/qdrant-python/qdrant-clientexamples/qdrant-python/qdrant-clientexamples/qdrant-python/qdrant-clientexamples/rust-axum-vector/qdrant-clientexamples/rust-axum-vector/@qdrant/js-client-restexamples/typescript-rag/@qdrant/js-client-restexamples/typescript-rag/qdrant-goqdrant-go/llmstxt/qdrant_tech_llms-full_txtresolve-library-id({ libraryName: "Qdrant" })
get-library-docs({
context7CompatibleLibraryID: "/llmstxt/qdrant_tech_llms-full_txt",
topic: "hybrid search collections python",
mode: "code"
})/llmstxt/qdrant_tech_llms-full_txtresolve-library-id({ libraryName: "Qdrant" })
get-library-docs({
context7CompatibleLibraryID: "/llmstxt/qdrant_tech_llms-full_txt",
topic: "hybrid search collections python",
mode: "code"
})references/qdrant.mdreferences/pgvector.mdreferences/milvus.mdreferences/embedding-strategies.mdreferences/chunking-patterns.mdreferences/qdrant.mdreferences/pgvector.mdreferences/milvus.mdreferences/embedding-strategies.mdreferences/chunking-patterns.mdexamples/qdrant-python/examples/pgvector-prisma/examples/typescript-rag/examples/qdrant-python/examples/pgvector-prisma/examples/typescript-rag/scripts/generate_embeddings.pyscripts/benchmark_similarity.pyscripts/evaluate_rag.pyscripts/generate_embeddings.pyscripts/benchmark_similarity.pyscripts/evaluate_rag.py