using-vector-databases

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Vector Databases for AI Applications

面向AI应用的向量数据库

When to Use This Skill

何时使用该技能

Use this skill when implementing:

RAG (Retrieval-Augmented Generation) systems for AI chatbots
Semantic search capabilities (meaning-based, not just keyword)
Recommendation systems based on similarity
Multi-modal AI (unified search across text, images, audio)
Document similarity and deduplication
Question answering over private knowledge bases

在实现以下系统时可使用本技能：

**RAG（检索增强生成）**系统，用于AI聊天机器人
语义搜索功能（基于语义而非仅关键词）
基于相似度的推荐系统
多模态AI（跨文本、图像、音频的统一搜索）
文档相似度与去重
基于私有知识库的问答系统

Quick Decision Framework

快速决策框架

1. Vector Database Selection

1. 向量数据库选择

START: Choosing a Vector Database

EXISTING INFRASTRUCTURE?
├─ Using PostgreSQL already?
│  └─ pgvector (<10M vectors, tight budget)
│      See: references/pgvector.md
│
└─ No existing vector database?
   │
   ├─ OPERATIONAL PREFERENCE?
   │  │
   │  ├─ Zero-ops managed only
   │  │  └─ Pinecone (fully managed, excellent DX)
   │  │      See: references/pinecone.md
   │  │
   │  └─ Flexible (self-hosted or managed)
   │     │
   │     ├─ SCALE: <100M vectors + complex filtering ⭐
   │     │  └─ Qdrant (RECOMMENDED)
   │     │      • Best metadata filtering
   │     │      • Built-in hybrid search (BM25 + Vector)
   │     │      • Self-host: Docker/K8s
   │     │      • Managed: Qdrant Cloud
   │     │      See: references/qdrant.md
   │     │
   │     ├─ SCALE: >100M vectors + GPU acceleration
   │     │  └─ Milvus / Zilliz Cloud
   │     │      See: references/milvus.md
   │     │
   │     ├─ Embedded / No server
   │     │  └─ LanceDB (serverless, edge deployment)
   │     │
   │     └─ Local prototyping
   │        └─ Chroma (simple API, in-memory)

START: Choosing a Vector Database

EXISTING INFRASTRUCTURE?
├─ Using PostgreSQL already?
│  └─ pgvector (<10M vectors, tight budget)
│      See: references/pgvector.md
│
└─ No existing vector database?
   │
   ├─ OPERATIONAL PREFERENCE?
   │  │
   │  ├─ Zero-ops managed only
   │  │  └─ Pinecone (fully managed, excellent DX)
   │  │      See: references/pinecone.md
   │  │
   │  └─ Flexible (self-hosted or managed)
   │     │
   │     ├─ SCALE: <100M vectors + complex filtering ⭐
   │     │  └─ Qdrant (RECOMMENDED)
   │     │      • Best metadata filtering
   │     │      • Built-in hybrid search (BM25 + Vector)
   │     │      • Self-host: Docker/K8s
   │     │      • Managed: Qdrant Cloud
   │     │      See: references/qdrant.md
   │     │
   │     ├─ SCALE: >100M vectors + GPU acceleration
   │     │  └─ Milvus / Zilliz Cloud
   │     │      See: references/milvus.md
   │     │
   │     ├─ Embedded / No server
   │     │  └─ LanceDB (serverless, edge deployment)
   │     │
   │     └─ Local prototyping
   │        └─ Chroma (simple API, in-memory)

2. Embedding Model Selection

2. 嵌入模型选择

REQUIREMENTS?

├─ Best quality (cost no object)
│  └─ Voyage AI voyage-3 (1024d)
│      • 9.74% better than OpenAI on MTEB
│      • ~$0.12/1M tokens
│      See: references/embedding-strategies.md
│
├─ Enterprise reliability
│  └─ OpenAI text-embedding-3-large (3072d)
│      • Industry standard
│      • ~$0.13/1M tokens
│      • Maturity shortening: reduce to 256/512/1024d
│
├─ Cost-optimized
│  └─ OpenAI text-embedding-3-small (1536d)
│      • ~$0.02/1M tokens (6x cheaper)
│      • 90-95% of large model performance
│
├─ Multilingual (100+ languages)
│  └─ Cohere embed-v3 (1024d)
│      • ~$0.10/1M tokens
│
└─ Self-hosted / Privacy-critical
   ├─ English: nomic-embed-text-v1.5 (768d, Apache 2.0)
   ├─ Multilingual: BAAI/bge-m3 (1024d, MIT)
   └─ Long docs: jina-embeddings-v2 (768d, 8K context)

REQUIREMENTS?

├─ Best quality (cost no object)
│  └─ Voyage AI voyage-3 (1024d)
│      • 9.74% better than OpenAI on MTEB
│      • ~$0.12/1M tokens
│      See: references/embedding-strategies.md
│
├─ Enterprise reliability
│  └─ OpenAI text-embedding-3-large (3072d)
│      • Industry standard
│      • ~$0.13/1M tokens
│      • Maturity shortening: reduce to 256/512/1024d
│
├─ Cost-optimized
│  └─ OpenAI text-embedding-3-small (1536d)
│      • ~$0.02/1M tokens (6x cheaper)
│      • 90-95% of large model performance
│
├─ Multilingual (100+ languages)
│  └─ Cohere embed-v3 (1024d)
│      • ~$0.10/1M tokens
│
└─ Self-hosted / Privacy-critical
   ├─ English: nomic-embed-text-v1.5 (768d, Apache 2.0)
   ├─ Multilingual: BAAI/bge-m3 (1024d, MIT)
   └─ Long docs: jina-embeddings-v2 (768d, 8K context)

Core Concepts

核心概念

Document Chunking Strategy

文档分块策略

Recommended defaults for most RAG systems:

Chunk size: 512 tokens (not characters)
Overlap: 50 tokens (10% overlap)

Why these numbers?

512 tokens balances context vs. precision
- Too small (128-256): Fragments concepts, loses context
- Too large (1024-2048): Dilutes relevance, wastes LLM tokens
50 token overlap ensures sentences aren't split mid-context

See

references/chunking-patterns.md

for advanced strategies by content type.

大多数RAG系统的推荐默认配置：

分块大小： 512 tokens（非字符数）
重叠部分： 50 tokens（10%重叠）

为何选择这些数值？

512 tokens平衡了上下文完整性与检索精度
- 过小（128-256）：会拆分完整概念，丢失上下文
- 过大（1024-2048）：会稀释相关性，浪费LLM tokens
50 tokens的重叠确保句子不会被中途拆分，保留上下文连贯性

如需针对不同内容类型的进阶策略，请查看

references/chunking-patterns.md

。

Hybrid Search (Vector + Keyword)

混合搜索（向量+关键词）

Hybrid Search = Vector Similarity + BM25 Keyword Matching

User Query: "OAuth refresh token implementation"
           │
    ┌──────┴──────┐
    │             │
Vector Search   Keyword Search
(Semantic)      (BM25)
    │             │
Top 20 docs   Top 20 docs
    │             │
    └──────┬──────┘
           │
   Reciprocal Rank Fusion
   (Merge + Re-rank)
           │
    Final Top 5 Results

Why hybrid matters:

Vector captures semantic meaning ("OAuth refresh" ≈ "token renewal")
Keyword ensures exact matches ("refresh_token" literal)
Combined provides best retrieval quality

See

references/hybrid-search.md

for implementation details.

混合搜索 = 向量相似度匹配 + BM25关键词匹配

User Query: "OAuth refresh token implementation"
           │
    ┌──────┴──────┐
    │             │
Vector Search   Keyword Search
(Semantic)      (BM25)
    │             │
Top 20 docs   Top 20 docs
    │             │
    └──────┬──────┘
           │
   Reciprocal Rank Fusion
   (Merge + Re-rank)
           │
    Final Top 5 Results

混合搜索的重要性：

向量搜索捕捉语义含义（"OAuth refresh" ≈ "token renewal"）
关键词搜索确保精确匹配（如字面意义上的"refresh_token"）
两者结合可实现最佳检索质量

如需实现细节，请查看

references/hybrid-search.md

。

Getting Started

快速开始

Python + Qdrant Example

Python + Qdrant 示例

python

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct

python

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct

1. Initialize client

client = QdrantClient("localhost", port=6333)

2. Create collection

client.create_collection( collection_name="documents", vectors_config=VectorParams(size=1024, distance=Distance.COSINE) )

3. Insert documents with embeddings

points = [ PointStruct( id=idx, vector=embedding, # From OpenAI/Voyage/etc payload={ "text": chunk_text, "source": "docs/api.md", "section": "Authentication" } ) for idx, (embedding, chunk_text) in enumerate(chunks) ] client.upsert(collection_name="documents", points=points)

4. Search with metadata filtering

results = client.search( collection_name="documents", query_vector=query_embedding, limit=5, query_filter={ "must": [ {"key": "section", "match": {"value": "Authentication"}} ] } )


For complete examples, see `examples/qdrant-python/`.

results = client.search( collection_name="documents", query_vector=query_embedding, limit=5, query_filter={ "must": [ {"key": "section", "match": {"value": "Authentication"}} ] } )


完整示例请查看`examples/qdrant-python/`。

TypeScript + Qdrant Example

TypeScript + Qdrant 示例

typescript

import { QdrantClient } from '@qdrant/js-client-rest';

const client = new QdrantClient({ url: 'http://localhost:6333' });

// Create collection
await client.createCollection('documents', {
  vectors: { size: 1024, distance: 'Cosine' }
});

// Insert documents
await client.upsert('documents', {
  points: chunks.map((chunk, idx) => ({
    id: idx,
    vector: chunk.embedding,
    payload: {
      text: chunk.text,
      source: chunk.source
    }
  }))
});

// Search
const results = await client.search('documents', {
  vector: queryEmbedding,
  limit: 5,
  filter: {
    must: [
      { key: 'source', match: { value: 'docs/api.md' } }
    ]
  }
});

For complete examples, see

examples/typescript-rag/

typescript

import { QdrantClient } from '@qdrant/js-client-rest';

const client = new QdrantClient({ url: 'http://localhost:6333' });

// Create collection
await client.createCollection('documents', {
  vectors: { size: 1024, distance: 'Cosine' }
});

// Insert documents
await client.upsert('documents', {
  points: chunks.map((chunk, idx) => ({
    id: idx,
    vector: chunk.embedding,
    payload: {
      text: chunk.text,
      source: chunk.source
    }
  }))
});

// Search
const results = await client.search('documents', {
  vector: queryEmbedding,
  limit: 5,
  filter: {
    must: [
      { key: 'source', match: { value: 'docs/api.md' } }
    ]
  }
});

完整示例请查看

examples/typescript-rag/

。

RAG Pipeline Architecture

RAG 流水线架构

Complete Pipeline Components

完整流水线组件

1. INGESTION
   ├─ Document Loading (PDF, web, code, Office)
   ├─ Text Extraction & Cleaning
   ├─ Chunking (semantic, recursive, code-aware)
   └─ Embedding Generation (batch, rate-limited)

2. INDEXING
   ├─ Vector Store Insertion (batch upsert)
   ├─ Index Configuration (HNSW, distance metric)
   └─ Keyword Index (BM25 for hybrid search)

3. RETRIEVAL (Query Time)
   ├─ Query Processing (expansion, embedding)
   ├─ Hybrid Search (vector + keyword)
   ├─ Filtering & Post-Processing (metadata, MMR)
   └─ Re-Ranking (cross-encoder, LLM-based)

4. GENERATION
   ├─ Context Construction (format chunks, citations)
   ├─ Prompt Engineering (system + context + query)
   ├─ LLM Inference (streaming, temperature tuning)
   └─ Response Post-Processing (citations, validation)

5. EVALUATION (Production Critical)
   ├─ Retrieval Metrics (precision, recall, relevancy)
   ├─ Generation Metrics (faithfulness, correctness)
   └─ System Metrics (latency, cost, satisfaction)

1. INGESTION
   ├─ Document Loading (PDF, web, code, Office)
   ├─ Text Extraction & Cleaning
   ├─ Chunking (semantic, recursive, code-aware)
   └─ Embedding Generation (batch, rate-limited)

2. INDEXING
   ├─ Vector Store Insertion (batch upsert)
   ├─ Index Configuration (HNSW, distance metric)
   └─ Keyword Index (BM25 for hybrid search)

3. RETRIEVAL (Query Time)
   ├─ Query Processing (expansion, embedding)
   ├─ Hybrid Search (vector + keyword)
   ├─ Filtering & Post-Processing (metadata, MMR)
   └─ Re-Ranking (cross-encoder, LLM-based)

4. GENERATION
   ├─ Context Construction (format chunks, citations)
   ├─ Prompt Engineering (system + context + query)
   ├─ LLM Inference (streaming, temperature tuning)
   └─ Response Post-Processing (citations, validation)

5. EVALUATION (Production Critical)
   ├─ Retrieval Metrics (precision, recall, relevancy)
   ├─ Generation Metrics (faithfulness, correctness)
   └─ System Metrics (latency, cost, satisfaction)

Essential Metadata for Production RAG

生产级RAG的必备元数据

Critical for filtering and relevance:

python

metadata = {
    # SOURCE TRACKING
    "source": "docs/api-reference.md",
    "source_type": "documentation",  # code, docs, logs, chat
    "last_updated": "2025-12-01T12:00:00Z",

    # HIERARCHICAL CONTEXT
    "section": "Authentication",
    "subsection": "OAuth 2.1",
    "heading_hierarchy": ["API Reference", "Authentication", "OAuth 2.1"],

    # CONTENT CLASSIFICATION
    "content_type": "code_example",  # prose, code, table, list
    "programming_language": "python",

    # FILTERING DIMENSIONS
    "product_version": "v2.0",
    "audience": "enterprise",  # free, pro, enterprise

    # RETRIEVAL HINTS
    "chunk_index": 3,
    "total_chunks": 12,
    "has_code": True
}

Why metadata matters:

Enables filtering BEFORE vector search (reduces search space)
Improves relevance through targeted retrieval
Supports multi-tenant systems (filter by user/org)
Enables versioned documentation (filter by product version)

对过滤和相关性至关重要：

python

metadata = {
    # SOURCE TRACKING
    "source": "docs/api-reference.md",
    "source_type": "documentation",  # code, docs, logs, chat
    "last_updated": "2025-12-01T12:00:00Z",

    # HIERARCHICAL CONTEXT
    "section": "Authentication",
    "subsection": "OAuth 2.1",
    "heading_hierarchy": ["API Reference", "Authentication", "OAuth 2.1"],

    # CONTENT CLASSIFICATION
    "content_type": "code_example",  # prose, code, table, list
    "programming_language": "python",

    # FILTERING DIMENSIONS
    "product_version": "v2.0",
    "audience": "enterprise",  # free, pro, enterprise

    # RETRIEVAL HINTS
    "chunk_index": 3,
    "total_chunks": 12,
    "has_code": True
}

元数据的重要性：

支持在向量搜索前进行过滤（缩小搜索范围）
通过定向检索提升相关性
支持多租户系统（按用户/组织过滤）
支持版本化文档（按产品版本过滤）

Evaluation with RAGAS

使用RAGAS进行评估

Use scripts/evaluate_rag.py for automated evaluation:

python

from ragas import evaluate
from ragas.metrics import (
    faithfulness,       # Answer grounded in context
    answer_relevancy,   # Answer addresses query
    context_recall,     # Retrieved docs cover ground truth
    context_precision   # Retrieved docs are relevant
)

使用scripts/evaluate_rag.py进行自动化评估：

python

from ragas import evaluate
from ragas.metrics import (
    faithfulness,       # Answer grounded in context
    answer_relevancy,   # Answer addresses query
    context_recall,     # Retrieved docs cover ground truth
    context_precision   # Retrieved docs are relevant
)

Test dataset

test_data = { "question": ["How do I refresh OAuth tokens?"], "answer": ["Use /token with refresh_token grant..."], "contexts": [["OAuth refresh documentation..."]], "ground_truth": ["POST to /token with grant_type=refresh_token"] }

Evaluate

results = evaluate(test_data, metrics=[ faithfulness, answer_relevancy, context_recall, context_precision ])

Production targets:

faithfulness: >0.90 (minimal hallucination)

answer_relevancy: >0.85 (addresses user query)

context_recall: >0.80 (sufficient context retrieved)

context_precision: >0.75 (minimal noise)

undefined

undefined

Performance Optimization

性能优化

Embedding Generation

嵌入生成

Batch processing: 100-500 chunks per batch
Caching: Cache embeddings by content hash
Rate limiting: Respect API provider limits (exponential backoff)

批量处理： 每批100-500个分块
缓存： 按内容哈希缓存嵌入结果
速率限制： 遵守API提供商的限制（指数退避策略）

Vector Search

向量搜索

Index type: HNSW (Hierarchical Navigable Small World) for most cases
Distance metric: Cosine for normalized embeddings
Pre-filtering: Apply metadata filters before vector search
Result diversity: Use MMR (Maximal Marginal Relevance) to reduce redundancy

索引类型： 大多数场景使用HNSW（层次可导航小世界）
距离度量： 归一化嵌入使用余弦距离
预过滤： 在向量搜索前应用元数据过滤
结果多样性： 使用MMR（最大边际相关性）减少冗余

Cost Optimization

成本优化

Embedding model: Consider text-embedding-3-small for budget constraints
Dimension reduction: Use maturity shortening (3072d → 1024d)
Caching: Implement semantic caching for repeated queries
Batch operations: Group insertions/updates for efficiency

嵌入模型： 预算有限时考虑text-embedding-3-small
维度缩减： 使用maturity shortening（3072d → 1024d）
缓存： 为重复查询实现语义缓存
批量操作： 分组执行插入/更新操作以提升效率

Common Workflows

常见工作流

1. Building a RAG Chatbot

1. 构建RAG聊天机器人

Vector database: Qdrant (self-hosted or cloud)
Embeddings: OpenAI text-embedding-3-large
Chunking: 512 tokens, 50 overlap, semantic splitter
Search: Hybrid (vector + BM25)
Integration: Frontend with ai-chat skill

See

examples/qdrant-python/

for complete implementation.

向量数据库：Qdrant（自托管或云服务）
嵌入模型：OpenAI text-embedding-3-large
分块策略：512 tokens，50个重叠，语义拆分器
搜索方式：混合搜索（向量+BM25）
集成：与ai-chat技能配合实现前端聊天界面

完整实现请查看

examples/qdrant-python/

。

2. Semantic Search Engine

2. 语义搜索引擎

Vector database: Qdrant or Pinecone
Embeddings: Voyage AI voyage-3 (best quality)
Chunking: Content-type specific (see chunking-patterns.md)
Search: Hybrid with re-ranking
Filtering: Pre-filter by metadata (date, category, etc.)

向量数据库：Qdrant或Pinecone
嵌入模型：Voyage AI voyage-3（最佳质量）
分块策略：针对不同内容类型定制（查看chunking-patterns.md）
搜索方式：混合搜索+重排序
过滤：按元数据（日期、类别等）预过滤

3. Code Search

3. 代码搜索

Vector database: Qdrant
Embeddings: OpenAI text-embedding-3-large
Chunking: AST-based (function/class boundaries)
Metadata: Language, file path, imports
Search: Hybrid with language filtering

See

examples/qdrant-python/

for code-specific implementation.

向量数据库：Qdrant
嵌入模型：OpenAI text-embedding-3-large
分块策略：基于AST（函数/类边界）
元数据：编程语言、文件路径、导入依赖
搜索方式：混合搜索+语言过滤

代码专属实现请查看

examples/qdrant-python/

。

Integration with Other Skills

与其他技能的集成

Frontend Skills

前端技能

ai-chat: Vector DB powers RAG pipeline behind chat interface
search-filter: Replace keyword search with semantic search
data-viz: Visualize embedding spaces, similarity scores

ai-chat： 向量数据库为聊天界面背后的RAG流水线提供支持
search-filter： 用语义搜索替代关键词搜索
data-viz： 可视化嵌入空间、相似度得分

Backend Skills

后端技能

databases-relational: Hybrid approach using pgvector extension
api-patterns: Expose semantic search via REST/GraphQL
observability: Monitor embedding quality and retrieval metrics

databases-relational： 使用pgvector扩展实现混合方案
api-patterns： 通过REST/GraphQL暴露语义搜索接口
observability： 监控嵌入质量和检索指标

Multi-Language Support

多语言支持

Python (Primary)

Python（首选）

Client:
```
qdrant-client
```
Framework: LangChain, LlamaIndex
See:
```
examples/qdrant-python/
```

客户端：
```
qdrant-client
```
框架：LangChain、LlamaIndex
查看：
```
examples/qdrant-python/
```

Rust

Client:
```
qdrant-client
```
(1,549 code snippets in Context7)
Framework: Raw Rust for performance-critical systems
See:
```
examples/rust-axum-vector/
```

客户端：
```
qdrant-client
```
（Context7中有1549个代码片段）
框架：纯Rust实现，适用于性能关键型系统
查看：
```
examples/rust-axum-vector/
```

TypeScript

Client:
```
@qdrant/js-client-rest
```
Framework: LangChain.js, integration with Next.js
See:
```
examples/typescript-rag/
```

客户端：
```
@qdrant/js-client-rest
```
框架：LangChain.js，与Next.js集成
查看：
```
examples/typescript-rag/
```

Go

Client:
```
qdrant-go
```
Use case: High-performance microservices

客户端：
```
qdrant-go
```
适用场景：高性能微服务

Troubleshooting

故障排查

Poor Retrieval Quality

检索质量不佳

Check chunking strategy (too large/small?)
Verify metadata filtering (too restrictive?)
Try hybrid search instead of vector-only
Implement re-ranking stage
Evaluate with RAGAS metrics

检查分块策略（过大/过小？）
验证元数据过滤（过于严格？）
尝试混合搜索而非仅向量搜索
实现重排序阶段
使用RAGAS指标进行评估

Slow Performance

性能缓慢

Use HNSW index (not Flat)
Pre-filter with metadata before vector search
Reduce vector dimensions (maturity shortening)
Batch operations (insertions, searches)
Consider GPU acceleration (Milvus)

使用HNSW索引（而非Flat）
向量搜索前先用元数据预过滤
减少向量维度（maturity shortening）
批量操作（插入、搜索）
考虑GPU加速（Milvus）

High Costs

成本过高

Switch to text-embedding-3-small
Implement semantic caching
Reduce chunk overlap
Use self-hosted embeddings (nomic, bge-m3)
Batch embedding generation

切换到text-embedding-3-small
实现语义缓存
减少分块重叠
使用自托管嵌入模型（nomic、bge-m3）
批量生成嵌入

Qdrant Context7 Documentation

Qdrant Context7 文档

Primary resource:

/llmstxt/qdrant_tech_llms-full_txt

Trust score: High
Code snippets: 10,154
Quality score: 83.1

Access via Context7:

resolve-library-id({ libraryName: "Qdrant" })
get-library-docs({
  context7CompatibleLibraryID: "/llmstxt/qdrant_tech_llms-full_txt",
  topic: "hybrid search collections python",
  mode: "code"
})

主要资源：

/llmstxt/qdrant_tech_llms-full_txt

信任评分： 高
代码片段： 10154个
质量评分： 83.1

通过Context7访问：

resolve-library-id({ libraryName: "Qdrant" })
get-library-docs({
  context7CompatibleLibraryID: "/llmstxt/qdrant_tech_llms-full_txt",
  topic: "hybrid search collections python",
  mode: "code"
})

Additional Resources

额外资源

Reference Documentation

参考文档

```
references/qdrant.md
```
- Comprehensive Qdrant guide
```
references/pgvector.md
```
- PostgreSQL pgvector extension
```
references/milvus.md
```
- Milvus/Zilliz for billion-scale
```
references/embedding-strategies.md
```
- Embedding model comparison
```
references/chunking-patterns.md
```
- Advanced chunking techniques

```
references/qdrant.md
```
- 全面Qdrant指南
```
references/pgvector.md
```
- PostgreSQL pgvector扩展
```
references/milvus.md
```
- 十亿级规模的Milvus/Zilliz
```
references/embedding-strategies.md
```
- 嵌入模型对比
```
references/chunking-patterns.md
```
- 进阶分块技术

Code Examples

代码示例

```
examples/qdrant-python/
```
- FastAPI + Qdrant RAG pipeline
```
examples/pgvector-prisma/
```
- PostgreSQL + Prisma integration
```
examples/typescript-rag/
```
- TypeScript RAG with Hono

```
examples/qdrant-python/
```
- FastAPI + Qdrant RAG流水线
```
examples/pgvector-prisma/
```
- PostgreSQL + Prisma集成
```
examples/typescript-rag/
```
- TypeScript RAG with Hono

Automation Scripts

自动化脚本

```
scripts/generate_embeddings.py
```
- Batch embedding generation
```
scripts/benchmark_similarity.py
```
- Performance benchmarking
```
scripts/evaluate_rag.py
```
- RAGAS-based evaluation

Next Steps:

Choose vector database based on scale and infrastructure
Select embedding model based on quality vs. cost trade-off
Implement chunking strategy for the content type
Set up hybrid search for production quality
Evaluate with RAGAS metrics
Optimize for performance and cost

```
scripts/generate_embeddings.py
```
- 批量嵌入生成
```
scripts/benchmark_similarity.py
```
- 性能基准测试
```
scripts/evaluate_rag.py
```
- 基于RAGAS的评估

下一步：

根据规模和基础设施选择向量数据库
根据质量与成本的权衡选择嵌入模型
针对内容类型实现分块策略
搭建生产级混合搜索
使用RAGAS指标进行评估
针对性能和成本进行优化