agentdb-performance-optimization

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese


LIBRARY-FIRST PROTOCOL (MANDATORY)

库优先协议(强制性要求)

Before writing ANY code, you MUST check:
在编写任何代码之前,你必须完成以下检查:

Step 1: Library Catalog

步骤1:库目录

  • Location:
    .claude/library/catalog.json
  • If match >70%: REUSE or ADAPT
  • 位置:
    .claude/library/catalog.json
  • 匹配度>70%:复用或适配

Step 2: Patterns Guide

步骤2:模式指南

  • Location:
    .claude/docs/inventories/LIBRARY-PATTERNS-GUIDE.md
  • If pattern exists: FOLLOW documented approach
  • 位置:
    .claude/docs/inventories/LIBRARY-PATTERNS-GUIDE.md
  • 若模式已存在:遵循文档中记录的方法

Step 3: Existing Projects

步骤3:现有项目

  • Location:
    D:\Projects\*
  • If found: EXTRACT and adapt
  • 位置:
    D:\Projects\*
  • 若找到相关内容:提取并适配

Decision Matrix

决策矩阵

MatchAction
Library >90%REUSE directly
Library 70-90%ADAPT minimally
Pattern existsFOLLOW pattern
In projectEXTRACT
No matchBUILD (add to library after)

匹配度操作
库匹配>90%直接复用
库匹配70-90%最小程度适配
模式已存在遵循模式
存在于现有项目中提取
无匹配构建(完成后添加至库)

When NOT to Use This Skill

不适用本技能的场景

  • Local-only operations with no vector search needs
  • Simple key-value storage without semantic similarity
  • Real-time streaming data without persistence requirements
  • Operations that do not require embedding-based retrieval
  • 无需向量搜索的本地仅运行操作
  • 无语义相似度需求的简单键值存储
  • 无持久化需求的实时流数据处理
  • 不需要基于嵌入的检索的操作

Success Criteria

成功标准

  • Vector search query latency: <10ms for 99th percentile
  • Embedding generation: <100ms per document
  • Index build time: <1s per 1000 vectors
  • Recall@10: >0.95 for similar documents
  • Database connection success rate: >99.9%
  • Memory footprint: <2GB for 1M vectors with quantization
  • 向量搜索查询延迟:99分位低于10ms
  • 嵌入生成速度:单文档耗时低于100ms
  • 索引构建时间:每1000个向量耗时低于1秒
  • Recall@10:相似文档召回率>0.95
  • 数据库连接成功率:>99.9%
  • 内存占用:启用量化后,100万向量的内存占用<2GB

Edge Cases & Error Handling

边缘情况与错误处理

  • Rate Limits: AgentDB local instances have no rate limits; cloud deployments may vary
  • Connection Failures: Implement retry logic with exponential backoff (max 3 retries)
  • Index Corruption: Maintain backup indices; rebuild from source if corrupted
  • Memory Overflow: Use quantization (4-bit, 8-bit) to reduce memory by 4-32x
  • Stale Embeddings: Implement TTL-based refresh for dynamic content
  • Dimension Mismatch: Validate embedding dimensions (384 for sentence-transformers) before insertion
  • 速率限制:AgentDB本地实例无速率限制;云部署的限制可能有所不同
  • 连接失败:实现带指数退避的重试逻辑(最多3次重试)
  • 索引损坏:维护备份索引;若损坏则从源数据重建
  • 内存溢出:使用量化(4位、8位)将内存占用降低4-32倍
  • 嵌入过期:为动态内容实现基于TTL的刷新机制
  • 维度不匹配:插入前验证嵌入维度(sentence-transformers为384维)

Guardrails & Safety

防护规则与安全要求

  • NEVER expose database connection strings in logs or error messages
  • ALWAYS validate vector dimensions before insertion
  • ALWAYS sanitize metadata to prevent injection attacks
  • NEVER store PII in vector metadata without encryption
  • ALWAYS implement access control for multi-tenant deployments
  • ALWAYS validate search results before returning to users
  • 绝对不要在日志或错误信息中暴露数据库连接字符串
  • 插入前必须验证向量维度
  • 必须清理元数据以防止注入攻击
  • 未加密的情况下绝对不要在向量元数据中存储PII(个人可识别信息)
  • 多租户部署必须实现访问控制
  • 返回给用户前必须验证搜索结果

Evidence-Based Validation

基于证据的验证

  • Verify database health: Check connection status and index integrity
  • Validate search quality: Measure recall/precision on test queries
  • Monitor performance: Track query latency, throughput, and memory usage
  • Test failure recovery: Simulate connection drops and index corruption
  • Benchmark improvements: Compare against baseline metrics (e.g., 150x speedup claim)
  • 验证数据库健康状态:检查连接状态和索引完整性
  • 验证搜索质量:在测试查询上衡量召回率/精确率
  • 监控性能:跟踪查询延迟、吞吐量和内存使用情况
  • 测试故障恢复:模拟连接断开和索引损坏场景
  • 基准测试性能提升:与基线指标对比(如宣称的150倍提速)

AgentDB Performance Optimization

AgentDB性能优化

What This Skill Does

本技能的作用

Use this skill to apply comprehensive performance optimization techniques for AgentDB vector databases. Implement quantization strategies (binary, scalar, product) to achieve 4-32x memory reduction. Enable HNSW indexing for 150x-12,500x performance improvements. Configure caching strategies and deploy batch operations to reduce memory usage while maintaining accuracy.
Performance: <100µs vector search, <1ms pattern retrieval, 2ms batch insert for 100 vectors.
使用本技能可为AgentDB向量数据库应用全面的性能优化技术。实现量化策略(二进制、标量、乘积)以将内存占用降低4-32倍。启用HNSW索引,实现150倍至12500倍的性能提升。配置缓存策略并部署批量操作,在保持精度的同时降低内存使用。
性能指标:向量搜索耗时<100µs,模式检索耗时<1ms,100个向量的批量插入耗时2ms。

Prerequisites

前置条件

Install Node.js 18+ and AgentDB v1.0.7+ via agentic-flow. Verify you have an existing AgentDB database or application ready for optimization.

安装Node.js 18+和AgentDB v1.0.7+(通过agentic-flow)。确认你已有一个待优化的AgentDB数据库或应用。

Quick Start

快速开始

Execute these steps to measure and optimize your AgentDB performance.
执行以下步骤以测量并优化你的AgentDB性能。

Run Performance Benchmarks

运行性能基准测试

Execute benchmarks to establish baseline performance:
bash
undefined
执行基准测试以建立性能基线:
bash
undefined

Comprehensive performance benchmarking

Comprehensive performance benchmarking

npx agentdb@latest benchmark
npx agentdb@latest benchmark

Results show:

Results show:

✅ Pattern Search: 150x faster (100µs vs 15ms)

✅ Pattern Search: 150x faster (100µs vs 15ms)

✅ Batch Insert: 500x faster (2ms vs 1s for 100 vectors)

✅ Batch Insert: 500x faster (2ms vs 1s for 100 vectors)

✅ Large-scale Query: 12,500x faster (8ms vs 100s at 1M vectors)

✅ Large-scale Query: 12,500x faster (8ms vs 100s at 1M vectors)

✅ Memory Efficiency: 4-32x reduction with quantization

✅ Memory Efficiency: 4-32x reduction with quantization

undefined
undefined

Enable Optimizations

启用优化

typescript
import { createAgentDBAdapter } from 'agentic-flow/reasoningbank';

// Optimized configuration
const adapter = await createAgentDBAdapter({
  dbPath: '.agentdb/optimized.db',
  quantizationType: 'binary',   // 32x memory reduction
  cacheSize: 1000,               // In-memory cache
  enableLearning: true,
  enableReasoning: true,
});

typescript
import { createAgentDBAdapter } from 'agentic-flow/reasoningbank';

// Optimized configuration
const adapter = await createAgentDBAdapter({
  dbPath: '.agentdb/optimized.db',
  quantizationType: 'binary',   // 32x memory reduction
  cacheSize: 1000,               // In-memory cache
  enableLearning: true,
  enableReasoning: true,
});

Quantization Strategies

量化策略

Select the appropriate quantization strategy based on your memory and accuracy requirements.
根据你的内存和精度需求选择合适的量化策略。

1. Binary Quantization (32x Reduction)

1. 二进制量化(32倍压缩)

Apply binary quantization for maximum memory reduction:
Best For: Large-scale deployments (1M+ vectors), memory-constrained environments Trade-off: ~2-5% accuracy loss, 32x memory reduction, 10x faster
typescript
const adapter = await createAgentDBAdapter({
  quantizationType: 'binary',
  // 768-dim float32 (3072 bytes) → 96 bytes binary
  // 1M vectors: 3GB → 96MB
});
Use Cases:
  • Mobile/edge deployment
  • Large-scale vector storage (millions of vectors)
  • Real-time search with memory constraints
Performance:
  • Memory: 32x smaller
  • Search Speed: 10x faster (bit operations)
  • Accuracy: 95-98% of original
应用二进制量化以实现最大程度的内存节省:
适用场景:大规模部署(100万+向量)、内存受限环境 权衡:精度损失约2-5%,内存占用降低32倍,速度提升10倍
typescript
const adapter = await createAgentDBAdapter({
  quantizationType: 'binary',
  // 768-dim float32 (3072 bytes) → 96 bytes binary
  // 1M vectors: 3GB → 96MB
});
使用案例:
  • 移动端/边缘端部署
  • 大规模向量存储(数百万向量)
  • 内存受限的实时搜索场景
性能表现:
  • 内存:缩小32倍
  • 搜索速度:提升10倍(基于位运算)
  • 精度:保留原精度的95-98%

2. Scalar Quantization (4x Reduction)

2. 标量量化(4倍压缩)

Best For: Balanced performance/accuracy, moderate datasets Trade-off: ~1-2% accuracy loss, 4x memory reduction, 3x faster
typescript
const adapter = await createAgentDBAdapter({
  quantizationType: 'scalar',
  // 768-dim float32 (3072 bytes) → 768 bytes (uint8)
  // 1M vectors: 3GB → 768MB
});
Use Cases:
  • Production applications requiring high accuracy
  • Medium-scale deployments (10K-1M vectors)
  • General-purpose optimization
Performance:
  • Memory: 4x smaller
  • Search Speed: 3x faster
  • Accuracy: 98-99% of original
适用场景:性能/精度平衡、中等规模数据集 权衡:精度损失约1-2%,内存占用降低4倍,速度提升3倍
typescript
const adapter = await createAgentDBAdapter({
  quantizationType: 'scalar',
  // 768-dim float32 (3072 bytes) → 768 bytes (uint8)
  // 1M vectors: 3GB → 768MB
});
使用案例:
  • 要求高精度的生产应用
  • 中等规模部署(1万-100万向量)
  • 通用型优化场景
性能表现:
  • 内存:缩小4倍
  • 搜索速度:提升3倍
  • 精度:保留原精度的98-99%

3. Product Quantization (8-16x Reduction)

3. 乘积量化(8-16倍压缩)

Best For: High-dimensional vectors, balanced compression Trade-off: ~3-7% accuracy loss, 8-16x memory reduction, 5x faster
typescript
const adapter = await createAgentDBAdapter({
  quantizationType: 'product',
  // 768-dim float32 (3072 bytes) → 48-96 bytes
  // 1M vectors: 3GB → 192MB
});
Use Cases:
  • High-dimensional embeddings (>512 dims)
  • Image/video embeddings
  • Large-scale similarity search
Performance:
  • Memory: 8-16x smaller
  • Search Speed: 5x faster
  • Accuracy: 93-97% of original
适用场景:高维向量、平衡压缩需求 权衡:精度损失约3-7%,内存占用降低8-16倍,速度提升5倍
typescript
const adapter = await createAgentDBAdapter({
  quantizationType: 'product',
  // 768-dim float32 (3072 bytes) → 48-96 bytes
  // 1M vectors: 3GB → 192MB
});
使用案例:
  • 高维嵌入(>512维)
  • 图像/视频嵌入
  • 大规模相似度搜索
性能表现:
  • 内存:缩小8-16倍
  • 搜索速度:提升5倍
  • 精度:保留原精度的93-97%

4. No Quantization (Full Precision)

4. 无量化(全精度)

Best For: Maximum accuracy, small datasets Trade-off: No accuracy loss, full memory usage
typescript
const adapter = await createAgentDBAdapter({
  quantizationType: 'none',
  // Full float32 precision
});

适用场景:最高精度需求、小型数据集 权衡:无精度损失,内存占用为完整大小
typescript
const adapter = await createAgentDBAdapter({
  quantizationType: 'none',
  // Full float32 precision
});

HNSW Indexing

HNSW索引

Hierarchical Navigable Small World - O(log n) search complexity
Hierarchical Navigable Small World(分层可导航小世界) - 搜索复杂度为O(log n)

Automatic HNSW

自动HNSW

AgentDB automatically builds HNSW indices:
typescript
const adapter = await createAgentDBAdapter({
  dbPath: '.agentdb/vectors.db',
  // HNSW automatically enabled
});

// Search with HNSW (100µs vs 15ms linear scan)
const results = await adapter.retrieveWithReasoning(queryEmbedding, {
  k: 10,
});
AgentDB会自动构建HNSW索引:
typescript
const adapter = await createAgentDBAdapter({
  dbPath: '.agentdb/vectors.db',
  // HNSW automatically enabled
});

// Search with HNSW (100µs vs 15ms linear scan)
const results = await adapter.retrieveWithReasoning(queryEmbedding, {
  k: 10,
});

HNSW Parameters

HNSW参数配置

typescript
// Advanced HNSW configuration
const adapter = await createAgentDBAdapter({
  dbPath: '.agentdb/vectors.db',
  hnswM: 16,              // Connections per layer (default: 16)
  hnswEfConstruction: 200, // Build quality (default: 200)
  hnswEfSearch: 100,       // Search quality (default: 100)
});
Parameter Tuning:
  • M (connections): Higher = better recall, more memory
    • Small datasets (<10K): M = 8
    • Medium datasets (10K-100K): M = 16
    • Large datasets (>100K): M = 32
  • efConstruction: Higher = better index quality, slower build
    • Fast build: 100
    • Balanced: 200 (default)
    • High quality: 400
  • efSearch: Higher = better recall, slower search
    • Fast search: 50
    • Balanced: 100 (default)
    • High recall: 200

typescript
// Advanced HNSW configuration
const adapter = await createAgentDBAdapter({
  dbPath: '.agentdb/vectors.db',
  hnswM: 16,              // Connections per layer (default: 16)
  hnswEfConstruction: 200, // Build quality (default: 200)
  hnswEfSearch: 100,       // Search quality (default: 100)
});
参数调优:
  • M(连接数):值越高,召回率越好,但内存占用越大
    • 小型数据集(<1万):M = 8
    • 中型数据集(1万-10万):M = 16
    • 大型数据集(>10万):M = 32
  • efConstruction:值越高,索引质量越好,但构建速度越慢
    • 快速构建:100
    • 平衡配置:200(默认)
    • 高质量索引:400
  • efSearch:值越高,召回率越好,但搜索速度越慢
    • 快速搜索:50
    • 平衡配置:100(默认)
    • 高召回率:200

Caching Strategies

缓存策略

In-Memory Pattern Cache

内存模式缓存

typescript
const adapter = await createAgentDBAdapter({
  cacheSize: 1000,  // Cache 1000 most-used patterns
});

// First retrieval: ~2ms (database)
// Subsequent: <1ms (cache hit)
const result = await adapter.retrieveWithReasoning(queryEmbedding, {
  k: 10,
});
Cache Tuning:
  • Small applications: 100-500 patterns
  • Medium applications: 500-2000 patterns
  • Large applications: 2000-5000 patterns
typescript
const adapter = await createAgentDBAdapter({
  cacheSize: 1000,  // Cache 1000 most-used patterns
});

// First retrieval: ~2ms (database)
// Subsequent: <1ms (cache hit)
const result = await adapter.retrieveWithReasoning(queryEmbedding, {
  k: 10,
});
缓存调优:
  • 小型应用:100-500个模式
  • 中型应用:500-2000个模式
  • 大型应用:2000-5000个模式

LRU Cache Behavior

LRU缓存行为

typescript
// Cache automatically evicts least-recently-used patterns
// Most frequently accessed patterns stay in cache

// Monitor cache performance
const stats = await adapter.getStats();
console.log('Cache Hit Rate:', stats.cacheHitRate);
// Aim for >80% hit rate

typescript
// Cache automatically evicts least-recently-used patterns
// Most frequently accessed patterns stay in cache

// Monitor cache performance
const stats = await adapter.getStats();
console.log('Cache Hit Rate:', stats.cacheHitRate);
// Aim for >80% hit rate

Batch Operations

批量操作

Batch Insert (500x Faster)

批量插入(速度提升500倍)

typescript
// ❌ SLOW: Individual inserts
for (const doc of documents) {
  await adapter.insertPattern({ /* ... */ });  // 1s for 100 docs
}

// ✅ FAST: Batch insert
const patterns = documents.map(doc => ({
  id: '',
  type: 'document',
  domain: 'knowledge',
  pattern_data: JSON.stringify({
    embedding: doc.embedding,
    text: doc.text,
  }),
  confidence: 1.0,
  usage_count: 0,
  success_count: 0,
  created_at: Date.now(),
  last_used: Date.now(),
}));

// Insert all at once (2ms for 100 docs)
for (const pattern of patterns) {
  await adapter.insertPattern(pattern);
}
typescript
// ❌ SLOW: Individual inserts
for (const doc of documents) {
  await adapter.insertPattern({ /* ... */ });  // 1s for 100 docs
}

// ✅ FAST: Batch insert
const patterns = documents.map(doc => ({
  id: '',
  type: 'document',
  domain: 'knowledge',
  pattern_data: JSON.stringify({
    embedding: doc.embedding,
    text: doc.text,
  }),
  confidence: 1.0,
  usage_count: 0,
  success_count: 0,
  created_at: Date.now(),
  last_used: Date.now(),
}));

// Insert all at once (2ms for 100 docs)
for (const pattern of patterns) {
  await adapter.insertPattern(pattern);
}

Batch Retrieval

批量检索

typescript
// Retrieve multiple queries efficiently
const queries = [queryEmbedding1, queryEmbedding2, queryEmbedding3];

// Parallel retrieval
const results = await Promise.all(
  queries.map(q => adapter.retrieveWithReasoning(q, { k: 5 }))
);

typescript
// Retrieve multiple queries efficiently
const queries = [queryEmbedding1, queryEmbedding2, queryEmbedding3];

// Parallel retrieval
const results = await Promise.all(
  queries.map(q => adapter.retrieveWithReasoning(q, { k: 5 }))
);

Memory Optimization

内存优化

Automatic Consolidation

自动合并

typescript
// Enable automatic pattern consolidation
const result = await adapter.retrieveWithReasoning(queryEmbedding, {
  domain: 'documents',
  optimizeMemory: true,  // Consolidate similar patterns
  k: 10,
});

console.log('Optimizations:', result.optimizations);
// {
//   consolidated: 15,  // Merged 15 similar patterns
//   pruned: 3,         // Removed 3 low-quality patterns
//   improved_quality: 0.12  // 12% quality improvement
// }
typescript
// Enable automatic pattern consolidation
const result = await adapter.retrieveWithReasoning(queryEmbedding, {
  domain: 'documents',
  optimizeMemory: true,  // Consolidate similar patterns
  k: 10,
});

console.log('Optimizations:', result.optimizations);
// {
//   consolidated: 15,  // Merged 15 similar patterns
//   pruned: 3,         // Removed 3 low-quality patterns
//   improved_quality: 0.12  // 12% quality improvement
// }

Manual Optimization

手动优化

typescript
// Manually trigger optimization
await adapter.optimize();

// Get statistics
const stats = await adapter.getStats();
console.log('Before:', stats.totalPatterns);
console.log('After:', stats.totalPatterns);  // Reduced by ~10-30%
typescript
// Manually trigger optimization
await adapter.optimize();

// Get statistics
const stats = await adapter.getStats();
console.log('Before:', stats.totalPatterns);
console.log('After:', stats.totalPatterns);  // Reduced by ~10-30%

Pruning Strategies

修剪策略

typescript
// Prune low-confidence patterns
await adapter.prune({
  minConfidence: 0.5,     // Remove confidence < 0.5
  minUsageCount: 2,       // Remove usage_count < 2
  maxAge: 30 * 24 * 3600, // Remove >30 days old
});

typescript
// Prune low-confidence patterns
await adapter.prune({
  minConfidence: 0.5,     // Remove confidence < 0.5
  minUsageCount: 2,       // Remove usage_count < 2
  maxAge: 30 * 24 * 3600, // Remove >30 days old
});

Performance Monitoring

性能监控

Database Statistics

数据库统计信息

bash
undefined
bash
undefined

Get comprehensive stats

Get comprehensive stats

npx agentdb@latest stats .agentdb/vectors.db
npx agentdb@latest stats .agentdb/vectors.db

Output:

Output:

Total Patterns: 125,430

Total Patterns: 125,430

Database Size: 47.2 MB (with binary quantization)

Database Size: 47.2 MB (with binary quantization)

Avg Confidence: 0.87

Avg Confidence: 0.87

Domains: 15

Domains: 15

Cache Hit Rate: 84%

Cache Hit Rate: 84%

Index Type: HNSW

Index Type: HNSW

undefined
undefined

Runtime Metrics

运行时指标

typescript
const stats = await adapter.getStats();

console.log('Performance Metrics:');
console.log('Total Patterns:', stats.totalPatterns);
console.log('Database Size:', stats.dbSize);
console.log('Avg Confidence:', stats.avgConfidence);
console.log('Cache Hit Rate:', stats.cacheHitRate);
console.log('Search Latency (avg):', stats.avgSearchLatency);
console.log('Insert Latency (avg):', stats.avgInsertLatency);

typescript
const stats = await adapter.getStats();

console.log('Performance Metrics:');
console.log('Total Patterns:', stats.totalPatterns);
console.log('Database Size:', stats.dbSize);
console.log('Avg Confidence:', stats.avgConfidence);
console.log('Cache Hit Rate:', stats.cacheHitRate);
console.log('Search Latency (avg):', stats.avgSearchLatency);
console.log('Insert Latency (avg):', stats.avgInsertLatency);

Optimization Recipes

优化方案模板

Recipe 1: Maximum Speed (Sacrifice Accuracy)

模板1:极致速度(牺牲部分精度)

typescript
const adapter = await createAgentDBAdapter({
  quantizationType: 'binary',  // 32x memory reduction
  cacheSize: 5000,             // Large cache
  hnswM: 8,                    // Fewer connections = faster
  hnswEfSearch: 50,            // Low search quality = faster
});

// Expected: <50µs search, 90-95% accuracy
typescript
const adapter = await createAgentDBAdapter({
  quantizationType: 'binary',  // 32x memory reduction
  cacheSize: 5000,             // Large cache
  hnswM: 8,                    // Fewer connections = faster
  hnswEfSearch: 50,            // Low search quality = faster
});

// Expected: <50µs search, 90-95% accuracy

Recipe 2: Balanced Performance

模板2:平衡性能

typescript
const adapter = await createAgentDBAdapter({
  quantizationType: 'scalar',  // 4x memory reduction
  cacheSize: 1000,             // Standard cache
  hnswM: 16,                   // Balanced connections
  hnswEfSearch: 100,           // Balanced quality
});

// Expected: <100µs search, 98-99% accuracy
typescript
const adapter = await createAgentDBAdapter({
  quantizationType: 'scalar',  // 4x memory reduction
  cacheSize: 1000,             // Standard cache
  hnswM: 16,                   // Balanced connections
  hnswEfSearch: 100,           // Balanced quality
});

// Expected: <100µs search, 98-99% accuracy

Recipe 3: Maximum Accuracy

模板3:极致精度

typescript
const adapter = await createAgentDBAdapter({
  quantizationType: 'none',    // No quantization
  cacheSize: 2000,             // Large cache
  hnswM: 32,                   // Many connections
  hnswEfSearch: 200,           // High search quality
});

// Expected: <200µs search, 100% accuracy
typescript
const adapter = await createAgentDBAdapter({
  quantizationType: 'none',    // No quantization
  cacheSize: 2000,             // Large cache
  hnswM: 32,                   // Many connections
  hnswEfSearch: 200,           // High search quality
});

// Expected: <200µs search, 100% accuracy

Recipe 4: Memory-Constrained (Mobile/Edge)

模板4:内存受限场景(移动端/边缘端)

typescript
const adapter = await createAgentDBAdapter({
  quantizationType: 'binary',  // 32x memory reduction
  cacheSize: 100,              // Small cache
  hnswM: 8,                    // Minimal connections
});

// Expected: <100µs search, ~10MB for 100K vectors

typescript
const adapter = await createAgentDBAdapter({
  quantizationType: 'binary',  // 32x memory reduction
  cacheSize: 100,              // Small cache
  hnswM: 8,                    // Minimal connections
});

// Expected: <100µs search, ~10MB for 100K vectors

Scaling Strategies

扩展策略

Small Scale (<10K vectors)

小型规模(<1万向量)

typescript
const adapter = await createAgentDBAdapter({
  quantizationType: 'none',    // Full precision
  cacheSize: 500,
  hnswM: 8,
});
typescript
const adapter = await createAgentDBAdapter({
  quantizationType: 'none',    // Full precision
  cacheSize: 500,
  hnswM: 8,
});

Medium Scale (10K-100K vectors)

中型规模(1万-10万向量)

typescript
const adapter = await createAgentDBAdapter({
  quantizationType: 'scalar',  // 4x reduction
  cacheSize: 1000,
  hnswM: 16,
});
typescript
const adapter = await createAgentDBAdapter({
  quantizationType: 'scalar',  // 4x reduction
  cacheSize: 1000,
  hnswM: 16,
});

Large Scale (100K-1M vectors)

大型规模(10万-100万向量)

typescript
const adapter = await createAgentDBAdapter({
  quantizationType: 'binary',  // 32x reduction
  cacheSize: 2000,
  hnswM: 32,
});
typescript
const adapter = await createAgentDBAdapter({
  quantizationType: 'binary',  // 32x reduction
  cacheSize: 2000,
  hnswM: 32,
});

Massive Scale (>1M vectors)

超大规模(>100万向量)

typescript
const adapter = await createAgentDBAdapter({
  quantizationType: 'product',  // 8-16x reduction
  cacheSize: 5000,
  hnswM: 48,
  hnswEfConstruction: 400,
});

typescript
const adapter = await createAgentDBAdapter({
  quantizationType: 'product',  // 8-16x reduction
  cacheSize: 5000,
  hnswM: 48,
  hnswEfConstruction: 400,
});

Troubleshooting

故障排查

Issue: High memory usage

问题:内存占用过高

bash
undefined
bash
undefined

Check database size

Check database size

npx agentdb@latest stats .agentdb/vectors.db
npx agentdb@latest stats .agentdb/vectors.db

Enable quantization

Enable quantization

Use 'binary' for 32x reduction

Use 'binary' for 32x reduction

undefined
undefined

Issue: Slow search performance

问题:搜索性能缓慢

typescript
// Increase cache size
const adapter = await createAgentDBAdapter({
  cacheSize: 2000,  // Increase from 1000
});

// Reduce search quality (faster)
const result = await adapter.retrieveWithReasoning(queryEmbedding, {
  k: 5,  // Reduce from 10
});
typescript
// Increase cache size
const adapter = await createAgentDBAdapter({
  cacheSize: 2000,  // Increase from 1000
});

// Reduce search quality (faster)
const result = await adapter.retrieveWithReasoning(queryEmbedding, {
  k: 5,  // Reduce from 10
});

Issue: Low accuracy

问题:精度较低

typescript
// Disable or use lighter quantization
const adapter = await createAgentDBAdapter({
  quantizationType: 'scalar',  // Instead of 'binary'
  hnswEfSearch: 200,           // Higher search quality
});

typescript
// Disable or use lighter quantization
const adapter = await createAgentDBAdapter({
  quantizationType: 'scalar',  // Instead of 'binary'
  hnswEfSearch: 200,           // Higher search quality
});

Performance Benchmarks

性能基准测试

Test System: AMD Ryzen 9 5950X, 64GB RAM
OperationVector CountNo OptimizationOptimizedImprovement
Search10K15ms100µs150x
Search100K150ms120µs1,250x
Search1M100s8ms12,500x
Batch Insert (100)-1s2ms500x
Memory Usage1M3GB96MB32x (binary)

测试系统:AMD Ryzen 9 5950X,64GB RAM
操作向量数量未优化已优化性能提升
搜索1万15ms100µs150倍
搜索10万150ms120µs1250倍
搜索100万100s8ms12500倍
批量插入(100个)-1s2ms500倍
内存占用100万3GB96MB32倍(二进制量化)

Learn More

了解更多


Category: Performance / Optimization Difficulty: Intermediate Estimated Time: 20-30 minutes

分类:性能 / 优化 难度:中级 预计耗时:20-30分钟

Core Principles

核心原则

AgentDB Performance Optimization operates on 3 fundamental principles:
AgentDB性能优化基于3个核心原则:

Principle 1: Trade Memory for Speed Through Intelligent Quantization

原则1:通过智能量化以内存换速度

Compress vectors by 4-32x with minimal accuracy loss (1-5%) using binary, scalar, or product quantization strategies.
In practice:
  • Binary quantization reduces 768-dim vectors from 3GB to 96MB (32x) with 95-98% accuracy retention
  • Scalar quantization achieves 4x reduction (3GB to 768MB) with 98-99% accuracy for production workloads
  • Select quantization based on memory constraints vs accuracy requirements (mobile = binary, production = scalar)
使用二进制、标量或乘积量化策略,将向量压缩4-32倍,同时仅损失1-5%的精度。
实际应用:
  • 二进制量化将768维向量从3GB压缩至96MB(32倍),同时保留95-98%的精度
  • 标量量化实现4倍压缩(3GB至768MB),保留98-99%的精度,适用于生产工作负载
  • 根据内存限制和精度需求选择量化策略:移动端/边缘端选二进制,生产环境选标量

Principle 2: O(log n) Search Complexity via HNSW Indexing

原则2:通过HNSW索引实现O(log n)的搜索复杂度

Replace O(n) linear scans with hierarchical navigable small world graphs for 150-12,500x performance improvements.
In practice:
  • HNSW automatically builds multi-layer proximity graphs during insertion
  • Search navigates graph layers for sub-millisecond retrieval (100µs vs 15ms linear)
  • Tune M (connections), efConstruction (build quality), efSearch (recall) for performance/accuracy balance
用分层可导航小世界图替代O(n)的线性扫描,实现150-12500倍的性能提升。
实际应用:
  • HNSW在插入过程中自动构建多层邻近图
  • 搜索时遍历图的各层,实现亚毫秒级检索(100µs vs 线性扫描的15ms)
  • 调整M(连接数)、efConstruction(构建质量)、efSearch(召回率)以平衡性能与精度

Principle 3: Batch Operations and Caching Eliminate Redundant Work

原则3:批量操作与缓存消除冗余工作

Aggregate operations and cache frequent patterns to achieve 500x faster batch inserts and <1ms cache hits.
In practice:
  • Batch insert 100 vectors in 2ms vs 1s for sequential inserts (500x speedup)
  • LRU cache (1000-5000 patterns) serves 80%+ queries from memory (<1ms) vs database (2ms)
  • Automatic pattern consolidation merges similar entries to reduce storage by 10-30%
聚合操作并缓存频繁使用的模式,实现500倍更快的批量插入和<1ms的缓存命中。
实际应用:
  • 批量插入100个向量耗时2ms,而顺序插入需1秒(500倍提速)
  • LRU缓存(1000-5000个模式)可将80%以上的查询从内存返回(<1ms),而非数据库(2ms)
  • 自动模式合并可合并相似条目,将存储占用减少10-30%

Common Anti-Patterns

常见反模式

Anti-PatternProblemSolution
Sequential Inserts1s for 100 vectors due to individual database writes and index updatesUse batch insert pattern: collect all patterns, insert in single transaction (2ms for 100 vectors)
Full Precision Everywhere3GB memory for 1M vectors causes OOM on mobile/edge devicesApply binary quantization (96MB, 32x reduction) with <5% accuracy loss for memory-constrained environments
Ignoring Cache TuningCache too small = low hit rate, too large = memory waste and eviction overheadSet cacheSize based on workload: 100-500 (small), 500-2000 (medium), 2000-5000 (large). Monitor hit rate >80%
反模式问题解决方案
顺序插入100个向量需1秒,原因是单次数据库写入和索引更新的开销使用批量插入模式:收集所有模式,在单个事务中插入(100个向量耗时2ms)
全精度一刀切100万向量占用3GB内存,导致移动端/边缘设备出现OOM(内存不足)应用二进制量化(96MB,32倍压缩),仅损失<5%的精度,适用于内存受限环境
忽略缓存调优缓存过小→命中率低,缓存过大→内存浪费和淘汰开销根据工作负载设置cacheSize:小型应用100-500,中型500-2000,大型2000-5000。监控命中率>80%

Conclusion

总结

AgentDB Performance Optimization transforms vector search from memory-intensive, slow operations into production-ready systems capable of handling millions of vectors with sub-millisecond latency. By applying quantization strategies tailored to your accuracy requirements, enabling HNSW indexing for logarithmic search complexity, and implementing intelligent caching and batch operations, you achieve 150-12,500x performance improvements while reducing memory footprint by 4-32x.
Use this skill when scaling to large vector datasets (>10K vectors), deploying to memory-constrained environments (mobile, edge devices), or optimizing production systems requiring <10ms p99 latency. The key insight is strategic trade-offs: quantization trades minimal accuracy for massive memory savings, HNSW trades insertion time for exponentially faster search, and caching trades memory for latency reduction. Start with balanced configurations (scalar quantization, M=16, cacheSize=1000) and tune based on benchmarks for your specific workload.
AgentDB性能优化将内存密集、速度缓慢的向量搜索操作,转变为可处理数百万向量、亚毫秒级延迟的生产就绪系统。通过根据精度需求选择量化策略、启用HNSW索引实现对数级搜索复杂度,以及实施智能缓存和批量操作,你可实现150-12500倍的性能提升,同时将内存占用降低4-32倍。
当你需要扩展至大型向量数据集(>1万向量)、部署到内存受限环境(移动端、边缘设备),或优化要求p99延迟<10ms的生产系统时,可使用本技能。核心思路是策略性权衡:量化以微小精度损失换取巨大内存节省,HNSW以插入时间换取指数级更快的搜索,缓存以内存换取延迟降低。建议从平衡配置(标量量化、M=16、cacheSize=1000)开始,根据基准测试结果针对特定工作负载进行调优。