agentdb-performance-optimization

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

LIBRARY-FIRST PROTOCOL (MANDATORY)

库优先协议（强制性要求）

Before writing ANY code, you MUST check:

在编写任何代码之前，你必须完成以下检查：

Step 1: Library Catalog

步骤1：库目录

Location:
```
.claude/library/catalog.json
```
If match >70%: REUSE or ADAPT

位置：
```
.claude/library/catalog.json
```
匹配度>70%：复用或适配

Step 2: Patterns Guide

步骤2：模式指南

Location:

.claude/docs/inventories/LIBRARY-PATTERNS-GUIDE.md

If pattern exists: FOLLOW documented approach

位置：

.claude/docs/inventories/LIBRARY-PATTERNS-GUIDE.md

若模式已存在：遵循文档中记录的方法

Step 3: Existing Projects

步骤3：现有项目

Location:
```
D:\Projects\*
```
If found: EXTRACT and adapt

位置：
```
D:\Projects\*
```
若找到相关内容：提取并适配

Decision Matrix

决策矩阵

Match	Action
Library >90%	REUSE directly
Library 70-90%	ADAPT minimally
Pattern exists	FOLLOW pattern
In project	EXTRACT
No match	BUILD (add to library after)

匹配度	操作
库匹配>90%	直接复用
库匹配70-90%	最小程度适配
模式已存在	遵循模式
存在于现有项目中	提取
无匹配	构建（完成后添加至库）

When NOT to Use This Skill

不适用本技能的场景

Local-only operations with no vector search needs
Simple key-value storage without semantic similarity
Real-time streaming data without persistence requirements
Operations that do not require embedding-based retrieval

无需向量搜索的本地仅运行操作
无语义相似度需求的简单键值存储
无持久化需求的实时流数据处理
不需要基于嵌入的检索的操作

Success Criteria

成功标准

Vector search query latency: <10ms for 99th percentile
Embedding generation: <100ms per document
Index build time: <1s per 1000 vectors
Recall@10: >0.95 for similar documents
Database connection success rate: >99.9%
Memory footprint: <2GB for 1M vectors with quantization

向量搜索查询延迟：99分位低于10ms
嵌入生成速度：单文档耗时低于100ms
索引构建时间：每1000个向量耗时低于1秒
Recall@10：相似文档召回率>0.95
数据库连接成功率：>99.9%
内存占用：启用量化后，100万向量的内存占用<2GB

Edge Cases & Error Handling

边缘情况与错误处理

Rate Limits: AgentDB local instances have no rate limits; cloud deployments may vary
Connection Failures: Implement retry logic with exponential backoff (max 3 retries)
Index Corruption: Maintain backup indices; rebuild from source if corrupted
Memory Overflow: Use quantization (4-bit, 8-bit) to reduce memory by 4-32x
Stale Embeddings: Implement TTL-based refresh for dynamic content
Dimension Mismatch: Validate embedding dimensions (384 for sentence-transformers) before insertion

速率限制：AgentDB本地实例无速率限制；云部署的限制可能有所不同
连接失败：实现带指数退避的重试逻辑（最多3次重试）
索引损坏：维护备份索引；若损坏则从源数据重建
内存溢出：使用量化（4位、8位）将内存占用降低4-32倍
嵌入过期：为动态内容实现基于TTL的刷新机制
维度不匹配：插入前验证嵌入维度（sentence-transformers为384维）

Guardrails & Safety

防护规则与安全要求

NEVER expose database connection strings in logs or error messages
ALWAYS validate vector dimensions before insertion
ALWAYS sanitize metadata to prevent injection attacks
NEVER store PII in vector metadata without encryption
ALWAYS implement access control for multi-tenant deployments
ALWAYS validate search results before returning to users

绝对不要在日志或错误信息中暴露数据库连接字符串
插入前必须验证向量维度
必须清理元数据以防止注入攻击
未加密的情况下绝对不要在向量元数据中存储PII（个人可识别信息）
多租户部署必须实现访问控制
返回给用户前必须验证搜索结果

Evidence-Based Validation

基于证据的验证

Verify database health: Check connection status and index integrity
Validate search quality: Measure recall/precision on test queries
Monitor performance: Track query latency, throughput, and memory usage
Test failure recovery: Simulate connection drops and index corruption
Benchmark improvements: Compare against baseline metrics (e.g., 150x speedup claim)

验证数据库健康状态：检查连接状态和索引完整性
验证搜索质量：在测试查询上衡量召回率/精确率
监控性能：跟踪查询延迟、吞吐量和内存使用情况
测试故障恢复：模拟连接断开和索引损坏场景
基准测试性能提升：与基线指标对比（如宣称的150倍提速）

AgentDB Performance Optimization

AgentDB性能优化

What This Skill Does

本技能的作用

Use this skill to apply comprehensive performance optimization techniques for AgentDB vector databases. Implement quantization strategies (binary, scalar, product) to achieve 4-32x memory reduction. Enable HNSW indexing for 150x-12,500x performance improvements. Configure caching strategies and deploy batch operations to reduce memory usage while maintaining accuracy.

Performance: <100µs vector search, <1ms pattern retrieval, 2ms batch insert for 100 vectors.

使用本技能可为AgentDB向量数据库应用全面的性能优化技术。实现量化策略（二进制、标量、乘积）以将内存占用降低4-32倍。启用HNSW索引，实现150倍至12500倍的性能提升。配置缓存策略并部署批量操作，在保持精度的同时降低内存使用。

性能指标：向量搜索耗时<100µs，模式检索耗时<1ms，100个向量的批量插入耗时2ms。

Prerequisites

前置条件

Install Node.js 18+ and AgentDB v1.0.7+ via agentic-flow. Verify you have an existing AgentDB database or application ready for optimization.

安装Node.js 18+和AgentDB v1.0.7+（通过agentic-flow）。确认你已有一个待优化的AgentDB数据库或应用。

Quick Start

快速开始

Execute these steps to measure and optimize your AgentDB performance.

执行以下步骤以测量并优化你的AgentDB性能。

Run Performance Benchmarks

运行性能基准测试

Execute benchmarks to establish baseline performance:

bash

undefined

执行基准测试以建立性能基线：

bash

undefined

Comprehensive performance benchmarking

npx agentdb@latest benchmark

Results show:

✅ Pattern Search: 150x faster (100µs vs 15ms)

✅ Batch Insert: 500x faster (2ms vs 1s for 100 vectors)

✅ Large-scale Query: 12,500x faster (8ms vs 100s at 1M vectors)

✅ Memory Efficiency: 4-32x reduction with quantization

undefined

undefined

Enable Optimizations

启用优化

typescript

import { createAgentDBAdapter } from 'agentic-flow/reasoningbank';

// Optimized configuration
const adapter = await createAgentDBAdapter({
  dbPath: '.agentdb/optimized.db',
  quantizationType: 'binary',   // 32x memory reduction
  cacheSize: 1000,               // In-memory cache
  enableLearning: true,
  enableReasoning: true,
});

typescript

import { createAgentDBAdapter } from 'agentic-flow/reasoningbank';

// Optimized configuration
const adapter = await createAgentDBAdapter({
  dbPath: '.agentdb/optimized.db',
  quantizationType: 'binary',   // 32x memory reduction
  cacheSize: 1000,               // In-memory cache
  enableLearning: true,
  enableReasoning: true,
});

Quantization Strategies

量化策略

Select the appropriate quantization strategy based on your memory and accuracy requirements.

根据你的内存和精度需求选择合适的量化策略。

1. Binary Quantization (32x Reduction)

1. 二进制量化（32倍压缩）

Apply binary quantization for maximum memory reduction:

Best For: Large-scale deployments (1M+ vectors), memory-constrained environments Trade-off: ~2-5% accuracy loss, 32x memory reduction, 10x faster

typescript

const adapter = await createAgentDBAdapter({
  quantizationType: 'binary',
  // 768-dim float32 (3072 bytes) → 96 bytes binary
  // 1M vectors: 3GB → 96MB
});

Use Cases:

Mobile/edge deployment
Large-scale vector storage (millions of vectors)
Real-time search with memory constraints

Performance:

Memory: 32x smaller
Search Speed: 10x faster (bit operations)
Accuracy: 95-98% of original

应用二进制量化以实现最大程度的内存节省：

适用场景：大规模部署（100万+向量）、内存受限环境权衡：精度损失约2-5%，内存占用降低32倍，速度提升10倍

typescript

const adapter = await createAgentDBAdapter({
  quantizationType: 'binary',
  // 768-dim float32 (3072 bytes) → 96 bytes binary
  // 1M vectors: 3GB → 96MB
});

使用案例:

移动端/边缘端部署
大规模向量存储（数百万向量）
内存受限的实时搜索场景

性能表现:

内存：缩小32倍
搜索速度：提升10倍（基于位运算）
精度：保留原精度的95-98%

2. Scalar Quantization (4x Reduction)

2. 标量量化（4倍压缩）

Best For: Balanced performance/accuracy, moderate datasets Trade-off: ~1-2% accuracy loss, 4x memory reduction, 3x faster

typescript

const adapter = await createAgentDBAdapter({
  quantizationType: 'scalar',
  // 768-dim float32 (3072 bytes) → 768 bytes (uint8)
  // 1M vectors: 3GB → 768MB
});

Use Cases:

Production applications requiring high accuracy
Medium-scale deployments (10K-1M vectors)
General-purpose optimization

Performance:

Memory: 4x smaller
Search Speed: 3x faster
Accuracy: 98-99% of original

适用场景：性能/精度平衡、中等规模数据集权衡：精度损失约1-2%，内存占用降低4倍，速度提升3倍

typescript

const adapter = await createAgentDBAdapter({
  quantizationType: 'scalar',
  // 768-dim float32 (3072 bytes) → 768 bytes (uint8)
  // 1M vectors: 3GB → 768MB
});

使用案例:

要求高精度的生产应用
中等规模部署（1万-100万向量）
通用型优化场景

性能表现:

内存：缩小4倍
搜索速度：提升3倍
精度：保留原精度的98-99%

3. Product Quantization (8-16x Reduction)

3. 乘积量化（8-16倍压缩）

Best For: High-dimensional vectors, balanced compression Trade-off: ~3-7% accuracy loss, 8-16x memory reduction, 5x faster

typescript

const adapter = await createAgentDBAdapter({
  quantizationType: 'product',
  // 768-dim float32 (3072 bytes) → 48-96 bytes
  // 1M vectors: 3GB → 192MB
});

Use Cases:

High-dimensional embeddings (>512 dims)
Image/video embeddings
Large-scale similarity search

Performance:

Memory: 8-16x smaller
Search Speed: 5x faster
Accuracy: 93-97% of original

适用场景：高维向量、平衡压缩需求权衡：精度损失约3-7%，内存占用降低8-16倍，速度提升5倍

typescript

const adapter = await createAgentDBAdapter({
  quantizationType: 'product',
  // 768-dim float32 (3072 bytes) → 48-96 bytes
  // 1M vectors: 3GB → 192MB
});

使用案例:

高维嵌入（>512维）
图像/视频嵌入
大规模相似度搜索

性能表现:

内存：缩小8-16倍
搜索速度：提升5倍
精度：保留原精度的93-97%

4. No Quantization (Full Precision)

4. 无量化（全精度）

Best For: Maximum accuracy, small datasets Trade-off: No accuracy loss, full memory usage

typescript

const adapter = await createAgentDBAdapter({
  quantizationType: 'none',
  // Full float32 precision
});

适用场景：最高精度需求、小型数据集权衡：无精度损失，内存占用为完整大小

typescript

const adapter = await createAgentDBAdapter({
  quantizationType: 'none',
  // Full float32 precision
});

HNSW Indexing

HNSW索引

Hierarchical Navigable Small World - O(log n) search complexity

Hierarchical Navigable Small World（分层可导航小世界） - 搜索复杂度为O(log n)

Automatic HNSW

自动HNSW

AgentDB automatically builds HNSW indices:

typescript

const adapter = await createAgentDBAdapter({
  dbPath: '.agentdb/vectors.db',
  // HNSW automatically enabled
});

// Search with HNSW (100µs vs 15ms linear scan)
const results = await adapter.retrieveWithReasoning(queryEmbedding, {
  k: 10,
});

AgentDB会自动构建HNSW索引：

typescript

const adapter = await createAgentDBAdapter({
  dbPath: '.agentdb/vectors.db',
  // HNSW automatically enabled
});

// Search with HNSW (100µs vs 15ms linear scan)
const results = await adapter.retrieveWithReasoning(queryEmbedding, {
  k: 10,
});

HNSW Parameters

HNSW参数配置

typescript

// Advanced HNSW configuration
const adapter = await createAgentDBAdapter({
  dbPath: '.agentdb/vectors.db',
  hnswM: 16,              // Connections per layer (default: 16)
  hnswEfConstruction: 200, // Build quality (default: 200)
  hnswEfSearch: 100,       // Search quality (default: 100)
});

Parameter Tuning:

M (connections): Higher = better recall, more memory
- Small datasets (<10K): M = 8
- Medium datasets (10K-100K): M = 16
- Large datasets (>100K): M = 32
efConstruction: Higher = better index quality, slower build
- Fast build: 100
- Balanced: 200 (default)
- High quality: 400
efSearch: Higher = better recall, slower search
- Fast search: 50
- Balanced: 100 (default)
- High recall: 200

typescript

// Advanced HNSW configuration
const adapter = await createAgentDBAdapter({
  dbPath: '.agentdb/vectors.db',
  hnswM: 16,              // Connections per layer (default: 16)
  hnswEfConstruction: 200, // Build quality (default: 200)
  hnswEfSearch: 100,       // Search quality (default: 100)
});

参数调优:

M（连接数）：值越高，召回率越好，但内存占用越大
- 小型数据集（<1万）：M = 8
- 中型数据集（1万-10万）：M = 16
- 大型数据集（>10万）：M = 32
efConstruction：值越高，索引质量越好，但构建速度越慢
- 快速构建：100
- 平衡配置：200（默认）
- 高质量索引：400
efSearch：值越高，召回率越好，但搜索速度越慢
- 快速搜索：50
- 平衡配置：100（默认）
- 高召回率：200

Caching Strategies

缓存策略

In-Memory Pattern Cache

内存模式缓存

typescript

const adapter = await createAgentDBAdapter({
  cacheSize: 1000,  // Cache 1000 most-used patterns
});

// First retrieval: ~2ms (database)
// Subsequent: <1ms (cache hit)
const result = await adapter.retrieveWithReasoning(queryEmbedding, {
  k: 10,
});

Cache Tuning:

Small applications: 100-500 patterns
Medium applications: 500-2000 patterns
Large applications: 2000-5000 patterns

typescript

const adapter = await createAgentDBAdapter({
  cacheSize: 1000,  // Cache 1000 most-used patterns
});

// First retrieval: ~2ms (database)
// Subsequent: <1ms (cache hit)
const result = await adapter.retrieveWithReasoning(queryEmbedding, {
  k: 10,
});

缓存调优:

小型应用：100-500个模式
中型应用：500-2000个模式
大型应用：2000-5000个模式

LRU Cache Behavior

LRU缓存行为

typescript

// Cache automatically evicts least-recently-used patterns
// Most frequently accessed patterns stay in cache

// Monitor cache performance
const stats = await adapter.getStats();
console.log('Cache Hit Rate:', stats.cacheHitRate);
// Aim for >80% hit rate

typescript

// Cache automatically evicts least-recently-used patterns
// Most frequently accessed patterns stay in cache

// Monitor cache performance
const stats = await adapter.getStats();
console.log('Cache Hit Rate:', stats.cacheHitRate);
// Aim for >80% hit rate

Batch Operations

批量操作

Batch Insert (500x Faster)

批量插入（速度提升500倍）

typescript

// ❌ SLOW: Individual inserts
for (const doc of documents) {
  await adapter.insertPattern({ /* ... */ });  // 1s for 100 docs
}

// ✅ FAST: Batch insert
const patterns = documents.map(doc => ({
  id: '',
  type: 'document',
  domain: 'knowledge',
  pattern_data: JSON.stringify({
    embedding: doc.embedding,
    text: doc.text,
  }),
  confidence: 1.0,
  usage_count: 0,
  success_count: 0,
  created_at: Date.now(),
  last_used: Date.now(),
}));

// Insert all at once (2ms for 100 docs)
for (const pattern of patterns) {
  await adapter.insertPattern(pattern);
}

typescript

// ❌ SLOW: Individual inserts
for (const doc of documents) {
  await adapter.insertPattern({ /* ... */ });  // 1s for 100 docs
}

// ✅ FAST: Batch insert
const patterns = documents.map(doc => ({
  id: '',
  type: 'document',
  domain: 'knowledge',
  pattern_data: JSON.stringify({
    embedding: doc.embedding,
    text: doc.text,
  }),
  confidence: 1.0,
  usage_count: 0,
  success_count: 0,
  created_at: Date.now(),
  last_used: Date.now(),
}));

// Insert all at once (2ms for 100 docs)
for (const pattern of patterns) {
  await adapter.insertPattern(pattern);
}

Batch Retrieval

批量检索

typescript

// Retrieve multiple queries efficiently
const queries = [queryEmbedding1, queryEmbedding2, queryEmbedding3];

// Parallel retrieval
const results = await Promise.all(
  queries.map(q => adapter.retrieveWithReasoning(q, { k: 5 }))
);

typescript

// Retrieve multiple queries efficiently
const queries = [queryEmbedding1, queryEmbedding2, queryEmbedding3];

// Parallel retrieval
const results = await Promise.all(
  queries.map(q => adapter.retrieveWithReasoning(q, { k: 5 }))
);

Memory Optimization

内存优化

Automatic Consolidation

自动合并

typescript

// Enable automatic pattern consolidation
const result = await adapter.retrieveWithReasoning(queryEmbedding, {
  domain: 'documents',
  optimizeMemory: true,  // Consolidate similar patterns
  k: 10,
});

console.log('Optimizations:', result.optimizations);
// {
//   consolidated: 15,  // Merged 15 similar patterns
//   pruned: 3,         // Removed 3 low-quality patterns
//   improved_quality: 0.12  // 12% quality improvement
// }

typescript

// Enable automatic pattern consolidation
const result = await adapter.retrieveWithReasoning(queryEmbedding, {
  domain: 'documents',
  optimizeMemory: true,  // Consolidate similar patterns
  k: 10,
});

console.log('Optimizations:', result.optimizations);
// {
//   consolidated: 15,  // Merged 15 similar patterns
//   pruned: 3,         // Removed 3 low-quality patterns
//   improved_quality: 0.12  // 12% quality improvement
// }

Manual Optimization

手动优化

typescript

// Manually trigger optimization
await adapter.optimize();

// Get statistics
const stats = await adapter.getStats();
console.log('Before:', stats.totalPatterns);
console.log('After:', stats.totalPatterns);  // Reduced by ~10-30%

typescript

// Manually trigger optimization
await adapter.optimize();

// Get statistics
const stats = await adapter.getStats();
console.log('Before:', stats.totalPatterns);
console.log('After:', stats.totalPatterns);  // Reduced by ~10-30%

Pruning Strategies

修剪策略

typescript

// Prune low-confidence patterns
await adapter.prune({
  minConfidence: 0.5,     // Remove confidence < 0.5
  minUsageCount: 2,       // Remove usage_count < 2
  maxAge: 30 * 24 * 3600, // Remove >30 days old
});

typescript

// Prune low-confidence patterns
await adapter.prune({
  minConfidence: 0.5,     // Remove confidence < 0.5
  minUsageCount: 2,       // Remove usage_count < 2
  maxAge: 30 * 24 * 3600, // Remove >30 days old
});

Performance Monitoring

性能监控

Database Statistics

数据库统计信息

bash

undefined

bash

undefined

Get comprehensive stats

npx agentdb@latest stats .agentdb/vectors.db

Output:

Total Patterns: 125,430

Database Size: 47.2 MB (with binary quantization)

Avg Confidence: 0.87

Domains: 15

Cache Hit Rate: 84%

Index Type: HNSW

undefined

undefined

Runtime Metrics

运行时指标

typescript

const stats = await adapter.getStats();

console.log('Performance Metrics:');
console.log('Total Patterns:', stats.totalPatterns);
console.log('Database Size:', stats.dbSize);
console.log('Avg Confidence:', stats.avgConfidence);
console.log('Cache Hit Rate:', stats.cacheHitRate);
console.log('Search Latency (avg):', stats.avgSearchLatency);
console.log('Insert Latency (avg):', stats.avgInsertLatency);

typescript

const stats = await adapter.getStats();

console.log('Performance Metrics:');
console.log('Total Patterns:', stats.totalPatterns);
console.log('Database Size:', stats.dbSize);
console.log('Avg Confidence:', stats.avgConfidence);
console.log('Cache Hit Rate:', stats.cacheHitRate);
console.log('Search Latency (avg):', stats.avgSearchLatency);
console.log('Insert Latency (avg):', stats.avgInsertLatency);

Optimization Recipes

优化方案模板

Recipe 1: Maximum Speed (Sacrifice Accuracy)

模板1：极致速度（牺牲部分精度）

typescript

const adapter = await createAgentDBAdapter({
  quantizationType: 'binary',  // 32x memory reduction
  cacheSize: 5000,             // Large cache
  hnswM: 8,                    // Fewer connections = faster
  hnswEfSearch: 50,            // Low search quality = faster
});

// Expected: <50µs search, 90-95% accuracy

typescript

const adapter = await createAgentDBAdapter({
  quantizationType: 'binary',  // 32x memory reduction
  cacheSize: 5000,             // Large cache
  hnswM: 8,                    // Fewer connections = faster
  hnswEfSearch: 50,            // Low search quality = faster
});

// Expected: <50µs search, 90-95% accuracy

Recipe 2: Balanced Performance

模板2：平衡性能

typescript

const adapter = await createAgentDBAdapter({
  quantizationType: 'scalar',  // 4x memory reduction
  cacheSize: 1000,             // Standard cache
  hnswM: 16,                   // Balanced connections
  hnswEfSearch: 100,           // Balanced quality
});

// Expected: <100µs search, 98-99% accuracy

typescript

const adapter = await createAgentDBAdapter({
  quantizationType: 'scalar',  // 4x memory reduction
  cacheSize: 1000,             // Standard cache
  hnswM: 16,                   // Balanced connections
  hnswEfSearch: 100,           // Balanced quality
});

// Expected: <100µs search, 98-99% accuracy

Recipe 3: Maximum Accuracy

模板3：极致精度

typescript

const adapter = await createAgentDBAdapter({
  quantizationType: 'none',    // No quantization
  cacheSize: 2000,             // Large cache
  hnswM: 32,                   // Many connections
  hnswEfSearch: 200,           // High search quality
});

// Expected: <200µs search, 100% accuracy

typescript

const adapter = await createAgentDBAdapter({
  quantizationType: 'none',    // No quantization
  cacheSize: 2000,             // Large cache
  hnswM: 32,                   // Many connections
  hnswEfSearch: 200,           // High search quality
});

// Expected: <200µs search, 100% accuracy

Recipe 4: Memory-Constrained (Mobile/Edge)

模板4：内存受限场景（移动端/边缘端）

typescript

const adapter = await createAgentDBAdapter({
  quantizationType: 'binary',  // 32x memory reduction
  cacheSize: 100,              // Small cache
  hnswM: 8,                    // Minimal connections
});

// Expected: <100µs search, ~10MB for 100K vectors

typescript

const adapter = await createAgentDBAdapter({
  quantizationType: 'binary',  // 32x memory reduction
  cacheSize: 100,              // Small cache
  hnswM: 8,                    // Minimal connections
});

// Expected: <100µs search, ~10MB for 100K vectors

Scaling Strategies

扩展策略

Small Scale (<10K vectors)

小型规模（<1万向量）

typescript

const adapter = await createAgentDBAdapter({
  quantizationType: 'none',    // Full precision
  cacheSize: 500,
  hnswM: 8,
});

typescript

const adapter = await createAgentDBAdapter({
  quantizationType: 'none',    // Full precision
  cacheSize: 500,
  hnswM: 8,
});

Medium Scale (10K-100K vectors)

中型规模（1万-10万向量）

typescript

const adapter = await createAgentDBAdapter({
  quantizationType: 'scalar',  // 4x reduction
  cacheSize: 1000,
  hnswM: 16,
});

typescript

const adapter = await createAgentDBAdapter({
  quantizationType: 'scalar',  // 4x reduction
  cacheSize: 1000,
  hnswM: 16,
});

Large Scale (100K-1M vectors)

大型规模（10万-100万向量）

typescript

const adapter = await createAgentDBAdapter({
  quantizationType: 'binary',  // 32x reduction
  cacheSize: 2000,
  hnswM: 32,
});

typescript

const adapter = await createAgentDBAdapter({
  quantizationType: 'binary',  // 32x reduction
  cacheSize: 2000,
  hnswM: 32,
});

Massive Scale (>1M vectors)

超大规模（>100万向量）

typescript

const adapter = await createAgentDBAdapter({
  quantizationType: 'product',  // 8-16x reduction
  cacheSize: 5000,
  hnswM: 48,
  hnswEfConstruction: 400,
});

typescript

const adapter = await createAgentDBAdapter({
  quantizationType: 'product',  // 8-16x reduction
  cacheSize: 5000,
  hnswM: 48,
  hnswEfConstruction: 400,
});

Troubleshooting

故障排查

Issue: High memory usage

问题：内存占用过高

bash

undefined

bash

undefined

Check database size

npx agentdb@latest stats .agentdb/vectors.db

Enable quantization

Use 'binary' for 32x reduction

undefined

undefined

Issue: Slow search performance

问题：搜索性能缓慢

typescript

// Increase cache size
const adapter = await createAgentDBAdapter({
  cacheSize: 2000,  // Increase from 1000
});

// Reduce search quality (faster)
const result = await adapter.retrieveWithReasoning(queryEmbedding, {
  k: 5,  // Reduce from 10
});

typescript

// Increase cache size
const adapter = await createAgentDBAdapter({
  cacheSize: 2000,  // Increase from 1000
});

// Reduce search quality (faster)
const result = await adapter.retrieveWithReasoning(queryEmbedding, {
  k: 5,  // Reduce from 10
});

Issue: Low accuracy

问题：精度较低

typescript

// Disable or use lighter quantization
const adapter = await createAgentDBAdapter({
  quantizationType: 'scalar',  // Instead of 'binary'
  hnswEfSearch: 200,           // Higher search quality
});

typescript

// Disable or use lighter quantization
const adapter = await createAgentDBAdapter({
  quantizationType: 'scalar',  // Instead of 'binary'
  hnswEfSearch: 200,           // Higher search quality
});

Performance Benchmarks

性能基准测试

Test System: AMD Ryzen 9 5950X, 64GB RAM

Operation	Vector Count	No Optimization	Optimized	Improvement
Search	10K	15ms	100µs	150x
Search	100K	150ms	120µs	1,250x
Search	1M	100s	8ms	12,500x
Batch Insert (100)	-	1s	2ms	500x
Memory Usage	1M	3GB	96MB	32x (binary)

测试系统：AMD Ryzen 9 5950X，64GB RAM

操作	向量数量	未优化	已优化	性能提升
搜索	1万	15ms	100µs	150倍
搜索	10万	150ms	120µs	1250倍
搜索	100万	100s	8ms	12500倍
批量插入（100个）	-	1s	2ms	500倍
内存占用	100万	3GB	96MB	32倍（二进制量化）

Learn More

了解更多

Quantization Paper: docs/quantization-techniques.pdf
HNSW Algorithm: docs/hnsw-index.pdf
GitHub: https://github.com/ruvnet/agentic-flow/tree/main/packages/agentdb
Website: https://agentdb.ruv.io

Category: Performance / Optimization Difficulty: Intermediate Estimated Time: 20-30 minutes

量化技术文档：docs/quantization-techniques.pdf
HNSW算法文档：docs/hnsw-index.pdf
GitHub：https://github.com/ruvnet/agentic-flow/tree/main/packages/agentdb
官网：https://agentdb.ruv.io

分类：性能 / 优化难度：中级 预计耗时：20-30分钟

Core Principles

核心原则

AgentDB Performance Optimization operates on 3 fundamental principles:

AgentDB性能优化基于3个核心原则：

Principle 1: Trade Memory for Speed Through Intelligent Quantization

原则1：通过智能量化以内存换速度

Compress vectors by 4-32x with minimal accuracy loss (1-5%) using binary, scalar, or product quantization strategies.

In practice:

Binary quantization reduces 768-dim vectors from 3GB to 96MB (32x) with 95-98% accuracy retention
Scalar quantization achieves 4x reduction (3GB to 768MB) with 98-99% accuracy for production workloads
Select quantization based on memory constraints vs accuracy requirements (mobile = binary, production = scalar)

使用二进制、标量或乘积量化策略，将向量压缩4-32倍，同时仅损失1-5%的精度。

实际应用：

二进制量化将768维向量从3GB压缩至96MB（32倍），同时保留95-98%的精度
标量量化实现4倍压缩（3GB至768MB），保留98-99%的精度，适用于生产工作负载
根据内存限制和精度需求选择量化策略：移动端/边缘端选二进制，生产环境选标量

Principle 2: O(log n) Search Complexity via HNSW Indexing

原则2：通过HNSW索引实现O(log n)的搜索复杂度

Replace O(n) linear scans with hierarchical navigable small world graphs for 150-12,500x performance improvements.

In practice:

HNSW automatically builds multi-layer proximity graphs during insertion
Search navigates graph layers for sub-millisecond retrieval (100µs vs 15ms linear)
Tune M (connections), efConstruction (build quality), efSearch (recall) for performance/accuracy balance

用分层可导航小世界图替代O(n)的线性扫描，实现150-12500倍的性能提升。

实际应用：

HNSW在插入过程中自动构建多层邻近图
搜索时遍历图的各层，实现亚毫秒级检索（100µs vs 线性扫描的15ms）
调整M（连接数）、efConstruction（构建质量）、efSearch（召回率）以平衡性能与精度

Principle 3: Batch Operations and Caching Eliminate Redundant Work

原则3：批量操作与缓存消除冗余工作

Aggregate operations and cache frequent patterns to achieve 500x faster batch inserts and <1ms cache hits.

In practice:

Batch insert 100 vectors in 2ms vs 1s for sequential inserts (500x speedup)
LRU cache (1000-5000 patterns) serves 80%+ queries from memory (<1ms) vs database (2ms)
Automatic pattern consolidation merges similar entries to reduce storage by 10-30%

聚合操作并缓存频繁使用的模式，实现500倍更快的批量插入和<1ms的缓存命中。

实际应用：

批量插入100个向量耗时2ms，而顺序插入需1秒（500倍提速）
LRU缓存（1000-5000个模式）可将80%以上的查询从内存返回（<1ms），而非数据库（2ms）
自动模式合并可合并相似条目，将存储占用减少10-30%

Common Anti-Patterns

常见反模式

Anti-Pattern	Problem	Solution
Sequential Inserts	1s for 100 vectors due to individual database writes and index updates	Use batch insert pattern: collect all patterns, insert in single transaction (2ms for 100 vectors)
Full Precision Everywhere	3GB memory for 1M vectors causes OOM on mobile/edge devices	Apply binary quantization (96MB, 32x reduction) with <5% accuracy loss for memory-constrained environments
Ignoring Cache Tuning	Cache too small = low hit rate, too large = memory waste and eviction overhead	Set cacheSize based on workload: 100-500 (small), 500-2000 (medium), 2000-5000 (large). Monitor hit rate >80%

反模式	问题	解决方案
顺序插入	100个向量需1秒，原因是单次数据库写入和索引更新的开销	使用批量插入模式：收集所有模式，在单个事务中插入（100个向量耗时2ms）
全精度一刀切	100万向量占用3GB内存，导致移动端/边缘设备出现OOM（内存不足）	应用二进制量化（96MB，32倍压缩），仅损失<5%的精度，适用于内存受限环境
忽略缓存调优	缓存过小→命中率低，缓存过大→内存浪费和淘汰开销	根据工作负载设置cacheSize：小型应用100-500，中型500-2000，大型2000-5000。监控命中率>80%

Conclusion

总结

AgentDB Performance Optimization transforms vector search from memory-intensive, slow operations into production-ready systems capable of handling millions of vectors with sub-millisecond latency. By applying quantization strategies tailored to your accuracy requirements, enabling HNSW indexing for logarithmic search complexity, and implementing intelligent caching and batch operations, you achieve 150-12,500x performance improvements while reducing memory footprint by 4-32x.

Use this skill when scaling to large vector datasets (>10K vectors), deploying to memory-constrained environments (mobile, edge devices), or optimizing production systems requiring <10ms p99 latency. The key insight is strategic trade-offs: quantization trades minimal accuracy for massive memory savings, HNSW trades insertion time for exponentially faster search, and caching trades memory for latency reduction. Start with balanced configurations (scalar quantization, M=16, cacheSize=1000) and tune based on benchmarks for your specific workload.

AgentDB性能优化将内存密集、速度缓慢的向量搜索操作，转变为可处理数百万向量、亚毫秒级延迟的生产就绪系统。通过根据精度需求选择量化策略、启用HNSW索引实现对数级搜索复杂度，以及实施智能缓存和批量操作，你可实现150-12500倍的性能提升，同时将内存占用降低4-32倍。

当你需要扩展至大型向量数据集（>1万向量）、部署到内存受限环境（移动端、边缘设备），或优化要求p99延迟<10ms的生产系统时，可使用本技能。核心思路是策略性权衡：量化以微小精度损失换取巨大内存节省，HNSW以插入时间换取指数级更快的搜索，缓存以内存换取延迟降低。建议从平衡配置（标量量化、M=16、cacheSize=1000）开始，根据基准测试结果针对特定工作负载进行调优。