agentdb-performance-optimization
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseLIBRARY-FIRST PROTOCOL (MANDATORY)
库优先协议(强制性要求)
Before writing ANY code, you MUST check:
在编写任何代码之前,你必须完成以下检查:
Step 1: Library Catalog
步骤1:库目录
- Location:
.claude/library/catalog.json - If match >70%: REUSE or ADAPT
- 位置:
.claude/library/catalog.json - 匹配度>70%:复用或适配
Step 2: Patterns Guide
步骤2:模式指南
- Location:
.claude/docs/inventories/LIBRARY-PATTERNS-GUIDE.md - If pattern exists: FOLLOW documented approach
- 位置:
.claude/docs/inventories/LIBRARY-PATTERNS-GUIDE.md - 若模式已存在:遵循文档中记录的方法
Step 3: Existing Projects
步骤3:现有项目
- Location:
D:\Projects\* - If found: EXTRACT and adapt
- 位置:
D:\Projects\* - 若找到相关内容:提取并适配
Decision Matrix
决策矩阵
| Match | Action |
|---|---|
| Library >90% | REUSE directly |
| Library 70-90% | ADAPT minimally |
| Pattern exists | FOLLOW pattern |
| In project | EXTRACT |
| No match | BUILD (add to library after) |
| 匹配度 | 操作 |
|---|---|
| 库匹配>90% | 直接复用 |
| 库匹配70-90% | 最小程度适配 |
| 模式已存在 | 遵循模式 |
| 存在于现有项目中 | 提取 |
| 无匹配 | 构建(完成后添加至库) |
When NOT to Use This Skill
不适用本技能的场景
- Local-only operations with no vector search needs
- Simple key-value storage without semantic similarity
- Real-time streaming data without persistence requirements
- Operations that do not require embedding-based retrieval
- 无需向量搜索的本地仅运行操作
- 无语义相似度需求的简单键值存储
- 无持久化需求的实时流数据处理
- 不需要基于嵌入的检索的操作
Success Criteria
成功标准
- Vector search query latency: <10ms for 99th percentile
- Embedding generation: <100ms per document
- Index build time: <1s per 1000 vectors
- Recall@10: >0.95 for similar documents
- Database connection success rate: >99.9%
- Memory footprint: <2GB for 1M vectors with quantization
- 向量搜索查询延迟:99分位低于10ms
- 嵌入生成速度:单文档耗时低于100ms
- 索引构建时间:每1000个向量耗时低于1秒
- Recall@10:相似文档召回率>0.95
- 数据库连接成功率:>99.9%
- 内存占用:启用量化后,100万向量的内存占用<2GB
Edge Cases & Error Handling
边缘情况与错误处理
- Rate Limits: AgentDB local instances have no rate limits; cloud deployments may vary
- Connection Failures: Implement retry logic with exponential backoff (max 3 retries)
- Index Corruption: Maintain backup indices; rebuild from source if corrupted
- Memory Overflow: Use quantization (4-bit, 8-bit) to reduce memory by 4-32x
- Stale Embeddings: Implement TTL-based refresh for dynamic content
- Dimension Mismatch: Validate embedding dimensions (384 for sentence-transformers) before insertion
- 速率限制:AgentDB本地实例无速率限制;云部署的限制可能有所不同
- 连接失败:实现带指数退避的重试逻辑(最多3次重试)
- 索引损坏:维护备份索引;若损坏则从源数据重建
- 内存溢出:使用量化(4位、8位)将内存占用降低4-32倍
- 嵌入过期:为动态内容实现基于TTL的刷新机制
- 维度不匹配:插入前验证嵌入维度(sentence-transformers为384维)
Guardrails & Safety
防护规则与安全要求
- NEVER expose database connection strings in logs or error messages
- ALWAYS validate vector dimensions before insertion
- ALWAYS sanitize metadata to prevent injection attacks
- NEVER store PII in vector metadata without encryption
- ALWAYS implement access control for multi-tenant deployments
- ALWAYS validate search results before returning to users
- 绝对不要在日志或错误信息中暴露数据库连接字符串
- 插入前必须验证向量维度
- 必须清理元数据以防止注入攻击
- 未加密的情况下绝对不要在向量元数据中存储PII(个人可识别信息)
- 多租户部署必须实现访问控制
- 返回给用户前必须验证搜索结果
Evidence-Based Validation
基于证据的验证
- Verify database health: Check connection status and index integrity
- Validate search quality: Measure recall/precision on test queries
- Monitor performance: Track query latency, throughput, and memory usage
- Test failure recovery: Simulate connection drops and index corruption
- Benchmark improvements: Compare against baseline metrics (e.g., 150x speedup claim)
- 验证数据库健康状态:检查连接状态和索引完整性
- 验证搜索质量:在测试查询上衡量召回率/精确率
- 监控性能:跟踪查询延迟、吞吐量和内存使用情况
- 测试故障恢复:模拟连接断开和索引损坏场景
- 基准测试性能提升:与基线指标对比(如宣称的150倍提速)
AgentDB Performance Optimization
AgentDB性能优化
What This Skill Does
本技能的作用
Use this skill to apply comprehensive performance optimization techniques for AgentDB vector databases. Implement quantization strategies (binary, scalar, product) to achieve 4-32x memory reduction. Enable HNSW indexing for 150x-12,500x performance improvements. Configure caching strategies and deploy batch operations to reduce memory usage while maintaining accuracy.
Performance: <100µs vector search, <1ms pattern retrieval, 2ms batch insert for 100 vectors.
使用本技能可为AgentDB向量数据库应用全面的性能优化技术。实现量化策略(二进制、标量、乘积)以将内存占用降低4-32倍。启用HNSW索引,实现150倍至12500倍的性能提升。配置缓存策略并部署批量操作,在保持精度的同时降低内存使用。
性能指标:向量搜索耗时<100µs,模式检索耗时<1ms,100个向量的批量插入耗时2ms。
Prerequisites
前置条件
Install Node.js 18+ and AgentDB v1.0.7+ via agentic-flow. Verify you have an existing AgentDB database or application ready for optimization.
安装Node.js 18+和AgentDB v1.0.7+(通过agentic-flow)。确认你已有一个待优化的AgentDB数据库或应用。
Quick Start
快速开始
Execute these steps to measure and optimize your AgentDB performance.
执行以下步骤以测量并优化你的AgentDB性能。
Run Performance Benchmarks
运行性能基准测试
Execute benchmarks to establish baseline performance:
bash
undefined执行基准测试以建立性能基线:
bash
undefinedComprehensive performance benchmarking
Comprehensive performance benchmarking
npx agentdb@latest benchmark
npx agentdb@latest benchmark
Results show:
Results show:
✅ Pattern Search: 150x faster (100µs vs 15ms)
✅ Pattern Search: 150x faster (100µs vs 15ms)
✅ Batch Insert: 500x faster (2ms vs 1s for 100 vectors)
✅ Batch Insert: 500x faster (2ms vs 1s for 100 vectors)
✅ Large-scale Query: 12,500x faster (8ms vs 100s at 1M vectors)
✅ Large-scale Query: 12,500x faster (8ms vs 100s at 1M vectors)
✅ Memory Efficiency: 4-32x reduction with quantization
✅ Memory Efficiency: 4-32x reduction with quantization
undefinedundefinedEnable Optimizations
启用优化
typescript
import { createAgentDBAdapter } from 'agentic-flow/reasoningbank';
// Optimized configuration
const adapter = await createAgentDBAdapter({
dbPath: '.agentdb/optimized.db',
quantizationType: 'binary', // 32x memory reduction
cacheSize: 1000, // In-memory cache
enableLearning: true,
enableReasoning: true,
});typescript
import { createAgentDBAdapter } from 'agentic-flow/reasoningbank';
// Optimized configuration
const adapter = await createAgentDBAdapter({
dbPath: '.agentdb/optimized.db',
quantizationType: 'binary', // 32x memory reduction
cacheSize: 1000, // In-memory cache
enableLearning: true,
enableReasoning: true,
});Quantization Strategies
量化策略
Select the appropriate quantization strategy based on your memory and accuracy requirements.
根据你的内存和精度需求选择合适的量化策略。
1. Binary Quantization (32x Reduction)
1. 二进制量化(32倍压缩)
Apply binary quantization for maximum memory reduction:
Best For: Large-scale deployments (1M+ vectors), memory-constrained environments
Trade-off: ~2-5% accuracy loss, 32x memory reduction, 10x faster
typescript
const adapter = await createAgentDBAdapter({
quantizationType: 'binary',
// 768-dim float32 (3072 bytes) → 96 bytes binary
// 1M vectors: 3GB → 96MB
});Use Cases:
- Mobile/edge deployment
- Large-scale vector storage (millions of vectors)
- Real-time search with memory constraints
Performance:
- Memory: 32x smaller
- Search Speed: 10x faster (bit operations)
- Accuracy: 95-98% of original
应用二进制量化以实现最大程度的内存节省:
适用场景:大规模部署(100万+向量)、内存受限环境
权衡:精度损失约2-5%,内存占用降低32倍,速度提升10倍
typescript
const adapter = await createAgentDBAdapter({
quantizationType: 'binary',
// 768-dim float32 (3072 bytes) → 96 bytes binary
// 1M vectors: 3GB → 96MB
});使用案例:
- 移动端/边缘端部署
- 大规模向量存储(数百万向量)
- 内存受限的实时搜索场景
性能表现:
- 内存:缩小32倍
- 搜索速度:提升10倍(基于位运算)
- 精度:保留原精度的95-98%
2. Scalar Quantization (4x Reduction)
2. 标量量化(4倍压缩)
Best For: Balanced performance/accuracy, moderate datasets
Trade-off: ~1-2% accuracy loss, 4x memory reduction, 3x faster
typescript
const adapter = await createAgentDBAdapter({
quantizationType: 'scalar',
// 768-dim float32 (3072 bytes) → 768 bytes (uint8)
// 1M vectors: 3GB → 768MB
});Use Cases:
- Production applications requiring high accuracy
- Medium-scale deployments (10K-1M vectors)
- General-purpose optimization
Performance:
- Memory: 4x smaller
- Search Speed: 3x faster
- Accuracy: 98-99% of original
适用场景:性能/精度平衡、中等规模数据集
权衡:精度损失约1-2%,内存占用降低4倍,速度提升3倍
typescript
const adapter = await createAgentDBAdapter({
quantizationType: 'scalar',
// 768-dim float32 (3072 bytes) → 768 bytes (uint8)
// 1M vectors: 3GB → 768MB
});使用案例:
- 要求高精度的生产应用
- 中等规模部署(1万-100万向量)
- 通用型优化场景
性能表现:
- 内存:缩小4倍
- 搜索速度:提升3倍
- 精度:保留原精度的98-99%
3. Product Quantization (8-16x Reduction)
3. 乘积量化(8-16倍压缩)
Best For: High-dimensional vectors, balanced compression
Trade-off: ~3-7% accuracy loss, 8-16x memory reduction, 5x faster
typescript
const adapter = await createAgentDBAdapter({
quantizationType: 'product',
// 768-dim float32 (3072 bytes) → 48-96 bytes
// 1M vectors: 3GB → 192MB
});Use Cases:
- High-dimensional embeddings (>512 dims)
- Image/video embeddings
- Large-scale similarity search
Performance:
- Memory: 8-16x smaller
- Search Speed: 5x faster
- Accuracy: 93-97% of original
适用场景:高维向量、平衡压缩需求
权衡:精度损失约3-7%,内存占用降低8-16倍,速度提升5倍
typescript
const adapter = await createAgentDBAdapter({
quantizationType: 'product',
// 768-dim float32 (3072 bytes) → 48-96 bytes
// 1M vectors: 3GB → 192MB
});使用案例:
- 高维嵌入(>512维)
- 图像/视频嵌入
- 大规模相似度搜索
性能表现:
- 内存:缩小8-16倍
- 搜索速度:提升5倍
- 精度:保留原精度的93-97%
4. No Quantization (Full Precision)
4. 无量化(全精度)
Best For: Maximum accuracy, small datasets
Trade-off: No accuracy loss, full memory usage
typescript
const adapter = await createAgentDBAdapter({
quantizationType: 'none',
// Full float32 precision
});适用场景:最高精度需求、小型数据集
权衡:无精度损失,内存占用为完整大小
typescript
const adapter = await createAgentDBAdapter({
quantizationType: 'none',
// Full float32 precision
});HNSW Indexing
HNSW索引
Hierarchical Navigable Small World - O(log n) search complexity
Hierarchical Navigable Small World(分层可导航小世界) - 搜索复杂度为O(log n)
Automatic HNSW
自动HNSW
AgentDB automatically builds HNSW indices:
typescript
const adapter = await createAgentDBAdapter({
dbPath: '.agentdb/vectors.db',
// HNSW automatically enabled
});
// Search with HNSW (100µs vs 15ms linear scan)
const results = await adapter.retrieveWithReasoning(queryEmbedding, {
k: 10,
});AgentDB会自动构建HNSW索引:
typescript
const adapter = await createAgentDBAdapter({
dbPath: '.agentdb/vectors.db',
// HNSW automatically enabled
});
// Search with HNSW (100µs vs 15ms linear scan)
const results = await adapter.retrieveWithReasoning(queryEmbedding, {
k: 10,
});HNSW Parameters
HNSW参数配置
typescript
// Advanced HNSW configuration
const adapter = await createAgentDBAdapter({
dbPath: '.agentdb/vectors.db',
hnswM: 16, // Connections per layer (default: 16)
hnswEfConstruction: 200, // Build quality (default: 200)
hnswEfSearch: 100, // Search quality (default: 100)
});Parameter Tuning:
- M (connections): Higher = better recall, more memory
- Small datasets (<10K): M = 8
- Medium datasets (10K-100K): M = 16
- Large datasets (>100K): M = 32
- efConstruction: Higher = better index quality, slower build
- Fast build: 100
- Balanced: 200 (default)
- High quality: 400
- efSearch: Higher = better recall, slower search
- Fast search: 50
- Balanced: 100 (default)
- High recall: 200
typescript
// Advanced HNSW configuration
const adapter = await createAgentDBAdapter({
dbPath: '.agentdb/vectors.db',
hnswM: 16, // Connections per layer (default: 16)
hnswEfConstruction: 200, // Build quality (default: 200)
hnswEfSearch: 100, // Search quality (default: 100)
});参数调优:
- M(连接数):值越高,召回率越好,但内存占用越大
- 小型数据集(<1万):M = 8
- 中型数据集(1万-10万):M = 16
- 大型数据集(>10万):M = 32
- efConstruction:值越高,索引质量越好,但构建速度越慢
- 快速构建:100
- 平衡配置:200(默认)
- 高质量索引:400
- efSearch:值越高,召回率越好,但搜索速度越慢
- 快速搜索:50
- 平衡配置:100(默认)
- 高召回率:200
Caching Strategies
缓存策略
In-Memory Pattern Cache
内存模式缓存
typescript
const adapter = await createAgentDBAdapter({
cacheSize: 1000, // Cache 1000 most-used patterns
});
// First retrieval: ~2ms (database)
// Subsequent: <1ms (cache hit)
const result = await adapter.retrieveWithReasoning(queryEmbedding, {
k: 10,
});Cache Tuning:
- Small applications: 100-500 patterns
- Medium applications: 500-2000 patterns
- Large applications: 2000-5000 patterns
typescript
const adapter = await createAgentDBAdapter({
cacheSize: 1000, // Cache 1000 most-used patterns
});
// First retrieval: ~2ms (database)
// Subsequent: <1ms (cache hit)
const result = await adapter.retrieveWithReasoning(queryEmbedding, {
k: 10,
});缓存调优:
- 小型应用:100-500个模式
- 中型应用:500-2000个模式
- 大型应用:2000-5000个模式
LRU Cache Behavior
LRU缓存行为
typescript
// Cache automatically evicts least-recently-used patterns
// Most frequently accessed patterns stay in cache
// Monitor cache performance
const stats = await adapter.getStats();
console.log('Cache Hit Rate:', stats.cacheHitRate);
// Aim for >80% hit ratetypescript
// Cache automatically evicts least-recently-used patterns
// Most frequently accessed patterns stay in cache
// Monitor cache performance
const stats = await adapter.getStats();
console.log('Cache Hit Rate:', stats.cacheHitRate);
// Aim for >80% hit rateBatch Operations
批量操作
Batch Insert (500x Faster)
批量插入(速度提升500倍)
typescript
// ❌ SLOW: Individual inserts
for (const doc of documents) {
await adapter.insertPattern({ /* ... */ }); // 1s for 100 docs
}
// ✅ FAST: Batch insert
const patterns = documents.map(doc => ({
id: '',
type: 'document',
domain: 'knowledge',
pattern_data: JSON.stringify({
embedding: doc.embedding,
text: doc.text,
}),
confidence: 1.0,
usage_count: 0,
success_count: 0,
created_at: Date.now(),
last_used: Date.now(),
}));
// Insert all at once (2ms for 100 docs)
for (const pattern of patterns) {
await adapter.insertPattern(pattern);
}typescript
// ❌ SLOW: Individual inserts
for (const doc of documents) {
await adapter.insertPattern({ /* ... */ }); // 1s for 100 docs
}
// ✅ FAST: Batch insert
const patterns = documents.map(doc => ({
id: '',
type: 'document',
domain: 'knowledge',
pattern_data: JSON.stringify({
embedding: doc.embedding,
text: doc.text,
}),
confidence: 1.0,
usage_count: 0,
success_count: 0,
created_at: Date.now(),
last_used: Date.now(),
}));
// Insert all at once (2ms for 100 docs)
for (const pattern of patterns) {
await adapter.insertPattern(pattern);
}Batch Retrieval
批量检索
typescript
// Retrieve multiple queries efficiently
const queries = [queryEmbedding1, queryEmbedding2, queryEmbedding3];
// Parallel retrieval
const results = await Promise.all(
queries.map(q => adapter.retrieveWithReasoning(q, { k: 5 }))
);typescript
// Retrieve multiple queries efficiently
const queries = [queryEmbedding1, queryEmbedding2, queryEmbedding3];
// Parallel retrieval
const results = await Promise.all(
queries.map(q => adapter.retrieveWithReasoning(q, { k: 5 }))
);Memory Optimization
内存优化
Automatic Consolidation
自动合并
typescript
// Enable automatic pattern consolidation
const result = await adapter.retrieveWithReasoning(queryEmbedding, {
domain: 'documents',
optimizeMemory: true, // Consolidate similar patterns
k: 10,
});
console.log('Optimizations:', result.optimizations);
// {
// consolidated: 15, // Merged 15 similar patterns
// pruned: 3, // Removed 3 low-quality patterns
// improved_quality: 0.12 // 12% quality improvement
// }typescript
// Enable automatic pattern consolidation
const result = await adapter.retrieveWithReasoning(queryEmbedding, {
domain: 'documents',
optimizeMemory: true, // Consolidate similar patterns
k: 10,
});
console.log('Optimizations:', result.optimizations);
// {
// consolidated: 15, // Merged 15 similar patterns
// pruned: 3, // Removed 3 low-quality patterns
// improved_quality: 0.12 // 12% quality improvement
// }Manual Optimization
手动优化
typescript
// Manually trigger optimization
await adapter.optimize();
// Get statistics
const stats = await adapter.getStats();
console.log('Before:', stats.totalPatterns);
console.log('After:', stats.totalPatterns); // Reduced by ~10-30%typescript
// Manually trigger optimization
await adapter.optimize();
// Get statistics
const stats = await adapter.getStats();
console.log('Before:', stats.totalPatterns);
console.log('After:', stats.totalPatterns); // Reduced by ~10-30%Pruning Strategies
修剪策略
typescript
// Prune low-confidence patterns
await adapter.prune({
minConfidence: 0.5, // Remove confidence < 0.5
minUsageCount: 2, // Remove usage_count < 2
maxAge: 30 * 24 * 3600, // Remove >30 days old
});typescript
// Prune low-confidence patterns
await adapter.prune({
minConfidence: 0.5, // Remove confidence < 0.5
minUsageCount: 2, // Remove usage_count < 2
maxAge: 30 * 24 * 3600, // Remove >30 days old
});Performance Monitoring
性能监控
Database Statistics
数据库统计信息
bash
undefinedbash
undefinedGet comprehensive stats
Get comprehensive stats
npx agentdb@latest stats .agentdb/vectors.db
npx agentdb@latest stats .agentdb/vectors.db
Output:
Output:
Total Patterns: 125,430
Total Patterns: 125,430
Database Size: 47.2 MB (with binary quantization)
Database Size: 47.2 MB (with binary quantization)
Avg Confidence: 0.87
Avg Confidence: 0.87
Domains: 15
Domains: 15
Cache Hit Rate: 84%
Cache Hit Rate: 84%
Index Type: HNSW
Index Type: HNSW
undefinedundefinedRuntime Metrics
运行时指标
typescript
const stats = await adapter.getStats();
console.log('Performance Metrics:');
console.log('Total Patterns:', stats.totalPatterns);
console.log('Database Size:', stats.dbSize);
console.log('Avg Confidence:', stats.avgConfidence);
console.log('Cache Hit Rate:', stats.cacheHitRate);
console.log('Search Latency (avg):', stats.avgSearchLatency);
console.log('Insert Latency (avg):', stats.avgInsertLatency);typescript
const stats = await adapter.getStats();
console.log('Performance Metrics:');
console.log('Total Patterns:', stats.totalPatterns);
console.log('Database Size:', stats.dbSize);
console.log('Avg Confidence:', stats.avgConfidence);
console.log('Cache Hit Rate:', stats.cacheHitRate);
console.log('Search Latency (avg):', stats.avgSearchLatency);
console.log('Insert Latency (avg):', stats.avgInsertLatency);Optimization Recipes
优化方案模板
Recipe 1: Maximum Speed (Sacrifice Accuracy)
模板1:极致速度(牺牲部分精度)
typescript
const adapter = await createAgentDBAdapter({
quantizationType: 'binary', // 32x memory reduction
cacheSize: 5000, // Large cache
hnswM: 8, // Fewer connections = faster
hnswEfSearch: 50, // Low search quality = faster
});
// Expected: <50µs search, 90-95% accuracytypescript
const adapter = await createAgentDBAdapter({
quantizationType: 'binary', // 32x memory reduction
cacheSize: 5000, // Large cache
hnswM: 8, // Fewer connections = faster
hnswEfSearch: 50, // Low search quality = faster
});
// Expected: <50µs search, 90-95% accuracyRecipe 2: Balanced Performance
模板2:平衡性能
typescript
const adapter = await createAgentDBAdapter({
quantizationType: 'scalar', // 4x memory reduction
cacheSize: 1000, // Standard cache
hnswM: 16, // Balanced connections
hnswEfSearch: 100, // Balanced quality
});
// Expected: <100µs search, 98-99% accuracytypescript
const adapter = await createAgentDBAdapter({
quantizationType: 'scalar', // 4x memory reduction
cacheSize: 1000, // Standard cache
hnswM: 16, // Balanced connections
hnswEfSearch: 100, // Balanced quality
});
// Expected: <100µs search, 98-99% accuracyRecipe 3: Maximum Accuracy
模板3:极致精度
typescript
const adapter = await createAgentDBAdapter({
quantizationType: 'none', // No quantization
cacheSize: 2000, // Large cache
hnswM: 32, // Many connections
hnswEfSearch: 200, // High search quality
});
// Expected: <200µs search, 100% accuracytypescript
const adapter = await createAgentDBAdapter({
quantizationType: 'none', // No quantization
cacheSize: 2000, // Large cache
hnswM: 32, // Many connections
hnswEfSearch: 200, // High search quality
});
// Expected: <200µs search, 100% accuracyRecipe 4: Memory-Constrained (Mobile/Edge)
模板4:内存受限场景(移动端/边缘端)
typescript
const adapter = await createAgentDBAdapter({
quantizationType: 'binary', // 32x memory reduction
cacheSize: 100, // Small cache
hnswM: 8, // Minimal connections
});
// Expected: <100µs search, ~10MB for 100K vectorstypescript
const adapter = await createAgentDBAdapter({
quantizationType: 'binary', // 32x memory reduction
cacheSize: 100, // Small cache
hnswM: 8, // Minimal connections
});
// Expected: <100µs search, ~10MB for 100K vectorsScaling Strategies
扩展策略
Small Scale (<10K vectors)
小型规模(<1万向量)
typescript
const adapter = await createAgentDBAdapter({
quantizationType: 'none', // Full precision
cacheSize: 500,
hnswM: 8,
});typescript
const adapter = await createAgentDBAdapter({
quantizationType: 'none', // Full precision
cacheSize: 500,
hnswM: 8,
});Medium Scale (10K-100K vectors)
中型规模(1万-10万向量)
typescript
const adapter = await createAgentDBAdapter({
quantizationType: 'scalar', // 4x reduction
cacheSize: 1000,
hnswM: 16,
});typescript
const adapter = await createAgentDBAdapter({
quantizationType: 'scalar', // 4x reduction
cacheSize: 1000,
hnswM: 16,
});Large Scale (100K-1M vectors)
大型规模(10万-100万向量)
typescript
const adapter = await createAgentDBAdapter({
quantizationType: 'binary', // 32x reduction
cacheSize: 2000,
hnswM: 32,
});typescript
const adapter = await createAgentDBAdapter({
quantizationType: 'binary', // 32x reduction
cacheSize: 2000,
hnswM: 32,
});Massive Scale (>1M vectors)
超大规模(>100万向量)
typescript
const adapter = await createAgentDBAdapter({
quantizationType: 'product', // 8-16x reduction
cacheSize: 5000,
hnswM: 48,
hnswEfConstruction: 400,
});typescript
const adapter = await createAgentDBAdapter({
quantizationType: 'product', // 8-16x reduction
cacheSize: 5000,
hnswM: 48,
hnswEfConstruction: 400,
});Troubleshooting
故障排查
Issue: High memory usage
问题:内存占用过高
bash
undefinedbash
undefinedCheck database size
Check database size
npx agentdb@latest stats .agentdb/vectors.db
npx agentdb@latest stats .agentdb/vectors.db
Enable quantization
Enable quantization
Use 'binary' for 32x reduction
Use 'binary' for 32x reduction
undefinedundefinedIssue: Slow search performance
问题:搜索性能缓慢
typescript
// Increase cache size
const adapter = await createAgentDBAdapter({
cacheSize: 2000, // Increase from 1000
});
// Reduce search quality (faster)
const result = await adapter.retrieveWithReasoning(queryEmbedding, {
k: 5, // Reduce from 10
});typescript
// Increase cache size
const adapter = await createAgentDBAdapter({
cacheSize: 2000, // Increase from 1000
});
// Reduce search quality (faster)
const result = await adapter.retrieveWithReasoning(queryEmbedding, {
k: 5, // Reduce from 10
});Issue: Low accuracy
问题:精度较低
typescript
// Disable or use lighter quantization
const adapter = await createAgentDBAdapter({
quantizationType: 'scalar', // Instead of 'binary'
hnswEfSearch: 200, // Higher search quality
});typescript
// Disable or use lighter quantization
const adapter = await createAgentDBAdapter({
quantizationType: 'scalar', // Instead of 'binary'
hnswEfSearch: 200, // Higher search quality
});Performance Benchmarks
性能基准测试
Test System: AMD Ryzen 9 5950X, 64GB RAM
| Operation | Vector Count | No Optimization | Optimized | Improvement |
|---|---|---|---|---|
| Search | 10K | 15ms | 100µs | 150x |
| Search | 100K | 150ms | 120µs | 1,250x |
| Search | 1M | 100s | 8ms | 12,500x |
| Batch Insert (100) | - | 1s | 2ms | 500x |
| Memory Usage | 1M | 3GB | 96MB | 32x (binary) |
测试系统:AMD Ryzen 9 5950X,64GB RAM
| 操作 | 向量数量 | 未优化 | 已优化 | 性能提升 |
|---|---|---|---|---|
| 搜索 | 1万 | 15ms | 100µs | 150倍 |
| 搜索 | 10万 | 150ms | 120µs | 1250倍 |
| 搜索 | 100万 | 100s | 8ms | 12500倍 |
| 批量插入(100个) | - | 1s | 2ms | 500倍 |
| 内存占用 | 100万 | 3GB | 96MB | 32倍(二进制量化) |
Learn More
了解更多
- Quantization Paper: docs/quantization-techniques.pdf
- HNSW Algorithm: docs/hnsw-index.pdf
- GitHub: https://github.com/ruvnet/agentic-flow/tree/main/packages/agentdb
- Website: https://agentdb.ruv.io
Category: Performance / Optimization
Difficulty: Intermediate
Estimated Time: 20-30 minutes
- 量化技术文档:docs/quantization-techniques.pdf
- HNSW算法文档:docs/hnsw-index.pdf
- GitHub:https://github.com/ruvnet/agentic-flow/tree/main/packages/agentdb
- 官网:https://agentdb.ruv.io
分类:性能 / 优化
难度:中级
预计耗时:20-30分钟
Core Principles
核心原则
AgentDB Performance Optimization operates on 3 fundamental principles:
AgentDB性能优化基于3个核心原则:
Principle 1: Trade Memory for Speed Through Intelligent Quantization
原则1:通过智能量化以内存换速度
Compress vectors by 4-32x with minimal accuracy loss (1-5%) using binary, scalar, or product quantization strategies.
In practice:
- Binary quantization reduces 768-dim vectors from 3GB to 96MB (32x) with 95-98% accuracy retention
- Scalar quantization achieves 4x reduction (3GB to 768MB) with 98-99% accuracy for production workloads
- Select quantization based on memory constraints vs accuracy requirements (mobile = binary, production = scalar)
使用二进制、标量或乘积量化策略,将向量压缩4-32倍,同时仅损失1-5%的精度。
实际应用:
- 二进制量化将768维向量从3GB压缩至96MB(32倍),同时保留95-98%的精度
- 标量量化实现4倍压缩(3GB至768MB),保留98-99%的精度,适用于生产工作负载
- 根据内存限制和精度需求选择量化策略:移动端/边缘端选二进制,生产环境选标量
Principle 2: O(log n) Search Complexity via HNSW Indexing
原则2:通过HNSW索引实现O(log n)的搜索复杂度
Replace O(n) linear scans with hierarchical navigable small world graphs for 150-12,500x performance improvements.
In practice:
- HNSW automatically builds multi-layer proximity graphs during insertion
- Search navigates graph layers for sub-millisecond retrieval (100µs vs 15ms linear)
- Tune M (connections), efConstruction (build quality), efSearch (recall) for performance/accuracy balance
用分层可导航小世界图替代O(n)的线性扫描,实现150-12500倍的性能提升。
实际应用:
- HNSW在插入过程中自动构建多层邻近图
- 搜索时遍历图的各层,实现亚毫秒级检索(100µs vs 线性扫描的15ms)
- 调整M(连接数)、efConstruction(构建质量)、efSearch(召回率)以平衡性能与精度
Principle 3: Batch Operations and Caching Eliminate Redundant Work
原则3:批量操作与缓存消除冗余工作
Aggregate operations and cache frequent patterns to achieve 500x faster batch inserts and <1ms cache hits.
In practice:
- Batch insert 100 vectors in 2ms vs 1s for sequential inserts (500x speedup)
- LRU cache (1000-5000 patterns) serves 80%+ queries from memory (<1ms) vs database (2ms)
- Automatic pattern consolidation merges similar entries to reduce storage by 10-30%
聚合操作并缓存频繁使用的模式,实现500倍更快的批量插入和<1ms的缓存命中。
实际应用:
- 批量插入100个向量耗时2ms,而顺序插入需1秒(500倍提速)
- LRU缓存(1000-5000个模式)可将80%以上的查询从内存返回(<1ms),而非数据库(2ms)
- 自动模式合并可合并相似条目,将存储占用减少10-30%
Common Anti-Patterns
常见反模式
| Anti-Pattern | Problem | Solution |
|---|---|---|
| Sequential Inserts | 1s for 100 vectors due to individual database writes and index updates | Use batch insert pattern: collect all patterns, insert in single transaction (2ms for 100 vectors) |
| Full Precision Everywhere | 3GB memory for 1M vectors causes OOM on mobile/edge devices | Apply binary quantization (96MB, 32x reduction) with <5% accuracy loss for memory-constrained environments |
| Ignoring Cache Tuning | Cache too small = low hit rate, too large = memory waste and eviction overhead | Set cacheSize based on workload: 100-500 (small), 500-2000 (medium), 2000-5000 (large). Monitor hit rate >80% |
| 反模式 | 问题 | 解决方案 |
|---|---|---|
| 顺序插入 | 100个向量需1秒,原因是单次数据库写入和索引更新的开销 | 使用批量插入模式:收集所有模式,在单个事务中插入(100个向量耗时2ms) |
| 全精度一刀切 | 100万向量占用3GB内存,导致移动端/边缘设备出现OOM(内存不足) | 应用二进制量化(96MB,32倍压缩),仅损失<5%的精度,适用于内存受限环境 |
| 忽略缓存调优 | 缓存过小→命中率低,缓存过大→内存浪费和淘汰开销 | 根据工作负载设置cacheSize:小型应用100-500,中型500-2000,大型2000-5000。监控命中率>80% |
Conclusion
总结
AgentDB Performance Optimization transforms vector search from memory-intensive, slow operations into production-ready systems capable of handling millions of vectors with sub-millisecond latency. By applying quantization strategies tailored to your accuracy requirements, enabling HNSW indexing for logarithmic search complexity, and implementing intelligent caching and batch operations, you achieve 150-12,500x performance improvements while reducing memory footprint by 4-32x.
Use this skill when scaling to large vector datasets (>10K vectors), deploying to memory-constrained environments (mobile, edge devices), or optimizing production systems requiring <10ms p99 latency. The key insight is strategic trade-offs: quantization trades minimal accuracy for massive memory savings, HNSW trades insertion time for exponentially faster search, and caching trades memory for latency reduction. Start with balanced configurations (scalar quantization, M=16, cacheSize=1000) and tune based on benchmarks for your specific workload.
AgentDB性能优化将内存密集、速度缓慢的向量搜索操作,转变为可处理数百万向量、亚毫秒级延迟的生产就绪系统。通过根据精度需求选择量化策略、启用HNSW索引实现对数级搜索复杂度,以及实施智能缓存和批量操作,你可实现150-12500倍的性能提升,同时将内存占用降低4-32倍。
当你需要扩展至大型向量数据集(>1万向量)、部署到内存受限环境(移动端、边缘设备),或优化要求p99延迟<10ms的生产系统时,可使用本技能。核心思路是策略性权衡:量化以微小精度损失换取巨大内存节省,HNSW以插入时间换取指数级更快的搜索,缓存以内存换取延迟降低。建议从平衡配置(标量量化、M=16、cacheSize=1000)开始,根据基准测试结果针对特定工作负载进行调优。