semantic-caching
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSemantic Caching
语义缓存
Cache LLM responses by semantic similarity.
Redis 8 Note: Redis 8+ includes Search, JSON, TimeSeries, and Bloom modules built-in. No separate Redis Stack installation is required. Usein Docker or any Redis 8+ deployment.redis:8
通过语义相似度缓存LLM响应。
Redis 8 说明: Redis 8+ 内置了Search、JSON、TimeSeries和Bloom模块,无需单独安装Redis Stack。在Docker或任何Redis 8+部署环境中使用即可。redis:8
Cache Hierarchy
缓存层级
Request → L1 (Exact) → L2 (Semantic) → L3 (Prompt) → L4 (LLM)
~1ms ~10ms ~2s ~3s
100% save 100% save 90% save Full cost请求 → L1(精确匹配)→ L2(语义匹配)→ L3(提示词)→ L4(LLM)
~1ms ~10ms ~2s ~3s
100% 成本节省 100% 成本节省 90% 成本节省 全额成本Redis Semantic Cache
Redis语义缓存
python
from redisvl.index import SearchIndex
from redisvl.query import VectorQuery
class SemanticCacheService:
def __init__(self, redis_url: str, threshold: float = 0.92):
self.client = Redis.from_url(redis_url)
self.threshold = threshold
async def get(self, content: str, agent_type: str) -> dict | None:
embedding = await embed_text(content[:2000])
query = VectorQuery(
vector=embedding,
vector_field_name="embedding",
filter_expression=f"@agent_type:{{{agent_type}}}",
num_results=1
)
results = self.index.query(query)
if results:
distance = float(results[0].get("vector_distance", 1.0))
if distance <= (1 - self.threshold):
return json.loads(results[0]["response"])
return None
async def set(self, content: str, response: dict, agent_type: str):
embedding = await embed_text(content[:2000])
key = f"cache:{agent_type}:{hash_content(content)}"
self.client.hset(key, mapping={
"agent_type": agent_type,
"embedding": embedding,
"response": json.dumps(response),
"created_at": time.time(),
})
self.client.expire(key, 86400) # 24h TTLpython
from redisvl.index import SearchIndex
from redisvl.query import VectorQuery
class SemanticCacheService:
def __init__(self, redis_url: str, threshold: float = 0.92):
self.client = Redis.from_url(redis_url)
self.threshold = threshold
async def get(self, content: str, agent_type: str) -> dict | None:
embedding = await embed_text(content[:2000])
query = VectorQuery(
vector=embedding,
vector_field_name="embedding",
filter_expression=f"@agent_type:{{{agent_type}}}",
num_results=1
)
results = self.index.query(query)
if results:
distance = float(results[0].get("vector_distance", 1.0))
if distance <= (1 - self.threshold):
return json.loads(results[0]["response"])
return None
async def set(self, content: str, response: dict, agent_type: str):
embedding = await embed_text(content[:2000])
key = f"cache:{agent_type}:{hash_content(content)}"
self.client.hset(key, mapping={
"agent_type": agent_type,
"embedding": embedding,
"response": json.dumps(response),
"created_at": time.time(),
})
self.client.expire(key, 86400) # 24小时TTLSimilarity Thresholds
相似度阈值
| Threshold | Distance | Use Case |
|---|---|---|
| 0.98-1.00 | 0.00-0.02 | Nearly identical |
| 0.95-0.98 | 0.02-0.05 | Very similar |
| 0.92-0.95 | 0.05-0.08 | Similar (default) |
| 0.85-0.92 | 0.08-0.15 | Moderately similar |
| 阈值 | 距离 | 使用场景 |
|---|---|---|
| 0.98-1.00 | 0.00-0.02 | 几乎完全相同 |
| 0.95-0.98 | 0.02-0.05 | 高度相似 |
| 0.92-0.95 | 0.05-0.08 | 相似(默认值) |
| 0.85-0.92 | 0.08-0.15 | 中度相似 |
Multi-Level Lookup
多级查询
python
async def get_llm_response(query: str, agent_type: str) -> dict:
# L1: Exact match (in-memory LRU)
cache_key = hash_content(query)
if cache_key in lru_cache:
return lru_cache[cache_key]
# L2: Semantic similarity (Redis)
similar = await semantic_cache.get(query, agent_type)
if similar:
lru_cache[cache_key] = similar # Promote to L1
return similar
# L3/L4: LLM call with prompt caching
response = await llm.generate(query)
# Store in caches
await semantic_cache.set(query, response, agent_type)
lru_cache[cache_key] = response
return responsepython
async def get_llm_response(query: str, agent_type: str) -> dict:
# L1:精确匹配(内存LRU缓存)
cache_key = hash_content(query)
if cache_key in lru_cache:
return lru_cache[cache_key]
# L2:语义相似度(Redis)
similar = await semantic_cache.get(query, agent_type)
if similar:
lru_cache[cache_key] = similar # 提升至L1缓存
return similar
# L3/L4:带提示词缓存的LLM调用
response = await llm.generate(query)
# 存入缓存
await semantic_cache.set(query, response, agent_type)
lru_cache[cache_key] = response
return responseRedis 8.4+ Hybrid Search (FT.HYBRID)
Redis 8.4+ 混合搜索(FT.HYBRID)
Redis 8.4 introduces native hybrid search combining semantic (vector) and exact (keyword) matching in a single query. This is ideal for caches that need both similarity and metadata filtering.
python
undefinedRedis 8.4 引入了原生混合搜索功能,可在单个查询中结合语义(向量)和精确(关键词)匹配。对于同时需要相似度和元数据过滤的缓存场景来说,这是理想的选择。
python
undefinedRedis 8.4 native hybrid search
Redis 8.4 原生混合搜索
result = redis.execute_command(
"FT.HYBRID", "llm_cache",
"SEARCH", f"@agent_type:{{{agent_type}}}",
"VSIM", "@embedding", "$query_vec",
"KNN", "2", "K", "5",
"COMBINE", "RRF", "4", "CONSTANT", "60",
"PARAMS", "2", "query_vec", embedding_bytes
)
**Hybrid Search Benefits:**
- Single query for keyword + vector matching
- RRF (Reciprocal Rank Fusion) combines scores intelligently
- Better results than sequential filtering
- BM25STD is now the default scorer for keyword matching
**When to Use Hybrid:**
- Filtering by metadata (agent_type, tenant, category) + semantic similarity
- Multi-tenant caches where exact tenant match is required
- Combining keyword search with vector similarityresult = redis.execute_command(
"FT.HYBRID", "llm_cache",
"SEARCH", f"@agent_type:{{{agent_type}}}",
"VSIM", "@embedding", "$query_vec",
"KNN", "2", "K", "5",
"COMBINE", "RRF", "4", "CONSTANT", "60",
"PARAMS", "2", "query_vec", embedding_bytes
)
**混合搜索优势:**
- 单个查询即可完成关键词+向量匹配
- RRF( reciprocal Rank Fusion, reciprocal排序融合)智能合并评分
- 效果优于顺序过滤
- BM25STD 现为关键词匹配的默认评分器
**混合搜索适用场景:**
- 元数据(agent_type、租户、分类)过滤 + 语义相似度匹配
- 多租户缓存,要求精确匹配租户
- 关键词搜索与向量相似度结合Key Decisions
关键决策建议
| Decision | Recommendation |
|---|---|
| Threshold | Start at 0.92, tune based on hit rate |
| TTL | 24h for production |
| Embedding | text-embedding-3-small (fast) |
| L1 size | 1000-10000 entries |
| Scorer | BM25STD (Redis 8+ default) |
| Hybrid | Use FT.HYBRID for metadata + vector queries |
| 决策项 | 推荐方案 |
|---|---|
| 阈值 | 从0.92开始,根据命中率调整 |
| TTL | 生产环境设置为24小时 |
| 嵌入模型 | text-embedding-3-small(速度快) |
| L1缓存大小 | 1000-10000条记录 |
| 评分器 | BM25STD(Redis 8+ 默认) |
| 混合搜索 | 元数据+向量查询时使用FT.HYBRID |
Common Mistakes
常见错误
- Threshold too low (false positives)
- No cache warming (cold start)
- Missing metadata filters
- Not promoting L2 hits to L1
- 阈值设置过低(误报)
- 未做缓存预热(冷启动问题)
- 缺失元数据过滤
- 未将L2命中结果提升至L1缓存
Related Skills
相关技能
- - Provider-native caching
prompt-caching - - Vector generation
embeddings - - Langfuse integration
cache-cost-tracking
- - 服务商原生缓存
prompt-caching - - 向量生成
embeddings - - Langfuse集成
cache-cost-tracking
Capability Details
能力详情
redis-vector-cache
redis-vector-cache
Keywords: redis, vector, embedding, similarity, cache
Solves:
- Cache LLM responses by semantic similarity
- Reduce API costs with smart caching
- Implement multi-level cache hierarchy
关键词: redis, vector, embedding, similarity, cache
解决问题:
- 通过语义相似度缓存LLM响应
- 借助智能缓存降低API成本
- 实现多级缓存架构
similarity-threshold
similarity-threshold
Keywords: threshold, similarity, tuning, cosine
Solves:
- Set appropriate similarity threshold
- Balance hit rate vs accuracy
- Tune cache performance
关键词: threshold, similarity, tuning, cosine
解决问题:
- 设置合适的相似度阈值
- 平衡命中率与准确性
- 优化缓存性能
orchestkit-integration
orchestkit-integration
Keywords: orchestkit, integration, roi, cost-savings
Solves:
- Integrate caching with OrchestKit
- Calculate ROI for caching
- Production implementation guide
关键词: orchestkit, integration, roi, cost-savings
解决问题:
- 将缓存与OrchestKit集成
- 计算缓存的投资回报率(ROI)
- 生产环境实施指南
cache-service
cache-service
Keywords: service, implementation, template, production
Solves:
- Production cache service template
- Complete implementation example
- Redis integration code
关键词: service, implementation, template, production
解决问题:
- 生产级缓存服务模板
- 完整的实现示例
- Redis集成代码
hybrid-search
hybrid-search
Keywords: hybrid, ft.hybrid, bm25, rrf, keyword, metadata, filter
Solves:
- Combine semantic and keyword search
- Filter cache by metadata with vector similarity
- Use Redis 8.4 FT.HYBRID command
- BM25STD scoring for keyword matching
关键词: hybrid, ft.hybrid, bm25, rrf, keyword, metadata, filter
解决问题:
- 结合语义与关键词搜索
- 基于元数据过滤+向量相似度的缓存
- 使用Redis 8.4 FT.HYBRID命令
- 关键词匹配采用BM25STD评分