semantic-caching

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Semantic Caching

语义缓存

Cache LLM responses by semantic similarity.

Redis 8 Note: Redis 8+ includes Search, JSON, TimeSeries, and Bloom modules built-in. No separate Redis Stack installation is required. Use
redis:8
in Docker or any Redis 8+ deployment.

通过语义相似度缓存LLM响应。

Redis 8 说明： Redis 8+ 内置了Search、JSON、TimeSeries和Bloom模块，无需单独安装Redis Stack。在Docker或任何Redis 8+部署环境中使用
redis:8
即可。

Cache Hierarchy

缓存层级

Request → L1 (Exact) → L2 (Semantic) → L3 (Prompt) → L4 (LLM)
           ~1ms         ~10ms           ~2s          ~3s
         100% save    100% save       90% save    Full cost

请求 → L1（精确匹配）→ L2（语义匹配）→ L3（提示词）→ L4（LLM）
           ~1ms         ~10ms           ~2s          ~3s
         100% 成本节省    100% 成本节省       90% 成本节省    全额成本

Redis Semantic Cache

Redis语义缓存

python

from redisvl.index import SearchIndex
from redisvl.query import VectorQuery

class SemanticCacheService:
    def __init__(self, redis_url: str, threshold: float = 0.92):
        self.client = Redis.from_url(redis_url)
        self.threshold = threshold

    async def get(self, content: str, agent_type: str) -> dict | None:
        embedding = await embed_text(content[:2000])

        query = VectorQuery(
            vector=embedding,
            vector_field_name="embedding",
            filter_expression=f"@agent_type:{{{agent_type}}}",
            num_results=1
        )

        results = self.index.query(query)

        if results:
            distance = float(results[0].get("vector_distance", 1.0))
            if distance <= (1 - self.threshold):
                return json.loads(results[0]["response"])

        return None

    async def set(self, content: str, response: dict, agent_type: str):
        embedding = await embed_text(content[:2000])
        key = f"cache:{agent_type}:{hash_content(content)}"

        self.client.hset(key, mapping={
            "agent_type": agent_type,
            "embedding": embedding,
            "response": json.dumps(response),
            "created_at": time.time(),
        })
        self.client.expire(key, 86400)  # 24h TTL

python

from redisvl.index import SearchIndex
from redisvl.query import VectorQuery

class SemanticCacheService:
    def __init__(self, redis_url: str, threshold: float = 0.92):
        self.client = Redis.from_url(redis_url)
        self.threshold = threshold

    async def get(self, content: str, agent_type: str) -> dict | None:
        embedding = await embed_text(content[:2000])

        query = VectorQuery(
            vector=embedding,
            vector_field_name="embedding",
            filter_expression=f"@agent_type:{{{agent_type}}}",
            num_results=1
        )

        results = self.index.query(query)

        if results:
            distance = float(results[0].get("vector_distance", 1.0))
            if distance <= (1 - self.threshold):
                return json.loads(results[0]["response"])

        return None

    async def set(self, content: str, response: dict, agent_type: str):
        embedding = await embed_text(content[:2000])
        key = f"cache:{agent_type}:{hash_content(content)}"

        self.client.hset(key, mapping={
            "agent_type": agent_type,
            "embedding": embedding,
            "response": json.dumps(response),
            "created_at": time.time(),
        })
        self.client.expire(key, 86400)  # 24小时TTL

Similarity Thresholds

相似度阈值

Threshold	Distance	Use Case
0.98-1.00	0.00-0.02	Nearly identical
0.95-0.98	0.02-0.05	Very similar
0.92-0.95	0.05-0.08	Similar (default)
0.85-0.92	0.08-0.15	Moderately similar

阈值	距离	使用场景
0.98-1.00	0.00-0.02	几乎完全相同
0.95-0.98	0.02-0.05	高度相似
0.92-0.95	0.05-0.08	相似（默认值）
0.85-0.92	0.08-0.15	中度相似

Multi-Level Lookup

多级查询

python

async def get_llm_response(query: str, agent_type: str) -> dict:
    # L1: Exact match (in-memory LRU)
    cache_key = hash_content(query)
    if cache_key in lru_cache:
        return lru_cache[cache_key]

    # L2: Semantic similarity (Redis)
    similar = await semantic_cache.get(query, agent_type)
    if similar:
        lru_cache[cache_key] = similar  # Promote to L1
        return similar

    # L3/L4: LLM call with prompt caching
    response = await llm.generate(query)

    # Store in caches
    await semantic_cache.set(query, response, agent_type)
    lru_cache[cache_key] = response

    return response

python

async def get_llm_response(query: str, agent_type: str) -> dict:
    # L1：精确匹配（内存LRU缓存）
    cache_key = hash_content(query)
    if cache_key in lru_cache:
        return lru_cache[cache_key]

    # L2：语义相似度（Redis）
    similar = await semantic_cache.get(query, agent_type)
    if similar:
        lru_cache[cache_key] = similar  # 提升至L1缓存
        return similar

    # L3/L4：带提示词缓存的LLM调用
    response = await llm.generate(query)

    # 存入缓存
    await semantic_cache.set(query, response, agent_type)
    lru_cache[cache_key] = response

    return response

Redis 8.4+ Hybrid Search (FT.HYBRID)

Redis 8.4+ 混合搜索（FT.HYBRID）

Redis 8.4 introduces native hybrid search combining semantic (vector) and exact (keyword) matching in a single query. This is ideal for caches that need both similarity and metadata filtering.

python

undefined

Redis 8.4 引入了原生混合搜索功能，可在单个查询中结合语义（向量）和精确（关键词）匹配。对于同时需要相似度和元数据过滤的缓存场景来说，这是理想的选择。

python

undefined

Redis 8.4 native hybrid search

Redis 8.4 原生混合搜索

result = redis.execute_command( "FT.HYBRID", "llm_cache", "SEARCH", f"@agent_type:{{{agent_type}}}", "VSIM", "@embedding", "$query_vec", "KNN", "2", "K", "5", "COMBINE", "RRF", "4", "CONSTANT", "60", "PARAMS", "2", "query_vec", embedding_bytes )


**Hybrid Search Benefits:**
- Single query for keyword + vector matching
- RRF (Reciprocal Rank Fusion) combines scores intelligently
- Better results than sequential filtering
- BM25STD is now the default scorer for keyword matching

**When to Use Hybrid:**
- Filtering by metadata (agent_type, tenant, category) + semantic similarity
- Multi-tenant caches where exact tenant match is required
- Combining keyword search with vector similarity


**混合搜索优势：**
- 单个查询即可完成关键词+向量匹配
- RRF（ reciprocal Rank Fusion， reciprocal排序融合）智能合并评分
- 效果优于顺序过滤
- BM25STD 现为关键词匹配的默认评分器

**混合搜索适用场景：**
- 元数据（agent_type、租户、分类）过滤 + 语义相似度匹配
- 多租户缓存，要求精确匹配租户
- 关键词搜索与向量相似度结合

Key Decisions

关键决策建议

Decision	Recommendation
Threshold	Start at 0.92, tune based on hit rate
TTL	24h for production
Embedding	text-embedding-3-small (fast)
L1 size	1000-10000 entries
Scorer	BM25STD (Redis 8+ default)
Hybrid	Use FT.HYBRID for metadata + vector queries

决策项	推荐方案
阈值	从0.92开始，根据命中率调整
TTL	生产环境设置为24小时
嵌入模型	text-embedding-3-small（速度快）
L1缓存大小	1000-10000条记录
评分器	BM25STD（Redis 8+ 默认）
混合搜索	元数据+向量查询时使用FT.HYBRID

Common Mistakes

常见错误

Threshold too low (false positives)
No cache warming (cold start)
Missing metadata filters
Not promoting L2 hits to L1

阈值设置过低（误报）
未做缓存预热（冷启动问题）
缺失元数据过滤
未将L2命中结果提升至L1缓存

Related Skills

Capability Details

能力详情

redis-vector-cache

Keywords: redis, vector, embedding, similarity, cache Solves:

Cache LLM responses by semantic similarity
Reduce API costs with smart caching
Implement multi-level cache hierarchy

关键词： redis, vector, embedding, similarity, cache 解决问题：

通过语义相似度缓存LLM响应
借助智能缓存降低API成本
实现多级缓存架构

similarity-threshold

Keywords: threshold, similarity, tuning, cosine Solves:

Set appropriate similarity threshold
Balance hit rate vs accuracy
Tune cache performance

关键词： threshold, similarity, tuning, cosine 解决问题：

设置合适的相似度阈值
平衡命中率与准确性
优化缓存性能

orchestkit-integration

Keywords: orchestkit, integration, roi, cost-savings Solves:

Integrate caching with OrchestKit
Calculate ROI for caching
Production implementation guide

关键词： orchestkit, integration, roi, cost-savings 解决问题：

将缓存与OrchestKit集成
计算缓存的投资回报率（ROI）
生产环境实施指南

cache-service

Keywords: service, implementation, template, production Solves:

Production cache service template
Complete implementation example
Redis integration code

关键词： service, implementation, template, production 解决问题：

生产级缓存服务模板
完整的实现示例
Redis集成代码

hybrid-search

Keywords: hybrid, ft.hybrid, bm25, rrf, keyword, metadata, filter Solves:

Combine semantic and keyword search
Filter cache by metadata with vector similarity
Use Redis 8.4 FT.HYBRID command
BM25STD scoring for keyword matching

关键词： hybrid, ft.hybrid, bm25, rrf, keyword, metadata, filter 解决问题：

结合语义与关键词搜索
基于元数据过滤+向量相似度的缓存
使用Redis 8.4 FT.HYBRID命令
关键词匹配采用BM25STD评分

semantic-caching

Original

Translation

Semantic Caching

语义缓存

Cache Hierarchy

缓存层级

Redis Semantic Cache

Redis语义缓存

Similarity Thresholds

相似度阈值

Multi-Level Lookup

多级查询

Redis 8.4+ Hybrid Search (FT.HYBRID)

Redis 8.4+ 混合搜索（FT.HYBRID）

Redis 8.4 native hybrid search

Redis 8.4 原生混合搜索

Key Decisions

关键决策建议

Common Mistakes

常见错误

Related Skills

相关技能

Capability Details

能力详情

redis-vector-cache

redis-vector-cache

similarity-threshold

similarity-threshold

orchestkit-integration

orchestkit-integration

cache-service

cache-service

hybrid-search

hybrid-search