Loading...
Loading...
Redis LangCache guidance for semantic caching of LLM responses on Redis Cloud — calling search/set via the SDK or REST API, tuning the similarity threshold, separating caches per task type, and filtering with custom attributes. Use when caching LLM completions or RAG answers to cut API cost and latency, building a cache-aside layer in front of OpenAI / Anthropic / etc., tuning hit rate vs precision, or splitting one app's LLM workloads into multiple LangCache caches.
npx skill4agent add redis/agent-skills redis-semantic-cacheLangCache is currently in preview on Redis Cloud. Features and behavior may change.
searchsetfrom langcache import LangCache
import os
lang_cache = LangCache(
server_url=f"https://{os.getenv('HOST')}",
cache_id=os.getenv("CACHE_ID"),
api_key=os.getenv("API_KEY"),
)
result = lang_cache.search(prompt="What is Redis?", similarity_threshold=0.9)
if result:
response = result[0]["response"]
else:
response = llm.generate("What is Redis?")
lang_cache.set(prompt="What is Redis?", response=response)POST /v1/caches/{cacheId}/entries/searchPOST /v1/caches/{cacheId}/entries| Threshold | Behavior | Use when |
|---|---|---|
| 0.95+ | Near-exact match required | Customer-facing answers where wrong responses are costly |
| 0.9 | Balanced default | Most workloads — start here |
| 0.8 | Loose semantic match | Internal tools, exploratory queries, FAQ deduplication |
# Stricter — fewer false positives
result = lang_cache.search(prompt="What is Redis?", similarity_threshold=0.95)
# Looser — higher hit rate
result = lang_cache.search(prompt="What is Redis?", similarity_threshold=0.8)support_cache = LangCache(server_url=..., cache_id="support-cache-id", api_key=...)
code_cache = LangCache(server_url=..., cache_id="code-cache-id", api_key=...){"category": "database"}