Loading...
Loading...
Found 13 Skills
Caching strategies for LLM prompts including Anthropic prompt caching, response caching, and CAG (Cache Augmented Generation) Use when: prompt caching, cache prompt, response cache, cag, cache augmented.
Prompt caching for Claude API to reduce latency by up to 85% and costs by up to 90%. Activate for cache_control, ephemeral caching, cache breakpoints, and performance optimization.
Use whenever the user mentions LLM prompt/prefix cache misses, cached_tokens=0, cache_read_input_tokens/cache_creation_input_tokens, prompt_cache_key, cache_control/cachePoint placement, stable prefixes, tool/schema stability, TTFT/prefill latency, OpenAI/Claude/Bedrock/OpenRouter routing, vLLM/SGLang KV reuse, or LLM cost/speed regressions on repeated long prompts. Use when reviewing LLM request shape changes: prompt text, message order, request builders, tools, schemas, response_format, provider API surface, model/router settings, agent loop structure, context compaction, or inference deployment. Use for speeding up agents only when prompt-cache stability, TTFT, or cache cost is central. Do not use for generic prompt writing, generic RAG design, token counting, or non-LLM performance.
Use when writing or editing a system prompt for any LLM API or SDK (any code passing a `system=` / `system` role parameter, or a `.txt`/`.md` file holding such a prompt). Applies prompt-engineering and prompt-caching best practices.
AI session compression techniques for managing multi-turn conversations efficiently through summarization, embedding-based retrieval, and intelligent context management.
Build, debug, and optimize Claude API / Anthropic SDK apps. Apps built with this skill should include prompt caching. Also handles migrating existing Claude API code between Claude model versions (4.5 → 4.6, 4.6 → 4.7, retired-model replacements). TRIGGER when: code imports `anthropic`/`@anthropic-ai/sdk`; user asks for the Claude API, Anthropic SDK, or Managed Agents; user adds/modifies/tunes a Claude feature (caching, thinking, compaction, tool use, batch, files, citations, memory) or model (Opus/Sonnet/Haiku) in a file; questions about prompt caching / cache hit rate in an Anthropic SDK project. SKIP: file imports `openai`/other-provider SDK, filename like `*-openai.py`/`*-generic.py`, provider-neutral code, general programming/ML.
Build with Claude Messages API using structured outputs for guaranteed JSON schema validation. Covers prompt caching (90% savings), streaming SSE, tool use, and model deprecations. Prevents 16 documented errors. Use when: building chatbots/agents, troubleshooting rate_limit_error, prompt caching issues, streaming SSE parsing errors, MCP timeout issues, or structured output hallucinations.
Cost optimization patterns for LLM API usage — model routing by task complexity, budget tracking, retry logic, and prompt caching.
Reduce LLM API and infrastructure costs through model selection, prompt caching, batching, caching, quantization, and self-hosting strategies. Track spend by team and model, set budgets, and implement cost-aware routing.
Audit and optimize OpenClaw API costs. Applies six proven optimizations — model routing, prompt caching, lean context, local heartbeats, rate limits, and workspace trimming — to cut monthly spend by up to 90%. Use when asked to reduce costs, optimize tokens, audit API spend, or configure cost-saving settings.
Use when building an LLM-powered app that needs cost control via model routing, budget tracking, retry, and prompt caching.
Anthropic Messages API (Claude API) for integrations, streaming, prompt caching, tool use, vision. Use for chatbots, assistants, or encountering rate limits, 429 errors.