Search Results: prompt-caching

Found 13 Skills

AI & Machine Learningsickn33/antigravity-aweso...

prompt-caching

Caching strategies for LLM prompts including Anthropic prompt caching, response caching, and CAG (Cache Augmented Generation) Use when: prompt caching, cache prompt, response cache, cag, cache augmented.

🇺🇸|EnglishTranslated

AI & Machine Learninglobbi-docs/claude

prompt-caching

Prompt caching for Claude API to reduce latency by up to 85% and costs by up to 90%. Activate for cache_control, ephemeral caching, cache breakpoints, and performance optimization.

🇺🇸|EnglishTranslated

AI & Machine Learningsernote/audit-prompt-cach...

audit-prompt-caching

Use whenever the user mentions LLM prompt/prefix cache misses, cached_tokens=0, cache_read_input_tokens/cache_creation_input_tokens, prompt_cache_key, cache_control/cachePoint placement, stable prefixes, tool/schema stability, TTFT/prefill latency, OpenAI/Claude/Bedrock/OpenRouter routing, vLLM/SGLang KV reuse, or LLM cost/speed regressions on repeated long prompts. Use when reviewing LLM request shape changes: prompt text, message order, request builders, tools, schemas, response_format, provider API surface, model/router settings, agent loop structure, context compaction, or inference deployment. Use for speeding up agents only when prompt-cache stability, TTFT, or cache cost is central. Do not use for generic prompt writing, generic RAG design, token counting, or non-LLM performance.

🇺🇸|EnglishTranslated

8 scripts/Attention

AI & Machine Learninglwlee2608/agent-skills

writing-system-prompts

Use when writing or editing a system prompt for any LLM API or SDK (any code passing a `system=` / `system` role parameter, or a `.txt`/`.md` file holding such a prompt). Applies prompt-engineering and prompt-caching best practices.

🇺🇸|EnglishTranslated

AI & Machine Learningbobmatnyc/claude-mpm-skil...

session-compression

AI session compression techniques for managing multi-turn conversations efficiently through summarization, embedding-based retrieval, and intelligent context management.

🇺🇸|EnglishTranslated

AI & Machine Learningguanyang/antigravity-skil...

claude-api

Build, debug, and optimize Claude API / Anthropic SDK apps. Apps built with this skill should include prompt caching. Also handles migrating existing Claude API code between Claude model versions (4.5 → 4.6, 4.6 → 4.7, retired-model replacements). TRIGGER when: code imports `anthropic`/`@anthropic-ai/sdk`; user asks for the Claude API, Anthropic SDK, or Managed Agents; user adds/modifies/tunes a Claude feature (caching, thinking, compaction, tool use, batch, files, citations, memory) or model (Opus/Sonnet/Haiku) in a file; questions about prompt caching / cache hit rate in an Anthropic SDK project. SKIP: file imports `openai`/other-provider SDK, filename like `*-openai.py`/`*-generic.py`, provider-neutral code, general programming/ML.

🇺🇸|EnglishTranslated

AI & Machine Learningjezweb/claude-skills

claude-api

Build with Claude Messages API using structured outputs for guaranteed JSON schema validation. Covers prompt caching (90% savings), streaming SSE, tool use, and model deprecations. Prevents 16 documented errors. Use when: building chatbots/agents, troubleshooting rate_limit_error, prompt caching issues, streaming SSE parsing errors, MCP timeout issues, or structured output hallucinations.

🇺🇸|EnglishTranslated

12 scripts/Attention

AI & Machine Learningaffaan-m/everything-claud...

cost-aware-llm-pipeline

Cost optimization patterns for LLM API usage — model routing by task complexity, budget tracking, retry logic, and prompt caching.

🇺🇸|EnglishTranslated

AI & Machine Learningbagelhole/devops-security...

llm-cost-optimization

Reduce LLM API and infrastructure costs through model selection, prompt caching, batching, caching, quantization, and self-hosting strategies. Track spend by team and model, set budgets, and implement cost-aware routing.

🇺🇸|EnglishTranslated

AI & Machine Learningiammarcin/cc4life

openclaw-cost-optimization

Audit and optimize OpenClaw API costs. Applies six proven optimizations — model routing, prompt caching, lean context, local heartbeats, rate limits, and workspace trimming — to cut monthly spend by up to 90%. Use when asked to reduce costs, optimize tokens, audit API spend, or configure cost-saving settings.

🇺🇸|EnglishTranslated

AI & Machine Learningshimo4228/claude-code-lea...

cost-aware-llm-pipeline

Use when building an LLM-powered app that needs cost control via model routing, budget tracking, retry, and prompt caching.

🇺🇸|EnglishTranslated

AI & Machine Learningsecondsky/claude-skills

claude-api

Anthropic Messages API (Claude API) for integrations, streaming, prompt caching, tool use, vision. Use for chatbots, assistants, or encountering rate limits, 429 errors.

🇺🇸|EnglishTranslated

12 scripts/Attention