Loading...
Loading...
Found 8 Skills
Reduce LLM API and infrastructure costs through model selection, prompt caching, batching, caching, quantization, and self-hosting strategies. Track spend by team and model, set budgets, and implement cost-aware routing.
Cost optimization patterns for LLM API usage — model routing by task complexity, budget tracking, retry logic, and prompt caching.
Reduces LLM costs and improves response times through caching, model selection, batching, and prompt optimization. Provides cost breakdowns, latency hotspots, and configuration recommendations. Use for "cost reduction", "performance optimization", "latency improvement", or "efficiency".
Master of LLM Economic Orchestration, specialized in Google GenAI (Gemini 3), Context Caching, and High-Fidelity Token Engineering.
Audit your Claude Code setup for token waste and context bloat. Use when the user says "audit my context", "check my settings", "why is Claude so slow", "token optimization", "context audit", or runs /context-audit. Starts by running /context to see real overhead, then audits MCP servers, CLAUDE.md rules, skills, settings, and file permissions. Returns a health score with specific fixes.
Tracks cumulative LLM costs across DAG execution and makes real-time decisions to stay within budget. Downgrades models, skips optional nodes, or stops early when cost exceeds thresholds. Use when managing execution budgets, analyzing cost breakdowns, or optimizing model routing for cost. Activate on "cost budget", "too expensive", "reduce cost", "cost optimization", "model downgrade", "budget exceeded". NOT for LLM model selection logic (use llm-router), pricing comparisons across providers, or billing/invoicing.
Cost-conscious Claude Code mode. Reduces output tokens 40-70% and overall costs 30-60% by enforcing concise responses, smart model routing, and efficient workflow patterns. Keeps full technical accuracy. Activate with /cost-mode or "enable cost mode". Auto-triggers on mentions of budget, cost, tokens, or spending.
Redis semantic caching for LLM applications. Use when implementing vector similarity caching, optimizing LLM costs through cached responses, or building multi-level cache hierarchies.