Loading...
Loading...
Found 10 Skills
Reduce LLM API and infrastructure costs through model selection, prompt caching, batching, caching, quantization, and self-hosting strategies. Track spend by team and model, set budgets, and implement cost-aware routing.
Cost optimization patterns for LLM API usage — model routing by task complexity, budget tracking, retry logic, and prompt caching.
Audit your Claude Code setup for token waste and context bloat. Use when the user says "audit my context", "check my settings", "why is Claude so slow", "token optimization", "context audit", or runs /context-audit. Starts by running /context to see real overhead, then audits MCP servers, CLAUDE.md rules, skills, settings, and file permissions. Returns a health score with specific fixes.
Cost-conscious Claude Code mode. Reduces output tokens 40-70% and overall costs 30-60% by enforcing concise responses, smart model routing, and efficient workflow patterns. Keeps full technical accuracy. Activate with /cost-mode or "enable cost mode". Auto-triggers on mentions of budget, cost, tokens, or spending.
Master of LLM Economic Orchestration, specialized in Google GenAI (Gemini 3), Context Caching, and High-Fidelity Token Engineering.
Tracks cumulative LLM costs across DAG execution and makes real-time decisions to stay within budget. Downgrades models, skips optional nodes, or stops early when cost exceeds thresholds. Use when managing execution budgets, analyzing cost breakdowns, or optimizing model routing for cost. Activate on "cost budget", "too expensive", "reduce cost", "cost optimization", "model downgrade", "budget exceeded". NOT for LLM model selection logic (use llm-router), pricing comparisons across providers, or billing/invoicing.
Redis semantic caching for LLM applications. Use when implementing vector similarity caching, optimizing LLM costs through cached responses, or building multi-level cache hierarchies.
Reduces LLM costs and improves response times through caching, model selection, batching, and prompt optimization. Provides cost breakdowns, latency hotspots, and configuration recommendations. Use for "cost reduction", "performance optimization", "latency improvement", or "efficiency".
Track, optimize, and control token consumption across multi-agent systems. Covers budget allocation, real-time monitoring, cost attribution, per-agent limits, and proactive cost optimization for production LLM deployments.
Route low-risk coding tasks to cheaper LLMs while keeping Codex for high-risk decisions, using MCP tools for cost-aware delegation