Search Results: kv-cache

Found 7 Skills

context-optimization

This skill should be used when the user asks to "optimize context", "reduce token costs", "improve context efficiency", "implement KV-cache optimization", "partition context", or mentions context limits, observation masking, context budgeting, or extending effective context capacity. A core context engineering skill — also activates when the user mentions "context engineering" or "context-engineering" in the context of maximizing information density within token constraints.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningeyadsibai/ltk

context-optimization

Use when optimizing agent context, reducing token costs, implementing KV-cache optimization, or asking about "context optimization", "token reduction", "context limits", "observation masking", "context budgeting", "context partitioning"

🇺🇸|EnglishTranslated

AI & Machine Learningcharleswiltgen/axiom

axiom-ios-ml

Use when deploying ANY machine learning model on-device, converting models to CoreML, compressing models, or implementing speech-to-text. Covers CoreML conversion, MLTensor, model compression (quantization/palettization/pruning), stateful models, KV-cache, multi-function models, async prediction, SpeechAnalyzer, SpeechTranscriber.

🇺🇸|EnglishTranslated

AI & Machine Learningguanyang/antigravity-skil...

latent-briefing

This skill should be used when the user asks to "share memory between agents", "KV cache compaction for multi-agent", "orchestrator worker context", "latent briefing", "reduce worker tokens", "cross-agent memory without summarization", or discusses Attention Matching compaction, recursive language models with workers, or token explosion in hierarchical agents.

🇺🇸|EnglishTranslated

AI & Machine Learningsickn33/antigravity-aweso...

context-optimization

Apply compaction, masking, and caching strategies

🇺🇸|EnglishTranslated

AI & Machine Learningshipshitdev/library

context-optimization

Apply optimization techniques to extend effective context capacity. Use when context limits constrain agent performance, when optimizing for cost or latency, or when implementing long-running agent systems.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningslowlyc/agent-gpu-skills

sglang-skill

Develop, debug, and optimize SGLang LLM serving engine. Use when the user mentions SGLang, sglang, srt, sgl-kernel, LLM serving, model inference, KV cache, attention backend, FlashInfer, MLA, MoE routing, speculative decoding, disaggregated serving, TP/PP/EP, radix cache, continuous batching, chunked prefill, CUDA graph, model loading, quantization FP8/GPTQ/AWQ, JIT kernel, triton kernel SGLang, or asks about serving LLMs with SGLang.

🇺🇸|EnglishTranslated

1 scripts/Attention