Search Results: llm-performance

Found 7 Skills

AI & Machine Learningjeffallan/claude-skills

prompt-engineer

Use when designing prompts for LLMs, optimizing model performance, building evaluation frameworks, or implementing advanced prompting techniques like chain-of-thought, few-shot learning, or structured outputs.

🇺🇸|EnglishTranslated

AI & Machine Learningvllm-project/vllm-skills

vllm-prefix-cache-bench

This is a skill for benchmarking the efficiency of automatic prefix caching in vLLM using fixed prompts, real-world datasets, or synthetic prefix/suffix patterns. Use when the user asks to benchmark prefix caching hit rate, caching efficiency, or repeated-prompt performance in vLLM.

🇺🇸|EnglishTranslated

AI & Machine Learningluongnv89/skills

ollama-optimizer

Optimize Ollama configuration for maximum performance on the current machine. Use when asked to "optimize Ollama", "configure Ollama", "speed up Ollama", "tune LLM performance", "setup local LLM", "fix Ollama performance", "Ollama running slow", or when users want to maximize inference speed, reduce memory usage, or select appropriate models for their hardware. Analyzes system hardware (GPU, RAM, CPU) and provides tailored recommendations.

🇺🇸|EnglishTranslated

2 scripts/Checked

AI & Machine Learningsernote/audit-prompt-cach...

audit-prompt-caching

Use whenever the user mentions LLM prompt/prefix cache misses, cached_tokens=0, cache_read_input_tokens/cache_creation_input_tokens, prompt_cache_key, cache_control/cachePoint placement, stable prefixes, tool/schema stability, TTFT/prefill latency, OpenAI/Claude/Bedrock/OpenRouter routing, vLLM/SGLang KV reuse, or LLM cost/speed regressions on repeated long prompts. Use when reviewing LLM request shape changes: prompt text, message order, request builders, tools, schemas, response_format, provider API surface, model/router settings, agent loop structure, context compaction, or inference deployment. Use for speeding up agents only when prompt-cache stability, TTFT, or cache cost is central. Do not use for generic prompt writing, generic RAG design, token counting, or non-LLM performance.

🇺🇸|EnglishTranslated

8 scripts/Attention

AI & Machine Learningomidzamani/dspy-skills

dspy-miprov2-optimizer

This skill should be used when the user asks to "optimize a DSPy program", "use MIPROv2", "tune instructions and demos", "get best DSPy performance", "run Bayesian optimization", mentions "state-of-the-art DSPy optimizer", "joint instruction tuning", or needs maximum performance from a DSPy program with substantial training data (200+ examples).

🇺🇸|EnglishTranslated

AI & Machine Learningeyadsibai/ltk

context-degradation

Use when diagnosing agent failures, debugging lost-in-middle issues, understanding context poisoning, or asking about "context degradation", "lost in middle", "context poisoning", "attention patterns", "context clash", "agent performance drops"

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

monkey-patch-kernels-to-transformers

Integrate TileGym kernels into Hugging Face `transformers` models by replacing the library's submodule(s) and certain class(es)' implementations, and patching certain class(es)' init/forward/load weight methods prior to instantiating models. Used when the user requires integrating TileGym kernels into `transformers` models.

🇺🇸|EnglishTranslated