Search Results: llama.cpp

Found 13 Skills

AI & Machine Learningsickn33/antigravity-aweso...

local-llm-expert

Master local LLM inference, model selection, VRAM optimization, and local deployment using Ollama, llama.cpp, vLLM, and LM Studio. Expert in quantization formats (GGUF, EXL2) and local AI privacy.

🇺🇸|EnglishTranslated

AI & Machine Learningmartinholovsky/claude-ski...

llm-integration

Expert skill for integrating local Large Language Models using llama.cpp and Ollama. Covers secure model loading, inference optimization, prompt handling, and protection against LLM-specific vulnerabilities including prompt injection, model theft, and denial of service attacks.

🇺🇸|EnglishTranslated

AI & Machine Learningeyadsibai/ltk

llm-inference

Use when "LLM inference", "serving LLM", "vLLM", "llama.cpp", "GGUF", "text generation", "model serving", "inference optimization", "KV cache", "continuous batching", "speculative decoding", "local LLM", "CPU inference"

🇺🇸|EnglishTranslated

AI & Machine Learningdavila7/claude-code-templ...

gguf-quantization

GGUF format and llama.cpp quantization for efficient CPU/GPU inference. Use when deploying models on consumer hardware, Apple Silicon, or when needing flexible quantization from 2-8 bit without GPU requirements.

🇺🇸|EnglishTranslated

AI & Machine Learningdpearson2699/swift-ios-sk...

apple-on-device-ai

Use when integrating Foundation Models framework, implementing on-device AI with Apple Intelligence, building tool-calling AI features, working with guided generation schemas, converting models with Core ML and coremltools, or running open-source LLMs on Apple Silicon. Covers Foundation Models (LanguageModelSession, @Generable, @Guide, SystemLanguageModel, structured output, tool calling), Core ML (coremltools, model conversion, quantization, palettization, pruning, Neural Engine, MLTensor), MLX Swift (transformer inference, unified memory), and llama.cpp (GGUF, cross-platform LLM).

🇺🇸|EnglishTranslated

AI & Machine Learninghuggingface/skills

huggingface-local-models

Use to select models to run locally with llama.cpp and GGUF on CPU, Mac Metal, CUDA, or ROCm. Covers finding GGUFs, quant selection, running servers, exact GGUF file lookup, conversion, and OpenAI-compatible local serving.

🇺🇸|EnglishTranslated

AI & Machine Learningaradotso/trending-skills

mac-code-local-ai-agent

Run a free 35B AI coding agent on Apple Silicon Macs using local LLMs via llama.cpp or MLX with web search, shell, and file tools.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

jetson-llm-benchmark

Benchmark Jetson LLM/VLM serving performance across vLLM, llama.cpp, and Ollama with structured JSON output.

🇺🇸|EnglishTranslated

3 scripts/Attention

AI & Machine Learningaradotso/trending-skills

club-3090-llm-serving

Recipes and configs for serving LLMs locally on RTX 3090 GPUs using vLLM, llama.cpp, and SGLang with OpenAI-compatible API

🇺🇸|EnglishTranslated

AI & Machine Learningaradotso/ai-agent-skills

openmonoagent-local-ai-coding-agent

Local-first AI coding agent powered by llama.cpp with zero tokens costs, Docker sandboxing, 20 built-in tools, LSP/Roslyn intelligence, and MCP integration

🇺🇸|EnglishTranslated

AI & Machine Learningwinsorllc/upgraded-carniv...

local-llm-provider

Connect to local LLM endpoints (Ollama, llama.cpp, vLLM) with automatic provider fallback. Use when: (1) you need to run LLM inference locally for privacy/cost, (2) you want to use models not available via cloud APIs, (3) you need offline capability, (4) you want automatic fallback to cloud providers when local fails.

🇺🇸|EnglishTranslated

2 scripts/Checked

AI & Machine Learningnvidia/skills

jetson-inference-mem-tune

Pick the serving stack and per-runtime memory flags (vLLM, SGLang, llama.cpp, TensorRT Edge-LLM) for an LLM/VLM workload on any NVIDIA Jetson.

🇺🇸|EnglishTranslated

1 scripts/Checked