Loading...
Loading...
Found 8 Skills
Local LLM operations with Ollama on Apple Silicon, including setup, model pulls, chat launchers, benchmarks, and diagnostics.
Runs LLM inference on CPU, Apple Silicon, and consumer GPUs without NVIDIA hardware. Use for edge deployment, M1/M2/M3 Macs, AMD/Intel GPUs, or when CUDA is unavailable. Supports GGUF quantization (1.5-8 bit) for reduced memory and 4-10× speedup vs PyTorch on CPU.
Running and fine-tuning LLMs on Apple Silicon with MLX. Use when working with models locally on Mac, converting Hugging Face models to MLX format, fine-tuning with LoRA/QLoRA on Apple Silicon, or serving models via HTTP API.
Run a free 35B AI coding agent on Apple Silicon Macs using local LLMs via llama.cpp or MLX with web search, shell, and file tools.
Lossless DFlash speculative decoding for MLX on Apple Silicon — 1.7–4x faster LLM inference using block diffusion drafting with target model verification.
Generate voice messages using local Qwen3-TTS (offline, Apple Silicon). Convert text to speech with customizable voices, emotions, and speed. Use when user asks for voice reply, audio, or TTS.
Run 397B parameter Mixture-of-Experts LLMs on a MacBook using pure C/Metal with SSD streaming
Local vision-language model for image analysis using SmolVLM-2B