Search Results: gemm

Found 12 Skills

AI & Machine Learningaradotso/trending-skills

gemma-tuner-multimodal

Fine-tune Gemma 4 and 3n models with audio, images, and text on Apple Silicon using PyTorch and Metal Performance Shaders.

🇺🇸|EnglishTranslated

AI & Machine Learningslowlyc/agent-gpu-skills

cutlass-skill

Write, debug, and optimize CUTLASS and CuTeDSL GPU kernels using local source code, examples, and header references. Use when the user mentions CUTLASS, CuTe, CuTeDSL, cute::Layout, cute::Tensor, TiledMMA, TiledCopy, CollectiveMainloop, CollectiveEpilogue, GEMM kernel, grouped GEMM, sparse GEMM, flash attention CUTLASS, blackwell GEMM, hopper GEMM, FP8 GEMM, blockwise scaling, MoE GEMM, StreamK, warp specialization CUTLASS, TMA CUTLASS, or asks about writing high-performance CUDA kernels with CUTLASS/CuTe templates.

🇺🇸|EnglishTranslated

1 scripts/Attention

AI & Machine Learninggoogle-gemma/gemma-skills

gemma-dev

Trigger this skill when building applications with Gemma or for general knowledge inquiries related to Gemma models (e.g. prompt structure, capabilities). Covers model selection, development workflows, and deployment best practices.

🇺🇸|EnglishTranslated

3 scripts/Checked

AI & Machine Learningnvidia/skills

kernel-cute-writing

Write and implement GPU kernels using NVIDIA CuTe DSL (CUTLASS 4.x Python API) — NOT for Triton, CUDA C++, or conceptual explanations. Trigger only when the user wants to write or implement a kernel, not when asking questions about CuTe DSL concepts or layouts. CuTe DSL uses cute.jit/cute.kernel decorators and cutlass.cute imports. Covers element-wise kernels, GEMM patterns, reductions, memory hierarchy (global/shared/register/TMA), MMA tensor core operations, software pipelining, and framework integration.

🇺🇸|EnglishTranslated

3 scripts/Attention

AI & Machine Learningjezweb/claude-skills

cloudflare-workers-ai

Run LLMs and AI models on Cloudflare's GPU network with Workers AI. Includes Llama 4, Gemma 3, Mistral 3.1, Flux images, BGE embeddings, streaming, and AI Gateway. Handles 2025 breaking changes. Prevents 7 documented errors. Use when: implementing LLM inference, images, RAG, or troubleshooting AI_ERROR, rate limits, max_tokens, BGE pooling, context window, neuron billing, Miniflare AI binding, NSFW filter, num_steps.

🇺🇸|EnglishTranslated

5 scripts/Attention

AI & Machine Learningaradotso/trending-skills

parlor-on-device-ai

On-device, real-time multimodal AI voice and vision assistant powered by Gemma 4 E2B and Kokoro TTS, running entirely locally via FastAPI WebSocket server.

🇺🇸|EnglishTranslated

AI & Machine Learningdavila7/claude-code-templ...

implementing-llms-litgpt

Implements and trains LLMs using Lightning AI's LitGPT with 20+ pretrained architectures (Llama, Gemma, Phi, Qwen, Mistral). Use when need clean model implementations, educational understanding of architectures, or production fine-tuning with LoRA/QLoRA. Single-file implementations, no abstraction layers.

🇺🇸|EnglishTranslated

Tools & Utilitiesnicmarti/skills-weaver

treasure-generator

Génère des trésors BFRPG par type (A-U). Pièces, gemmes, bijoux et objets magiques. Utilise les tables officielles avec probabilités. Indispensable après un combat victorieux.

🇺🇸|EnglishTranslated

AI & Machine Learningaradotso/trending-skills

pokeclaw-android-ai-agent

PokeClaw (PocketClaw) — on-device Android AI phone agent using Gemma 4 via LiteRT-LM with tool calling, accessibility automation, and optional cloud models.

🇺🇸|EnglishTranslated

AI & Machine Learningaradotso/trending-skills

gemma-gem-browser-ai

Build and extend Gemma Gem, an on-device AI browser assistant Chrome extension running Google's Gemma 4 model via WebGPU with no cloud dependencies.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

kernel-triton-writing

ONLY for OpenAI Triton (@triton.jit) kernel development. NEVER use for CUDA C++ kernels, TileIR, or profiling tools (ncu, nsys). The user's request must involve Triton explicitly. Covers Triton-specific patterns: fused elementwise, reductions (softmax, LayerNorm, RMSNorm), tiled GEMM with triton.autotune, and flash attention. Workflow: design, write, verify (with fast-path for explicit requests).

🇺🇸|EnglishTranslated

3 scripts/Attention

AI & Machine Learningpepperu96/hyper-mla

cute-dsl-ref

CuTe Python DSL API reference and implementation patterns for NVIDIA GPU kernel programming. Provides execution model, core API table, key constraints, common patterns, and documentation index. Use when: (1) writing or modifying CuTe DSL kernel code, (2) looking up CuTe DSL API syntax, (3) implementing attention/GEMM/MLA patterns in CuTe DSL, (4) understanding CuTe DSL execution model and compilation pipeline, (5) checking what CuTe DSL can and cannot do.

🇺🇸|EnglishTranslated