Search Results: ai-inference

Found 8 Skills

huggingface-lora-space-builder

Build and publish a Gradio demo on Hugging Face Spaces for a user-provided LoRA. Use when someone asks to create, generate, ship, or publish a Space, demo, Gradio app, or playground for a LoRA — including LoRAs for Qwen-Image, Qwen-Image-Edit, LTX-Video, Wan, FLUX, SDXL, or other diffusion base models. Also triggers when someone describes a LoRA they trained or hosts on the Hub and wants to share it. Covers picking the right base pipeline and `diffusers` inference recipe, designing a UI tailored to the LoRA's task and inputs (Union/multi-task control, edit, video, image, etc.), respecting model-card recommendations (trigger words, steps, guidance, LoRA scale, example inputs), and shipping to ZeroGPU hardware as a private Space by default.

🇺🇸|EnglishTranslated

AI & Machine Learningmckruz/comfyui-expert

comfyui-api

Connect to a running ComfyUI instance, queue workflows, monitor execution, and retrieve results. Supports both online (REST API) and offline (JSON export) modes. Use when executing ComfyUI workflows or checking server status.

🇺🇸|EnglishTranslated

AI & Machine Learningbbuf/sglang-auto-driven-s...

sglang-prod-incident-triage

Replay-first debug flow for SGLang serving problems. Use when a live or recent server shows health-check failures, latency or throughput regressions, queue growth, timeouts, distributed stalls, crash dumps, wrong outputs after deploys, or PD/EP/HiCache issues, and the job is to turn the problem into a replay plus the right next debug tool.

🇺🇸|EnglishTranslated

2 scripts/Attention

AI & Machine Learningjackspace/claudeskillz

cloudflare-workers-ai

Complete knowledge domain for Cloudflare Workers AI - Run AI models on serverless GPUs across Cloudflare's global network. Use when: implementing AI inference on Workers, running LLM models, generating text/images with AI, configuring Workers AI bindings, implementing AI streaming, using AI Gateway, integrating with embeddings/RAG systems, or encountering "AI_ERROR", rate limit errors, model not found, token limit exceeded, or neurons exceeded errors. Keywords: workers ai, cloudflare ai, ai bindings, llm workers, @cf/meta/llama, workers ai models, ai inference, cloudflare llm, ai streaming, text generation ai, ai embeddings, image generation ai, workers ai rag, ai gateway, llama workers, flux image generation, stable diffusion workers, vision models ai, ai chat completion, AI_ERROR, rate limit ai, model not found, token limit exceeded, neurons exceeded, ai quota exceeded, streaming failed, model unavailable, workers ai hono, ai gateway workers, vercel ai sdk workers, openai compatible workers, workers ai vectorize

🇺🇸|EnglishTranslated

AI & Machine Learningbbuf/sglang-auto-driven-s...

sglang-torch-profiler-analysis

Compact SGLang torch-profiler triage skill. Use when Codex should inspect an existing `trace.json(.gz)` or profile directory, trigger `sglang.profiler` against a live server, and return one compact report with kernel, overlap-opportunity, and fuse-pattern tables. Single-trace triage is enough for quick diagnosis; mapping+formal two-trace triage gives stronger overlap conclusions.

🇺🇸|EnglishTranslated

4 scripts/Checked

AI & Machine Learningpromptingcompany/nv-skill...

nemoclaw-user

User-facing NemoClaw guidance for installing, configuring, operating, securing, monitoring, and troubleshooting NemoClaw sandboxes. Use when users ask about NemoClaw quickstarts, OpenClaw and OpenShell relationships, local inference, remote GPU deployment, sandbox lifecycle, network policy, security posture, agent skills, command reference, or issue triage instructions.

🇺🇸|EnglishTranslated

AI & Machine Learningveniceai/skills

venice-api-overview

High-level map of the Venice.ai API - base URL, authentication modes, endpoint categories, response headers, pricing model, error shape, and versioning. Load this first when starting any Venice integration.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

exec-local-compile

Compile TensorRT-LLM on a compute node inside a Docker container. Use this when already on a compute node with GPUs visible.

🇺🇸|EnglishTranslated