搜索： llm-inference - AI Agent Skills

AI & Machine Learningskillssh/skills

agent-tools

Run 250+ AI apps via inference.sh CLI - image generation, video creation, LLMs, search, 3D, Twitter automation. Models: FLUX, Veo, Gemini, Grok, Claude, Seedance, OmniHuman, Tavily, Exa, OpenRouter, and many more. Use when running AI apps, generating images/videos, calling LLMs, web search, or automating Twitter. Triggers: inference.sh, infsh, ai model, run ai, serverless ai, ai api, flux, veo, claude api, image generation, video generation, openrouter, tavily, exa search, twitter api, grok

🇺🇸|EnglishTranslated

182.7k

AI & Machine Learningskillssh/skills

infsh-cli

Run 250+ AI apps via inference.sh CLI - image generation, video creation, LLMs, search, 3D, Twitter automation. Models: FLUX, Veo, Gemini, Grok, Claude, Seedance, OmniHuman, Tavily, Exa, OpenRouter, and many more. Use when running AI apps, generating images/videos, calling LLMs, web search, or automating Twitter. Triggers: inference.sh, infsh, ai model, run ai, serverless ai, ai api, flux, veo, claude api, image generation, video generation, openrouter, tavily, exa search, twitter api, grok

🇺🇸|EnglishTranslated

182.6k

AI & Machine Learninginference-sh/skills

inference-sh

Run 150+ AI apps via inference.sh CLI - image generation, video creation, LLMs, search, 3D, Twitter automation. Models: FLUX, Veo, Gemini, Grok, Claude, Seedance, OmniHuman, Tavily, Exa, OpenRouter, and many more. Use when running AI apps, generating images/videos, calling LLMs, web search, or automating Twitter. Triggers: inference.sh, infsh, ai model, run ai, serverless ai, ai api, flux, veo, claude api, image generation, video generation, openrouter, tavily, exa search, twitter api, grok

🇺🇸|EnglishTranslated

922

AI & Machine Learninginference-sh/skills

skills

Run 150+ AI apps via inference.sh CLI - image generation, video creation, LLMs, search, 3D, Twitter automation. Models: FLUX, Veo, Gemini, Grok, Claude, Seedance, OmniHuman, Tavily, Exa, OpenRouter, and many more. Use when running AI apps, generating images/videos, calling LLMs, web search, or automating Twitter. Triggers: inference.sh, infsh, ai model, run ai, serverless ai, ai api, flux, veo, claude api, image generation, video generation, openrouter, tavily, exa search, twitter api, grok

🇺🇸|EnglishTranslated

402

AI & Machine Learningdavila7/claude-code-templ...

serving-llms-vllm

Serves LLMs with high throughput using vLLM's PagedAttention and continuous batching. Use when deploying production LLM APIs, optimizing inference latency/throughput, or serving models with limited GPU memory. Supports OpenAI-compatible endpoints, quantization (GPTQ/AWQ/FP8), and tensor parallelism.

🇺🇸|EnglishTranslated

12

Code Qualityyonatangross/orchestkit

performance

Performance optimization patterns covering Core Web Vitals, React render optimization, lazy loading, image optimization, backend profiling, and LLM inference. Use when improving page speed, debugging slow renders, optimizing bundles, reducing image payload, profiling backend, or deploying LLMs efficiently.

🇺🇸|EnglishTranslated

8

7 scripts/Attention

AI & Machine Learningjezweb/claude-skills

cloudflare-workers-ai

Run LLMs and AI models on Cloudflare's GPU network with Workers AI. Includes Llama 4, Gemma 3, Mistral 3.1, Flux images, BGE embeddings, streaming, and AI Gateway. Handles 2025 breaking changes. Prevents 7 documented errors. Use when: implementing LLM inference, images, RAG, or troubleshooting AI_ERROR, rate limits, max_tokens, BGE pooling, context window, neuron billing, Miniflare AI binding, NSFW filter, num_steps.

🇺🇸|EnglishTranslated

7

5 scripts/Attention

AI & Machine Learningscientiacapital/skills

groq-inference

Fast LLM inference with Groq API - chat, vision, audio STT/TTS, tool use. Use when: groq, fast inference, low latency, whisper, PlayAI TTS, Llama, vision API, tool calling, voice agents, real-time AI.

🇺🇸|EnglishTranslated

7

1 scripts/Checked

AI & Machine Learningdatabricks/databricks-age...

databricks-model-serving

Manage Databricks Model Serving endpoints via CLI. Use when asked to create, configure, query, or manage model serving endpoints for LLM inference, custom models, or external models.

🇺🇸|EnglishTranslated

7

AI & Machine Learningdavila7/claude-code-templ...

nowait-reasoning-optimizer

Implements the NOWAIT technique for efficient reasoning in R1-style LLMs. Use when optimizing inference of reasoning models (QwQ, DeepSeek-R1, Phi4-Reasoning, Qwen3, Kimi-VL, QvQ), reducing chain-of-thought token usage by 27-51% while preserving accuracy. Triggers on "optimize reasoning", "reduce thinking tokens", "efficient inference", "suppress reflection tokens", or when working with verbose CoT outputs.

🇺🇸|EnglishTranslated

5

1 scripts/Checked

AI & Machine Learninghkuds/cli-anything

cli-anything-ollama

Command-line interface for Ollama - Local LLM inference and model management via Ollama REST API. Designed for AI agents and power users who need to manage models, generate text, chat, and create embeddings without a GUI.

🇺🇸|EnglishTranslated

5

AI & Machine Learningteam-telnyx/skills

telnyx-ai-inference-python

Access Telnyx LLM inference APIs, embeddings, and AI analytics for call insights and summaries. This skill provides Python SDK examples.

🇺🇸|EnglishTranslated

4

Search Results: llm-inference

agent-tools

infsh-cli

inference-sh

skills

serving-llms-vllm

performance

cloudflare-workers-ai

groq-inference

databricks-model-serving

nowait-reasoning-optimizer

cli-anything-ollama

telnyx-ai-inference-python

Search Results: llm-inference

agent-tools

infsh-cli

inference-sh

skills

serving-llms-vllm

performance

cloudflare-workers-ai

groq-inference

databricks-model-serving

nowait-reasoning-optimizer

cli-anything-ollama

telnyx-ai-inference-python