搜索： llm-inference - AI Agent Skills

AI & Machine Learningskillssh/skills

agent-tools

Run 250+ AI apps via inference.sh CLI - image generation, video creation, LLMs, search, 3D, Twitter automation. Models: FLUX, Veo, Gemini, Grok, Claude, Seedance, OmniHuman, Tavily, Exa, OpenRouter, and many more. Use when running AI apps, generating images/videos, calling LLMs, web search, or automating Twitter. Triggers: inference.sh, infsh, ai model, run ai, serverless ai, ai api, flux, veo, claude api, image generation, video generation, openrouter, tavily, exa search, twitter api, grok

🇺🇸|EnglishTranslated

182.7k

AI & Machine Learningskillssh/skills

infsh-cli

Run 250+ AI apps via inference.sh CLI - image generation, video creation, LLMs, search, 3D, Twitter automation. Models: FLUX, Veo, Gemini, Grok, Claude, Seedance, OmniHuman, Tavily, Exa, OpenRouter, and many more. Use when running AI apps, generating images/videos, calling LLMs, web search, or automating Twitter. Triggers: inference.sh, infsh, ai model, run ai, serverless ai, ai api, flux, veo, claude api, image generation, video generation, openrouter, tavily, exa search, twitter api, grok

🇺🇸|EnglishTranslated

182.7k

AI & Machine Learninginference-sh/skills

inference-sh

Run 150+ AI apps via inference.sh CLI - image generation, video creation, LLMs, search, 3D, Twitter automation. Models: FLUX, Veo, Gemini, Grok, Claude, Seedance, OmniHuman, Tavily, Exa, OpenRouter, and many more. Use when running AI apps, generating images/videos, calling LLMs, web search, or automating Twitter. Triggers: inference.sh, infsh, ai model, run ai, serverless ai, ai api, flux, veo, claude api, image generation, video generation, openrouter, tavily, exa search, twitter api, grok

🇺🇸|EnglishTranslated

929

AI & Machine Learninginference-sh/skills

skills

Run 150+ AI apps via inference.sh CLI - image generation, video creation, LLMs, search, 3D, Twitter automation. Models: FLUX, Veo, Gemini, Grok, Claude, Seedance, OmniHuman, Tavily, Exa, OpenRouter, and many more. Use when running AI apps, generating images/videos, calling LLMs, web search, or automating Twitter. Triggers: inference.sh, infsh, ai model, run ai, serverless ai, ai api, flux, veo, claude api, image generation, video generation, openrouter, tavily, exa search, twitter api, grok

🇺🇸|EnglishTranslated

411

AI & Machine Learningdavila7/claude-code-templ...

nowait-reasoning-optimizer

Implements the NOWAIT technique for efficient reasoning in R1-style LLMs. Use when optimizing inference of reasoning models (QwQ, DeepSeek-R1, Phi4-Reasoning, Qwen3, Kimi-VL, QvQ), reducing chain-of-thought token usage by 27-51% while preserving accuracy. Triggers on "optimize reasoning", "reduce thinking tokens", "efficient inference", "suppress reflection tokens", or when working with verbose CoT outputs.

🇺🇸|EnglishTranslated

15

1 scripts/Checked

AI & Machine Learningdavila7/claude-code-templ...

serving-llms-vllm

Serves LLMs with high throughput using vLLM's PagedAttention and continuous batching. Use when deploying production LLM APIs, optimizing inference latency/throughput, or serving models with limited GPU memory. Supports OpenAI-compatible endpoints, quantization (GPTQ/AWQ/FP8), and tensor parallelism.

🇺🇸|EnglishTranslated

12

Code Qualityyonatangross/orchestkit

performance

Performance optimization patterns covering Core Web Vitals, React render optimization, lazy loading, image optimization, backend profiling, and LLM inference. Use when improving page speed, debugging slow renders, optimizing bundles, reducing image payload, profiling backend, or deploying LLMs efficiently.

🇺🇸|EnglishTranslated

10

7 scripts/Attention

AI & Machine Learningteam-telnyx/skills

telnyx-ai-inference-python

Access Telnyx LLM inference APIs, embeddings, and AI analytics for call insights and summaries. This skill provides Python SDK examples.

🇺🇸|EnglishTranslated

10

AI & Machine Learningaradotso/trending-skills

dflash-mlx-speculative-decoding

Lossless DFlash speculative decoding for MLX on Apple Silicon — 1.7–4x faster LLM inference using block diffusion drafting with target model verification.

🇺🇸|EnglishTranslated

10

AI & Machine Learningparcadei/continuous-claud...

agentica-server

Agentica server + Claude proxy setup - architecture, startup sequence, debugging

🇺🇸|EnglishTranslated

9

AI & Machine Learningjezweb/claude-skills

cloudflare-workers-ai

Run LLMs and AI models on Cloudflare's GPU network with Workers AI. Includes Llama 4, Gemma 3, Mistral 3.1, Flux images, BGE embeddings, streaming, and AI Gateway. Handles 2025 breaking changes. Prevents 7 documented errors. Use when: implementing LLM inference, images, RAG, or troubleshooting AI_ERROR, rate limits, max_tokens, BGE pooling, context window, neuron billing, Miniflare AI binding, NSFW filter, num_steps.

🇺🇸|EnglishTranslated

8

5 scripts/Attention

AI & Machine Learningscientiacapital/skills

groq-inference

Fast LLM inference with Groq API - chat, vision, audio STT/TTS, tool use. Use when: groq, fast inference, low latency, whisper, PlayAI TTS, Llama, vision API, tool calling, voice agents, real-time AI.

🇺🇸|EnglishTranslated

8

1 scripts/Checked

Search Results: llm-inference

agent-tools

infsh-cli

inference-sh

skills

nowait-reasoning-optimizer

serving-llms-vllm

performance

telnyx-ai-inference-python

dflash-mlx-speculative-decoding

agentica-server

cloudflare-workers-ai

groq-inference

Search Results: llm-inference

agent-tools

infsh-cli

inference-sh

skills

nowait-reasoning-optimizer

serving-llms-vllm

performance

telnyx-ai-inference-python

dflash-mlx-speculative-decoding

agentica-server

cloudflare-workers-ai

groq-inference