Search Results: openai-compatible

Found 42 Skills

AI & Machine Learningvllm-project/vllm-skills

vllm-bench-serve

Benchmark vLLM or OpenAI-compatible serving endpoints using vllm bench serve. Supports multiple datasets (random, sharegpt, sonnet, HF), backends (openai, openai-chat, vllm-pooling, embeddings), throughput/latency testing with request-rate control, and result saving. Use when benchmarking LLM serving performance, measuring TTFT/TPOT, or load testing inference APIs.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

nemoclaw-user-configure-inference

Connects NemoClaw to a local inference server. Use when setting up Ollama, vLLM, TensorRT-LLM, NIM, or any OpenAI-compatible local model server with NemoClaw. Trigger keywords - nemoclaw local inference, ollama nemoclaw, vllm nemoclaw, local model server, openai compatible endpoint, switch nemoclaw inference model, change inference runtime, nemoclaw additional model, nemoclaw sub-agent model, openclaw sub-agent, agents.list, sessions_spawn, vlm-demo, nemoclaw tool calling, ollama tool calls, vllm tool-call-parser, raw json in tui, nemoclaw inference options, nemoclaw onboarding providers, nemoclaw inference routing.

🇺🇸|EnglishTranslated

AI & Machine Learninghuggingface/skills

huggingface-local-models

Use to select models to run locally with llama.cpp and GGUF on CPU, Mac Metal, CUDA, or ROCm. Covers finding GGUFs, quant selection, running servers, exact GGUF file lookup, conversion, and OpenAI-compatible local serving.

🇺🇸|EnglishTranslated

AI & Machine Learningdavila7/claude-code-templ...

serving-llms-vllm

Serves LLMs with high throughput using vLLM's PagedAttention and continuous batching. Use when deploying production LLM APIs, optimizing inference latency/throughput, or serving models with limited GPU memory. Supports OpenAI-compatible endpoints, quantization (GPTQ/AWQ/FP8), and tensor parallelism.

🇺🇸|EnglishTranslated

AI & Machine Learningcascade-protocol/agentbox

agentbox

Provision dedicated AI agents on AgentBox via x402 payment ($5 USDC on Solana). Use when creating cloud instances running OpenClaw AI gateways with HTTPS and web terminal. Requires Node.js and a Solana wallet.json with USDC funds. Covers: provisioning new instances, polling status, interacting via OpenAI-compatible chat completions, extending, and listing instances.

🇺🇸|EnglishTranslated

AI & Machine Learningnovitalabs/novita-skills

novita-ai

Novita AI: LLM, Image Generation & Editing, Video Generation, Audio (TTS/ASR), and GPU Cloud. Use this skill whenever the user wants to call Novita AI APIs — chat with LLMs (DeepSeek, Llama, Qwen), generate images (FLUX, Stable Diffusion, Seedream, Hunyuan Image), edit images (remove background, upscale, inpainting, img2img, outpainting, reimagine, merge face, replace background, remove text), generate videos (Kling, Wan, Hunyuan, Minimax Hailuo, Vidu, PixVerse, Seedance), do text-to-speech or speech-to-text (MiniMax TTS, GLM TTS, Fish Audio, ASR, voice cloning), run OpenAI-compatible batch jobs, manage GPU cloud instances and serverless endpoints, or check account balance and billing. Also trigger when the user mentions novita.ai, Novita AI, Novita API key, or wants to use any Novita platform service — even if they just say "generate an image" or "run an LLM" and Novita is available as a provider.

🇺🇸|EnglishTranslated

AI & Machine Learningbrave/brave-search-skills

answers

USE FOR AI-grounded answers via OpenAI-compatible /chat/completions. Two modes: single-search (fast) or deep research (enable_research=true, thorough multi-search). Streaming/blocking. Citations.

🇺🇸|EnglishTranslated

AI & Machine Learningnarcooo/inkos

inkos

Autonomous novel writing CLI agent - use for creative fiction writing, novel generation, style imitation, chapter continuation/import, EPUB export, and AIGC detection. Supports Chinese web novel genres (xuanhuan, xianxia, urban, horror, other) with multi-agent pipeline, two-phase writer (creative + settlement), 33-dimension auditing, token usage analytics, creative brief input, structured logging (JSON Lines), and custom OpenAI-compatible provider support.

🇺🇸|EnglishTranslated

AI & Machine Learningconardli/garden-skills

gpt-image-2

An image generation/editing Skill for GPT Image 2. It can be used in 3 environments: (A) Garden Local Mode: directly generate and save images via OpenAI-compatible APIs; (B) Host-Native Mode: treat this Skill as a prompt engineering guide, and pass the rendered prompt to the image tool built into the host Agent for image generation; (C) Advisor Mode: degrade to a high-quality prompt consultant when the host has no image tools. It covers 18 major categories and over 80 structured templates, including scenarios such as posters, UI, products, infographics, academic figures, technical architecture diagrams, comics, avatars, process boards, storyboards, IP peripherals, and editing workflows.

🇨🇳|ChineseTranslated

4 scripts/Attention

AI & Machine Learningnvidia/skills

rt-vlm

Use this skill when working with the RTVI VLM or RT-VLM microservice API on VSS 3.1. Generate dense captions and alerts for stored video files and live RTSP streams via `/v1/generate_captions_alerts`; upload media via `/v1/files`; add and remove live streams with `/v1/streams/add` and `/v1/streams/delete/{stream_id}`; call OpenAI-compatible `/v1/chat/completions`; consume Kafka caption, incident, and error topics; or debug rtvi-vlm responses. For deployment, read `references/deploy-rt-vlm-service.md` first.

🇺🇸|EnglishTranslated

AI & Machine Learningvm0-ai/vm0-skills

deepseek

DeepSeek AI large language model API via curl. Use this skill for chat completions, reasoning, and code generation with OpenAI-compatible endpoints.

🇺🇸|EnglishTranslated

AI & Machine Learningascend-ai-coding/awesome-...

vllm-ascend

vLLM Ascend plugin for LLM inference serving on Huawei Ascend NPU. Use for offline batch inference, API server deployment, quantization inference (with msmodelslim quantized models), tensor/pipeline parallelism for distributed serving, and OpenAI-compatible API endpoints. Supports Qwen, DeepSeek, GLM, LLaMA models with Ascend-optimized kernels.

🇺🇸|EnglishTranslated

3 scripts/Attention