Loading...
Loading...
Found 1,279 Skills
Evaluates LLMs across 60+ academic benchmarks (MMLU, HumanEval, GSM8K, TruthfulQA, HellaSwag). Use when benchmarking model quality, comparing models, reporting academic results, or tracking training progress. Industry standard used by EleutherAI, HuggingFace, and major labs. Supports HuggingFace, vLLM, APIs.
Optimizes LLM inference with NVIDIA TensorRT for maximum throughput and lowest latency. Use for production deployment on NVIDIA GPUs (A100/H100), when you need 10-100x faster inference than PyTorch, or for serving models with quantization (FP8/INT4), in-flight batching, and multi-GPU scaling.
Provides PyTorch-native distributed LLM pretraining using torchtitan with 4D parallelism (FSDP2, TP, PP, CP). Use when pretraining Llama 3.1, DeepSeek V3, or custom models at scale from 8 to 512+ GPUs with Float8, torch.compile, and distributed checkpointing.
Build and maintain an LLM-curated personal knowledge base — the "LLM Wiki" pattern from Andrej Karpathy's April 2026 gist. Use this skill whenever the user wants to ingest a source (paper, article, transcript, PDF, notes) into a persistent compounding knowledge base, ask a question against accumulated notes, lint or audit such a base, or initialize a new one. Trigger on phrases like "add this to my wiki", "ingest this paper", "compile this into the knowledge base", "what does my wiki say about X", "lint the wiki", "build a knowledge base from these documents", "research notes", "second brain", "personal knowledge base", or any reference to LLM Wiki / OmegaWiki. Trigger even when the user does not say "wiki" — if they are accumulating sources over time and want them organized, this applies. The skill scales — sharded indexes, atomic pages, YAML frontmatter, and a bundled search script keep the wiki from becoming a context bottleneck at hundreds or thousands of pages.
Unified LLM torch-profiler triage skill for `sglang`, `vllm`, and `TensorRT-LLM`. Use it to inspect an existing `trace.json(.gz)` or profile directory, or to drive live profiling against a running server and return one three-table report with kernel, overlap-opportunity, and fuse-pattern tables.
Framework-independent LLM serving benchmark skill for comparing SGLang, vLLM, TensorRT-LLM, or another serving framework. Use when a user wants to find the best deployment command for one model across multiple serving frameworks under the same workload, GPU budget, and latency SLA.
LLM-as-a-judge HTTP/HTTPS proxy that secures AI agents by intercepting and evaluating outbound requests against security policies before they reach external APIs.
System prompt toolkit that removes AI slop and makes any LLM respond like a normal person — concise, direct, no filler.
CallMiner platform help — enterprise conversation analytics (Eureka) with omnichannel interaction capture, automated QA scoring, agent coaching, real-time alerts, compliance monitoring, and CX automation. Use when QA scoring is inconsistent or takes too long across agents, when needing to analyze 100% of customer interactions instead of sampling, when setting up automated compliance monitoring for regulated industries (healthcare, finance, collections), when CallMiner Coach scorecards aren't surfacing the right coaching moments, when CallMiner RealTime alerts aren't triggering during live calls, when ingesting audio or text into CallMiner via the Ingestion API, when CallMiner Analyze categories aren't matching expected interactions, or when evaluating CallMiner vs Observe.AI or NICE CXone analytics. Do NOT use for CCaaS platform selection (use /sales-ccaas-selection) or for sales-specific coaching strategy (use /sales-coaching).
Investigate LLM analytics clusters — understand usage patterns in AI/LLM traffic, compare cluster behavior, compute cost/latency metrics, and drill into individual traces within clusters.
Add PostHog LLM analytics to trace AI model usage. Use after implementing LLM features or reviewing PRs to ensure all generations are captured with token counts, latency, and costs. Also handles initial PostHog SDK setup if not yet installed.
Design prompts, schemas, validation, and recovery logic for reliable machine-readable model outputs. Use when generating JSON, typed objects, extraction results, tool arguments, or any output another system must parse safely.