Loading...
Loading...
Found 117 Skills
Evaluates LLMs across 100+ benchmarks from 18+ harnesses (MMLU, HumanEval, GSM8K, safety, VLM) with multi-backend execution. Use when needing scalable evaluation on local Docker, Slurm HPC, or cloud platforms. NVIDIA's enterprise-grade platform with container-first architecture for reproducible benchmarking.
Evaluates LLMs across 60+ academic benchmarks (MMLU, HumanEval, GSM8K, TruthfulQA, HellaSwag). Use when benchmarking model quality, comparing models, reporting academic results, or tracking training progress. Industry standard used by EleutherAI, HuggingFace, and major labs. Supports HuggingFace, vLLM, APIs.
Run Microsoft's eval-recipes benchmarks to validate amplihack improvements against baseline agents. Auto-activates when testing improvements, running evals, or benchmarking changes.
This skill should be used when profiling code, optimizing bottlenecks, benchmarking, or when "performance", "profiling", "optimization", or "--perf" are mentioned.
Use when evaluating LLMs, running benchmarks like MMLU/HumanEval/GSM8K, setting up evaluation pipelines, or asking about "NeMo Evaluator", "LLM benchmarking", "model evaluation", "MMLU", "HumanEval", "GSM8K", "benchmark harnesses"
Calculate engagement rates for creator posts and benchmark them against platform and tier averages. This skill should be used when calculating an influencer's engagement rate, benchmarking creator engagement against industry averages, evaluating whether a creator's engagement is above or below average for their tier, comparing engagement rates across platforms, checking if engagement rates suggest fake followers, auditing a creator's engagement quality before a partnership, analyzing engagement by content type (reels, stories, feed posts, TikTok videos), or assessing engagement trends across a creator's recent posts. For estimating fair market rates based on engagement, see creator-rate-estimator. For full creator vetting beyond engagement, see creator-vetting-scorecard. For scoring niche fit, see niche-fit-scorer.
Create a custom technical indicator using Numba JIT + NumPy. Generates production-grade, O(n) optimized indicator functions with charting and benchmarking.
Build institutional-grade comparable company analyses with operating metrics, valuation multiples, and statistical benchmarking in Excel/spreadsheet format. **Perfect for:** - Public company valuation (M&A, investment analysis) - Benchmarking performance vs. industry peers - Pricing IPOs or funding rounds - Identifying valuation outliers (over/under-valued) - Supporting investment committee presentations - Creating sector overview reports **Not ideal for:** - Private companies without comparable public peers - Highly diversified conglomerates - Distressed/bankrupt companies - Pre-revenue startups - Companies with unique business models
Use this skill when benchmarking compensation, designing equity plans, building leveling frameworks, or structuring total rewards. Triggers on compensation benchmarking, equity grants, stock options, leveling, pay bands, total rewards, salary ranges, and any task requiring compensation strategy or structure design.
Decompose Return on Equity into component ratios to identify performance drivers. Use for financial analysis, performance benchmarking, and identifying improvement opportunities.
Use this skill when load testing services, benchmarking API performance, planning capacity, or identifying bottlenecks under stress. Triggers on k6, Artillery, JMeter, load testing, stress testing, soak testing, spike testing, performance benchmarks, throughput testing, and any task requiring load or performance testing.
Golang benchmarking, profiling, and performance measurement. Use when writing, running, or comparing Go benchmarks, profiling hot paths with pprof, interpreting CPU/memory/trace profiles, analyzing results with benchstat, setting up CI benchmark regression detection, or investigating production performance with Prometheus runtime metrics. Also use when the developer needs deep analysis on a specific performance indicator - this skill provides the measurement methodology, while golang-performance provides the optimization patterns.