Search Results: benchmarking

Found 147 Skills

Tools & Utilitiesanthropics/knowledge-work...

comp-analysis

Analyze compensation — benchmarking, band placement, and equity modeling. Trigger with "what should we pay a [role]", "is this offer competitive", "model this equity grant", or when uploading comp data to find outliers and retention risks.

🇺🇸|EnglishTranslated

Backend Developmenteduardo-sl/go-agent-skill...

go-performance-review

Detect performance anti-patterns and apply optimization techniques in Go. Covers allocations, string handling, slice/map preallocation, sync.Pool, benchmarking, and profiling with pprof. Use when checking performance, finding slow code, reducing allocations, profiling, or reviewing hot paths. Trigger examples: "check performance", "find slow code", "reduce allocations", "benchmark this", "profile", "optimize Go code". Do NOT use for concurrency correctness (use go-concurrency-review) or general code style (use go-coding-standards).

🇺🇸|EnglishTranslated

Backend Developmenttursodatabase/turso

memory-benchmark

How to benchmark and analyze memory usage in Turso using the memory-benchmark crate and dhat heap profiler. Use this skill whenever the user mentions memory usage, memory profiling, allocation tracking, heap analysis, memory regression, memory benchmarking, dhat, or wants to understand where memory is being allocated during SQL workloads. Also use when investigating memory growth in WAL or MVCC mode. IMPORTANT - If you modify the perf/memory crate (add profiles, change CLI flags, change output format, etc.), update this skill document to reflect those changes so it stays accurate for future agents.

🇺🇸|EnglishTranslated

Testing & QAnvidia/skills

rag-perf

Performance benchmarking for a deployed NVIDIA RAG Blueprint server: profiling pass + aiperf load test driven by a single YAML config. Not for accuracy / RAGAS scoring (use rag-eval) or for deploying / repairing services (use rag-blueprint).

🇺🇸|EnglishTranslated

AI & Machine Learningkiterlin/intelligent-dete...

nemo-evaluator-sdk

Evaluates LLMs across 100+ benchmarks from 18+ harnesses (MMLU, HumanEval, GSM8K, safety, VLM) with multi-backend execution. Use when needing scalable evaluation on local Docker, Slurm HPC, or cloud platforms. NVIDIA's enterprise-grade platform with container-first architecture for reproducible benchmarking.

🇺🇸|EnglishTranslated

Tools & Utilitiesmanutej/luxor-claude-mark...

performance-benchmark-specialist

Performance benchmarking expertise for shell tools, covering benchmark design, statistical analysis (min/max/mean/median/stddev), performance targets (<100ms, >90% hit rate), workspace generation, and comprehensive reporting

🇺🇸|EnglishTranslated

AI & Machine Learningrysweet/amplihack

eval-recipes-runner

Run Microsoft's eval-recipes benchmarks to validate amplihack improvements against baseline agents. Auto-activates when testing improvements, running evals, or benchmarking changes.

🇺🇸|EnglishTranslated

Code Qualityynulihao/agentskillos

performance-optimization

Apply systematic performance optimization techniques when writing or reviewing code. Use when optimizing hot paths, reducing latency, improving throughput, fixing performance regressions, or when the user mentions performance, optimization, speed, latency, throughput, profiling, or benchmarking.

🇺🇸|EnglishTranslated

Marketing & Growthsales-skills/sales

sales-plutoba

PlutoBa platform help — AI influencer vetting and creator due diligence across TikTok, Instagram, and YouTube. Covers PlutoBa Score (7-dimension assessment), Deep Assessments (100+ posts, 300+ comments), fake follower detection, audience authenticity, brand safety risk scoring, rate benchmarking, AI-powered creator outreach, creator CRM, and campaign briefs. Use when worried an influencer's followers are fake, need to check if a creator is brand-safe before signing a deal, want to know what to pay an influencer, PlutoBa Score seems too low or too high, creator outreach templates aren't getting responses, unsure which PlutoBa plan fits your needs, or setting up PlutoBa for an agency with multiple brands. Do NOT use for influencer strategy across platforms (use /sales-influencer-marketing) or influencer discovery and search (use /sales-hypeauditor or /sales-modash).

🇺🇸|EnglishTranslated

Backend Developmentpatricio0312rev/skills

sql-query-optimizer

Analyzes and optimizes SQL queries using EXPLAIN plans, index recommendations, query rewrites, and performance benchmarking. Use for "query optimization", "slow queries", "database performance", or "EXPLAIN analysis".

🇺🇸|EnglishTranslated

AI & Machine Learningeyadsibai/ltk

nemo-evaluator

Use when evaluating LLMs, running benchmarks like MMLU/HumanEval/GSM8K, setting up evaluation pipelines, or asking about "NeMo Evaluator", "LLM benchmarking", "model evaluation", "MMLU", "HumanEval", "GSM8K", "benchmark harnesses"

🇺🇸|EnglishTranslated

AI & Machine Learningvanman2024/ai-dev-marketp...

chunking-strategies

Document chunking implementations and benchmarking tools for RAG pipelines including fixed-size, semantic, recursive, and sentence-based strategies. Use when implementing document processing, optimizing chunk sizes, comparing chunking approaches, benchmarking retrieval performance, or when user mentions chunking, text splitting, document segmentation, RAG optimization, or chunk evaluation.

🇺🇸|EnglishTranslated

8 scripts/Checked