Search Results: benchmarking

Found 147 Skills

Testing & QAsablier-labs/plugin-marke...

foundry

Write Foundry-based tests and scripts. Trigger phrases - foundry testing, write test, fuzz test, fork test, invariant test, deploy script, gas benchmark, coverage, or when working in tests/ or scripts/ directories.

🇺🇸|EnglishTranslated

Testing & QAjeremylongshore/claude-co...

thread-dump-analyzer

Thread Dump Analyzer - Auto-activating skill for Performance Testing. Triggers on: thread dump analyzer, thread dump analyzer Part of the Performance Testing skill category.

🇺🇸|EnglishTranslated

AI & Machine Learninglibukai/awesome-agent-ski...

skill-creator-pro

Create new skills, modify and improve existing skills, and measure skill performance. Enhanced version with quick commands. Use when users want to create a skill from scratch, update or optimize an existing skill, run evals to test a skill, benchmark skill performance with variance analysis, or optimize a skill's description for better triggering accuracy. Triggers on phrases like "make a skill", "create a new skill", "build a skill for", "improve this skill", "optimize my skill", "test my skill", "turn this into a skill", "skill description optimization", or "help me create a skill".

🇺🇸|EnglishTranslated

10 scripts/Attention

Marketing & Growthasgard-ai-platform/skills

algo-social-engagement

Calculate and benchmark social media engagement rates across platforms and variants. Use this skill when the user needs to compute engagement metrics, compare performance across accounts or posts, or set engagement benchmarks — even if they say 'what is my engagement rate', 'benchmark engagement', or 'social media KPIs'.

🇺🇸|EnglishTranslated

AI & Machine Learningpascalorg/skills

agent-collaboration

Multi-model agent orchestration using specialized agents for planning, coding, research, math/science, visual analysis, and adversarial review. Use when tasks are complex enough to benefit from different models' strengths, when you want adversarial review to catch blind spots, or when coordinating multi-step workflows across agent roles. Triggers on complex projects, multi-step tasks, architecture decisions, or when explicitly requested.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningexploreomni/omni-agent-sk...

omni-ai-eval

Evaluate Omni AI query generation accuracy by running test prompts through the Omni CLI, comparing generated query JSON against expected results, and scoring accuracy. Use this skill whenever someone wants to evaluate Omni AI, benchmark Blobby, run regression tests, compare AI output across branches or configurations, test prompt variations, measure AI quality, run A/B tests on model changes, assess impact of context changes, or any variant of "run evals", "test Blobby", "benchmark query generation", "compare AI results", "regression test", "how accurate is the AI", or "measure the impact of my changes".

🇺🇸|EnglishTranslated

AI & Machine Learninggarrytan/gstack

benchmark-models

Cross-model benchmark for gstack skills. Runs the same prompt through Claude, GPT (via Codex CLI), and Gemini side-by-side — compares latency, tokens, cost, and optionally quality via LLM judge. Answers "which model is actually best for this skill?" with data instead of vibes. Separate from /benchmark, which measures web page performance. Use when: "benchmark models", "compare models", "which model is best for X", "cross-model comparison", "model shootout". (gstack) Voice triggers (speech-to-text aliases): "compare models", "model shootout", "which model is best".

🇺🇸|EnglishTranslated

AI & Machine Learningbbuf/sglang-auto-driven-s...

vllm-sota-humanize-loop

Run an autonomous Humanize-governed vLLM SOTA performance loop for one LLM model: first perform the fixed fair vLLM/SGLang/TensorRT-LLM deployment search and benchmark, then start one RLCR loop that repeatedly decides the gap, profiles the current bottleneck, runs layer/kernel pipeline analysis, patches vLLM code, optionally uses ncu-report-skill for kernel evidence, and revalidates until vLLM matches or beats the best observed framework under the same workload and SLA.

🇺🇸|EnglishTranslated

Marketing & Growthindranilbanerjee/digital-...

performance-check

Pull live marketing metrics for a performance snapshot: KPIs vs targets, trend comparison, and cross-platform overview. Use when checking current marketing performance, monitoring KPI health, comparing to benchmarks, or getting a quick status update across analytics platforms.

🇺🇸|EnglishTranslated

Testing & QAjeremylongshore/claude-co...

k6-script-generator

K6 Script Generator - Auto-activating skill for Performance Testing. Triggers on: k6 script generator, k6 script generator Part of the Performance Testing skill category.

🇺🇸|EnglishTranslated

Testing & QAnovotnyllc/dotnet-artisan

dotnet-testing

Defines .NET test strategy, xUnit v3, integration/E2E, snapshots (Verify), Playwright, benchmarks, and quality gates.

🇺🇸|EnglishTranslated

AI & Machine Learningruvnet/ruflo

agent-benchmark-suite

Agent skill for benchmark-suite - invoke with $agent-benchmark-suite

🇺🇸|EnglishTranslated