Search Results: benchmarking

Found 117 Skills

AI & Machine Learningakillness/oh-my-skills

skill-autoresearch

Autonomously optimize an existing AI skill by running it repeatedly against binary evals, mutating one instruction at a time, and keeping only changes that improve pass rate. Based on Karpathy-style autoresearch, but applied to SKILL.md iteration instead of ML training. Use when optimizing a skill, benchmarking prompt quality, building evals for a skill, or running self-improvement loops on reusable agent instructions. Triggers on: skill-autoresearch, optimize this skill, improve this skill, benchmark this skill, eval my skill, run autoresearch on this skill, self-improve skill.

🇺🇸|EnglishTranslated

AI & Machine Learningvllm-project/vllm-skills

vllm-bench-serve

Benchmark vLLM or OpenAI-compatible serving endpoints using vllm bench serve. Supports multiple datasets (random, sharegpt, sonnet, HF), backends (openai, openai-chat, vllm-pooling, embeddings), throughput/latency testing with request-rate control, and result saving. Use when benchmarking LLM serving performance, measuring TTFT/TPOT, or load testing inference APIs.

🇺🇸|EnglishTranslated

Marketing & Growthsales-skills/sales

sales-customer-feedback

Customer feedback, NPS, CSAT, CES, Voice of Customer strategy across platforms — survey design, response rate optimization, closed-loop feedback, text analytics, benchmarking, program governance. Use when NPS scores are stagnant, survey response rates are low, feedback isn't driving action, unsure which CX metric to use, need to design a VoC program, comparing feedback tools (Medallia vs Qualtrics vs SurveyMonkey vs Typeform), or customers feel over-surveyed. Do NOT use for product review collection like Trustpilot or G2 (use /sales-customer-reviews) or in-app message surveys (use /sales-in-app-messaging).

🇺🇸|EnglishTranslated

AI & Machine Learningalpoxdev/hypercore

autoresearch-skill

[Hyper] Optimize an existing Codex skill through baseline-first experiments, binary evals, optional guards, and one-mutation-at-a-time iteration. Use for skill autoresearch, measured trigger/workflow improvement, self-optimizing a skill, benchmarking skill changes, or resuming skill experiment artifacts.

🇺🇸|EnglishTranslated

1 scripts/Checked

Tools & Utilitiesnimbleway/agent-skills

talent-sourcing

Finds qualified candidates for a role by searching LinkedIn, Indeed, GitHub, and other professional platforms using Nimble Web Search Agents. Accepts a job description, role title, or freeform request and returns a ranked candidate list with profiles, skills, and contact signals. Use this skill when the user wants to find, source, or recruit candidates for a role. Common triggers: "find candidates for", "source engineers in", "who can I hire for", "find me a [role]", "recruiting for", "talent search", "find a [role] in [city]", "build a candidate list", "sourcing for [role]", "who's available for", "find potential hires". Also triggers on a pasted job description followed by a sourcing request. Do NOT use for job market research or salary benchmarking — use market-finder instead. Do NOT use for researching a single known person — use company-deep-dive or meeting-prep instead.

🇺🇸|EnglishTranslated

AI & Machine Learningdaymade/claude-code-skill...

product-analysis

Multi-path parallel product analysis with cross-model test-time compute scaling. Spawns parallel agents (Claude Code agent teams + Codex CLI) to explore product from multiple perspectives, then synthesizes findings into actionable optimization plans. Can invoke competitors-analysis for competitive benchmarking. Use when "product audit", "self-review", "发布前审查", "产品分析", "analyze our product", "UX audit", or "信息架构审计".

🇺🇸|EnglishTranslated

Marketing & Growthspivx/agent-skills

gsc

Live Google Search Console analytics — fetches real SEO data (clicks, impressions, CTR, rankings) and delivers actionable insights with CTR benchmarking and opportunity detection. Zero dependencies. Use when the user asks about GSC, Google Search Console, SEO performance, search performance, keywords, rankings, organic traffic, top pages, top queries, "how is my site performing in Google", "check rankings", or "search console report".

🇺🇸|EnglishTranslated

2 scripts/Attention

Testing & QAyonatangross/orchestkit

testing-perf

Performance and load testing patterns — k6 load tests, Locust stress tests, pytest execution optimization (xdist parallel, plugins), test type classification, and performance benchmarking. Use when writing load tests, optimizing test execution speed, or setting up pytest infrastructure.

🇺🇸|EnglishTranslated

AI & Machine Learningyonatangross/orchestkit

bare-eval

Run isolated eval and grading calls using CC 2.1.81 --bare mode. Constructs claude -p --bare invocations for skill evaluation, trigger testing, and LLM grading without plugin/hook interference. Use when running eval pipelines, grading skill outputs, benchmarking prompt quality, or testing trigger accuracy in isolation.

🇺🇸|EnglishTranslated

AI & Machine Learningbbuf/sglang-auto-driven-s...

sglang-sota-performance

End-to-end SGLang SOTA performance workflow. Use when a user names an LLM model and wants SGLang to match or beat the best observed vLLM and TensorRT-LLM serving performance by searching each framework's best deployment command, benchmarking them fairly, profiling SGLang if it is slower, identifying kernel/overlap/fusion bottlenecks, patching SGLang code, and revalidating with real model runs.

🇺🇸|EnglishTranslated

Platform Servicessales-skills/sales

sales-arrfounder

Arrfounder platform help — founder revenue directory by @Folyd (2024) that auto-extracts MRR/ARR + products from Twitter/X bios via AI, lists 1000+ founders on sortable leaderboards (ARR / followers / products / recently added), free Airtable submission with 24-48h manual approval, auto-syncs within hours of bio changes. Social-proof verification only (no Stripe / Lemon Squeezy / Polar API integration) — built for peer discovery and community browsing, not acquisition-grade proof. Use when getting listed on Arrfounder, writing a Twitter/X bio that passes the MRR/ARR extractor, fixing a profile that didn't get approved or stopped updating after a bio edit, deciding Arrfounder vs TrustMRR or StartuPage for verified-revenue display, benchmarking against peers in the $1K-$10M+ ARR tiers, or using Arrfounder as a comp-check tool before pricing a sale or fundraise. Do NOT use for selling/buying a project or cross-marketplace valuation (use /sales-side-project-valuation).

🇺🇸|EnglishTranslated

Marketing & Growthsales-skills/sales

sales-social-listening

Social listening and brand monitoring strategy — monitoring, Boolean queries, sentiment, competitive intel, crisis detection, AI visibility monitoring, LLM brand mentions. Platform comparison (Meltwater, Brandwatch, Talkwalker, Brand24, Sprout Social, Mention, Hootsuite, BrandJet, Influencity), monitoring setup (keywords, sources, alerts), sentiment analysis, competitive benchmarking (share of voice), crisis detection (real-time alerts, escalation), consumer insights, and reporting. Use when you don't know what people are saying about your brand, competitors are getting mentioned more than you, negative sentiment is spiking and you need to understand why, you're missing PR crises until it's too late, you can't tell if your brand shows up in AI/LLM answers, or you need to pick the right social listening tool. Do NOT use for platform-specific config (use /sales-meltwater), influencer discovery (use /sales-influencer-marketing), social media publishing/scheduling, or SEO keyword research (use /sales-semrush).

🇺🇸|EnglishTranslated