Search Results: benchmarking

Found 117 Skills

sales-plutoba

PlutoBa platform help — AI influencer vetting and creator due diligence across TikTok, Instagram, and YouTube. Covers PlutoBa Score (7-dimension assessment), Deep Assessments (100+ posts, 300+ comments), fake follower detection, audience authenticity, brand safety risk scoring, rate benchmarking, AI-powered creator outreach, creator CRM, and campaign briefs. Use when worried an influencer's followers are fake, need to check if a creator is brand-safe before signing a deal, want to know what to pay an influencer, PlutoBa Score seems too low or too high, creator outreach templates aren't getting responses, unsure which PlutoBa plan fits your needs, or setting up PlutoBa for an agency with multiple brands. Do NOT use for influencer strategy across platforms (use /sales-influencer-marketing) or influencer discovery and search (use /sales-hypeauditor or /sales-modash).

🇺🇸|EnglishTranslated

AI & Machine Learningvuralserhat86/antigravity...

llm_evaluation

Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or establishing evaluation frameworks.

🇺🇸|EnglishTranslated

Code Qualityabsolutelyskilled/absolut...

performance-engineering

Use this skill when profiling application performance, debugging memory leaks, optimizing latency, benchmarking code, or reducing resource consumption. Triggers on CPU profiling, memory profiling, flame graphs, garbage collection tuning, load testing, P99 latency, throughput optimization, bundle size reduction, and any task requiring performance analysis or optimization.

🇺🇸|EnglishTranslated

AI & Machine Learningerichowens/some_claude_sk...

skill-creator

Guide for creating, improving, benchmarking, and packaging Claude Agent Skills (SKILL.md files). Invoke when users want to create a skill from scratch, improve or test an existing skill, benchmark skill performance with variance analysis, or optimize a skill description for triggering accuracy. Also invoke when users say "turn this into a skill", "make a skill for X", "help me write a SKILL.md", "my skill isn't firing correctly", or want to convert a workflow/conversation into a reusable skill. Invoke proactively when a conversation has produced a repeatable workflow worth capturing. If the user mentions SKILL.md, skill files, skill descriptions, or skill triggering, this skill applies.

🇺🇸|EnglishTranslated

10 scripts/Attention

AI & Machine Learningvanman2024/ai-dev-marketp...

chunking-strategies

Document chunking implementations and benchmarking tools for RAG pipelines including fixed-size, semantic, recursive, and sentence-based strategies. Use when implementing document processing, optimizing chunk sizes, comparing chunking approaches, benchmarking retrieval performance, or when user mentions chunking, text splitting, document segmentation, RAG optimization, or chunk evaluation.

🇺🇸|EnglishTranslated

8 scripts/Checked

Marketing & Growthanysiteio/agent-skills

anysite-content-analytics

Track and analyze content performance across Instagram, YouTube, LinkedIn, Twitter/X, and Reddit using anysite MCP server. Measure engagement metrics, analyze post effectiveness, benchmark content strategy, identify top-performing content, and optimize posting strategies. Supports post performance tracking, engagement analysis, content type comparison, and competitive benchmarking. Use when users need to measure content ROI, optimize social strategy, identify viral content patterns, or analyze content engagement across platforms.

🇺🇸|EnglishTranslated

Backend Developmenttursodatabase/turso

memory-benchmark

How to benchmark and analyze memory usage in Turso using the memory-benchmark crate and dhat heap profiler. Use this skill whenever the user mentions memory usage, memory profiling, allocation tracking, heap analysis, memory regression, memory benchmarking, dhat, or wants to understand where memory is being allocated during SQL workloads. Also use when investigating memory growth in WAL or MVCC mode. IMPORTANT - If you modify the perf/memory crate (add profiles, change CLI flags, change output format, etc.), update this skill document to reflect those changes so it stays accurate for future agents.

🇺🇸|EnglishTranslated

AI & Machine Learningdavila7/claude-code-templ...

cirq

Quantum computing framework for building, simulating, optimizing, and executing quantum circuits. Use this skill when working with quantum algorithms, quantum circuit design, quantum simulation (noiseless or noisy), running on quantum hardware (Google, IonQ, AQT, Pasqal), circuit optimization and compilation, noise modeling and characterization, or quantum experiments and benchmarking (VQE, QAOA, QPE, randomized benchmarking).

🇺🇸|EnglishTranslated

Frontend Developmentbbeierle12/skill-mcp-clau...

performance-at-scale

Spatial indexing and world streaming for Three.js building games with thousands of pieces. Use when optimizing building games, implementing spatial queries, chunk loading, or profiling performance. Includes spatial hash grids, octrees, chunk managers, and benchmarking tools.

🇺🇸|EnglishTranslated

4 scripts/Checked

AI & Machine Learninggithub/awesome-copilot

eval-driven-dev

Instrument Python LLM apps, build golden datasets, write eval-based tests, run them, and root-cause failures — covering the full eval-driven development cycle. Make sure to use this skill whenever a user is developing, testing, QA-ing, evaluating, or benchmarking a Python project that calls an LLM, even if they don't say "evals" explicitly. Use for making sure an AI app works correctly, catching regressions after prompt changes, debugging why an agent started behaving differently, or validating output quality before shipping.

🇺🇸|EnglishTranslated

AI & Machine Learningdavila7/claude-code-templ...

evaluating-code-models

Evaluates code generation models across HumanEval, MBPP, MultiPL-E, and 15+ benchmarks with pass@k metrics. Use when benchmarking code models, comparing coding abilities, testing multi-language support, or measuring code generation quality. Industry standard from BigCode Project used by HuggingFace leaderboards.

🇺🇸|EnglishTranslated

AI & Machine Learningrysweet/amplihack

model-evaluation-benchmark

Automated reproduction of comprehensive model evaluation benchmarks following the Benchmark Suite V3. Auto-activates for model benchmarking, comparison evaluation, or performance testing between AI models.

🇺🇸|EnglishTranslated