Search Results: uat

Found 1,942 Skills

Security & Compliancejeredblu/eval-marketplace

mcp-evaluator

Comprehensive security and privacy evaluation system for MCP (Model Context Protocol) servers. Use when users provide GitHub URLs to MCP servers and request security assessment, privacy evaluation, or ask "is this MCP safe to use." Evaluates security vulnerabilities, privacy risks, code quality, community feedback, and provides actionable recommendations with risk scoring.

🇺🇸|EnglishTranslated

AI & Machine Learningawslabs/agent-plugins

model-evaluation

Generates a Jupyter notebook that evaluates a fine-tuned SageMaker model using LLM-as-a-Judge. Use when the user says "evaluate my model", "how did my model perform", "compare models", or after a training job completes. Supports built-in and custom evaluation metrics, evaluation dataset setup, and judge model selection.

🇺🇸|EnglishTranslated

2 scripts/Checked

AI & Machine Learningskillcreatorai/ai-agent-s...

browse-and-evaluate

Use when exploring the ai-agent-skills catalog to find, compare, and evaluate skills before installing. Always use --fields to limit output size and --dry-run before committing to an install.

🇺🇸|EnglishTranslated

AI & Machine Learningkiterlin/intelligent-dete...

nemo-evaluator-sdk

Evaluates LLMs across 100+ benchmarks from 18+ harnesses (MMLU, HumanEval, GSM8K, safety, VLM) with multi-backend execution. Use when needing scalable evaluation on local Docker, Slurm HPC, or cloud platforms. NVIDIA's enterprise-grade platform with container-first architecture for reproducible benchmarking.

🇺🇸|EnglishTranslated

Tools & Utilitiessharadchaturveda-coder/ag...

agency-tool-evaluator

Expert technology assessment specialist focused on evaluating, testing, and recommending tools, software, and platforms for business use and productivity optimization

🇺🇸|EnglishTranslated

AI & Machine Learninga-green-hand-jack/ml-rese...

artifact-evaluation-prep

Prepare a research artifact package for conference artifact evaluation, reproducibility review, badges, supplementary material, or post-acceptance artifact release. Use this skill whenever the user needs install instructions, reviewer-facing reproduction commands, Docker or environment checks, data/checkpoint packaging, hardware/runtime estimates, anonymized or public artifact metadata, artifact evaluation forms, or a claim-to-artifact reproducibility audit for ML/AI venues.

🇺🇸|EnglishTranslated

Data Processinglongbridge/skills

longbridge-industry-valuation

Industry valuation comparison and distribution analysis via Longbridge — cross-peer valuation matrix (PE / PB / PS / dividend yield), industry-percentile ranking, and industry premium / discount for a single stock. Triggers: "行业估值", "行业溢价", "行业折价", "行业对比", "行业百分位", "同行业估值", "板块估值", "行业贵不贵", "行業估值", "行業溢價", "行業折價", "行業對比", "行業百分位", "板塊估值", "industry valuation", "sector valuation", "industry premium", "industry percentile", "peer valuation", "sector PE", "TSLA.US industry valuation", "700.HK sector comparison".

🇺🇸|EnglishTranslated

Tools & Utilitieslongbridge/skills

longbridge-valuation-rank

Industry valuation rank time series for a single stock via Longbridge — tracks how a stock's PE / PB / PS / dividend-yield rank within its sector has changed over time (rank N of total M). Answers "is my stock becoming relatively cheaper or more expensive vs peers?" Complements longbridge-valuation (single-stock percentile history) and longbridge-industry-valuation (current sector snapshot). Triggers: "行业排名变化", "估值排名", "PE排名历史", "行业估值位置", "排名走势", "估值相对同业", "行業排名變化", "估值排名", "PE排名歷史", "行業估值位置", "排名走勢", "valuation rank", "industry rank history", "PE rank trend", "relative valuation rank", "sector ranking over time", "how does AAPL rank in industry PE".

🇺🇸|EnglishTranslated

AI & Machine Learningrefoundai/lenny-skills

evaluating-new-technology

Help users evaluate emerging technologies. Use when someone is assessing new tools, making build vs buy decisions, evaluating AI vendors, or deciding on technical architecture.

🇺🇸|EnglishTranslated

Data Processingfatfingererr/macro-skills

usd-reserve-loss-gold-revaluation

Under the assumption that the US dollar or a certain currency loses its reserve status and gold becomes the only anchor, deduce the 'implied gold price that the balance sheet can withstand' by dividing central bank monetary liabilities by gold reserves, and output the leverage level, gap and ranking of each country or currency.

🇨🇳|ChineseTranslated

2 scripts/Checked

AI & Machine Learninglangchain-ai/langchain-sk...

langsmith-evaluator

Use this skill for ANY question about CREATING evaluators. Covers creating custom metrics, LLM as Judge evaluators, code-based evaluators, and uploading evaluation logic to LangSmith. Includes basic usage of evaluators to run evaluations.

🇺🇸|EnglishTranslated

1 scripts/Attention

AI & Machine Learningeyadsibai/ltk

nemo-evaluator

Use when evaluating LLMs, running benchmarks like MMLU/HumanEval/GSM8K, setting up evaluation pipelines, or asking about "NeMo Evaluator", "LLM benchmarking", "model evaluation", "MMLU", "HumanEval", "GSM8K", "benchmark harnesses"

🇺🇸|EnglishTranslated