Loading...
Loading...
Found 1,746 Skills
Comprehensive evaluation of potential stock investments combining valuation analysis, fundamental research, technical assessment, and clear buy/hold/sell recommendations. Use when the user asks about buying a stock, evaluating investment opportunities, analyzing watchlist candidates, or requests stock recommendations. Provides specific entry prices, position sizing, and conviction ratings.
Alibaba Cloud Governance Center evaluation report skill. Use for querying governance maturity check results, generating structured risk reports, and account compliance analysis. Triggers: "云治理", "成熟度检测", "合规检查", "安全风险", "治理检测", "governance evaluation", "maturity check", "compliance report", "risk report", "governance center".
Evaluates NVIDIA Cosmos Policy on LIBERO and RoboCasa simulation environments. Use when setting up cosmos-policy for robot manipulation evaluation, running headless GPU evaluations with EGL rendering, or profiling inference latency on cluster or local GPU machines.
This skill should be used when the user asks to "implement LLM-as-judge", "compare model outputs", "create evaluation rubrics", "mitigate evaluation bias", or mentions direct scoring, pairwise comparison, position bias, evaluation pipelines, or automated quality assessment. Part of the context engineering skill suite — also activates when the user mentions "context engineering" or "context-engineering" in the context of evaluating LLM output quality.
Evaluates LLMs across 100+ benchmarks from 18+ harnesses (MMLU, HumanEval, GSM8K, safety, VLM) with multi-backend execution. Use when needing scalable evaluation on local Docker, Slurm HPC, or cloud platforms. NVIDIA's enterprise-grade platform with container-first architecture for reproducible benchmarking.
Evaluates LLMs across 60+ academic benchmarks (MMLU, HumanEval, GSM8K, TruthfulQA, HellaSwag). Use when benchmarking model quality, comparing models, reporting academic results, or tracking training progress. Industry standard used by EleutherAI, HuggingFace, and major labs. Supports HuggingFace, vLLM, APIs.
Full browser UAT for web apps — Playwright testing with console/network error capture, accessibility checks, i18n validation, and bug triage. Use when running screen-by-screen UAT or testing specific features in any web or hybrid app (React, Vue, Angular, Ionic, Next.js, etc).
Evaluate options for a specific design decision node and recommend one with explicit trade-offs. Use when the design already exposes a concrete choice such as architecture style, state management approach, auth model, storage pattern, sync strategy, multi-agent coordination model, language or runtime, UI framework, data-layer library, or tooling selection. Trigger when the user needs structured comparison and recommendation for a bounded design decision. Do not use for broad design discovery, full-system decomposition, or final readiness review.
Prepare a research artifact package for conference artifact evaluation, reproducibility review, badges, supplementary material, or post-acceptance artifact release. Use this skill whenever the user needs install instructions, reviewer-facing reproduction commands, Docker or environment checks, data/checkpoint packaging, hardware/runtime estimates, anonymized or public artifact metadata, artifact evaluation forms, or a claim-to-artifact reproducibility audit for ML/AI venues.
Estimate the intrinsic value of a public company using DCF, relative (peer multiple) and sum-of-parts (SOTP) methods, then triangulate to an implied share price with upside/downside versus the current market price. Use this skill whenever the user asks: "what is AAPL worth", "valuation of NVDA", "fair value of TSLA", "intrinsic value", "DCF for MSFT", "build a DCF", "discounted cash flow", "WACC", "terminal value", "implied share price", "upside to fair value", "is X overvalued/undervalued", "relative valuation", "peer comparison valuation", "EV/EBITDA target", "SOTP", "sum of the parts", "how much is [company] worth", "price target from fundamentals", "value this company", or any ticker in the context of computing intrinsic or relative valuation. Default to running ALL three methods (DCF + relative + SOTP-if-applicable) and presenting a blended implied price with a sensitivity table. Do not answer valuation questions from memory — always run the workflow.
LLM prompt testing, evaluation, and CI/CD quality gates using Promptfoo. Invoke when: - Setting up prompt evaluation or regression testing - Integrating LLM testing into CI/CD pipelines - Configuring security testing (red teaming, jailbreaks) - Comparing prompt or model performance - Building evaluation suites for RAG, factuality, or safety Keywords: promptfoo, llm evaluation, prompt testing, red team, CI/CD, regression testing
Evaluate and improve Claude Code commands, skills, and agents. Use when testing prompt effectiveness, validating context engineering choices, or measuring improvement quality.