All Skills

Total 50,537 skills, AI & Machine Learning has 8483 skills

Showing 12 of 8483 skills

Per page

Downloads

Sort

AI & Machine Learningdatadog-labs/agent-skills

llm-obs-experiment-py-bootstrap

Generates a self-contained Python experiment client that uses the ddtrace.llmobs SDK. Emits either a runnable .py script or a Jupyter .ipynb notebook matching the canonical DataDog reference notebook style. Use when the user says "generate Python experiment", "write an SDK experiment", "create a ddtrace experiment", "Python notebook experiment", "use the LLM Obs SDK", or has `ddtrace` installed and wants idiomatic SDK code.

🇺🇸|EnglishTranslated

AI & Machine Learningdatadog-labs/agent-skills

llm-obs-eval-pipeline

End-to-end pipeline from unlabeled ml_app traces to a bootstrapped evaluator suite. Runs trace classification → root cause analysis → eval bootstrap in sequence with user checkpoints. Use when user says "run the eval pipeline", "go from traces to evals", "bootstrap evals end to end", "classify then RCA then bootstrap", "build an eval set from scratch", or wants a guided walkthrough from production data to evaluator code.

🇺🇸|EnglishTranslated

AI & Machine Learningwind-information-co-ltd/w...

wind-alice

A CLI tool that calls the Wind Alice Agent (A2A protocol, SSE streaming) to execute specified Skills and obtain analysis results. It applies to scenarios where users explicitly request actions like "run a certain Skill with Alice", "generate a research question list for a company", "create a one-page investment memo", "verify a piece of financial information", etc., that involve Alice sub-Skills.

🇨🇳|ChineseTranslated

5 scripts/Attention

AI & Machine Learningnvidia/skills

adding-cutile-kernel

Add a new cuTile GPU kernel operator to TileGym. Covers dispatch registration in ops.py, cuTile backend implementation, __init__.py exports, test creation, and benchmark in tests/benchmark. Use when adding, creating, or implementing a new cuTile operator/kernel in TileGym, or when asking how to register a new cuTile op.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

perf-nsight-systems

Nsight Systems (nsys) CLI for system-level timeline profiling. Use when the user wants to run nsys profile, analyze .nsys-rep reports, use nsys stats/analyze/recipe commands, diagnose GPU idle time from timeline traces, or profile distributed training with NCCL overlap analysis. NOT for kernel-level metrics like SOL%, occupancy, or roofline (use perf-nsight-compute-analysis for ncu). NOT for writing or generating kernels. NOT for applying optimizations like CUDA Graphs.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

video-search

Search video archives using natural language — find events, objects, actions, and people across recorded video using fusion search (Cosmos Embed1 semantic search + CV attribute search). Use when asked to search for something in video, find actions and events, locate objects and people, or query video archives. For these types of questions, default to this top-level fusion search unless user specifies otherwise. Requires the search profile to be deployed.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

ad-accuracy-debug

Debug AutoDeploy accuracy regressions vs a reference score (PyTorch backend or published baseline). Use when an AutoDeploy model's eval score is significantly below the reference and the root cause is unknown.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

cutile-python

Expert cuTile programming assistant. Write high-performance GPU kernels using cuTile's tile-based programming model with proper validation and optimization. Supports deep agent orchestration for complex multi-kernel tasks.

🇺🇸|EnglishTranslated

11 scripts/Attention

AI & Machine Learningnvidia/skills

multi-node-slurm

Convert single-node scripts to multi-node Slurm sbatch jobs and debug common multi-node failures. Covers srun-native vs uv run torch.distributed approaches, container setup, NCCL timeouts, OOM sizing for MoE models, and interactive allocation.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

config-conventions

Configuration conventions for NeMo-RL. YAML is the single source of truth for defaults. Covers TypedDict usage, exemplar YAML updates, and forbidden default patterns.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

exec-local-compile

Compile TensorRT-LLM on a compute node inside a Docker container. Use this when already on a compute node with GPUs visible.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

perf-host-analysis

Analyze host/CPU overhead in TensorRT-LLM inference from nsys traces. Detect whether host overhead is the bottleneck using GPU idle ratio, host prep exposed ratio, and per-phase evidence. For regressions, isolate forward steps via allreduce/NVTX patterns, compare host operation breakdowns across versions, and identify scheduling or request-management overhead. Supports optional inter-kernel gap, eager-vs-graph, pattern mapping, and multi-rank straggler drill-down. Use standalone or within perf-analysis. Triggers: host overhead, inter-step gap, scheduling overhead, forward step isolation, nsys iteration analysis, NVTX breakdown, request management overhead, GPU idle, host bottleneck, host prep exposed, inter-kernel gap, bubble analysis, graph coverage, eager kernel, rank imbalance, straggler detection.

🇺🇸|EnglishTranslated

2 scripts/Attention