Search Results: autodeploy

Found 6 Skills

ad-graph-dump

Enable and interpret TensorRT-LLM AutoDeploy FX graph text dumps via AD_DUMP_GRAPHS_DIR. Use when you need before/after graphs per transform, to locate subgraphs, or to confirm a rewrite ran. Paths and behavior are grounded in tensorrt_llm/_torch/auto_deploy (GraphWriter, BaseTransform). Complements ad-add-fusion-transformation.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

ad-layer-visualizer

Visualize a specific transformer decoder layer from an AutoDeploy FX graph text dump as a hierarchical DOT/PNG diagram. Optionally annotate nodes with actual GPU kernel names and durations from an nsys trace. Use when the user wants to visualize, inspect, or debug a layer in an AutoDeploy model graph dump. Triggers on: "visualize layer", "show layer", "graph of layer", "layer visualization", "dump graph layer". Assumes graph dumps already exist in a directory (produced by AD_DUMP_GRAPHS_DIR).

🇺🇸|EnglishTranslated

2 scripts/Checked

AI & Machine Learningnvidia/skills

ad-conf-check

Check whether AutoDeploy YAML configs were actually applied by analyzing server logs and optionally graph dumps (AD_DUMP_GRAPHS_DIR). Use when the user wants to verify config application, debug config issues, or check if AutoDeploy transforms (piecewise CUDA graph, multi-stream, sharding, fusion, etc.) were applied or fell back. Triggers on: "check config", "verify config", "ad-conf-check", "were my configs applied", "config not working", "check if piecewise is enabled", "check log for config", or any request to compare AD YAML settings against runtime behavior.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningnvidia/skills

ad-accuracy-debug

Debug AutoDeploy accuracy regressions vs a reference score (PyTorch backend or published baseline). Use when an AutoDeploy model's eval score is significantly below the reference and the root cause is unknown.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

ad-model-onboard

Translates a HuggingFace model into a prefill-only AutoDeploy custom model using reference custom ops, validates with hierarchical equivalence tests.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

ad-add-fusion-transformation

Claude Code skill (trtllm-agent-toolkit): implement or extend TensorRT-LLM AutoDeploy fusion transforms under transform/library/ in a TensorRT-LLM checkout. Prefer existing kernels and custom ops; use Triton only when no viable existing-kernel path exists. Use ad-graph-dump for AD_DUMP_GRAPHS_DIR workflows. Covers TRT-LLM paths, registry, default.yaml registration, graph validation, tests, and a review checklist — without prescribing profiling tools or throughput targets.

🇺🇸|EnglishTranslated