Loading...
Loading...
Claude Code skill (trtllm-agent-toolkit): implement or extend TensorRT-LLM AutoDeploy fusion transforms under transform/library/ in a TensorRT-LLM checkout. Prefer existing kernels and custom ops; use Triton only when no viable existing-kernel path exists. Use ad-graph-dump for AD_DUMP_GRAPHS_DIR workflows. Covers TRT-LLM paths, registry, default.yaml registration, graph validation, tests, and a review checklist — without prescribing profiling tools or throughput targets.
npx skill4agent add nvidia/skills ad-add-fusion-transformationtensorrt_llm/...examples/auto_deploy/...tests/...README.mdtrtllm-agent-toolkit:trtllm-agent-toolkit:ad-add-fusion-transformation| Skill | Use it for |
|---|---|
| ad-graph-dump | Enabling |
| trtllm-codebase-exploration | Mapping existing transforms, custom ops, and search patterns before writing a pass. |
| trtllm-code-contribution | TensorRT-LLM pre-commit, tests, DCO sign-off, and PR expectations. |
| triton-kernel-writing | Implementing a Triton op only after existing-kernel lookup fails. |
| triton-tileir-optimization | Tuning existing Triton kernels for the TileIR backend when that path applies. |
AD_DUMP_GRAPHS_DIRdefault.yamlfoundnot_founduse_existing_kernelneeds_triton_fallbackdeferCandidate: <short-name>
Affected graph pattern: <pattern>
Existing kernel lookup: <found|not_found>
Evidence: <path/symbol>
Recommendation: <use_existing_kernel|needs_triton_fallback|defer>
Strengths / weaknesses / risks:
- ...deferdefault.yamlexisting_kernel_pathtriton_fallback_path[SUMMARY] matches=...transform/library/custom_ops/torch.ops.auto_deploy.*tensorrt_llm/_torch/auto_deploy/transform/library/tensorrt_llm/_torch/auto_deploy/transform/interface.pytensorrt_llm/_torch/auto_deploy/config/default.yamltensorrt_llm/_torch/auto_deploy/utils/graph_writer.pytensorrt_llm/_torch/auto_deploy/utils/node_utils.pytensorrt_llm/_torch/auto_deploy/utils/_graph.pytensorrt_llm/_torch/auto_deploy/custom_ops/tests/unittest/auto_deploy/singlegpu/transformations/library/tests/integration/defs/accuracy/test_llm_api_autodeploy.pytransform/library/@TransformRegistry.register("my_transform_key")
class MyTransform(BaseTransform):
@classmethod
def get_config_class(cls):
return MyTransformConfigdefault.yamltransforms:tensorrt_llm/_torch/auto_deploy/config/default.yamlenabled: falseexamples/auto_deploy/model_registry/configs/torch.ops.auto_deploynode.metatransform/library/custom_ops/torch.ops.auto_deploy.*[SUMMARY] matches=<n>skippeddisabledtests/unittest/auto_deploy/singlegpu/transformations/library/default.yamlmatchesskippedCandidate: <name>
Path: <existing_kernel_path|triton_fallback_path|other>
Rationale:
- ...
Graph validation: <pass|fail — what files / ops>
Summary logs: <matches before / after>
Tests: <what ran>
Open risks:
- ...