Loading...
Loading...
Found 39 Skills
Apply CUDA Graphs to PyTorch workloads — API selection (torch.compile, PyTorch make_graphed_callables, TE make_graphed_callables, MCore CudaGraphManager, FullCudaGraphWrapper, manual torch.cuda.graph), code compatibility, capture workflows, dynamic pattern handling, and troubleshooting. Triggers: CUDA graph, torch.cuda.graph, make_graphed_callables, reduce-overhead, graph capture, graph replay, kernel launch overhead, CudaGraphManager, FullCudaGraphWrapper, full-iteration graph, stream capture.
OCR alternativo a PaddleOCR, excelente en caracteres especiales y múltiples scripts
Refactor PyTorch code to improve maintainability, readability, and adherence to best practices. Identifies and fixes DRY violations, long functions, deep nesting, SRP violations, and opportunities for modular components. Applies PyTorch 2.x patterns including torch.compile optimization, Automatic Mixed Precision (AMP), optimized DataLoader configuration, modular nn.Module design, gradient checkpointing, CUDA memory management, PyTorch Lightning integration, custom Dataset classes, model factory patterns, weight initialization, and reproducibility patterns.