Writing Fast Custom Operations in PyTorch With No Parallel Programming - nvFuser
, Director of the PyTorch Team, NVIDIA
PyTorch users should be able to write interesting, novel operations without becoming parallel programming experts. However, frameworks rely on libraries of fast but complex parallel programming implementations. Python-based interfaces provide flexibility to define novel operations, but until now it has come at the cost of compute speed. nvFuser can provide the compute speed of expert-written CUDA on novel operations written directly in the Python PyTorch interface. Learn how you can write fast novel activation and normalization functions in PyTorch today with no parallel programming experience.