On the Performance of GPU-Accelerated Meshfree Solvers in Fortran, C++, Python, and Julia
, BITS Pilani - Hyderabad Campus
We present the development of GPU-accelerated meshfree solvers based on the Least Squares Kinetic Upwind Method for inviscid compressible flows. The GPU solvers are written in both traditional (Fortran 90 and C++) and modern (Python and Julia) programming languages. To assess the computational efficiency and to compare the relative performance of the solvers, benchmark calculations are performed to compare the rate of data processing (RDP) values of the GPU codes on several levels of point distribution. Here, RDP is a cost metric, defined as the wall clock time in seconds per iteration per point. We'll present a detailed investigation of the important global kernels to analyze overall performance. Apart from RDP, we'll show an analysis of other performance metrics such as utilization of streaming multiprocessors and device memory, occupancy, and arithmetic intensity. Finally we'll assess the performance of the solvers on V100 and A100 GPUs.