VDI: GRID vPC tested on a server with 2x Intel Xeon Gold 6148 (20c, 2.4 GHz), GRID vPC (using 1B profiles) with 4x T4 GPUs supports 64 user VMs, VMware ESXi 6.7, NVIDIA vGPU Software (410.91/412.16), Windows 10 (1803), 2 vCPUs, 4GB RAM, Resolution 1920x1080 resolution, Single Monitor, VMware Horizon 7.6 User experience was measured using an NVIDIA internal benchmarking tool which measured remoted frames running office productivity applications such as Microsoft PowerPoint, Word, Excel, Chrome, PDF viewing and video playback.
Machine Learning: CPU nodes (61 GB of memory, 8 vCPUs, 64-bit platform), Apache Spark. 200 GB CSV dataset; Data preparation includes joins, variable transformations. GPU Server Config: Dual-Socket Xeon E5-2698 v4@3.6GHz, 20 T4 GPUs on 5 nodes, each with 4x T4 GPUs. All run on InfiniBand network, CPU data for XGBoost and Data Conversion steps are estimated based on measured data for 20 CPU nodes, and reducing execution time by 60% to normalize for training on smaller data set on T4.
Training: Workload: ResNet-50; CPU Server: Dual-Socket Xeon E5-2698 v4@3.6GHz, 512GB System Memory; GPU Server: Dual Xeon E5-2698 v4@3.6GHz with 4x T4 GPUs; Framework: MXNet v1.4; Mixed Precision, CUDA 10.1.105; NCCL 2.4.3, cuDNN 7.5.0.56; cuBLAS 10.1.105; DALI 0.7.0; NVIDIA Driver: 410.104; Batch size: T4: 208 | Dual-Socket CPU Server: 256
Inference: Workload: VGG-19; CPU Server: Dual-Socket Xeon Gold 6140@2.30GHz using OpenVINO Toolkit; GPU Servers: 4x T4 GPUs; TensorRT 5.1 GA; CUDA 10.1.105;
NCCL 2.4.3, cuDNN 7.5.0.56, cuBLAS 10.1.105; NVIDIA Driver 410.104; Batch-size: 128