We'll discuss WarpDrive a high-speed, scalable, multi-GPU implementation for hashing billions of key-value pairs. Hash maps are among the most versatile data structures because of their compact data layout and expected constant time complexity for insertion and querying. CUDA-enabled GPUs can speedup hashing by virtue of their fast video memory featuring almost one terabytes per second bandwidth in comparison to state-of-the-art CPUs. However, the size of hash maps supported by single-GPU hashing implementations is restricted by the limited amount of available video RAM. We propose a novel subwarp/coalesced group-based probing scheme featuring coalesced memory access over consecutive memory regions in order to mitigate the high latency of irregular access patterns. Our implementation achieves around 1.3 billion insertions per second in single-GPU mode for a load factor of 0.95, clearly outperforming other implementations. We'll also present transparent scaling to multiple GPUs within the same node with over 4.5 billion operations per second for high load factors on four Tesla P100 GPUs connected by NVLink technology. WarpDrive is freely available at https://github.com/sleeepyjack/warpdrive.