Graph coloring assigns colors to the vertices of a graph such that no two adjacent vertices get the same color. It's a key building block in many applications. Using a minimal number of colors is often only part of the problem. The solution also needs to be computed quickly. Several parallel implementations exist, but they might suffer from low parallelism depending on the input graph. Our approach increases the parallelism without affecting the coloring quality, and we'll describe an efficient CUDA implementation thereof. On 18 test graphs, our technique yields an average of 3.4x more parallelism. Our CUDA code running on a Titan V is 2.9x faster on average and uses as few or fewer colors as the best GPU codes from the literature.