Automatic Generation of 1D Recursive Filter Code for GPUs
, Texas State University
Learn how to automatically generate 1D recursive filter code for GPUs using PLR, a domain-specific compiler. It only requires the filter coefficients as input and emits high-performance CUDA code. Later result values depend on earlier result values in digital filters, making it a challenge to compute them in parallel. We'll present the new work and space efficient algorithm PLR uses to implement digital filters and other linear recurrences, and we explain how it automatically parallelizes and optimizes the GPU code. Our evaluation shows that, for single-stage IIR filters, the generated code reaches the throughput of memory copy for large inputs, which cannot be surpassed. On other digital filters, the automatically parallelized code outperforms the fastest prior implementations.