Accelerating Compact Fractals with Tensor Core GPUs
This work presents a GPU thread mapping approach that allows doing fast parallel stencil-like computations on discrete fractals using their compact representation. The intuition behind is to employ two GPU tensor-core accelerated thread maps, λ(ω) and ν(ω), which act as threadspace-to-dataspace and dataspace-to-threadspace functions, respectively. By combining these maps, threads can access compact space and interact with their neighbors. The cost of the maps is 𝒪(loglog(n)) time, with n being the side of a n × n embedding for the fractal in its expanded form. The technique works on any fractal that belongs to the Non-overlapping-Bounding-Boxes (NBB) class of discrete fractals, and can be extended to three dimensions as well. Results using an A100 GPU on the Sierpinski Triangle as a case study show up to ∼11× of speedup and a memory usage reduction of 234× with respect to a Bounding Box approach. These results show that the proposed compact approach can allow the scientific community to tackle larger problems that did not fit in GPU memory before, and run even faster than a bounding box approach.
READ FULL TEXT