GPU Implementation and Optimization of a Flexible MAP Decoder for Synchronization Correction
In this paper we present an optimized parallel implementation of a flexible MAP decoder for synchronization error correcting codes, supporting a very wide range of code sizes and channel conditions. On mid-range GPUs we demonstrate decoding speedups of more than two orders of magnitude over a CPU implementation of the same optimized algorithm, and more than an order of magnitude over our earlier GPU implementation. The prominent challenge is to maintain high parallelization efficiency over a wide range of code sizes and channel conditions, and different execution hardware. We ensure this with a dynamic strategy for choosing parallel execution parameters at run-time. We also present a variant that trades off some decoding speed for significantly reduced memory requirement, with no loss to the decoder's error correction performance. The increased throughput of our implementation and its ability to work with less memory allow us to analyse larger codes and poorer channel conditions, and makes practical use of such codes more feasible.
READ FULL TEXT