A Digital Hardware Fast Algorithm and FPGA-based Prototype for a Novel 16-point Approximate DCT for Image Compression Applications
The discrete cosine transform (DCT) is the key step in many image and video coding standards. The 8-point DCT is an important special case, possessing several low-complexity approximations widely investigated. However, 16-point DCT transform has energy compaction advantages. In this sense, this paper presents a new 16-point DCT approximation with null multiplicative complexity. The proposed transform matrix is orthogonal and contains only zeros and ones. The proposed transform outperforms the well-know Walsh-Hadamard transform and the current state-of-the-art 16-point approximation. A fast algorithm for the proposed transform is also introduced. This fast algorithm is experimentally validated using hardware implementations that are physically realized and verified on a 40 nm CMOS Xilinx Virtex-6 XC6VLX240T FPGA chip for a maximum clock rate of 342 MHz. Rapid prototypes on FPGA for 8-bit input word size shows significant improvement in compressed image quality by up to 1-2 dB at the cost of only eight adders compared to the state-of-art 16-point DCT approximation algorithm in the literature [S. Bouguezel, M. O. Ahmad, and M. N. S. Swamy. A novel transform for image compression. In Proceedings of the 53rd IEEE International Midwest Symposium on Circuits and Systems (MWSCAS), 2010].
READ FULL TEXT