Patch DCT vs LeNet
This paper compares the performance of a NN taking the output of a DCT (Discrete Cosine Transform) of an image patch with leNet for classifying MNIST hand written digits. The basis functions underlying the DCT bear a passing resemblance to some of the learned basis function of the Visual Transformer but are an order of magnitude faster to apply.
READ FULL TEXT