Continual Learning of Recurrent Neural Networks by Locally Aligning Distributed Representations
Temporal models based on recurrent neural networks have proven to be quite powerful in a wide variety of applications, including language modeling and speech processing. However, training these models relies on back-propagation through time, which entails unfolding the network over many time steps, making the process of conducting credit assignment considerably more challenging. Furthermore, the nature of back-propagation itself does not permit the use of non-differentiable activation functions and is inherently sequential, making parallelization of the training process very difficult. In this work, we propose the Parallel Temporal Neural Coding Network (P-TNCN), a biologically inspired model trained by the learning algorithm known as Local Representation Alignment, that aims to resolve the difficulties that plague recurrent networks trained by back-propagation through time. Most notably, this architecture requires neither unrolling nor the derivatives of its internal activation functions. We compare our model and learning procedure to other online back-propagation-through-time alternatives (which tend to be computationally expensive), including real-time recurrent learning, echo state networks, and unbiased online recurrent optimization, and show that it outperforms them on sequence benchmarks such as Bouncing MNIST, a new benchmark we call Bouncing NotMNIST, and Penn Treebank. Notably, our approach can, in some instances, outperform full back-propagation through time and variants such as sparse attentive back-tracking. Significantly, the hidden unit correction phase of P-TNCN allows it to adapt to new datasets even if its synaptic weights are held fixed (zero-shot adaptation) and facilitates retention of prior knowledge when faced with a task sequence. We present results that show the P-TNCN's ability to conduct zero-shot adaptation and continual sequence modeling.
READ FULL TEXT