Closing the Accuracy Gap in an Event-Based Visual Recognition Task
Mobile and embedded applications require neural networks-based pattern recognition systems to perform well under a tight computational budget. In contrast to commonly used synchronous, frame-based vision systems and CNNs, asynchronous, spiking neural networks driven by event-based visual input respond with low latency to sparse, salient features in the input, leading to high efficiency at run-time. The discrete nature of the event-based data streams makes direct training of asynchronous neural networks challenging. This paper studies asynchronous spiking neural networks, obtained by conversion from a conventional CNN trained on frame-based data. As an example, we consider a CNN trained to steer a robot to follow a moving target. We identify possible pitfalls of the conversion and demonstrate how the proposed solutions bring the classification accuracy of the asynchronous network to only 3% below the performance of the original synchronous CNN, while requiring 12x fewer computations. While being applied to a simple task, this work is an important step towards low-power, fast, and embedded neural networks-based vision solutions for robotic applications.
READ FULL TEXT