Improving American Sign Language Recognition with Synthetic Data
There is a need for real-time communication between the deaf and hearing without the aid of an interpreter. Developing a machine translation (MT) system between sign and spoken languages is a multimodal task since sign language is a visual language, which involves the automatic recognition and translation of video images. In this paper, we present the research we have been carrying out to build an automated sign language recognizer (ASLR), which is the core component of a machine translation (MT) system between American Sign Language (ASL) and English. Developing an ASLR is a challenging task due to the lack of sufficient quantities of annotated ASL-English parallel corpora for training, testing and developing an ASLR. This paper describes the research we have been conducting to explore a range of different techniques for automatically generating synthetic data from existing datasets to improve the accuracy of ASLR. This work involved experimentation with several algorithms with varying amounts of synthetic data and evaluations of their effectiveness. It was demonstrated that automatically creating valid synthetic training data through simple image manipulation of ASL video recordings improves the performance of the ASLR task.
READ FULL TEXT