SDST: Successive Decoding for Speech-to-text Translation
End-to-end speech-to-text translation (ST), which directly translates the source language speech to the target language text, has attracted intensive attention recently. However, the combination of speech recognition and machine translation in a single model poses a heavy burden on the direct cross-modal cross-lingual mapping. To reduce the learning difficulty, we propose SDST, an integral framework with Successive Decoding for end-to-end Speech-to-text Translation task. This method is verified in two mainstream datasets. Experiments show that our proposed improves the previous state-of-the-art methods by big margins.
READ FULL TEXT