Meeting Transcription Using Virtual Microphone Arrays

05/03/2019
by   Takuya Yoshioka, et al.
0

We describe a system that generates speaker-annotated transcripts of meetings by using a virtual microphone array, a set of spatially distributed asynchronous recording devices such as laptops and mobile phones. The system is composed of continuous audio stream alignment, blind beamforming, speech recognition, speaker diarization using prior speaker information, and system combination. With seven input audio streams, our system achieves a word error rate (WER) of 22.3 the non-overlapping speech segments. The speaker-attributed WER (SAWER) is 26.7 20.3 presented system achieves a 13.6 duration contains more than one speaker. The contribution of each component to the overall performance is also investigated.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset