EEG-Derived Voice Signature for Attended Speaker Detection
Objective: Conventional EEG-based auditory attention detection (AAD) is achieved by comparing the time-varying speech stimuli and the elicited EEG signals. However, in order to obtain reliable correlation values, these methods necessitate a long decision window, resulting in a long detection latency. Humans have a remarkable ability to recognize and follow a known speaker, regardless of the spoken content. In this paper, we seek to detect the attended speaker among the pre-enrolled speakers from the elicited EEG signals. In this manner, we avoid relying on the speech stimuli for AAD at run-time. In doing so, we propose a novel EEG-based attended speaker detection (E-ASD) task. Methods: We encode a speaker's voice with a fixed dimensional vector, known as speaker embedding, and project it to an audio-derived voice signature, which characterizes the speaker's unique voice regardless of the spoken content. We hypothesize that such a voice signature also exists in the listener's brain that can be decoded from the elicited EEG signals, referred to as EEG-derived voice signature. By comparing the audio-derived voice signature and the EEG-derived voice signature, we are able to effectively detect the attended speaker in the listening brain. Results: Experiments show that E-ASD can effectively detect the attended speaker from the 0.5s EEG decision windows, achieving 99.78% AAD accuracy, 99.94% AUC, and 0.27% EER. Conclusion: We conclude that it is possible to derive the attended speaker's voice signature from the EEG signals so as to detect the attended speaker in a listening brain. Significance: We present the first proof of concept for detecting the attended speaker from the elicited EEG signals in a cocktail party environment. The successful implementation of E-ASD marks a non-trivial, but crucial step towards smart hearing aids.
READ FULL TEXT