Hierarchical RNN with Static Sentence-Level Attention for Text-Based Speaker Change Detection
Traditional speaker change detection in dialogues is typically based on audio input. In some scenarios, however, researchers can only obtain text, and do not have access to raw audio signals. Moreover, with the increasing need of deep semantic processing, text-based dialogue understanding is attracting more attention in the community. These raise the problem of text-based speaker change detection. In this paper, we formulate the task as a matching problem of utterances before and after a certain decision point; we propose a hierarchical recurrent neural network (RNN) with static sentence-level attention. Our model comprises three main components: a sentence encoder with a long short term memory (LSTM)-based RNN, a context encoder with another LSTM-RNN, and a static sentence-level attention mechanism, which allows rich information interaction. Experimental results show that neural networks consistently achieve better performance than feature-based approaches, and that our attention-based model significantly outperforms non-attention neural networks.
READ FULL TEXT