A Tree Architecture of LSTM Networks for Sequential Regression with Missing Data
We investigate regression for variable length sequential data containing missing samples and introduce a novel tree architecture based on the Long Short-Term Memory (LSTM) networks. In our architecture, we employ a variable number of LSTM networks, which use only the existing inputs in the sequence, in a tree-like architecture without any statistical assumptions or imputations on the missing data, unlike all the previous approaches. In particular, we incorporate the missingness information by selecting a subset of these LSTM networks based on "presence-pattern" of a certain number of previous inputs. From the mixture of experts perspective, we train different LSTM networks as our experts for various missingness patterns and then combine their outputs to generate the final prediction. We also provide the computational complexity analysis of the proposed architecture, which is in the same order of the complexity of the conventional LSTM architectures for the sequence length. Our method can be readily extended to similar structures such as GRUs, RNNs as remarked in the paper. In the experiments, we achieve significant performance improvements with respect to the state-of-the-art methods for the well-known financial and real life datasets.
READ FULL TEXT