Multidimensional Scaling for Big Data
We present a set of algorithms for Multidimensional Scaling (MDS) to be used with large datasets. MDS is a statistic tool for reduction of dimensionality, using as input a distance matrix of dimensions n × n. When n is large, classical algorithms suffer from computational problems and MDS configuration can not be obtained. In this paper we address these problems by means of three algorithms: Divide and Conquer MDS, Fast MDS and MDS based on Gower interpolation (the first and the last being original proposals). The main ideas of these methods are based on partitioning the dataset into small pieces, where classical MDS methods can work. In order to check the performance of the algorithms as well as to compare them, we do a simulation study. This study points out that Fast MDS and MDS based on Gower interpolation are appropriated to use when n is large. Although Divide and Conquer MDS is not as fast as the other two algorithms, it is the best method that captures the variance of the original data.
READ FULL TEXT