Robust and scalable learning of data manifolds with complex topologies via ElPiGraph

04/20/2018
by   Luca Albergante, et al.
0

We present ElPiGraph, a method for approximating data distributions having non-trivial topological features such as the existence of excluded regions or branching structures. Unlike many existing methods, ElPiGraph is not based on the construction of a k-nearest neighbour graph, a procedure that can perform poorly in the case of multidimensional and noisy data. Instead, ElPiGraph constructs elastic principal graphs in a more robust way by minimizing elastic energy, applying graph grammars and explicitly controlling topological complexity. Using trimmed approximation error function makes ElPiGraph extremely robust to the presence of background noise without decreasing computational performance and allows it to deal with complex cases of manifold learning (for example, ElPiGraph can learn disconnected intersecting manifolds). Thanks to the quasi-quadratic nature of the elastic function, ElPiGraph performs almost as fast as a simple k-means clustering and, therefore, is much more scalable than alternative methods, and can work on large datasets containing millions of high dimensional points on a personal computer. The excellent performance of the method opens the possibility to apply resampling and to approximate complex data structures via principal graph ensembles which can be used to construct consensus principal graphs. ElPiGraph is currently implemented in five programming languages and accompanied by a graphical user interface, which makes it a versatile tool to deal with complex data in various fields from molecular biology, where it can be used to infer pseudo-time trajectories from single-cell RNASeq, to astronomy, where it can be used to approximate complex structures in the distribution of galaxies.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset