Variable screening based on Gaussian Centered L-moments
An important challenge in big data is identification of important variables. In this paper, we propose methods of discovering variables with non-standard univariate marginal distributions. The conventional moments-based summary statistics can be well-adopted for that purpose, but their sensitivity to outliers can lead to selection based on a few outliers rather than distributional shape such as bimodality. To address this type of non-robustness, we consider the L-moments. Using these in practice, however, has a limitation because they do not take zero values at the Gaussian distributions to which the shape of a marginal distribution is most naturally compared. As a remedy, we propose Gaussian Centered L-moments which share advantages of the L-moments but have zeros at the Gaussian distributions. The strength of Gaussian Centered L-moments over other conventional moments is shown in theoretical and practical aspects such as their performances in screening important genes in cancer genetics data.
READ FULL TEXT