Rasch-based high-dimensionality data reduction and class prediction with applications to microarray gene expression data
Class prediction is an important application of microarray gene expression data analysis. The high-dimensionality of microarray data, where number of genes (variables) is very large compared to the number of samples (obser- vations), makes the application of many prediction techniques (e.g., logistic regression, discriminant analysis) difficult. An efficient way to solve this prob- lem is by using dimension reduction statistical techniques. Increasingly used in psychology-related applications, Rasch model (RM) provides an appealing framework for handling high-dimensional microarray data. In this paper, we study the potential of RM-based modeling in dimensionality reduction with binarized microarray gene expression data and investigate its prediction ac- curacy in the context of class prediction using linear discriminant analysis. Two different publicly available microarray data sets are used to illustrate a general framework of the approach. Performance of the proposed method is assessed by re-randomization scheme using principal component analysis (PCA) as a benchmark method. Our results show that RM-based dimension reduction is as effective as PCA-based dimension reduction. The method is general and can be applied to the other high-dimensional data problems.
READ FULL TEXT