Matrix Completion for Survey Data Prediction with Multivariate Missingness
Survey data are the gold-standard for estimating finite population parameters, which, however, become challenging due to inevitable missingness. In this paper, we develop a new imputation method to deal with multivariate missingness at random using matrix completion. In contrast to existing imputation schemes either conducting row-wise or column-wise imputation, we treat the data matrix as a whole which allows to exploit both row and column patterns at the same time. We adopt a column-space-decomposition model for the population data matrix with easy-to-obtain demographic data as covariates and a low-rank structured residual matrix. The proposed method addresses identification and penalized estimation for the sample data matrix. Asymptotic properties are investigated, and simulation study shows that the doubly robust estimator using the proposed matrix completion for imputation has smaller mean squared error than other competitors. We apply the proposed method to the National Health and Nutrition Examination Survey 2015-2016 Questionnaire Data.
READ FULL TEXT