Integrating multi-source block-wise missing data in model selection

01/12/2019
by   Fei Xue, et al.
0

For multi-source data, blocks of variable information from certain sources are likely missing. Existing methods for handling missing data do not take structures of block-wise missing data into consideration. In this paper, we propose a Multiple Block-wise Imputation (MBI) approach, which incorporates imputations based on both complete and incomplete observations. Specifically, for a given missing pattern group, the imputations in MBI incorporate more samples from groups with fewer observed variables in addition to the group with complete observations. We propose to construct estimating equations based on all available information, and optimally integrate informative estimating functions to achieve efficient estimators. We show that the proposed method has estimation and model selection consistency under both fixed-dimensional and high-dimensional settings. Moreover, the proposed estimator is asymptotically more efficient than the estimator based on a single imputation from complete observations only. In addition, the proposed method is not restricted to missing completely at random. Numerical studies and ADNI data application confirm that the proposed method outperforms existing variable selection methods under various missing mechanisms.

READ FULL TEXT
research
08/07/2018

Generalized Integrative Principal Component Analysis for Multi-Type Data with Block-Wise Missing Structure

High-dimensional multi-source data are encountered in many fields. Despi...
research
01/14/2019

Supervised Learning for Multi-Block Incomplete Data

In the supervised high dimensional settings with a large number of varia...
research
10/23/2020

Learning from missing data with the Latent Block Model

Missing data can be informative. Ignoring this information can lead to m...
research
11/29/2018

Accounting for model uncertainty in multiple imputation under complex sampling

Multiple imputation provides an effective way to handle missing data. Wh...
research
02/06/2018

An Imputation-Consistency Algorithm for High-Dimensional Missing Data Problems and Beyond

Missing data are frequently encountered in high-dimensional problems, bu...
research
06/21/2021

A generalized EMS algorithm for model selection with incomplete data

Recently, a so-called E-MS algorithm was developed for model selection i...
research
11/22/2019

ptype: Probabilistic Type Inference

Type inference refers to the task of inferring the data type of a given ...

Please sign up or login with your details

Forgot password? Click here to reset