Analyzing covariate clustering effects in healthcare cost subgroups: insights and applications for prediction
Healthcare cost prediction is a challenging task due to the high-dimensionality and high correlation among covariates. Additionally, the skewed, heavy-tailed, and often multi-modal nature of cost data can complicate matters further due to unobserved heterogeneity. In this study, we propose a novel framework for finite mixture regression models that incorporates covariate clustering methods to better account for the effects of clustered covariates on subgroups of the outcome, which enables a more accurate characterization of the complex distribution of the data. The proposed framework can be formulated as a convex optimization problem with an additional penalty term based on the prior similarity of the covariates. To efficiently solve this optimization problem, a specialized EM-ADMM algorithm is proposed that integrates the alternating direction multiplicative method (ADMM) into the iterative process of the expectation-maximizing (EM) algorithm. The convergence of the algorithm and the efficiency of the covariate clustering method are verified using simulation data, and the superiority of the approach over traditional regression techniques is demonstrated using two real Chinese medical expenditure datasets. Our empirical results provide valuable insights into the complex network graph of the covariates and can inform business practices, such as the design and pricing of medical insurance products.
READ FULL TEXT