Influenza Modeling Based on Massive Feature Engineering and International Flow Deconvolution
In this article, we focus on the analysis of the potential factors driving the spread of influenza, and possible policies to mitigate the adverse effects of the disease. To be precise, we first invoke discrete Fourier transform (DFT) to conclude a yearly periodic regional structure in the influenza activity, thus safely restricting ourselves to the analysis of the yearly influenza behavior. Then we collect a massive number of possible region-wise indicators contributing to the influenza mortality, such as consumption, immunization, sanitation, water quality, and other indicators from external data, with 1170 dimensions in total. We extract significant features from the high dimensional indicators using a combination of data analysis techniques, including matrix completion, support vector machines (SVM), autoencoders, and principal component analysis (PCA). Furthermore, we model the international flow of migration and trade as a convolution on regional influenza activity, and solve the deconvolution problem as higher-order perturbations to the linear regression, thus separating regional and international factors related to the influenza mortality. Finally, both the original model and the perturbed model are tested on regional examples, as validations of our models. Pertaining to the policy, we make a proposal based on the connectivity data along with the previously extracted significant features to alleviate the impact of influenza, as well as efficiently propagate and carry out the policies. We conclude that environmental features and economic features are of significance to the influenza mortality. The model can be easily adapted to model other types of infectious diseases.
READ FULL TEXT