Homeostasis phenomenon in predictive inference when using a wrong learning model: a tale of random split of data into training and test sets
This note uses a conformal prediction procedure to provide further support on several points discussed by Professor Efron (Efron, 2020) concerning prediction, estimation and IID assumption. It aims to convey the following messages: (1) Under the IID (e.g., random split of training and testing data sets) assumption, prediction is indeed an easier task than estimation, since prediction has a 'homeostasis property' in this case – Even if the model used for learning is completely wrong, the prediction results maintain valid. (2) If the IID assumption is violated (e.g., a targeted prediction on specific individuals), the homeostasis property is often disrupted and the prediction results under a wrong model are usually invalid. (3) Better model estimation typically leads to more accurate prediction in both IID and non-IID cases. Good modeling and estimation practices are important and, in many times, crucial for obtaining good prediction results. The discussion also provides one explanation why the deep learning method works so well in academic exercises (with experiments set up by randomly splitting the entire data into training and testing data sets), but fails to deliver many `killer applications' in real world applications.
READ FULL TEXT