On the two-dataset problem

11/01/2019

∙

This paper considers the two-dataset problem, where data are collected from two potentially different populations sharing common aspects. This problem arises when data are collected by two different types of researchers or from two different sources. We may reach invalid conclusions if knowledge about the data collection process is ignored or assumptions about the process are wrong. To address this problem, this paper develops statistical models and proposes two prediction errors that can be used to evaluate the underlying data collection process. As a consequence, it is possible to discuss the heterogeneity/similarity of data in terms of prediction. Two real datasets are selected to illustrate our method.

READ FULL TEXT

On the two-dataset problem

Sign in with Google

Consider DeepAI Pro