KDE sampling for imbalanced class distribution
Imbalanced response variable distribution is not an uncommon occurrence in data science. One common way to combat class imbalance is through resampling the minority class to achieve a more balanced distribution. In this paper, we investigate the performance of the sampling method based on kernel density estimate (KDE). We illustrate how KDE is less prone to overfitting than other standard sampling methods. Numerical experiments show that KDE can outperform other sampling techniques on a range of classifiers and real life datasets.
READ FULL TEXT