A Kernel Theory of Modern Data Augmentation

03/16/2018
by   Tri Dao, et al.
0

Data augmentation, a technique in which a training set is expanded with class-preserving transformations, is ubiquitous in modern machine learning pipelines. In this paper, we seek to establish a theoretical framework for understanding modern data augmentation techniques. We start by showing that for kernel classifiers, data augmentation can be approximated by first-order feature averaging and second-order variance regularization components. We connect this general approximation framework to prior work in invariant kernels, tangent propagation, and robust optimization. Next, we explicitly tackle the compositional aspect of modern data augmentation techniques, proposing a novel model of data augmentation as a Markov process. Under this model, we show that performing k-nearest neighbors with data augmentation is asymptotically equivalent to a kernel classifier. Finally, we illustrate ways in which our theoretical framework can be leveraged to accelerate machine learning workflows in practice, including reducing the amount of computation needed to train on augmented data, and predicting the utility of a transformation prior to training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/25/2020

Learning Data Augmentation with Online Bilevel Optimization for Image Classification

Data augmentation is a key practice in machine learning for improving ge...
research
04/01/2021

GABO: Graph Augmentations with Bi-level Optimization

Data augmentation refers to a wide range of techniques for improving mod...
research
06/08/2021

What Data Augmentation Do We Need for Deep-Learning-Based Finance?

The main task we consider is portfolio construction in a speculative mar...
research
06/16/2023

SLACK: Stable Learning of Augmentations with Cold-start and KL regularization

Data augmentation is known to improve the generalization capabilities of...
research
03/13/2022

On Data Augmentation in Point Process Models Based on Thinning

Many models for point process data are defined through a thinning proced...
research
05/20/2022

Data Augmentation for Compositional Data: Advancing Predictive Models of the Microbiome

Data augmentation plays a key role in modern machine learning pipelines....
research
02/15/2019

Asymptotically exact data augmentation: models, properties and algorithms

Data augmentation, by the introduction of auxiliary variables, has becom...

Please sign up or login with your details

Forgot password? Click here to reset