Learning Near-optimal Convex Combinations of Basis Models with Generalization Guarantees
The problem of learning an optimal convex combination of basis models has been studied in a number of works, with a focus on the theoretical analysis, but little investigation on the empirical performance of the approach. In this paper, we present some new theoretical insights, and empirical results that demonstrate the effectiveness of the approach. Theoretically, we first consider whether we can replace convex combinations by linear combinations, and obtain convergence results similar to existing results for learning from a convex hull. We present a negative result showing that the linear hull of very simple basis functions can have unbounded capacity, and is thus prone to overfitting. On the other hand, convex hulls are still rich but have bounded capacities. In addition, we obtain a generalization bound for a general class of Lipschitz loss functions. Empirically, we first discuss how a convex combination can be greedily learned with early stopping, and how a convex combination can be non-greedily learned when the number of basis models is known a priori. Our experiments suggest that the greedy scheme is competitive with or better than several baselines, including boosting and random forests. The greedy algorithm requires little effort in hyper-parameter tuning, and also seems to adapt to the underlying complexity of the problem.
READ FULL TEXT