Three essays on the use of margins to improve ensemble methods
Ensemble methods, such as bagging (Breiman, 1996), boosting (Freund and Schapire, 1997) and random forests (Breiman, 2001) combine a large number of classifiers through (weighted) voting to produce strong classifiers. To explain the successful performance of ensembles and particularly of boosting, Schapire, Freund, Bartlett and Lee (1998) developed an upper bound on the generalization error of an ensemble based on the margins, from which it was concluded that larger margins should lead to lower generalization error, everything else being equal (sometimes referred to as the "large margins theory"). This result has led many researchers to consider direct optimization of functions of the margins (see, e.g., Grove and Schuurmans, 1998; Breiman, 1999 Mason, Bartlett and Baxter, 2000; and Shen and Li, 2010). In this research, we show that the large margins theory is not sufficient for explaining the performance of AdaBoost. Shen and Li (2010) and Xu and Gray (2012) provide evidence suggesting that generalization error might be reduced by increasing the mean and decreasing the variance of the margins, which we refer to as "squeezing" the margins. For that reason, we also propose several alternative techniques for squeezing the margins and evaluate their effectiveness through simulations with real and synthetic data sets. In addition to the margins being a determinant of the performance of ensembles, we know that AdaBoost, the most common boosting algorithm, can be very sensitive to outliers and noisy data, since it assigns observations that have been misclassified a higher weight in subsequent runs. Therefore, we propose several techniques to identify and potentially delete noisy samples in order to improve its performance.