Three essays on the use of margins to improve ensemble methods

Show simple item record

dc.contributor Barrett, Bruce E.
dc.contributor Adams, Benjamin Michael
dc.contributor Sox, Charles R.
dc.contributor Albright, Thomas L.
dc.contributor.advisor Gray, J. Brian
dc.contributor.author Martinez Cid, Waldyn Gerardo
dc.date.accessioned 2017-03-01T16:35:38Z
dc.date.available 2017-03-01T16:35:38Z
dc.date.issued 2012
dc.identifier.other u0015_0000001_0001081
dc.identifier.other MartinezCid_alatus_0004D_11348
dc.identifier.uri https://ir.ua.edu/handle/123456789/1563
dc.description Electronic Thesis or Dissertation
dc.description.abstract Ensemble methods, such as bagging (Breiman, 1996), boosting (Freund and Schapire, 1997) and random forests (Breiman, 2001) combine a large number of classifiers through (weighted) voting to produce strong classifiers. To explain the successful performance of ensembles and particularly of boosting, Schapire, Freund, Bartlett and Lee (1998) developed an upper bound on the generalization error of an ensemble based on the margins, from which it was concluded that larger margins should lead to lower generalization error, everything else being equal (sometimes referred to as the "large margins theory"). This result has led many researchers to consider direct optimization of functions of the margins (see, e.g., Grove and Schuurmans, 1998; Breiman, 1999 Mason, Bartlett and Baxter, 2000; and Shen and Li, 2010). In this research, we show that the large margins theory is not sufficient for explaining the performance of AdaBoost. Shen and Li (2010) and Xu and Gray (2012) provide evidence suggesting that generalization error might be reduced by increasing the mean and decreasing the variance of the margins, which we refer to as "squeezing" the margins. For that reason, we also propose several alternative techniques for squeezing the margins and evaluate their effectiveness through simulations with real and synthetic data sets. In addition to the margins being a determinant of the performance of ensembles, we know that AdaBoost, the most common boosting algorithm, can be very sensitive to outliers and noisy data, since it assigns observations that have been misclassified a higher weight in subsequent runs. Therefore, we propose several techniques to identify and potentially delete noisy samples in order to improve its performance.
dc.format.extent 83 p.
dc.format.medium electronic
dc.format.mimetype application/pdf
dc.language English
dc.language.iso en_US
dc.publisher University of Alabama Libraries
dc.relation.ispartof The University of Alabama Electronic Theses and Dissertations
dc.relation.ispartof The University of Alabama Libraries Digital Collections
dc.relation.hasversion born digital
dc.rights All rights reserved by the author unless otherwise indicated.
dc.subject.other Statistics
dc.title Three essays on the use of margins to improve ensemble methods
dc.type thesis
dc.type text
etdms.degree.department University of Alabama. Dept. of Information Systems, Statistics, and Management Science
etdms.degree.discipline Applied Statistics
etdms.degree.grantor The University of Alabama
etdms.degree.level doctoral
etdms.degree.name Ph.D.


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account