Three essays on the use of margins to improve ensemble methods

dc.contributorBarrett, Bruce E.
dc.contributorAdams, Benjamin Michael
dc.contributorSox, Charles R.
dc.contributorAlbright, Thomas L.
dc.contributor.advisorGray, J. Brian
dc.contributor.authorMartinez Cid, Waldyn Gerardo
dc.contributor.otherUniversity of Alabama Tuscaloosa
dc.date.accessioned2017-03-01T16:35:38Z
dc.date.available2017-03-01T16:35:38Z
dc.date.issued2012
dc.descriptionElectronic Thesis or Dissertationen_US
dc.description.abstractEnsemble methods, such as bagging (Breiman, 1996), boosting (Freund and Schapire, 1997) and random forests (Breiman, 2001) combine a large number of classifiers through (weighted) voting to produce strong classifiers. To explain the successful performance of ensembles and particularly of boosting, Schapire, Freund, Bartlett and Lee (1998) developed an upper bound on the generalization error of an ensemble based on the margins, from which it was concluded that larger margins should lead to lower generalization error, everything else being equal (sometimes referred to as the "large margins theory"). This result has led many researchers to consider direct optimization of functions of the margins (see, e.g., Grove and Schuurmans, 1998; Breiman, 1999 Mason, Bartlett and Baxter, 2000; and Shen and Li, 2010). In this research, we show that the large margins theory is not sufficient for explaining the performance of AdaBoost. Shen and Li (2010) and Xu and Gray (2012) provide evidence suggesting that generalization error might be reduced by increasing the mean and decreasing the variance of the margins, which we refer to as "squeezing" the margins. For that reason, we also propose several alternative techniques for squeezing the margins and evaluate their effectiveness through simulations with real and synthetic data sets. In addition to the margins being a determinant of the performance of ensembles, we know that AdaBoost, the most common boosting algorithm, can be very sensitive to outliers and noisy data, since it assigns observations that have been misclassified a higher weight in subsequent runs. Therefore, we propose several techniques to identify and potentially delete noisy samples in order to improve its performance.en_US
dc.format.extent83 p.
dc.format.mediumelectronic
dc.format.mimetypeapplication/pdf
dc.identifier.otheru0015_0000001_0001081
dc.identifier.otherMartinezCid_alatus_0004D_11348
dc.identifier.urihttps://ir.ua.edu/handle/123456789/1563
dc.languageEnglish
dc.language.isoen_US
dc.publisherUniversity of Alabama Libraries
dc.relation.hasversionborn digital
dc.relation.ispartofThe University of Alabama Electronic Theses and Dissertations
dc.relation.ispartofThe University of Alabama Libraries Digital Collections
dc.rightsAll rights reserved by the author unless otherwise indicated.en_US
dc.subjectStatistics
dc.titleThree essays on the use of margins to improve ensemble methodsen_US
dc.typethesis
dc.typetext
etdms.degree.departmentUniversity of Alabama. Department of Information Systems, Statistics, and Management Science
etdms.degree.disciplineApplied Statistics
etdms.degree.grantorThe University of Alabama
etdms.degree.leveldoctoral
etdms.degree.namePh.D.

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
file_1.pdf
Size:
13.22 MB
Format:
Adobe Portable Document Format