Online topic modeling for software maintenance using a changeset-based approach

dc.contributorGray, Jeff
dc.contributorSmith, Randy K.
dc.contributorAtkison, Travis Levestis
dc.contributor.advisorKraft, Nicholas A.
dc.contributor.advisorCarver, Jeffrey C.
dc.contributor.authorCorley, Christopher Scott
dc.contributor.otherUniversity of Alabama Tuscaloosa
dc.date.accessioned2018-07-11T16:49:03Z
dc.date.available2018-07-11T16:49:03Z
dc.date.issued2018
dc.descriptionElectronic Thesis or Dissertationen_US
dc.description.abstractTopic modeling is a machine learning technique for discovering thematic structure within a corpus. Topic models have been applied to several areas of software engineering, including bug localization, feature location, triaging change requests, and traceability link recovery. Many of these approaches train topic models on a source code snapshot -- a revision or state of code at a particular point of time, such as a versioned release. However, source code evolution leads to model obsolescence and thus to the need to retrain the model from the latest snapshot, incurring a non-trivial computational cost of model re-learning. This work proposes and investigates an approach that can remedy the obsolescence problem. Conventional wisdom in the software maintenance research community holds that the topic model training information must be the same information that is of interest for retrieval. The primary insight for this work is that topic models can infer the topics of any information, regardless of the information used to train the model. Pairing online topic modeling with mining software repositories, I can remove the need to retrain a model and achieve model persistence. For this, I suggest training of topic models on the software repository history in the form of the changeset -- a textual representation of the changes that occur between two source code snapshots. To show the feasibility of this approach, I investigate two popular applications of text retrieval in software maintenance, feature location and developer identification. Feature location is a search activity for locating the source code entity that relates to a feature of interest. Developer identification is similar, but focuses on identifying the developer most apt for working on a feature of interest. Further, to demonstrate the usability of changeset-based topic models, I investigate whether I can coalesce topic-modeling-based maintenance tasks into using a single model, rather than needing to train a model for each task at hand. In sum, this work aims to show that training online topic models on software repositories removes retraining costs while maintaining accuracy of a traditional snapshot-based topic model for different software maintenance problems.en_US
dc.format.extent195 p.
dc.format.mediumelectronic
dc.format.mimetypeapplication/pdf
dc.identifier.otheru0015_0000001_0002925
dc.identifier.otherCorley_alatus_0004D_13461
dc.identifier.urihttp://ir.ua.edu/handle/123456789/3610
dc.languageEnglish
dc.language.isoen_US
dc.publisherUniversity of Alabama Libraries
dc.relation.hasversionborn digital
dc.relation.ispartofThe University of Alabama Electronic Theses and Dissertations
dc.relation.ispartofThe University of Alabama Libraries Digital Collections
dc.rightsAll rights reserved by the author unless otherwise indicated.en_US
dc.subjectComputer science
dc.titleOnline topic modeling for software maintenance using a changeset-based approachen_US
dc.typethesis
dc.typetext
etdms.degree.departmentUniversity of Alabama. Department of Computer Science
etdms.degree.disciplineComputer Science
etdms.degree.grantorThe University of Alabama
etdms.degree.leveldoctoral
etdms.degree.namePh.D.
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
file_1.pdf
Size:
5.23 MB
Format:
Adobe Portable Document Format