Structural information based term weighting in text retrieval for feature location

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
University of Alabama Libraries

Feature location is a program comprehension activity in which a developer identifies source code entities that implement a feature of interest. Recent feature location techniques apply text retrieval techniques to corpora built from text embedded in source code. These techniques are highly configurable, but many of the available parameters remain unexplored in the software engineering context. For example, while the natural language processing community has developed several term weighting schemes meant to highlight the importance of certain terms in a particular document, the software engineering community has thus far not developed new term weighting schemes for use with source code. Thus, we propose a new term weighting scheme that is based on the structural information in source code. We then report the results of an empirical study in which we evaluated the performance effects of the proposed term weighting scheme on a latent Dirichlet allocation (LDA) based feature location technique (FLT). In all, we studied over 400 bugs and features from five open source Java systems. Our key finding is that the accuracy of the LDA-based FLT improves when a structural term weighting scheme is used rather than a uniform term weighting scheme.

Electronic Thesis or Dissertation
Computer science