Theses and Dissertations - Department of Computer Science
Permanent URI for this collection
Browse
Browsing Theses and Dissertations - Department of Computer Science by Author "Atkison, Travis Levestis"
Now showing 1 - 7 of 7
Results Per Page
Sort Options
Item Application of human error theories in detecting and preventing software requirement errors(University of Alabama Libraries, 2017) Hu, Wenhua; Carver, Jeffrey C.; University of Alabama TuscaloosaDeveloping correct software requirements is important for overall software quality. Most existing quality improvement approaches focus on detection and removal of faults (i.e., problems recorded in a document) as opposed to identifying the underlying errors that produced those faults. Accordingly, developers are likely to make the same errors in the future and not recognize other existing faults with the same origins. The Requirement Error Taxonomy (RET) developed by Walia and Carver helps focus the developer’s attention on common errors that can occur during requirements engineering. However, because development of software requirements is a human-centric process, requirements engineers will likely make human errors during the process which may lead to undetected faults. Thus, in order to bridge the gap, the goals of my dissertation are: (1) construct a complete Human Error Taxonomy (HET) for the software requirements stage; (2) investigate the usefulness of HET as a defect detection technique; (3) investigate the effectiveness of HET as a defect prevention technique; and (4) provide specific defect prevention measurements for each error in HET. To address these goals, the dissertation contains three articles. The first article is a systematic literature review that uses insights from cognitive psychology research on human errors to develop formal HET to help software engineers improve software requirements specification (SRS) documents. After building the HET, it is necessary to empirically evaluate its effectiveness. Thus, the second article describes two studies to evaluate the usefulness of the HET in the process of defect detection. Finally, the third article analyzes the usefulness of HET for defect prevention and provides strategies for preventing specific errors in the SRS.Item Combining information retrieval modules and structural information for source code bug localization and feature location(University of Alabama Libraries, 2011) Shao, Peng; Smith, Randy K.; Kraft, Nicholas A.; University of Alabama TuscaloosaBug localization and feature location in source code are software evolution tasks in which developers use information about a bug or feature present in a software system to locate the source code elements, such as classes or methods. These classes or methods must be modified either to correct the bug or implement a feature. Automating bug localization and feature location are necessary due to the size and complexity of modern software systems. Recently, researchers have developed static bug localization and feature location techniques using information retrieval techniques, such as latent semantic indexing (LSI), to model lexical information, such as identifiers and comments, from source code. This research presents a new technique, LSICG, which combines LSI modeling lexical information and call graphs to modeling structural information. The output is a list of methods ranked in descending order by likelihood of requiring modification to correct the bug or implement the feature under consideration. Three case studies including comparison of LSI and LSICG at method level and class level of granularity on 25 features in JavaHMO, 35 bugs in Rhino, 3 features and 6 bugs in jEdit demonstrate that The LSICG technique provides improved performance compared to LSI alone.Item Improving intelligent analytics through guidance: analysis and refinement of patterns of use and recommendation methods for data mining and analytics systems(University of Alabama Libraries, 2019) Pate, Jeremy; Dixon, Brandon; University of Alabama TuscaloosaIn conjunction with the proliferation of data collection applications, systems that provide functionality to analyze and mine this resource also increase in count and complexity. As a part of this growth, understanding how users navigate these systems, and how that navigation influences the resulting extracted information and subsequent decisions becomes a critical component of their design. A central theme of improving the understanding of user behavior and tools for their support within these systems focuses the effort to gain a context-aware view of analytics system optimization. Through distinct, but interwoven, articles this research examines the specific characteristics of usage patterns of a specific example of these types of systems, construction of and educational support system for new and existing users, and a decision-tree supported workflow optimization recommender system. These components combine to yield a method for guided intelligent analytics that uses behavior, system knowledge, and workflow optimization to improve the user experience and promote efficiency of use for systems of this type.Item Investigating the effect of corpus construction on latent dirichlet allocation based feature location(University of Alabama Libraries, 2012) Biggers, Lauren Rachel; Kraft, Nicholas A.; University of Alabama TuscaloosaThe software maintenance community has adopted text retrieval techniques to aid program comprehension tasks, e.g., feature location --- the process of finding the source code entity or entities that implement a system feature. Latent Dirichlet Allocation (LDA) and latent semantic indexing (LSI) are two such text retrieval techniques. However, little work exists to inform the configuration of these text retrieval techniques for software maintenance tasks. This work investigates the impact of highly configurable preprocessing techniques on LDA based feature location. These decisions affect the composition and quality of the corpus and thus the accuracy of the text retrieval technique. Source code extraction is based on a researcher's understanding of source code use. We decompose source code into three distinct lexicons: identifiers, comments, and literals. Many researchers choose the aggregation of the lexicons; however, some choose specific subsets. This work finds that the chosen text source(s) does impact the accuracy of the LDA based FLT. Conventional wisdom holds that identifier splitting improves the performance of a text retrieval based FLT. However, the decision to retain or remove the original identifier is unexplored. This work finds that identifier splitting does impact the accuracy of the LDA based FLT, but retaining or removing the original identifier does not have a significant impact. Stop words, words with little semantic value, are often removed from natural language corpora. This work explores the impact of stop word removal on source code corpora. The observations prove that few stop word configurations are significantly different from one another --- even a null configuration is acceptable. The Porter stemming algorithm is a popular, light-weight, rule-based stemmer, often used in software maintenance preprocessing applications. We investigate the effects of two heavy stemmers, two light stemmers, four blended stemmers, and a null configuration. One light stemmer is morphological and the other stemmers are rule-based. The results indicate that no stemming algorithm significantly affects the performance of the FLT as compared to another stemming algorithm. We suggest basing preprocessing decisions on system structure and constraints. As such, these recommendations reduce the memory and/or processing time needed for LDA based feature location.Item Law enforcement deployment algorithms: historic sharing approaches and results(University of Alabama Libraries, 2017) Elliott, Terry Beau; Smith, Randy K.; Atkison, Travis Levestis; University of Alabama TuscaloosaLaw Enforcement must be vigilant in utilizing their limited resources to address both criminal activity and traffic safety. Past research has concentrated on determining optimal routes for law enforcement entities under simplifying assumptions. This dissertation presents a taxonomy of these law enforcement resource deployments and routing methods in order to provide a stronger foundation for algorithms that consider both criminal activity and traffic safety in optimally allocating available resources. This dissertation also introduces a method for comparing of deployment strategies using simulation-based algorithm to cover nodes throughout a county, and it presents the results of several example comparisons.Item Mailtrust: attribute-based dynamic encrypted email(University of Alabama Libraries, 2017) Hudnall, Matthew; Vrbsky, Susan V.; Parrish, Allen Scott; University of Alabama TuscaloosaE-mail is generally regarded as an insecure method of electronic communication for numerous reasons. Most notably, the default does not guarantee the authentic identity of either the intended sender or receiver of a message, nor does it guarantee the confidentiality and integrity of the message. While these problems can partially be addressed with commonly utilized technologies involving certificates and e-mail client plugins, current practice is insufficient for high-security applications, such as classified communications among clients of different email systems. The research presented here leverages “Trustmarks” (which were developed primarily to support efficient single sign on in a federated environment) to support secure e-mails between multiple systems where there are particularly stringent confidentiality and integrity requirements. Such a system could increase the ability of users at disparate organizations to communicate without fear that sensitive information might intentionally or accidentally be disclosed. Although there are many barriers to adoption, such a system might also eventually reduce the reliance on separate communication networks and systems for classified communications.Item Online topic modeling for software maintenance using a changeset-based approach(University of Alabama Libraries, 2018) Corley, Christopher Scott; Kraft, Nicholas A.; Carver, Jeffrey C.; University of Alabama TuscaloosaTopic modeling is a machine learning technique for discovering thematic structure within a corpus. Topic models have been applied to several areas of software engineering, including bug localization, feature location, triaging change requests, and traceability link recovery. Many of these approaches train topic models on a source code snapshot -- a revision or state of code at a particular point of time, such as a versioned release. However, source code evolution leads to model obsolescence and thus to the need to retrain the model from the latest snapshot, incurring a non-trivial computational cost of model re-learning. This work proposes and investigates an approach that can remedy the obsolescence problem. Conventional wisdom in the software maintenance research community holds that the topic model training information must be the same information that is of interest for retrieval. The primary insight for this work is that topic models can infer the topics of any information, regardless of the information used to train the model. Pairing online topic modeling with mining software repositories, I can remove the need to retrain a model and achieve model persistence. For this, I suggest training of topic models on the software repository history in the form of the changeset -- a textual representation of the changes that occur between two source code snapshots. To show the feasibility of this approach, I investigate two popular applications of text retrieval in software maintenance, feature location and developer identification. Feature location is a search activity for locating the source code entity that relates to a feature of interest. Developer identification is similar, but focuses on identifying the developer most apt for working on a feature of interest. Further, to demonstrate the usability of changeset-based topic models, I investigate whether I can coalesce topic-modeling-based maintenance tasks into using a single model, rather than needing to train a model for each task at hand. In sum, this work aims to show that training online topic models on software repositories removes retraining costs while maintaining accuracy of a traditional snapshot-based topic model for different software maintenance problems.