Empirical investigation of causes and effects of code clones
Code Clones, also known as Software Clones are similar code fragments mostly formed due to reuse of code. The literature is abundant with ambiguous and vague fundamental definitions of code clones. Over the years, researchers have shown increasing interest in code clones. However, most of the research lacks empirical validation. There is a dearth of empirical studies especially in the area of cause and effect. Often researchers have associated code clones with a negative connotation. However, there is little evidence to prove that code clones negatively affect the system. Although the research community unanimously agrees that it is critical to keep track of code clones, the available research is void of substantial efforts on maintenance related issues. Most efforts go into the software life-cycle process of maintenance. It is yet unknown how exactly code clones can affect the process of maintenance and this dissertation is a step in that direction. Good and bad coding practices, together give rise to code clones. Educating and providing assistance to developers in clone maintenance scenarios can save effort. A primary objective of this dissertation is to investigate developer behavior and ascertain ways to help developers during clone maintenance. Before reaching this goal, a major milestone to cross is, understanding the fundamentals of code clones. This dissertation proposes a `four pillar architecture' with each pillar, namely - consistent definitions, causes and effects of clones, clone awareness, and clone management, focusing on questions closely related to the issues. For the purpose of answering the questions related to each pillar, this dissertation explains five research studies with respective empirical methods: systematic literature review, community survey, developer observation and qualitative interview. Results highlight a degree of ambiguity in the literature and difference of opinion in the research community. The results also show that cloned code requires more effort to maintain, and given proper training and clone aware information, developers can be assisted. This dissertation also proposes a code clone categorization based on cloning intent with a classification of harmful and helpful clones.