Efficient Automatic Change Detection in Software Maintenance and Evolutionary Processes

Abstract: Software maintenance is such an integral part of its evolutionary process that it consumes much of the total resources available. Some estimate the costs of maintenance to be up to 100 times the amount of developing a software. A software not maintained builds up technical debt, and not paying off that debt timely will eventually outweigh the value of the software, if no countermeasures are undertaken. A software must adapt to changes in its environment, or to new and changed requirements. It must further receive corrections for emerging faults and vulnerabilities. Constant maintenance can prepare a software for the accommodation of future changes.While there may be plenty of rationale for future changes, the reasons behind historical changes may not be accessible longer. Understanding change in software evolution provides valuable insights into, e.g., the quality of a project, or aspects of the underlying development process. These are worth exploiting, for, e.g., fault prediction, managing the composition of the development team, or for effort estimation models. The size of software is a metric often used in such models, yet it is not well-defined. In this thesis, we seek to establish a robust, versatile and computationally cheap metric, that quantifies the size of changes made during maintenance. We operationalize this new metric and exploit it for automated and efficient commit classification.Our results show that the density of a commit, that is, the ratio between its net- and gross-size, is a metric that can replace other, more expensive metrics in existing classification models. Models using this metric represent the current state of the art in automatic commit classification. The density provides a more fine-grained and detailed insight into the types of maintenance activities in a software project.Additional properties of commits, such as their relation or intermediate sojourn-times, have not been previously exploited for improved classification of changes. We reason about the potential of these, and suggest and implement dependent mixture- and Bayesian models that exploit joint conditional densities, models that each have their own trade-offs with regard to computational cost and complexity, and prediction accuracy. Such models can outperform well-established classifiers, such as Gradient Boosting Machines.All of our empirical evaluation comprise large datasets, software and experiments, all of which we have published alongside the results as open-access. We have reused, extended and created datasets, and released software packages for change detection and Bayesian models used for all of the studies conducted.

  CLICK HERE TO DOWNLOAD THE WHOLE DISSERTATION. (in PDF format)