Noise Handling For Improving Machine Learning-Based Test Case Selection
Abstract: Background: Continuous integration is a modern software engineering practice that promotes rapid integration and testing of code changes as soon as they get committed to the project repository. One challenge in adopting this practice lies in the long time required for executing all available test cases to perform regression testing. The availability of large amounts of data about code changes and executed test cases in continuous integration systems poses an opportunity to design data-driven approaches that can effectively select a subset of test cases for regression testing. Objective: The objective of this thesis is to create a method for selecting test cases that have the highest probability of revealing faults in the system, given new code changes pushed into the code-base. Using historically committed source code and their respective executed test cases, we can utilize textual analysis and machine learning to design a method, called MeBoTs, that can learn the selection of test cases. Method: To address this objective, we carried out two design science research cycles and two controlled experiments. A combination of quantitative and qualitative data collection methods were used, including testing and code commits data, surveys, and a workshop, to evaluate and improve the effectiveness of MeBoTs in selecting effective test cases. Results: The main findings of this thesis are that: 1) using an elimination and a relabelling strategy for handling class noise in the data increases the performance of MeBoTs from 25% to 84% (F1-score), 2) eliminating attribute noise from the training data does not improve the predictive performance of a test selection model (F1-score remains unchanged at 66%), and 3) memory management changes in the source code should be tested with performance, load, soak, stress, volume, and capacity tests; the algorithmic complexity changes should be tested with the same tests for memory management code changes in addition to maintainability tests. Conclusion: Our first conclusion is that textual analysis of source code can be effective in test case selection if a class noise handling strategy is applied for curating incorrectly labeled data points in the training data. Secondly, test orchestrators do not need to handle attribute noise in the data, since it does not lead to an improvement in the performance of MeBoTs. Finally, we conclude that the performance of MeBoTs can be improved by instrumenting a tool that automatically associates code changes of specific types to test cases that are in dependency for training.
This dissertation MIGHT be available in PDF-format. Check this page to see if it is available for download.