Data-Centric AI for Software Performance Engineering - Predicting Workload Dependent and Independent Performance of Software Systems Using Machine Learning Based Approaches

Abstract: Context: Machine learning (ML) approaches are widely employed in various software engineering (SE) tasks. Performance, however, is one of the most critical software quality requirements. Performance prediction is estimating the execution time of a software system prior to execution. The backbone of performance estimation is prediction models, in which machine learning (ML) is a common choice. Two settings are commonly considered for ML-based performance prediction: workload-dependent and workload-independent performance, depending on whether or not the specific usage of the system is fed as input to the ML estimator. Problem:  Developers usually manually understand the performance behaviour with respect to the workload. This process consumes time, effort, and computational resources since the developer repeats the running of the same tested system ( e.g. benchmark) many times, each with different workload values. In a workload-independent setting, predicting the scalar value of execution time based on the structure of the source code is challenging as it is a function of many factors, including the underlying architecture, the input parameters, and the application’s interactions with the operating system. Consequently, works that have attempted to predict absolute execution time for arbitrary applications from source code generally report poor accuracy. Goal:  The thesis presents a modern machine learning-based approach for predicting the execution time from two angles: (a) workload-independent performance. (b) workload-dependent performance.  Solution Approaches and Research Methodologies: To achieve the goal and tackle the problems mentioned earlier, we conducted a systematic empirical study to fill the gap of workload-dependant performance across five well-known projects in JMH benchmarking (including RxJava, Log4J2, and the Eclipse Collections framework) and 126 concrete benchmarks. We generated a dataset of approximately 1.4 million measurements. As for the poor accuracy challenges, we aim to increase the quality of data which is the source code in this context. To that aim, we invest in Data-Centric AI. Thus, we conduct a systematic literature review, and systematic mapping study about the different approaches of source code representation and the level of information each representation can hold. Then, based on that, we conduct an experimental study to increase the quality of source code representation by establishing a rich hybrid code representation. Then marry this representation with a Graph Neural Network (GNN)- an ML approach to predict the scalar value of the functional test. Results:  Our results showed that by investing in classical ML approaches, we could predict the performance value of the benchmarks according to configuration workload. Moreover, with our proposed method, the developers can easily determine the impact of each workload on the performance measurement. On the other hand, by increasing the data quality through data-centric AI, we achieve very high and considerable accuracy in predicting the absolute execution time of software performance only according to the structure of the source code.

  CLICK HERE TO DOWNLOAD THE WHOLE DISSERTATION. (in PDF format)