Data-driven quality management using explainable machine learning and adaptive control limits

Abstract: In industrial applications, the objective of statistical quality management is to achieve quality guarantees through the efficient and effective application of statistical methods. Historically, quality management has been characterized by a systematic monitoring of critical quality characteristics, accompanied by manual and experience-based root cause analysis in case of an observed decline in quality. Machine learning researchers have suggested that recent improvements in digitization, including sensor technology, computational power, and algorithmic developments, should enable more systematic approaches to root cause analysis.In this thesis, we explore the potential of data-driven approaches to quality management. This exploration is performed with consideration to an envisioned end product which consists of an automated data collection and curation system, a predictive and explanatory model trained on historical process and quality data, and an automated alarm system that predicts a decline in quality and suggests worthwhile interventions. The research questions investigated in this thesis relate to which statistical methods are relevant for the implementation of the product, how their reliability can be assessed, and whether there are knowledge gaps that prevent this implementation.This thesis consists of four papers: In Paper I, we simulated various types of process-like data in order to investigate how several dataset properties affect the choice of methods for quality prediction. These properties include the number of predictors, their distribution and correlation structure, and their relationships with the response. In Paper II, we reused the simulation method from Paper I to simulate multiple types of datasets, and used them to compare local explanation methods by evaluating them against a ground truth.In Paper III, we outlined a framework for an automated process adjustment system based on a predictive and explanatory model trained on historical data. Next, given a relative cost between reduced quality and process adjustments, we described a method for searching for a worthwhile adjustment policy. Several simulation experiments were performed to demonstrate how to evaluate such a policy.In Paper IV, we described three ways to evaluate local explanation methods on real-world data, where no ground truth is available for comparison. Additionally, we described four methods for decorrelation and dimension reduction, and describe the respective tradeoffs. These methods were evaluated on real-world process and quality data from the paint shop of the Volvo Trucks cab factory in Umeå, Sweden.During the work on this thesis, two significant knowledge gaps were identified: The first gap is a lack of best practices for data collection and quality control, preprocessing, and model selection. The other gap is that although there are many promising leads for how to explain the predictions of machine learning models, there is still an absence of generally accepted definitions for what constitutes an explanation, and a lack of methods for evaluating the reliability of such explanations.