Applied Machine Learning in Steel Process Engineering : Using Supervised Machine Learning Models to Predict the Electrical Energy Consumption of Electric Arc Furnaces

Abstract: The steel industry is in constant need of improving its production processes. This is partly due to increasing competition and partly due to environmental concerns. One commonly used method for improving these processes is through the act of modeling. Models are representations of the reality that can be used to study and test new processes and strategies without costly interventions. In recent years, Machine Learning (ML) has emerged as a promising modeling approach for the steel industry. This has partly been driven by the Industry 4.0 development, which highlights ML as one of the key technologies for its realization. However, these models are often difficult to interpret, which makes it impractical to validate if the model accurately represents reality. This can lead to a lack of trust in ML models by domain practitioners in the steel industry. Thus, the present work investigates the practical usefulness of ML models in the context steel process engineering. The chosen application to answer this research question is the prediction of the Electrical Energy (EE) consumption of Electric Arc Furnaces (EAF). The EAF process was chosen due to its widespread use in the steel industry and due to the difficulty to accurately model the EE consumption using physical modeling. In the present literature, the use of linear statistical models are commonly used even though the EE consumption is non-linearly dependant on multiple important EAF process variables. In addition, the literature does neither investigate the correlations between input variables nor attempts to find the most optimal model with respect to model complexity, predictive performance, stability, and generalizability. Furthermore, a consistent reporting of predictive performance metrics and interpreting the non-transparent models is lacking. These shortcomings motivated the development of a Model Construction methodology and a Model Evaluation methodology that eliminate these shortcomings by considering both the domain-specific (metallurgical) aspects as well as the challenges imposed by ML modeling. By using the developed methodologies, several important findings originated from the resulting ML models predicting the EE consumption of two disparate EAF. A high model complexity, governed by an elevated number of input variables and model coefficients, is not necessary to achieve a state-of-the-art predictive performance on test data. This was confirmed both by the extensive number of produced models and by the comparison of the selected models with the models reported in the literature. To improve the predictive performance of the models, the main focus should instead be on data quality improvements. Experts in both process metallurgy and the specific process under study must be utilized when developing practically useful ML models. They support both in the selection of input variables and in the evaluation of the contribution of the input variables on the EE consumption prediction in relation to established physico-chemical laws and experiences with the specific EAF under study. In addition, a data cleaning strategy performed by an expert at one of the two EAF resulted in the best performing model. The scrap melting process in the EAF is complex and therefore challenging to accurately model using physico-chemical modeling. Using ML modeling, it was demonstrated that a scrap categorization based on the surface-area-to-volume ratio of scrap produced ML models with the highest predictive performance. This agrees well with the physico-chemical phenomena that governs the melting of scrap; temperature gradients, alloying gradients, stirring velocity, and the freezing effect. Multiple different practical use cases of ML models were exemplified in the present work, since the model evaluation methodology demonstrated the possibility to reveal the true contributions by each input variable on the EE consumption. The most prominent example was the analysis of the contribution by various scrap categories on the EE consumption. Three of these scrap categories were confirmed by the steel plant engineers to be accurately interpreted by the model. However, to be able to draw specific conclusions, a higher model predictive performance is required. This can only be realized after significant data quality improvements. Lastly, the developed methodology is not limited to the case used in the present work. It can be used to develop supervised ML models for other processes in the steel industry. This is valuable for the steel industry moving forward in the Industry 4.0 development.