Data-driven decision support for product change management : Making explainable classifications of product change requests at Scania using machine learning methods

Detta är en Uppsats för yrkesexamina på avancerad nivå från Umeå universitet/Institutionen för matematik och matematisk statistik

Sammanfattning: Decision making is a big part of our day-to-day lives, both personal and professional. A good decision support can provide a decision process with high quality, efficiency and consistency. In recent years, machine learning has shown outstanding capacity for making complex processes understandable and provide decision support. But what good is this decision support if it is not trusted? Our work tries to improve the usage of machine learning models by making their results more understandable and trustworthy. In this thesis, we investigate the decisions in the Product Development (PD) process at Scania. Two important steps in the PD process is to prioritize a Product Change Request (PCR) and decide if it should be realized or not. Our main objective is to build machine learning models that can be incorporated in this process and help with the decision making. In order to choose the most suitable model, different machine learning models are trained on historical data. The model with the best performance is chosen and can be used to make predictions on new PCRs. The model that performed best when deciding the priority of a given PCR was Extreme Gradient Boosting (XGB), which achieved a F1 score of 46.6% and an accuracy of 48.0%. However, we found that the data was not suitable for making classifications regarding the priorities. The model that performed the best when deciding if a PCR should be realized or not was the random forest, which achieved a F1 score of 67.4% and an accuracy of 79.4%. We found that better classifications could be made regarding if a PCR should be realized or not when additional data was added to the model, and we therefore recommend changes to the collection and storage of data. The random forest achieved a F1 score of 73.5% and an accuracy of 83.8% with the additional data from attachments. We also explain and visualize how the random forest makes its classification and how each feature from the PCRs affect the classification. This is important in order to improve the trust in the decision support provided by the model. 

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)