Predicting Short-term Absences of a Railway Crew using Historical Data

Detta är en Master-uppsats från KTH/Matematisk statistik

Sammanfattning: Transportation via train is considered the most environmentally friendly way of traveling and is widely seen as the future of transportation. Canceled and delayed trains worsen customer satisfaction; thus, punctual trains are crucial for railway companies. One reason for canceled and delayed trains is the shortage of employees due to sickness or care of relatives, known as short-term absences. Therefore, it is important for railway companies to have reliable predictions of these. This thesis is in collaboration with SJ, the largest railway company in Sweden which offers trips all over Sweden and some other parts of northern Europe. The thesis predicts short-term absences with data provided by SJ, by using the machine learning methods random forest and extreme gradient boosting (XGBoost). The aim is to investigate if SJ can use machine learning algorithms and statistical analysis in their absence predictions and if it can yield better results than their current absence prediction methodology. Furthermore, the thesis identifies which factors are most important for the predictions. In addition to this, quantile regression is implemented for both methods since overestimating absenteeism could be better for avoiding employee shortage.  Two different datasets are used for two different tasks; one regression task to predict the number of absent employees on each date and one classification task to predict the probability of an absent employee on a specific duty, and then adding the probabilities to achieve the total predicted number of absent employees on each date. Both task formulations yielded good absence prediction results. XGBoost resulted overall in lower errors than random forest, meaning it was a slightly better model to implement for this task. When comparing the results, the performance for the developed models was better than the current predictions at SJ, meaning machine learning models could benefit SJ's prediction work.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)