Time series Forecast of Call volume in Call Centre using Statistical and Machine Learning Methods

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Sammanfattning: Time series is a collection of points gathered at regular intervals. Time series analysis explores the time correlations and tries to model it according to trend and seasonality. One of the most relevant tasks, in time series analysis, is forecasting future values, which is considered fundamental in many real-world scenarios. Nowadays, many companies forecast using hand-written models or naive statistical models. Call centers are the front end of the organization, managing the relationship with the customers. A key challenge for call centers remains the call load forecast and the optimization of the schedule. Call load indicates the number of calls a call center receives. The call load forecast is mostly exploited to schedule the staff. They are interested in the short term forecast to handle the unforeseen and to optimize the staff schedule, and in the long term forecast to hire or assign staff to other tasks. Machine learning has been applied to several fields reporting excellent results, and recently, time series forecasting problems have gained a high-interest thanks to the new recurrent network, named Long-short Term Memory. This thesis has explored the capabilities of machine learning in modeling and forecasting call load time series, characterized by a strong seasonality, both at daily and hourly scale. We compare Seasonal Artificial Neural Network (ANN) and a Long-Short Term Memory (LSTM) models with Seasonal Autoregressive Integrated Moving Average (SARIMA) model, which is one of the most common statistical method utilized by call centers. The primary metric used to evaluate the results is the Normalized Mean Squared Error (NMSE), the secondary is the Symmetric Mean Absolute Percentage Error (SMAPE), utilized to calculate the accuracy of the models. We carried out our experiments on three different datasets provided by the Teleopti. Experimental results have proven SARIMA to be more accurate in forecasting at daily scale across the three datasets. It performs better than the Seasonal ANN and the LSTM with a limited amount of data points. At hourly scale, Seasonal ANN and LSTM outperform SARIMA, showing robustness across a forecasting horizon of 160 points. Finally, SARIMA has shown no correlation between the quality of the model and the number of data points, while both SANN and LSTM improves together with the number of sample

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)