Forecasting post COVID-19 : How to improve forecasting models’ performance when training data has been aected by exceptional events like COVID-19 pandemic?

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Sammanfattning: Almost every company around the world were aected by the COVID-19 crisis and the government measures that were taken to slow the spread of the virus. The impact the crisis had on the economy caused the appearance of anomalies in the data collected by companies : such as abnormal trend, seasonality etc. Traditional methods of forecasting were then questioned when trying to predict business indicators such as sales in a post COVID19 world, as we saw performance like forecast accuracy decreased. So how can data scientists increase the performance of their forecasting models in a post COVID-19 world knowing that the training data contains COVID-19, an event never observed before? What methods can be used to overcome this problem? The goal of this project was to provide a guideline for dealing with COVID-19 data points for forecasters. We first dedicated this thesis to data analysis and finding a clear methodology to better understand and quantify the impact of COVID-19 crisis on business indicators. Then, we compared multiple methods to overcome the forecasting issues that are faced when training datasets influenced by the phenomenon of COVID-19 and improved forecast accuracy and reduce bias. Each method had its pros and cons. Among the methods changing the training data, imputation is the easiest method and can give very good results. Multiplicative coecients also can be used, and give also good results. Finally, optimal transport was tested as an alternative to the two first methods. This method changes less the original the time series compared to imputation. Among methods consisting in adding external features to the model, a boolean feature is the most simple way to flag a COVID-19 period and works surprisingly well. Adding more complex features describing COVID-19 impact on the time series is challenging since we need to find a feature that describes well the phenomenon and be able to use another model to predict its future values if we want to use it for our first model. Adding Google mobility features to the model as external regressors seem to increase the most forecast accuracy, but its performance depends on how well we can estimate their future values. This applies also to stringency index, but predicting stringency index future values is even harder as we are trying to estimate government measures. However, with the Stringency index we can simulate scenarios if we make a hypothesis on future government measures: we can estimate COVID-19 impact on the time series in a worst case scenario with lockdowns by setting the Stringency index high for instance.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)