Time Series Forecasting on Database Storage

Detta är en Kandidat-uppsats från Linnéuniversitetet/Institutionen för datavetenskap och medieteknik (DM)

Sammanfattning: Time Series Forecasting has become vital in various industries ranging from weather forecasting to business forecasting. There is a need to research database storage solutions for companies in order to optimize resource allocation, enhance decision making process and enable predictive data storage maintenance. With the introduction of Artificial Intelligence and a branch of AI, Machine Learning, Time Series Forecasting has become more powerful and efficient. This project attempts to validate the possibility of using time series forecasting on database storage data to make business predictions. Currently, predicting capabilities of database storage is an area which is not fully explored, despite the growing necessity of databases. Currently, most of the optimization of databases is left to human touch which is ultimately slower and more error prone. As such, this research will investigate the possibilities of time series forecasting in database storage. This project will use Machine Learning and Time-series Forecasting to predict the future trend of database storage to give information on how the trend of the data will change. Examining the pattern of database storage fluctuations will allow the respective owners an overview of their storage and in turn, make decisions on optimizing the database to prevent critical problems ahead of time. Three distinct approaches - employing a traditional linear model fore forecasting, utilizing a Convolutional Neural Network (CNN) to detect local changes in time series data, and leveraging a Recurrent Neural Network (RNN) to capture long term temporal dependencies - are implemented to assess which of these techniques is better suited for the provided dataset. Furthermore, two settings (single step and multi step) have been tested in order to test the changes in accuracy from a small prediction step to a major. The research indicates that currently the models do not have the possibility to be used. This is due to the mean absolute error being very big. The main purpose of the project was to establish which of the three different techniques is the best for the particular dataset provided by the company. In general, across all approaches (Linear, CNN, RNN), their performance was superior in the single step method. In the multi step aspect, The linear model suffered the greatest in the accuracy drop with CNN and RNN performing slightly better. The findings also indicated that the model with local change detection (CNN) performs better for the provided dataset in both single and multi step settings, as evidenced by its minimal Mean Absolute Error (MAE). This is because the dataset is comprised of local data and the models are only trained to check for normal changes. If the research had also checked for seasonality or sequential patterns, then it is possible that LSTM may have had a better outcome due to its capability of capturing those dependencies. The accuracy of single step forecasting using CNN is good (MAE = 0.25) but must be further explored and improved.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)