Machine Learning for Sparse Time-Series Classification - An Application in Smart Metering
Sammanfattning: Smart Meters are measuring devices collecting labeled time series data of utility consumptions from sub-meters and are capable of automatically transmit-ting this between the customer and utility companies together with other companies that offer services such as monitoring of consumption and cleaning of data. The smart meters are in some cases experiencing communication errors. One such error occurs when the information about what the utility sub-meters are measuring is lost. This information is important for when the producers of the utility are billing the customers for their usage. The information has had to be collected manually, something which is inefficient in terms of time and money. In this thesis a method for classifying the meters based on their raw time series data is investigated. The data used in the thesis comes from Metry AB and contains thousands of time series in five different classes. The task is complicated by the fact that the data has a high class imbalance, contains many missing values and that the time series vary substantially in length. The proposed method is based on partitioning the time series into slices of equal size and training a Deep Neural Network (DNN) together with a Bayesian Neural Network (BNN) to classify the slices. Prediction on new time series is performed by the prediction of individual slices for that time series followed by a voting procedure. The method is justified through a set of assumptions about the underlying stochastic process generating the time series coupled with an analysis based on the multinomial distribution. The results indicate that the models tend to perform worse on the samples coming from the classes ”water” and ”hot water” and that the worst performance is on the ”hot water”-class. On all the classes the models achieve accuracies of around 60%, by excluding the ”hot water” class it is possible to achieve accuracies of at least 70% on the data set. The models perform worse on time series that contain a few number of good quality slices, by considering only time series which has many good quality slices, accuracies of 70% are achieved for all classes and above 80% when excluding ”Hot Water”. It is concluded that in order to further improve the classification performance, more data is needed. Drawbacks with the method are the increased number of hyper-parameters involved in the extraction of slices. However, the voting method seems promising enough to investigate further on more highly sparse data sets.
HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)