Anomaly Detection in Streaming Time Series Data Using Active Learning and Metalearning

Detta är en Uppsats för yrkesexamina på avancerad nivå från Lunds universitet/Institutionen för reglerteknik

Författare: Jonas Lundgren; [2020]

Nyckelord: Technology and Engineering;

Sammanfattning: In this thesis a framework for finding anomalies in streaming data is proposed. The framework proposed is not necessarily applicable only to problems in anomaly detection, but could be applied to other problems as well. There are three main concepts at play in the framework: (i) Active Learning, a learning algorithm which can query a human specialist for labels of instance such that the model can improve, from an otherwise unlabeled data set. (ii) Ensemble which is a combination of models, often weaker models, where the idea is that the combined result from all models will mitigate the error in every single model and thus provide better results. (iii) Metalearning which is the concept of having a second model learn model characteristics for a problem. In this thesis metalearning will be used to weight ensemble members. The framework is displayed in Figure 0.1. The meta learner takes instances as input and output weights for each ensemble member according to its performance of previous similar instances. Thus the total output is a dynamically weighted ensemble output where the weighting is based on the input. When a human expert provides label feedback on misclassified instances only the meta learner is updated in order to provide new weights for the ensemble to suppress the error and not the entire ensemble. We want to leverage the fact that different ensemble members have different characteristics which makes them more or less suitable to make predictions for certain instances. We weight the ensemble members using a neural network, taking the instance as input to weight the ensemble members in accordance with their capacity to make a prediction for certain instances. The loss to train the neural network is composed of two parts, the first a supervised part lossAAD, using the labels provided by a human expert, and a second part lossprior which places a uniform prior on the ensemble members. When new labels are provided the meta learner is updated so as not to misclassify any of the labeled instances. The framework was tested on the Yahoo Webscope benchmark dataset consisting of four different types of time series. The proposed framework had an AUC of 0.9088, 0.9787, 0.8998 and 0.8123 for the four datasets corresponding to the second highest AUC for 2 data sets and third highest for the remaining 2 data sets out of the models that were used for comparison.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)