Unsupervised Anomaly Detection on Multi-Process Event Time Series

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Sammanfattning: Establishing whether the observed data are anomalous or not is an important task that has been widely investigated in literature, and it becomes an even more complex problem if combined with high dimensional representations and multiple sources independently generating the patterns to be analyzed. The work presented in this master thesis employs a data-driven pipeline for the definition of a recurrent auto-encoder architecture to analyze, in an unsupervised fashion, high-dimensional event time-series generated by multiple and variable processes interacting with a system. Facing the above mentioned problem the work investigates whether it is possible or not to use a single model to analyze patterns produced by different sources. The analysis of log files that record events of interaction between users and the radio network infrastructure is employed as realworld case-study for the given problem. The investigation aims to verify the performances of a single machine learning model applied to the learning of multiple patterns developed through time by distinct sources. The work proposes a pipeline, to deal with the complex representation of the data source and the definition and tuning of the anomaly detection model, that is based on no domain-specific knowledge and can thus be adapted to different problem settings. The model has been implemented in four different variants that have been evaluated over both normal and anomalous data, gathered partially from real network cells and partially from the simulation of anomalous behaviours. The empirical results show the applicability of the model for the detection of anomalous sequences and events in the described conditions, with scores reaching above 80% in terms of F1-score, and varying depending on the specific threshold setting. In addition, their deeper interpretation gives insights about the difference between the variants of the model and thus, their limitations and strong points.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)