A Review of Anomaly Detection Techniques forHeterogeneous Datasets

Detta är en Master-uppsats från KTH/Optimeringslära och systemteori

Sammanfattning: Anomaly detection is a field of study that is closely associated with machine learning and it is the process of finding irregularities in datasets. Developing and maintaining multiple machine learning models for anomaly detection takes time and can be an expensive task. One proposed solution is to combine all datasets and create a single model. This creates a heterogeneous dataset with a wide variation in its distribution, making it difficult to find anomalies in the dataset. The objective of this thesis is then to identify a framework that is suitable for anomaly detection in heterogeneous datasets. A selection of five methods were implemented in this project - 2 supervised learning approaches and 3 unsupervised learning approaches. These models are trained on 3 synthetic datasets that have been designed to be heterogeneous with an imbalance between the classes as anomalies are rare events. The performance of the models are evaluated with the AUC and the F1-score, aswell as observing the Precision-Recall Curve. The results makes it evident that anomaly detection in heterogeneous datasets is a challenging task. The best performing approach was with a random forest model where the class imbalance problem had been solved by generating synthetic samples of the anomaly class by implementing a generative adversarial network.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)