Hybrid Extended Isolation Forest : Anomaly Detection for Bird Alarm

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Författare: Viktor Holmér; [2019]

Nyckelord: ;

Sammanfattning: The Isolation Forest algorithm is a random forest based anomaly detection algorithm utilizing isolation to determine anomality of data. The Hybrid Isolation Forest and Extended Isolation Forest algorithms were independently developed to overcome two separate issues with the Isolation Forest algorithm. By combining these algorithms the Hybrid Extended Isolation Forest algorithm was proposed and evaluated with the goal of overcoming both issues at once. Bird Alarm is a system developed by Nordicstation for bird watchers. It allows bird watchers to create reports based on observations of birds in nature. By applying the proposed algorithm to Bird Alarm data administrators can be alerted of erroneous reports. Performance of an algorithm is measured by Receiver Operating Characteristic or Precision-Recall curves. The proposed algorithm is compared to other Isolation Forest based algorithms by measuring the area under these curves for many datasets. Since Bird Alarm is unlabelled, anomalies are defined based on created pseudolabels. In order to maximize the performance, a hyperparameter unique to the Hybrid Isolation Forest and the proposed algorithm is optimized by random search. The effect of hyperparameter choice is investigated. An online detector for Bird Alarm is developed to automatically notify administrators of erroneous reports. The results indicate that the proposed algorithm successfully unifies the Hybrid Isolation Forest and the Extended Isolation Forest. However, it was not conclusively found if the performance increased as it is closely tied to the choice of dataset. The proposed algorithm performed better than other evaluated algorithms for Bird Alarm, leading to its utilization in the online detector. By further evaluating the proposed algorithm on other datasets or by incorporating known anomalies into the anomaly scoring function the algorithm may be improved. Minimal datasets and ensemble sizes might yield insights into the proposed algorithm’s performance potential but is left for future studies.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)