Real-time unsupervised log event anomaly detection in public transportation

Detta är en Uppsats för yrkesexamina på avancerad nivå från Lunds universitet/Institutionen för reglerteknik

Författare: Felicia Segui; Andreas Timürtas; [2022]

Nyckelord: Technology and Engineering;

Sammanfattning: Detecting log data anomalies in real-time is useful since it makes it possible to apply logic that corrects the anomalies when they happen. This project presents a method for detecting public transportation bus event log data anomalies in realtime, without having a labeled data set. Initially, each unique bus trip is represented by the event frequencies, a representation that is not suitable for real-time. With a data set assumed to only contain normal data, an autoencoder, a PCA model and a clustering algorithm label each data point in the frequency domain, as normal or anomalous. The labeled data is split into sequences of events with a rolling window, a representation that is suitable for detecting anomalies in real-time. To separate the anomalous event sequences from the normal event sequences that occur, during the same bus trip as an anomalous event sequence, the event sequences together with their labels are grouped and counted. By comparing the frequency for each event sequence in anomalous trips with the frequency of the corresponding event sequence in normal trips, the sequences that are overrepresented in anomalous trips are detected and receive a final label being normal or anomalous. These labeled sequences are further used in the real-time detector. With the three base labeling models (autoencoder, PCA and clustering algorithm), different combinations of models are created. These models are either created by applying the union or the intersection of all anomalous labeled journeys. This results in 11 different models that are all tested and evaluated. The evaluation is performed by calculating the recall, precision and F1-score of experiments performed with a data set of assumed normal journeys, together with injected simulated anomalies. The evaluation is performed at two places within the method; one after the initial labeling and another after the real-time detector. The results obtained using this evaluation method show that the combination using the autoencoder and the clustering algorithm together through intersection is the best model combination, based on the F1-score calculated after the real-time detection. This combination scores a median recall and precision of 0.89 respectively 0.72, which results in an F1-score of 0.79.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)