Anomaly Detection in User Authentication Logs using Long Short-Term Memories and Word Embeddings

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Författare: Mikael Forsmark; [2020]

Nyckelord: ;

Sammanfattning: As an increasing amount of sensitive data is being stored at various connected services and platforms, the need for robust user authentication mechanisms that maintain the users’ safety and personal integrity online are growing. Meanwhile, too strict and simplistic user authentication policies may result in a degraded user experience which increases the demand for a sharp user authentication security tool that flags login events as abnormal with high accuracy. This study investigates if Long Short-Term Memories (LSTM) and Word Embeddings can be combined in order to detect abnormal user authentication behavior. Two anomaly detection models, where the first one focuses on detecting abnormal login events while the second one detects abnormal sequences of login events, are proposed and applied in a user authentication log context consisting of 280,063 login attempts. As there are no known anomalies in the user authentication log, the models are trained on the normal login attempt flow while two types of manipulated abnormal log events are inserted into the test data in order to verify the models’ anomaly detection performances. By reconstructing the data containing no anomalies with the trained models and studying the resulting reconstruction errors, the reconstruction errors of the test data are used to find abnormal user authentication events. Finally, the results are compared to baseline models. The results of the two proposed models were varying. When detecting abnormal login sequences, the proposed LSTM and Word Embeddings combination showed promising results with significantly good recall values during the detection of one of the anomaly types as a highlight. When instead attempting to detect abnormal log events, the proposed LSTM andWord Embeddings combination showed poor results where the selection of the reconstruction error-based anomaly threshold was deemed to play a significant part. The examination of the log event attribute combination [User, Country, Login Status] turned out to be the combination that resulted in the best anomaly detection accuracies for both models.  

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)