Towards Automated Log Message Embeddings for Anomaly Detection

Detta är en Uppsats för yrkesexamina på avancerad nivå från Lunds universitet/Institutionen för reglerteknik

Författare: Adrian Murphy; Daniel Larsson; [2024]

Nyckelord: Technology and Engineering;

Sammanfattning: Log messages are implemented by developers to record important runtime information about a system. For that reason, system logs can provide insight into the state and health of a system and potentially be used to anticipate and discover errors. Manually inspecting these logs becomes impractical due to the high volume of messages generated by modern systems. Consequently, the research field of machine learning-based log anomaly detection has emerged to automatically identify irregularities. Parsing log messages into a structured, tractable format is a vital step in log anomaly detection. This degree project investigates the application of log message embeddings, a recently proposed log parsing method, for anomaly detection in complex IT systems and measures their resilience to concept drift, where the format of log messages changes over time, in comparison with a traditional parsing approach. Empirical analyses are conducted on two benchmark datasets, revealing that log message embeddings not only achieve anomaly detection results on par with traditional methods but also demonstrate considerable robustness against concept drift. A key focus of this project is on the application of large language models to automate the log embedding pipeline by handling out-of-vocabulary words and extracting synonymous and antonymous word relationships. These capabilities are important for distinguishing log messages that are identical except for one or more synonymous or antonymous word pairs. While large language models show promise in these tasks, experiments highlight the need for further refinement to match the performance achieved through manual operator feedback.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)