Supervised Failure Diagnosis of Clustered Logs from Microservice Tests

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Sammanfattning: Pinpointing the source of a software failure based on log files can be a time consuming process. Automated log analysis tools are meant to streamline such processes, and can be used for tasks like failure diagnosis. This thesis evaluates three supervised models for failure diagnosis of clustered log data. The goal of the thesis is to compare the performance of the models on industry data, as a way to investigate whether the chosen ML techniques are suitable in the context of automated log analysis. A Random Forest, an SVM and an MLP are generated from a dataset of 194 failed executions of tests on microservices, that each resulted in a large collection of logs. The models are tuned with random search and compared in terms of precision, recall, F1-score, hold-out accuracy and 5-fold cross-validation accuracy. The hold-out accuracy is calculated as a mean from 50 hold-out data splits, and the cross-validation accuracy is computed separately from a single set of folds. The results show that the Random Forest scores highest in terms of mean hold-out accuracy (90%), compared to the SVM (86%) and the Neural Network (85%). The mean cross-validation accuracy is the highest for the SVM (95%), closely followed by the Random Forest (94%), and lastly the Neural Network (85%). The precision, recall and F1-score are stable and consistent with the hold-out results, although the precision results are slightly higher than the other two measures. According to this evaluation, the Random Forest has the overall highest performance on the dataset when considering the hold-out- and cross-validation accuracies, and also the fact that it has the lowest complexity and thus the shortest training time, compared to the other considered solutions. All in all, the results of the thesis demonstrate that supervised learning is a promising approach to automatize log analysis.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)