Application failure predictions from neural networks analyzing telemetry data

Detta är en Master-uppsats från Uppsala universitet/Institutionen för informationsteknologi

Författare: Max Rylander; Filip Hultgren; [2021]

Nyckelord: ;

Sammanfattning: ith the revolution of the internet, new applications have emerged in our daily life. People are dependent on services for transportation, bank matters, and communication. Services availability is crucial for their survival and competition against other service providers. Achieving good availability is a challenging task. The latest trend is migrating systems to the cloud. The cloud provides numerous methods to prevent downtimes, such as auto-scaling, continuous deployment, continuous monitoring, and more. However, failures can still occur even though the preemptive techniques fulfill their purpose. Monitoring the system gives insights into the system's actual state, but it is up to the maintainer to interpret these insights. This thesis investigates how machine learning can predict future crashes of Kubernetes pods based on the metrics collected from them. At the start of the project, there was no available data on pod crashes, and the solution was to simulate a 10-tier microservice system in a Kubernetes cluster to create generic data. The project applies two different models, a Random Forest model and a Temporal Convolutional Networksmodel, where the first-mentioned acted as a baseline model. They predict if a failure will occur within a given prediction time window based upon a 15-minutes of data. The project evaluated three different prediction time windows. The five-minute prediction time window resulted in the best foresight based on the models' accuracy. The Random Forest model achieved an accuracy of 73.4 %, while the TCN model achieved an accuracy of 77.7 %. Predictions of the models can act as an early alert of incoming failure, which the system or a maintainer can act upon to improve the availability of its system.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)