Prediction of Code Lifetime

Detta är en Master-uppsats från Linköpings universitet/Statistik; Linköpings universitet/Tekniska fakulteten

Sammanfattning: There are several previous studies in which machine learning algorithms are used to predict how fault-prone a piece of code is. This thesis takes on a slightly different approach by attempting to predict how long a piece of code will remain unmodified after being written (its “lifetime”). This is based on the hypothesis that frequently modified code is more likely to contain weaknesses, which may make lifetime predictions useful for code evaluation purposes. In this thesis, the predictions are made with machine learning algorithms which are trained on open source code examples from GitHub. Two different machine learning algorithms are used: the multilayer perceptron and the support vector machine. A piece of code is described by three groups of features: code contents, code properties obtained from static code analysis, and metadata from the version control system Git. In a series of experiments it is shown that the support vector machine is the best performing algorithm and that all three feature groups are useful for predicting lifetime. Both the multilayer perceptron and the support vector machine outperform a baseline prediction which always outputs the mean lifetime of the training set. This indicates that lifetime to some extent can be predicted based on information extracted from the code. However, lifetime prediction performance is shown to be highly dataset dependent with large error magnitudes.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)