Ensemble approach to code smell identification : Evaluating ensemble machine learning techniques to identify code smells within a software system

Detta är en Master-uppsats från Jönköping University/JTH, Datateknik och informatik

Sammanfattning: The need for automated methods for identifying refactoring items is prelevent in many software projects today. Symptoms of refactoring needs is the concept of code smells within a software system. Recent studies have used single model machine learning to combat this issue. This study aims to test the possibility of improving machine learning code smell detection using ensemble methods. Therefore identifying the strongest ensemble model in the context of code smells and the relative sensitivity of the strongest perfoming ensemble identified. The ensemble models performance was studied by performing experiments using WekaNose to create datasets of code smells and Weka to train and test the models on the dataset. The datasets created was based on Qualitas Corpus curated java project. Each tested ensemble method was then compared to all the other ensembles, using f-measure, accuracy and AUC ROC scores. The tested ensemble methods were stacking, voting, bagging and boosting. The models to implement the ensemble methods with were models that previous studies had identified as strongest performer for code smell identification. The models where Jrip, J48, Naive Bayes and SMO. The findings showed, that compared to previous studies, bagging J48 improved results by 0.5%. And that the nominally implemented baggin of J48 in Weka follows best practices and the model where impacted negatively. However, due to the complexity of stacking and voting ensembles further work is needed regarding stacking and voting ensemble models in the context of code smell identification.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)