Mapping Java Source Code To Architectural Concerns Through Machine Learning

Detta är en Kandidat-uppsats från Karlstads universitet/Institutionen för matematik och datavetenskap (from 2013)

Sammanfattning: The explosive growth of software systems with both size and complexity results in the recognised need of techniques to combat architectural degradation. Reflexion Modelling is a method commonly used for Software Architectural Consistency Checking (SACC). However, the steps needed to utilise the method involve manual mapping, which could become tedious depending on the system's size. Recently, machine learning has been showing promising results outperforming other approaches. However, neither a comparison of different classifiers nor a comprehensive investigation of how to best pre-process source code has yet been performed. This thesis compares different classifier and their performance to the manual effort needed to train them and how different pre-processing settings affect their accuracy. The study can be divided into two areas: pre-processing and how large the manual mapping should be to achieve satisfactory performance. Across the three software systems used in this study, the overall best performing model, MaxEnt, achieved the following average results, accuracy 0.88, weighted precision 0.89 and weighted recall 0.88. SVM performed almost identically to MaxEnt. Furthermore, the results show that Naive-Bayes, the algorithm in recent related work approaches, performs worse than SVM and MaxEnt. The results yielded that the pre-processing that extracts packages and libraries, together with the feature representation method Bag-of-Words had the best performance. Furthermore, it was found that manual mapping of a minimum of ten files per concern is needed for satisfactory performance. The research results represent a further step towards automating code-to-architecture mappings, as required in reflexion modelling and similar techniques.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)