Machine Learning for Software Bug Categorization

Detta är en Uppsats för yrkesexamina på avancerad nivå från Uppsala universitet/Institutionen för informationsteknologi

Författare: Arman Vatandoust; [2019]

Nyckelord: ;

Sammanfattning: The pursuit of flawless software is often an exhausting task for software developers. Code defects can range from soft issues to hard issues that lead to unforgiving consequences. DICE have their own system which automatically collects these defects which are grouped into buckets, however, this system suffers from the flaw of sometimes incorrectly grouping unrelated issues, and missing apparent duplicates. This time-consuming flaw puts excessive work for software developers and leads to wasted resources in the company. These flaws also impact the data quality of the system's defects tracking datasets which turn into a never-ending vicious circle. In this thesis, we investigate the method of measuring the similarity between reports in order to reduce incorrectly grouped issues and duplicate reports. Prototype models have been built for bug categorization and bucketing using convolutional neural networks. For each report, the prototype is able to provide developers with candidates of related issues with likelihood metric whether the issues are related. The similarity measurement is made in the representation phase of the neural networks, which we call the latent space. We also use Kullback–Leibler divergence in this space in order to get better similarity metrics. The results show important findings and insights for further improvement in the future. In addition to this, we discuss methods and strategies for detecting outliers using Mahalanobis distance in order to prevent incorrectly grouped reports.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)