Detecting diabetes with Machine learning : A study of Naive Bayes and Decision Tree

Detta är en Kandidat-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Författare: Olivia Stigeborn; Frida Wallberg; [2020]

Nyckelord: ;

Sammanfattning: Diabetes is a disease that affects a large part of our society and comes with a multitude of health issues. Many of these can be improved by dietary and lifestyle changes if found early enough. What if it was possible to detect if someone has diabetes using a few simple measurements. A simple detection method could lead to earlier detection of the disease, giving more people the possibility of lifestyle changes to improve their health, and therefore reducing the risk of secondary diseases. The objective of this study is to research the use of machine learning in detecting diabetes. Two well-known machine learning algorithms, Decision tree and Naïve Bayes, have been implemented to see which one is preferable when detecting diabetes. The algorithms were compared using their respective confusion matrix, where values such as accuracy, precision and recall were derived. The study found that Naïve Bayes is the best option when detecting diabetes using the dataset used in this study. Naïve Bayes classifies 80% correctly while Decision Tree classifies correctly to 78.355% this was shown to not be statistically significant but considering when the algorithms performs well still points to said conclusion. It is evident that Decision Tree perform better when making negative classifications, it is therefore better when predicting that a patient does not have diabetes. Naïve Bayes on the other hand performs better when predicting that a patient has diabetes which is arguably more important health-wise for the patients since fewer patients that has diabetes are missed. This gave the conclusion that Naïve Bayes is the preferred algorithm.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)