Convolutional Neural Networks: Performance on Imbalanced Data

Detta är en Magister-uppsats från Umeå universitet/Statistik

Författare: Oscar Sallander; [2021]

Nyckelord: ;

Sammanfattning: Imbalanced data is a major problem in machine learning classification, since predictive performance can be hindered when one class occurs more frequently than the others. For example, in medical science, imbalanced data sets are very common. When searching for rare diseases in a population, the healthy proportion can be extremely large in comparison to the proportion with a disease.This raises a problem, because when a model is given only a few example observations of one class and a larger amount of observations of the other, the model tends to be biased towards the majority class. When the label with less occurrences is of great importance, or if both labels must be correctly classified, this creates a problem. In deep learning and image classification, there is a lack of research on how Convolutional Neural Networks perform on imbalanced data compared to other classifiers. The goal of this thesis is to analyze and compare the performance of Convolutional Neural Networks against the k-Nearest-Neighbor algorithm. Performance is evaluated on a data set that is modified with increasingly imbalanced classes. The results show that imbalanced data does have a negative effect on the performance of Convolutional Neural Networks for classifying the minority classes, but to a lesser degree than for the k-Nearest-Neighbor algorithm.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)