Modification of the RusBoost algorithm : A comparison of classifiers on imbalanced data

Detta är en Magister-uppsats från Umeå universitet/Statistik

Författare: Isak Forslund; [2022]

Nyckelord: ;

Sammanfattning: In many situations data is imbalanced, meaning the proportion of one class is larger than the other(s). Standard classifiers often produce undesirable results when the data is imbalanced and different methods have been developed in the attempt to improve classification under such conditions. Examples of this are the algorithms AdaBoost, RusBoost, and SmoteBoost which modifies the cost for misclassified observations, and the latter two also reduce the class imbalances when training the classifier. This thesis presents a new method, Modified RusBoost, where the RusBoost algorithm is modified in a way such that observations that are harder to classify correctly are assigned a lower probability of being removed in the under-sampling process. Comparisons were made between the performance of this method, AdaBoost, RusBoost, and SmoteBoost on imbalanced data. Also, how imbalances affect the different classifiers were investigated. The performance of these methods were compared on 20 real data sets. Overall, Modified RusBoost performed better or comparable to the other methods. Indicating that this algorithm can be a good alternative when classifying imbalanced data. Also, results showed that an increase of ρ, a ratio of majority over minority observations in a data set, has a negative impact on performance of the algorithms. However, this negative impact of ρ affects the performance of all methods similarly.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)