Performance comparison of data mining algorithms for imbalanced and high-dimensional data

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Författare: Daniel Rubio Adeva; [2023]

Nyckelord: Data science; neural network; random forest; support vector machine; imbalanced data; average precision; ROC; Datavetenskap; neuralt nätverk; slumpmässig skog; stödvektormaskin; obalanserad data; medelprecision; ROC;

Sammanfattning: Artificial intelligence techniques, such as artificial neural networks, random forests, or support vector machines, have been used to address a variety of problems in numerous industries. However, in many cases, models have to deal with issues such as imbalanced data or high multi-dimensionality. This thesis implements and compares the performance of support vector machines, random forests, and neural networks for a new bank account fraud detection, a use case defined by imbalanced data and high multi-dimensionality. The neural network achieved both the best AUC-ROC (0.889) and the best average precision (0.192). However, the results of the study indicate that the difference between the models’ performance is not statistically significant to reject the initial hypothesis that assumed equal model performances.

HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)

Performance comparison of data mining algorithms for imbalanced and high-dimensional data

Sökningar just nu

Populära sökningar

Uppsatser med många visningar igår (2024-04-26)