A comparison of machine learning algorithms in their ability to predict pancreatic cancer

Detta är en Kandidat-uppsats från KTH/Datavetenskap

Författare: Arvid Eriksson; Jonathan Erikson; [2022]

Nyckelord: ;

Sammanfattning: Pancreatic cancer is an uncommon but lethal disease which has no obvious biomarkers for its early stages. Machine learning has been used in order to predict the disease with limited success. Survey data has been of special interest due to its great size and accessibility. However only select machine learning algorithms, especially neural networks, have been successfully applied on this type of data. Multiple different machine learning algorithms were tested and compared in this study in order to find algorithms that also could perform well or even better than known well-performing algorithms. Health survey data from two different survey studies, NHIS and PLCO, were combined into a dataset with 22 features of 2 216 867 samples with 1 031 patients diagnosed with pancreatic cancer. A logistic regressor, a neural network, a decision tree, and a support vector machine were trained on this dataset using cross validation and then evaluated on a test partition. It was found that neural networks are in most use cases superior. Logistic regression can, however, have similar performance to neural networks when applied to survey data and can thus be a simpler alternative. The decision tree achieved similar results to the neural network in some metrics but lacked performance in precision. The support vector machine was shown to have worse performance than the aforementioned ones. This could be a result of the inability to train the support vector machine on the whole dataset due to its detrimental performance on larger datasets.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)