Interweaving AutoML and Data Science Method

Detta är en Kandidat-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Författare: Edvard Aldor; Daniel Helle; [2021]

Nyckelord: ;

Sammanfattning: The advent of automated machine learning (AutoML) tools promises to democratize machine learning to non-experts in the field. These tools certainly effect the process of applying machine learning for non-experts, but it is not clear how data scientists will be able to draw benefit from having access to automated machine learning systems. This bachelor thesis is an inquiry into how an application of one such AutoML system, Google Cloud AutoML, can improve upon commercial state of the art ensemble models. Specifically, inferences about unimportant features were examined and removed from an ensemble model, weighted between eXtreme Gradient Boosting and Random Forest Regressor, based on a feature importance graph produced by Google Cloud AutoML tables. The results showed a positive impact overall on the error rates of mean absolute error and root mean square error. While it was not of statistical significance, the method for feature selection showed a trend of increasing its performance on the error metrics. While the method was not proved to be statistically significant it nonetheless showed promise as a technique for reducing model complexity by decreasing dimensionality. 

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)