Check me out im a 15 year old rapper
Sammanfattning: Soundcloud is the worlds third largest streaming platform for music. Despite this, the website is riddled with spam comments that disturb user experience. This spam diers much in format and content from the email spam that we are used to seeing, and that most classic spam lters have been developed to detect. On soundcloud, 90% of spam comments are written by users wanting to self-promote their own music and contain plenty of slang and spelling errors. Our aim is to develop a machine learning model that can predict spam with a high accuracy for comments on this internet platform. We take several steps to process the data, such as removing stopwords and emojis, in order to facilitate analysis. We use python packages of the Natural Language Processing toolkit in order to process the data and train our algorithms. We test a Naïve Bayes algorithm and two different Support Vector Machines (RBF and Linear kernel) to see which one performs best. Our results show that all three algorithms are still highly effective even when presented with the challenge of detecting spam in this different kind of data; The Support Vector Machine with RBF kernel performs best, predicting 97,3% of comments correctly in our test, but is slower to train than the Naïve Bayes and with only a 2% difference in predictive precision.
HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)