Classification of explicit music content using lyrics and music metadata

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Sammanfattning: In a world where online information is growing rapidly, the need for more efficient methods to search for and create music collections is larger than ever. Looking at the most recent trends, the application of machine learning to automate different categorization problems such as genre and mood classification has shown promising results. In this thesis we investigate the problem of classifying explicit music content using machine learning. Different data sets containing lyrics and music metadata, vectorization methods and algorithms including Support Vector Machine, Random Forest, k-Nearest Neighbor and Multinomial Naive Bayes are combined to create 32 different configurations. The configurations are then evaluated using precision-recall curves. The investigation shows that the configuration with the lyric data set together with TF-IDF vectorization and Random Forest as algorithm outperforms all other configurations.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)