Categorization of songs using spectral clustering

Detta är en Kandidat-uppsats från KTH/Skolan för teknikvetenskap (SCI)

Författare: Felix Darke; Linus Below Blomkvist; [2021]

Nyckelord: ;

Sammanfattning: A direct consequence of the world becoming more digital is that the amount of available data grows, which presents great opportunities for organizations, researchers and institutions alike.However, this places a huge demand on efficient and understandable algorithms for analyzing vast datasets. This project is centered around using one of these algorithms for identifying groups of songs in a public dataset released by Spotify in 2018. This problem is part of a larger problem class, where one wish to assign data into groups, without the preexisting knowledge of what makes the different groups special, or how many different groups there are. This is typically solved using unsupervised machine learning. The overall goal of this project was to use spectral clustering (a specific algorithm in the unsupervised machine learning family) to assign 50 704 songs from the dataset into different categories, where each category would be made up of similar songs. The algorithm rests upon graph theory, and a large emphasis was placed upon actuallyunderstanding the mathematical foundation and motivation behind the method before the actual implementation, which is reflected in the report. The results achieved through applying spectral clustering were one large group consisting of 40 718 songs in combination with 22 smaller groups, all larger than 100 songs, with an average size of 430 songs. The groups found were not examined in depth, but the analysis done hints that certain groups were clearly different from the data as a whole in terms of the musical features. For instance, one groupwere deemed to be 54% more likely to be acoustic than the dataset as a whole. As a conclusion, the largest cluster was deemed to be an artefact of the fact that when a sample of songs listened to on Spotify is taken, the likelihood of these songs mainly being popular songs would be high. This would explain the homogeneity that resulted in the fact that most songs were assigned into the same group, which also resulted in the limited success of spectral clustering for this specific project.  

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)