Content-based music recommendation system : A comparison of supervised Machine Learning models and music features

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Författare: Marine Chemeque Rabel; [2020]

Nyckelord: ;

Sammanfattning: As streaming platforms have become more and more popular in recent years and music consumption has increased, music recommendation has become an increasingly relevant issue. Music applications are attempting to improve their recommendation systems in order to offer their users the best possible listening experience and keep them on their platform. For this purpose, two main models have emerged, collaborative filtering and content-based model. In the former, recommendations are based on similarity computations between users and their musical tastes. The main issue with this method is called cold start, it describes the fact that the system will not perform well on new items, whether music or users. In the latter, it is a matter of extracting information from the music itself in order to recommend a similar one. It is the second method that has been implemented in this thesis. The state of the art of content-based methods reveals that the features that can be ex- tracted are numerous. Indeed, there are low level features that can be temporal (zero crossing rate), spectral (spectral decrease), or even perceptual (loudness) that require knowledge of physics and signal processing. There are middle level features that can be understood by musical experts (rhythm, pitch, ...). Finally, there are higher level features, understandable by all (mood, danceability, ...). It should be underlined that the models identified during the paper readings step are also abundant. Using the two datasets GTZAN and FMA, we will aim to first find the best model by focusing only on supervised models as well as their hyperparameters to achieve a relevant recommendation. On the other hand it is also necessary to determine the best subset of features to characterise the music while avoiding redundant and parasitic information. One of the main challenges is to find a way to assess the performance of our system.  

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)