Efficient Music Thumbnailing for Genre Classification

Detta är en Master-uppsats från KTH/Matematisk statistik

Sammanfattning: For music genre classification purposes, the importance of an intelligent and content-based selection of audio samples has been mostly overlooked. One common approach toward representative results is to select samples at predetermined locations. This is done to avoid analysis of the full audio during classification. While methods in music thumbnailing could be used to find representative samples for genre classification, it has not yet been demonstrated. This thesis showed that efficient and genre representative sampling can be performed with a machine learning model (bidirectional RNN with either LSTM or GRU cells). The model was trained using a sub-optimal genre classifier and computationally inexpensive audio features. The genre classifier was used to compute losses for evenly spaced samples in 14000 tracks. The losses were then used as targets during training. Root mean square energy and zero-crossing rate were used as features, computed over relatively large time steps and wide intervals. The proposed framework can be used to give better predictions with trained genre classifiers and most likely also train, or retrain, them for higher classification accuracy at a low computational cost.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)