Noisy recognition of perceptual mid-level features in music

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Författare: Simon Mossmyr; [2021]

Nyckelord: ;

Sammanfattning: Self-training with noisy student is a consistency-based semi-supervised self- training method that achieved state-of-the-art accuracy on ImageNet image classification upon its release. It makes use of data noise and model noise when fitting a model to both labelled data and a large amount of artificially labelled data. In this work, we use self- training with noisy student to fit a VGG- style deep CNN model to a dataset of music piece excerpts labelled with perceptual mid-level features and compare its performance with the benchmark. To achieve this, we experiment with some common data warping augmentations and find that pitch shifting, time stretching, and time translation applied on the excerpt spectrograms can improve the model's invariance. We also apply stochastic depth to the VGG-style model — a method which randomly drops entire layers of a model during training—and find that it too can increase model invariance. This is a novel application since stochastic depth has not been used outside the ResNet architecture to our knowledge. Finally, we apply self-training with noisy student with the aforementioned methods as sources of noise and find that it reduces the mean squared error of the testing subset by an impressive amount, although the overall performance of the model can still be questioned. 

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)