Polyphonic Music Instrument Detection on Weakly Labelled Data using Sequence Learning Models

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Sammanfattning: Polyphonic or multiple music instrument detection is a difficult problem compared to detecting single or solo instruments in an audio recording. As music is time series data it be can modelled using sequence learning methods within deep learning. Recently, temporal convolutional networks (TCN) have shown to outperform conventional recurrent neural networks (RNN) on various sequence modelling tasks. Though there have been significant improvements in deep learning methods, data scarcity becomes a problem in training large scale models. Weakly labelled data is an alternative where a clip is annotated for presence or absence of instruments without specifying the times at which an instrument is sounding. This study investigates how TCN model compares to a Long Short-Term Memory (LSTM) model while trained on weakly labelled dataset. The results showed successful training of both models along with generalisation on a separate dataset. The comparison showed that TCN performed better than LSTM, but only marginally. Therefore, from the experiments carried out it could not be explicitly concluded if TCN is convincingly a better choice over LSTM in the context of instrument detection, but definitely a strong alternative.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)