Sequence Models for Speech and Music Detection in Radio Broadcast

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Författare: Quentin Lemaire; [2019]

Nyckelord: ;

Sammanfattning: Speech and Music detection is an important meta-data extraction step for radio broadcasters. It provides them with a good time-stamping of the audio, including parts where speech and music overlap. This task has important applications in royalty collection in broadcast audio for instance, which is the use case for this particular study. The study is focused on deep neural network architectures made to process sequential data such as recurrent neural networks or convolutional architectures for sequential learning. Different architectures that have not yet been applied for this task are evaluated and compared with a state-of-the-art architecture (Bidirectional Long Short-Term Memory). Moreover, different strategies to take advantage of both low and high-quality datasets are evaluated. The study shows that Temporal Convolution Network (TCN) architectures can outperform state-of-the-art architectures, and that especially non-causal TCNs lead to a significant improvement in the accuracy. The code used for this study has been made available on GitHub.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)