An exploration of the current state-of-the-art in automatic music transcription - with proposed improvements using machine learning

Detta är en Master-uppsats från Lunds universitet/Matematisk statistik

Sammanfattning: The research field of automatic music transcription has vastly grown during the 21st century, where the goal is to transcribe a polyphonic music signal into annotated sheet music. Within this field, the subproblem of fundamental frequency estimation in a piece of music is a difficult problem, e.g., due to dissimilar structures in signals from different instruments playing the same note. This becomes further convoluted in a polyphonic signal consisting of several notes, where the harmonic overtones of the notes interact. To solve this and other issues, machine learning techniques have furthered the research in music transcription, which is the main focus of this thesis. This is undertaken by comparing the best performing fundamental frequency estimators from recent years, mainly from MIREX competitions from 2015-2017. These are recreated and evaluated on a customized test set consisting of MIDI files of various instruments. The evaluation consists both of typical music transcription measures such as precision, recall and accuracy, but also by deeper analysis in order to find the large-scale structural biases. The evaluation of the tests herein shows that the best performing models are THK1 and CT1 from MIREX 2017 which are based on CNN. This work has identified some structural errors in these methods pointing out potential for further improvements. In addition, a novel approach of applying complex-valued neural networks in music transcription is also examined, by modifying research in an existing deep complex neural network model. The proposed and improved model finishes on third place in the evaluation, indicating that complex neural networks may develop the research area of music transcription even further.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)