Improving accuracy of speech recognition for low resource accents : Testing the performance of fine-tuned Wav2vec2 models on accented Swedish

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Sammanfattning: While the field of speech recognition has recently advanced quickly, even the highest performing models struggle with accents. There are several methods of improving the performance on accents, but many are hard to implement or need high amounts of data and are therefore costly to implement. Therefore, examining the performance of the Wav2vec2 architecture, which previously has performed well on small amounts of labeled data, becomes relevant. Using a model trained in Swedish, this thesis fine-tunes the model on small datasets of three Swedish accents, to create both accent-dependent specialized models as well as an accent-independent general model. The specialized models perform better than the original model, and the general model performs approximately as well as each specialized model without sacrificing performance on non-accented Swedish. This means that the Wav2vec2 framework offers a low cost method of improving speech recognition that can be used to improve private and public services for larger parts of the population.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)