LaMOSNet: Latent Mean-Opinion-Score Network for Non-intrusive Speech Quality Assessment : Deep Neural Network for MOS Prediction

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Sammanfattning: Objective non-intrusive speech quality assessment aimed to emulate and correlate with human judgement has received more attention over the years. It is a difficult problem due to three reasons: data scarcity, noisy human judgement, and a potential uneven distribution of bias of mean opinion scores (MOS). In this paper, we introduce the Latent Mean-Opinion-Score Network (LaMOSNet) that leverage on individual judge’s scores to increase the data size, and new ideas to deal with both noisy and biased labels. We introduce a methodology called Optimistic Judge Estimation as a way to reduce bias in MOS in a clear way. We also implement stochastic gradient noise and mean teacher, ideas from noisy image classification, to further deal with noisy and uneven bias distribution of labels. We achieve competitive results on VCC2018 modeling MOS, and state-of-the-art modeling only listener dependent scores.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)