Detecting Signal Corruptions in Voice Recordings for Speech Therapy

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Sammanfattning: When recording voice samples from a patient in speech therapy the quality of the recording may be affected by different signal corruptions, for example background noise or clipping. The equipment and expertise required to identify small disturbances are not always present at smaller clinics. Therefore, this study investigates possible machine learning algorithms to automatically detect selected corruptions in speech signals, including infrasound and random muting. Five algorithms are analyzed: kernel substitution based Support Vector Machine, Convolutional Neural Network, Long Short-term Memory (LSTM), Gaussian Mixture Model based Hidden Markov Model and Generative Model based Hidden Markov Model. A tool to generate datasets of corrupted recordings is developed to test the algorithms in both single-label and multi-label settings. Mel-frequency Cepstral Coefficients are used as the main features. For each type of corruption different ways to increase the classification accuracy are tested, for example by using a Voice Activity Detector to filter out less relevant parts of the recording, changing the feature parameters, or using an ensemble of classifiers. The experiments show that a machine learning approach is feasible for this problem as a balanced accuracy of at least 75% is reached on all tested corruptions. While the single-label study gave mixed results with no algorithm clearly outperforming the others, in the multi-label case the LSTM in general performs better than other algorithms. Notably it achieves over 95% balanced accuracy on both white noise and infrasound. As the algorithms are trained only on spoken English phrases the usability of this tool in its current state is limited, but the experiments are easily expanded upon with other types of audio recordings, corruptions, features, or classification algorithms. 

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)