Evaluation of Text-Independent and Closed-Set Speaker Identification Systems

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Författare: Berk Gedik; [2018]

Nyckelord: ;

Sammanfattning: Speaker recognition is the task of recognizing a speaker of a given speech record and it has wide application areas. In this thesis, various machine learning models such as Gaussian Mixture Model (GMM), k-Nearest Neighbor(k-NN) Model and Support Vector Machines (SVM) and feature extraction methods such as Mel-Frequency Cepstral Coefficients (MFCC) and Linear Predictive Cepstral Coefficients (LPCC) are investigated for the speaker recognition task. Combinations of those models and feature extraction methods are evaluated on many datasets varying on the number of speakers and training data size. This way, the performance of methods in different settings are analyzed. As results, it is found that GMM and KNN methods are providing good accuracies and LPCC method performs better than MFCC. Also, the effect of audio recording duration, training data duration and number of speakers on the prediction accuracy is analyzed. 

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)