A performance measurement of a Speaker Verification system based on a variance in data collection for Gaussian Mixture Model and Universal Background Model
Sammanfattning: Voice recognition has become a more focused and researched field in the last century, and new techniques to identify speech has been introduced. A part of voice recognition is speaker verification which is divided into Front-end and Back-end. The first component is the front-end or feature extraction where techniques such as Mel-Frequency Cepstrum Coefficients (MFCC) is used to extract the speaker specific features of a speech signal, MFCC is mostly used because it is based on the known variations of the humans ear’s critical frequency bandwidth. The second component is the back-end and handles the speaker modeling. The back-end is based on the Gaussian Mixture Model (GMM) and Gaussian Mixture Model-Universal Background Model (GMM-UBM) methods for enrollment and verification of the specific speaker. In addition, normalization techniques such as Cepstral Means Subtraction (CMS) and feature warping is also used for robustness against noise and distortion. In this paper, we are going to build a speaker verification system and experiment with a variance in the amount of training data for the true speaker model, and to evaluate the system performance. And further investigate the area of security in a speaker verification system then two methods are compared (GMM and GMM-UBM) to experiment on which is more secure depending on the amount of training data available. This research will therefore give a contribution to how much data is really necessary for a secure system where the False Positive is as close to zero as possible, how will the amount of training data affect the False Negative (FN), and how does this differ between GMM and GMM-UBM. The result shows that an increase in speaker specific training data will increase the performance of the system. However, too much training data has been proven to be unnecessary because the performance of the system will eventually reach its highest point and in this case it was around 48 min of data, and the results also show that the GMMUBM model containing 48- to 60 minutes outperformed the GMM models.
HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)