Automatic Speech Recognition Model for Swedish using Kaldi

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Författare: Yihan Wang; [2020]

Nyckelord: Speech recognition; Kaldi; Mel-frequency cepstral coefficient; Perceptual linear predictive; Speaker adaptive training; Weight Finite State Transducers; Taligenkänning; Kaldi; Cefstralskoefficient för Mel-Frekvens; Perceptuell linjär prediktiv; Uppladdning av högtalaren; Viktfinitomvandlare;

Sammanfattning: With the development of intelligent era, speech recognition has been a hottopic. Although many automatic speech recognition(ASR) tools have beenput into the market, a considerable number of them do not support Swedishbecause of its small number. In this project, a Swedish ASR model basedon Hidden Markov Model and Gaussian Mixture Models is established usingKaldi which aims to help ICA Banken complete the classification of aftersalesvoice calls. A variety of model patterns have been explored, whichhave different phoneme combination methods and eigenvalue extraction andprocessing methods. Word Error Rate and Real Time Factor are selectedas evaluation criteria to compare the recognition accuracy and speed ofthe models. As far as large vocabulary continuous speech recognition isconcerned, triphone is much better than monophone. Adding feature transformationwill further improve the speed of accuracy. The combination oflinear discriminant analysis, maximum likelihood linear transformand speakeradaptive training obtains the best performance in this implementation. Fordifferent feature extraction methods, mel-frequency cepstral coefficient ismore conducive to obtain higher accuracy, while perceptual linear predictivetends to improve the overall speed.

HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)

Automatic Speech Recognition Model for Swedish using Kaldi

Sökningar just nu

Populära sökningar

Uppsatser med många visningar igår (2024-04-23)