Emotion Recognition in Football Commentator Speech : Is the action intense or not ?

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Författare: Paul-gauthier Noé; [2020]

Nyckelord: ;

Sammanfattning: In order to improve the production quality of a football game broadcast, Digigram wants to detect automatically the excitement state of the commentator. The aim of this master thesis is to obtain this state from the commentator speech in order to know if s/he is describing an intense action or a calm one. In order to do that, a simple binary classification problem is defined. A speech segment has to be classified as being either from an intense action or a calm one. The audio waveform is not directly used for classification. Relevant features are used instead, such as the Mel-Frequency Cepstral Coefficients (MFCC), the energy, the pitch, its smoothed version and an introduced feature that is related to the speaking rate. Least Absolute Shrinkage and Selection Operator (LASSO) estimator is used in order to select the features that have the biggest linear influence on the class and thus reduce the number of input features. Least Square, Naive Bayes, K-Nearest Neighbors (KNN) and Support Vector Machine (SVM) classifiers are presented and evaluated. SVM has the best performance and is also used in a real time context where the posterior probability of having an intense action is plotted. However, more data are needed to go further. Indeed with the present dataset, the generalisation ability to another speaker or other conditions is not guaranteed. 

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)