Prediktion av användaromdömen om språkcafé-samtal baserat på automatisk röstanalys

Detta är en Kandidat-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Sammanfattning: Spoken communication between humans generate information in two channels; the primary channel, linked to the syntactic-semantic part of the speech (what a person is litteraly saying); the secondary channel conveys paralinguistic information (tone, emotional state and gestures). This study examines the paralinguistic part of the speech, more specific the tone and emotional state. The study examines if there is a correlation between human speech and the opinion of a participant to a language café based conversation. The language café conversations is moderated by the social robot platform Furhat created by Furhat Robotics. The report is written from two perspectives. A data scientific view where identified emotions in audio files are analysed with machine learning algorithms and mathematical models. Vokaturi, an emotion recognition software, analyses the audio files and quantifies the emotional attributes. The classification model is based upon these attributes and the answers from the language café survey. Speech emotion recognition is also evaluated as a method for gathering customer opinions in a customer feedback loop. The results show an accuracy of 61% and indicates that some sort of prediction is possible. However there is no clear correlation between the recorded human voice and the participants opinion of the conversation. In the discussion part the difficulties of creating a high accuracy model with current data is analysed. It also contains a hypothetic analysis of the model as a gathering method for customer data.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)