Text analysis for email multi label classification

Detta är en Master-uppsats från Göteborgs universitet/Institutionen för data- och informationsteknik

Sammanfattning: This master’s thesis studies a multi label text classification task on a small dataset of bilingual, English and Swedish, short texts (emails). Specifically, the size ofthe data set is 5800 emails and those emails are distributed among 107 classes withthe special case that the majority of the emails includes the two languages at thesame time. For handling this task different models have been employed: SupportVector Machines (SVM), Gated Recurrent Units (GRU), Convolution Neural Network(CNN), Quasi Recurrent Neural Network (QRNN) and Transformers. Theexperiments demonstrate that in terms of weighted averaged F1 score, the SVMoutperforms the other models with a score of 0.96 followed by the CNN with 0.89and the QRNN with 0.80.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)