Email Classification with Machine Learning and Word Embeddings for Improved Customer Support

Detta är en Uppsats för yrkesexamina på avancerad nivå från Blekinge Tekniska Högskola/Institutionen för datalogi och datorsystemteknik

Sammanfattning: Classifying emails into distinct labels can have a great impact on customer support. By using machine learning to label emails the system can set up queues containing emails of a specific category. This enables support personnel to handle request quicker and more easily by selecting a queue that match their expertise. This study aims to improve the manually defined rule based algorithm, currently implemented at a large telecom company, by using machine learning. The proposed model should have higher F1-score and classification rate. Integrating or migrating from a manually defined rule based model to a machine learning model should also reduce the administrative and maintenance work. It should also make the model more flexible. By using the frameworks, TensorFlow, Scikit-learn and Gensim, the authors conduct five experiments to test the performance of several common machine learning algorithms, text-representations, word embeddings and how they work together. In this article a web based interface were implemented which can classify emails into 33 different labels with 0.91 F1-score using a Long Short Term Memory network. The authors conclude that Long Short Term Memory networks outperform other non-sequential models such as Support Vector Machines and ADABoost when predicting labels for emails.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)