Multi-class Sentiment Classification on Twitter using an Emoji Training Heuristic

Detta är en Kandidat-uppsats från KTH/Skolan för datavetenskap och kommunikation (CSC)

Författare: Fredrik Hallsmar; Jonas Palm; [2016]

Nyckelord: ;

Sammanfattning: Sentiment analysis on social media is an important part of today's need for information gathering. Different machine learning techniques have been used in recent years, and usage of an emoticon heuristic to automatically annotate training sets has been a popular approach. As emojis are becoming more popular to use in text-based communication this thesis investigates the feasibility of an emoji training heuristic for multi-class sentiment analysis using a Multinomial Naive Bayes Classifier. Training sets consisting of 4000 to 400 000 tweets were used to train the classifier using various configurations of N-grams. The results show that an emoji heuristic performs well compared to emoticon- or hashtag-based heuristics. However, classifier confusion is highly dependent on class selection and emoji representations when multi-class sentiment analysis is performed.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)