Feature Selection for Sentiment Analysis of Swedish News Article Titles

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Sammanfattning: The aim of this study was to elaborate the possibilities of sentiment analyzing Swedish news article titles using machine learning approaches and find how the text is best represented in such conditions. Sentiment analysis has traditionally been conducted by part-of-speech tagging and counting word polarities, which performs well for large domains and in absence of large sets of training data. For narrower domains and previously labeled data, supervised learning can be used. The work of this thesis tested the performance of a convolutional neural network and a Support Vector Machine on different sets of data. The data sets were constructed to represent various language features. This included for example a simple unigram bag-of-words model storing word counts, a bigram bag-of-words model to include the ordering of words and an integer vector summary of the title. The study concluded that each of the tested feature sets gave information about the sentiment to various extents. The neural network approach with all feature sets combined performed better than the two annotators of the study. Despite the limited data set, overfitting did not seem to be a problem when using the features together.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)