Attitydanalys av svenska produktomdömen – behövs språkspecifika verktyg?
Sammanfattning: Sentiment analysis of Swedish data is often performed using English tools and machine. This thesis compares using a neural network trained on Swedish data with a corresponding one trained on English data. Two datasets were used: approximately 200,000 non-neutral Swedish reviews from the company Prisjakt Sverige AB, one of the largest annotated datasets used for Swedish sentiment analysis, and 1,000,000 non-neutral English reviews from Amazon.com. Both networks were evaluated on 11,638 randomly selected reviews, in Swedish and in English machine translation. The test set had the same overrepresentation of positive reviews as the Swedish dataset (84% were positive). The results suggest that English tools can be used with machine translation for sentiment analysis of Swedish reviews, without loss of classification ability. However, the English tool required 33% more training data to achieve maximum performance. Evaluation on the unbalanced test set required extra consideration regarding statistical measures. F1-measure turned out to be reliable only when calculated for the underrepresented class. It then showed a strong correlation with the Matthews correlation coefficient, which has been found to be more reliable. This warrants further investigation into whether the correlation is valid for all different balances, which would simplify comparison between studies.
HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)