Sentiment analysis of arbitrary search resultsIdentified obstacles, mitigations strategies and effects on sentiment measurement

Detta är en Kandidat-uppsats från Uppsala universitet/Institutionen för informationsteknologi

Författare: Lukas Nord; [2022]

Nyckelord: ;

Sammanfattning: This work analyzes the sentiment of over 2000 Google search results in order to investigate whether it is feasible to accurately measure sentiment of an arbitrary website using methods usually utilized in predefined scopes. While sentiment analysis is commonly performed using data obtained from a predetermined source such as a single target website, widening the scope to analyze arbitrary search results presents a number of obstacles. These obstacles, and mitigation strategies to overcome them, are the focus of this work.  The identified obstacles are grouped as either pertaining to data retrieval or sentiment analysis. Obstacles regarding data retrieval are related to the difficulty in pinpointing where the relevant information is located in an arbitrary webpage, and excluding irrelevant information when automatically retrieving the webpage text. The obstacles pertaining to sentiment analysis itself include interpreting sarcasm literally, and the relatively formal language of arbitrary websites in comparison to comments or reviews - which are more common subjects for sentiment analysis. By removing HTML tags containing irrelevant text, as well as removing common phrases used in website vernacular, it is possible to increase the accuracy of automatic data retrieval. The effectiveness of these mitigation strategies is measured by comparing the sentiment of manually retrieved text to that of automatically retrieved text, sourced from the same 20 randomly selected webpages.  From 2260 scraped webpages, the proposed methods were accurate enough to discern subtle differences in intensity of positive and negative sentiment words. As an example, webpages resulting from queries containing the word “great” were more positive than those from queries containing the word “good”. Both of these sentiment words had a positive effect on mean sentiment, while negative sentiment words were observed to have a negative effect

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)