Detecting Dissimilarity in Discourse on Social Media

Detta är en Uppsats för yrkesexamina på avancerad nivå från Uppsala universitet/Matematiska institutionen

Författare: Mattias Mineur; [2022]

Nyckelord: Natural Language Processing;

Sammanfattning: A lot of interaction between humans take place on social media. Groups and communities are sometimes formed both with and without intention. These interactions generate a large quantity of text data. This project aims to detect dissimilarity in discourse between communities on social media using a distributed approach. A data set of tweets was used to test and evaluate the method. Tweets produced from two communities were extracted from the data set. Two Natural Language Processing techniques were used to vectorise the tweets for each community. Namely LIWC, dictionary based on knowledge acquired from professionals in linguistics and psychology, and BERT, an embedding model which uses machine learning to present words and sentences as a vector of decimal numbers. These vectors were then used as representations of the text to measure the similarity of discourse between the communities. Both distance and similarity were measured. It was concluded that none of the combinations of measure or vectorisation method that was tried could detect a dissimilarity in discourse on social media for the sample data set. 

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)