Machine Learning for Detecting Hate Speech in Low Resource Languages

Detta är en Master-uppsats från Göteborgs universitet/Institutionen för data- och informationsteknik

Sammanfattning: This work examines the role of both cross-lingual zero-shot learning and data augmentationin detecting hate speech online for low resource set-ups. The proposedsolutions for situations where the amount of labeled data is scarce are to use alanguage with more resources during training or to create synthetic data points.Cross-lingual zero-shot results suggest some knowledge transfer is occurring. However,results seem greatly influenced by the specific training data set selected. Thisis further supported by cross-data set experimentation within the same language,where results were also found to fluctuate based on training data without the needfor cross-lingual transfer. Meanwhile, data augmentation methods show an improvement,especially for low amounts of data. Furthermore, a detailed discussionon how the proposed data augmentation techniques impact the data is presented inthis work.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)