Ensuring Brand Safety by Using Contextual Text Features: A Study of Text Classification with BERT

Detta är en Master-uppsats från Uppsala universitet/Institutionen för lingvistik och filologi

Författare: Lingqing Song; [2023]

Nyckelord: ;

Sammanfattning: When advertisements are placed on web pages, the context in which the advertisements are presented is important. For example, manufacturers of kitchen knives may not want their advertisement to appear in a news article about a knife-wielding murderer. The purpose of the current work is to explore the ability of pre-trained language models on text classification tasks for determining whether the content of a given article is brand-safe, that is, suitable for brand advertising. A Norwegian-language news dataset containing 3600 news items was manually labelled with negative topics. Five pre-trained BERT language models were tested, including one multilingual BERT and four language models pre-trained specifically on Norwegian. Different training settings and fine-tuning methods were also tested for two best-performing models. It was found that more structurally complex language models and language models trained on corpora that were large or had larger vocabularies performed better on the text classification task during testing. However, the performance of smaller models is also acceptable if there is a trade-off between the better performance and the time and processing power required. As far as training and fine-tuning settings are concerned, this work found that for news texts, the initial part of the articles, which often contain the most information, is the optimal choice of parts as input to the model BERT. Another achievement and contribution of this work was the manual tagging of a Norwegian news dataset on negative topics.        This thesis also points to some possible directions for future work, such as experimenting with different label granularity, experimenting with multilingual controlled training, and training with few samples.    

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)