Mitigating Unintended Bias in Toxic Comment Detection using Entropy-based Attention Regularization

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Sammanfattning: The proliferation of hate speech is a growing challenge for social media platforms, as toxic online comments can have dangerous consequences also in real life. There is a need for tools that can automatically and reliably detect hateful comments, and deep learning models have proven effective in solving this issue. However, these models have been shown to have unintended bias against some categories of people. Specifically, they may classify comments that reference certain frequently attacked identities (such as gay, black, or Muslim) as toxic even if the comments themselves are actually not toxic (e.g. ”I am Muslim”). To address this bias, previous authors introduced an Entropy-based Attention Regularization (EAR) method which, when applied to BERT, has been shown to reduce its unintended bias. In this study, the EAR method was applied not only to BERT, but also to XLNet. The investigation involved the comparison of four models: BERT, BERT+EAR, XLNet, and XLNet+EAR. Several experiments were performed, and the associated code is available on GitHub. The classification performance of these models was measured using the F1-score on a public data set containing comments collected from Wikipedia forums. While their unintended bias was evaluated by employing AUC-based metrics on a synthetic data set consisting of 50 identities grouped into four macro categories: Gender & Sexual orientation, Ethnicity, Religion, and Age & Physical disability. The results of the AUC-based metrics proved that EAR performs well on both BERT and XLNet, successfully reducing their unintended bias towards the 50 identity terms considered in the experiments. Conversely, the F1-score results demonstrated a negative impact of EAR on the classification performance of both BERT and XLNet.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)