Analyzing the Anisotropy Phenomenon in Transformer-based Masked Language Models

Detta är en Master-uppsats från Uppsala universitet/Institutionen för lingvistik och filologi

Författare: Ziyang Luo; [2021]

Nyckelord: Transformer; BERT; RoBERTa; Anisotropy;

Sammanfattning: In this thesis, we examine the anisotropy phenomenon in popular masked language models, BERT and RoBERTa, in detail. We propose a possible explanation for this unreasonable phenomenon. First, we demonstrate that the contextualized word vectors derived from pretrained masked language model-based encoders share a common, perhaps undesirable pattern across layers. Namely, we find cases of persistent outlier neurons within BERT and RoBERTa's hidden state vectors that consistently bear the smallest or largest values in said vectors. In an attempt to investigate the source of this information, we introduce a neuron-level analysis method, which reveals that the outliers are closely related to information captured by positional embeddings. Second, we find that a simple normalization method, whitening can make the vector space isotropic. Lastly, we demonstrate that ''clipping'' the outliers or whitening can more accurately distinguish word senses, as well as lead to better sentence embeddings when mean pooling.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)