The Influence of M-BERT and Sizes on the Choice of Transfer Languages in Parsing

Detta är en Master-uppsats från Uppsala universitet/Institutionen för lingvistik och filologi

Författare: Yifei Zhang; [2021]

Nyckelord: ;

Sammanfattning: In this thesis, we explore the impact of M-BERT and different transfer sizes on the choice of different transfer languages in dependency parsing. In order to investigate our research questions, we conduct a series of experiments on the treebanks in Universal Dependencies with UUParser.     The main conclusions and contributions of this study are as follows:   First, we train a variety of languages in several different scripts with M-BERT being added into the parsing framework, which is one of the most state-of-the-art deep learning models based on the Transformer architecture. In general, we get advancing results with M-BERT compared with the randomly initialized embedding in UUParser.    Second, since it is a common way to choose a source language, which is 'close' to the target language in cross-lingual parsing, we try to explore what 'close' languages actually are, as there is not a definition for 'close'. In our study, we explore how strongly the parsing results are correlated with the different linguistic distances between the source and target languages. The relevant data is queried from URIEL Database. We find that the parsing performance is more dependent on inventory, syntactic and featural distance than on the geographic, genetic and phonological distance in zero-shot experiments. In the few-shot prediction, the parsing accuracy shows stronger correlation with inventory and syntactic distance than with others.     Third, we vary the training sizes in few-shot experiments with M-BERT being added to see how the parsing results are influenced. We find that it is very obvious that few-shot experiments outperform zero-shot experiments. With the source sizes being cut, all parsing scores decrease. However, we do not see a linear drop of the results.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)