Multilingual Dependency Parsing of Uralic Languages : Parsing with zero-shot transfer and cross-lingual models using geographically proximate, genealogically related, and syntactically similar transfer languages

Detta är en Master-uppsats från Uppsala universitet/Institutionen för lingvistik och filologi

Sammanfattning: One way to improve dependency parsing scores for low-resource languages is to make use of existing resources from other closely related or otherwise similar languages. In this paper, we look at eleven Uralic target languages (Estonian, Finnish, Hungarian, Karelian, Livvi, Komi Zyrian, Komi Permyak, Moksha, Erzya, North Sámi, and Skolt Sámi) with treebanks of varying sizes and select transfer languages based on geographical, genealogical, and syntactic distances. We focus primarily on the performance of parser models trained on various combinations of geographically proximate and genealogically related transfer languages, in target-trained, zero-shot, and cross-lingual configurations. We find that models trained on combinations of geographically proximate and genealogically related transfer languages reach the highest LAS in most zero-shot models, while our highest-performing cross-lingual models were trained on genealogically related languages. We also find that cross-lingual models outperform zero-shot transfer models. We then select syntactically similar transfer languages for three target languages, and find a slight improvement in the case of Hungarian. We discuss the results and conclude with suggestions for possible future work.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)