A Survey of Non-Projective Dependencies and a Novel Approach to Projectivization for Parsing

Detta är en Master-uppsats från Uppsala universitet/Institutionen för lingvistik och filologi

Sammanfattning: Non-projective dependencies remain an at large issue in the field of dependency parsing. Regardless of what parsing algorithm is used, researchers run into the issue of computational speed and lower parsing performance on non-projective dependencies than on projective dependencies. Through a better understanding of non-projectivity, we may be able to address both issues. This thesis is aimed to discover what types of non-projective dependencies are prevalent in the three languages English, German, and Czech. Moreover, this thesis is aimed to define and create a linguistically informed projectivization scheme and to find out the extent to which the scheme improves upon the performance of the baseline parser. In order to achieve these aims, the eight most frequently occurring non-projective dependencies in English, German, and Czech were surveyed. This means that the causes of their non-projectivity were discovered, the structures of the non-projective dependencies were analyzed, and generalizations and comparisons between non-projective dependencies were made. After the survey, an attempt to define and create a linguistically informed projectivization scheme was made. The goals were not only to projectivize the non-projective relations but to do so by assigning the closest possible new parent in the sentence to the non-projective child and to minimize the number of projectivization transformations that needed to be made. Although the survey of the non-projective dependencies yielded good results, as we were able to identify that the causes of the more frequently occurring non-projective dependencies in German and Czech were the same and the structures of them the same as well, we reached no solid conclusion on how a linguistically informed projectivization scheme could be defined, as further research is needed. However, the novel projectivization scheme we did come up with managed to marginally outperform the baseline parser in English and German, and moderately outperform the baseline parser in Czech which is the language with the most non-projective dependencies of the group.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)