Identifying Base Noun Phrases by Means of Recurrent Neural Networks : Using Morphological and Dependency Features

Detta är en Master-uppsats från Uppsala universitet/Institutionen för lingvistik och filologi

Författare: Tonghe Wang; [2020]

Nyckelord: ;

Sammanfattning: Noun phrases convey key information in communication and are of interest in NLP tasks. A base NP is defined as the headword and left-hand side modifiers of a noun phrase. In this thesis, we identify base NPs in Universal Dependencies treebanks in English and French using an RNN architecture.The data of this thesis consist of three multi-layered treebanks in which each sentence is annotated in both constituency and dependency formalisms. To build our training data, we find base NPs in the constituency layers and project them onto the dependency layer by labeling corresponding tokens. For input features, we devised 18 configurations of features available in UD annotation. We train RNN models with LSTM and GRU cells with different numbers of epochs on these configurations of features.Tested on monolingual and bilingual test sets, our models delivered satisfactory token-based F1 scores (92.70% on English, 94.87% on French, 94.29% on bilingual test set). The most predicative configuration of features is found out to be pos_dep_parent_child_morph, which covers 1) dependency relations between the current token, its syntactic head, its leftmost and rightmost syntactic dependents; 2) PoS tags of these tokens; and 3) morphological features of the current token.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)