Identifying Base Noun Phrases by Means of Recurrent Neural Networks : Using Morphological and Dependency Features

Detta är en Master-uppsats från Uppsala universitet/Institutionen för lingvistik och filologi

Författare: Tonghe Wang; [2020]

Nyckelord: ;

Sammanfattning: Noun phrases convey key information in communication and are of interest in NLP tasks. A base NP is deﬁned as the headword and left-hand side modiﬁers of a noun phrase. In this thesis, we identify base NPs in Universal Dependencies treebanks in English and French using an RNN architecture.The data of this thesis consist of three multi-layered treebanks in which each sentence is annotated in both constituency and dependency formalisms. To build our training data, we ﬁnd base NPs in the constituency layers and project them onto the dependency layer by labeling corresponding tokens. For input features, we devised 18 conﬁgurations of features available in UD annotation. We train RNN models with LSTM and GRU cells with diﬀerent numbers of epochs on these conﬁgurations of features.Tested on monolingual and bilingual test sets, our models delivered satisfactory token-based F1 scores (92.70% on English, 94.87% on French, 94.29% on bilingual test set). The most predicative conﬁguration of features is found out to be pos_dep_parent_child_morph, which covers 1) dependency relations between the current token, its syntactic head, its leftmost and rightmost syntactic dependents; 2) PoS tags of these tokens; and 3) morphological features of the current token.

HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)

Identifying Base Noun Phrases by Means of Recurrent Neural Networks : Using Morphological and Dependency Features

Sökningar just nu

Populära sökningar

Uppsatser med många visningar igår (2024-04-18)