Genre classification using syntactic features

Detta är en Master-uppsats från Uppsala universitet/Institutionen för lingvistik och filologi

Sammanfattning: This thesis work adresses text classification in relation to genre identification using different feature sets, with a focus on syntactic based features. We built our models by means of traditional machine learning algorithms, i.e. Naive Bayes, K-nearest neighbour, Support Vector Machine and Random Forest in order to predict the literary genre of books. We trained our models using as feature sets bag-of-words (BOW), bigrams, syntactic-based bigrams and emotional features, as well as combinations of features. Results obtained using the best features, i.e. BOW combined with bigrams based on syntactic relations between words, on the test set showed an enhancement in performance by 2% in F1-score over the baseline using BOW features, which translates into a positive impact of using syntactic information in the task of text classification.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)