The Use of Distributional Semantics in Text Classification Models : Comparative performance analysis of popular word embeddings

Detta är en Master-uppsats från Linköpings universitet/Datorseende

Författare: Tobias Norlund; [2016]

Nyckelord: distributional semantics; text classification; cnn;

Sammanfattning:

In the field of Natural Language Processing, supervised machine learning is commonly used to solve classification tasks such as sentiment analysis and text categorization. The classical way of representing the text has been to use the well known Bag-Of-Words representation. However lately low-dimensional dense word vectors have come to dominate the input to state-of-the-art models. While few studies have made a fair comparison of the models' sensibility to the text representation, this thesis tries to fill that gap. We especially seek insight in the impact various unsupervised pre-trained vectors have on the performance. In addition, we take a closer look at the Random Indexing representation and try to optimize it jointly with the classification task. The results show that while low-dimensional pre-trained representations often have computational benefits and have also reported state-of-the-art performance, they do not necessarily outperform the classical representations in all cases.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)