Evaluating a Novel, Scalable Natural Language Processing Heuristic for Determining Semantic Relatedness

Detta är en Kandidat-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Författare: Mattias Bergström; Per Fahlander; [2019]

Nyckelord: ;

Sammanfattning:  Distributional semantics is a recent research field aiming to quantify how close one text is to another in terms of contextual meaning. In this study we propose and evaluate a novel distributional semantics model on how much agreement its predictions can yield with a set of 12,227 human opinions. We call this method Refined Semantic Relatedness (RSR), which applies an incrementally improvable word association index and some distributional principles for producing theoretically educated predictions. Using 1951 preprocessed Wikipedia articles as a basis for the predictions, the model predicted the human opinions with a Pearson correlation of 0.3. In previous literature it has been claimed that Explicit Semantic Analysis (ESA-Wiki) achieve a corresponding Pearson correlation of 0.72 by utilizing 241,393 preprocessed Wikipedia articles. That is roughly 5.76 times more variance accounted for, although, also a result of considerably more extensive preprocessing in terms of articles. While the predictive value of RSR turned out relatively low as a result of the study’s limitations, this could be addressed in further research. We believe that this paper in any way can contribute with some novel ideas to the field.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)