A comparison of different methods in their ability to compare semantic similarity between articles and press releases

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Sammanfattning: The goal of a press release is to have the information spread as widely as possible. A suitable approach to distribute the information is to target journalists who are likely to distribute the information further. Deciding which journalists to target has traditionally been performed manually without intelligent digital assistance and therefore has been a time consuming task. Machine learning can be used to assist the user by predicting a ranking of journalists based on their most semantically similar written article to the press release. The purpose of this thesis was to compare different methods in their ability to compare semantic similarity between articles and press releases when used for the task of ranking journalists. Three methods were chosen for comparison: (1.) TF-IDF together with cosine similarity, (2.) TF-IDF together with soft-cosine similarity and (3.) sentence mover’s distance (SMD) together with SBERT. Based on the proposed heuristic success metric, both TF-IDF methods outperformed the SMD method. The best performing method was TF-IDF with soft-cosine similarity.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)