Searching and Recommending TextsRelated to Climate Change

Detta är en Master-uppsats från Uppsala universitet/Institutionen för informationsteknologi

Författare: Karolin Gjöthlén; [2021]

Nyckelord: ;

Sammanfattning: This project considers the design of a machine learning system to search efficiently a database of texts related to climate change. The efficient search and navigation of such a database make it easier to find actionable information, detect trends, or derives other useful information. A key feature of such an information retrieval system is the numerical representation of such a text. This project implements and compares three different ways to represent a text in a vector space. Specifically, we contrast Bag-of-Words, Term Frequency - Inverse Document Frequency, and Doc2Vec in this context. The reported results indicate two cases: firstly, we observe that all 3 embeddings outperform a naive (fixed, expert rule-based) method for retrieving a text. In this case, the query contains part of the text with a small modification, while the result of the query should be the text itself. The Bag-of-Words approach turns out to be best in class for this task. Secondly, we consider the task where the query is a random string, while the desired result is based on a manual comparison of the results. Here we observe that the doc2vec approach is best in class. If the random queries become abstract-alike, the Bag-of-Words approach is performing almost as well.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)