Changing a user’s search experience byincorporating preferences of metadata

Detta är en Master-uppsats från KTH/Skolan för datavetenskap och kommunikation (CSC)

Sammanfattning: Implicit feedback is usually data that comes from users’ clicks, search queries and text highlights. It exists in abun- dance, but it is riddled with much noise and requires advanced algorithms to properly make good use of it. Several findings suggest that factors such as click-through data and reading time could be used to create user behaviour models in order to predict the users’ information need. This Master’s thesis aims to use click-through data and search queries together with heuristics to create a model that prioritises metadata-fields of the documents in order to predict the information need of a user. Simply put, implicit feedback will be used to improve the precision of a search engine. The Master’s thesis was carried out at Findwise AB - a search engine consultancy firm. Documents from the benchmark dataset INEX were indexed into a search engine. Two different heuristics were proposed that increment the priority of different metadata-fields based on the users’ search queries and clicks. It was assumed that the heuristics would be able to change the listing order of the search results. Evaluations were carried out for the two heuristics and the unmodified search engine was used as the baseline for the experiment. The evaluations were based on simulating a user that searches queries and clicks on documents. The queries and documents, with manually tagged relevance, used in the evaluation came from a data set given by INEX. It was expected that listing order would change in a way that was favourable for the user; the top-ranking results would be documents that truly were in the interest of the user. The evaluations revealed that the behaviour of the heuristics and the baseline have erratic behaviours and metrics never converged to any specific mean-relevance. A statistical test revealed that there is no difference in accuracy between the heuristics and the baseline. These results mean that the proposed heuristics do not improve the precision of the search engine and several factors, such as the indexing of too redundant metadata, could have been responsible for this outcome. 

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)