Lexikonbaserad Cross-Language Information Retrival : Utvärdering av queryeffektivitet

Detta är en Magister-uppsats från Högskolan i Borås/Institutionen Biblioteks- och informationsvetenskap / Bibliotekshögskolan

Sammanfattning: This thesis discusses main problems associated with dictionary-based Cross-Language Information Retrieval as lexical and translational ambiguity of query terms, translation of compounds and phrases, dictionary limitation. The purpose of the study is to investigate how query structure influences the effectiveness of CLIR regarding performance of three query types: original query, unstructured query and structured query. Query structuring refers to the application of #syn-operator to group query terms. The study comprises an experiment that was performed in the InQuery IR system with TrecUta database that contains 550,000 news articles from different American newspapers. 24 topics were used for the experiment. The effectiveness of three types of query structure is compared at different Document Cut-off Value levels, maximal DCV= 100. The measure used is average precision. Binary relevance situation, where the three relevance degrees 1, 2, and 3 have been merged into one, is applied. The results show that dictionary-based query translation without the use of structure significantly decreases the effectiveness of information retrieval while query structuring through synonym sets shows to be a simple and effective method, which allows the reduction of the effects of translation ambiguity and the improvement of the performance of CLIR-queries. The results reveal that the performance can nearly reach the same level as the original queries.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)