A Study in Describing Complex Words Using Wikipedia's Categorisation System : Adding Descriptive Terms to Increase the Comprehension of Swedish Texts

Detta är en Master-uppsats från Linköpings universitet/Institutionen för datavetenskap

Sammanfattning: This thesis offers new input in the field of generating epithets to aid the comprehension of Swedish texts. For whatever reason, a reader might find certain words in a text difficult to understand. For example, they may never have come across the term moussaka before; however, by the simple expedient of assigning an explanatory epithet – in this case, “the dish” moussaka – they can hopefully continue reading uninterrupted. To do this, obscure phrases are identified and extracted based on word class, shallow token features and the Pareto Principle. An algorithm then extracts appropriate epithets for each word using the Wikipedia categorisation system. Although the algorithm developed for the study achieved underwhelming results when extracting obscure phrases, it did prove excellent at assigning appropriate epithets to nouns and proper nouns. With further research, this process can hopefully be utilised as a tool for improving the readability of any text.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)