Factorisation of Latent Variables in Word Space Models : Studying redistribution of weight on latent variables

Detta är en Kandidat-uppsats från KTH/Skolan för teknikvetenskap (SCI)

Författare: David Ödling; Arvid Österlund; [2014]

Nyckelord: ;

Sammanfattning: The ultimate goal of any DSM is a scalable and accurate representation of lexical semantics. Recent developments due to Bullinaria & Levy (2012) and Caron (2001) indicate that the accuracy of such models can be improved by redistribution of weight on the principal components. However, this method is poorly understood and barely replicated due to the computational expensive dimension reduction and the puzzling nature of the results. This thesis aims to explore the nature of these results. Beginning by reproducing the results in Bullinaria & Levy (2012) we move onto deepen the understanding of these results, quantitatively as well as qualitatively, using various forms of the BLESS test and juxtapose these with previous results.  The main result of this thesis is the verification of the 100% score on the TOEFL test and 91.5% on a paradigmatic version of the BLESS test. Our qualitative tests indicate that the redistribution of weight away from the first principal components is slightly different between word categories and hence the improvement in the TOEFL and BLESS results. While we do not find any significant relation between word frequencies and weight distribution, we find an empirical relation for the optimal weight distribution. Based on these results, we suggest a range of further studies to better understand these phenomena.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)