Weighting Edit Distance to Improve Spelling Correction in Music Entity Search

Detta är en Master-uppsats från KTH/Skolan för datavetenskap och kommunikation (CSC)

Sammanfattning: This master’s thesis project undertook investigation of whether the extant Damerau- Levenshtein edit distance measurement between two strings could be made more useful for detecting and adjusting misspellings in a search query. The idea was to use the knowledge that many users type their queries using the QWERTY keyboard layout, and weighting the edit distance in a manner that makes it cheaper to correct misspellings caused by confusion of nearer keys. Two different weighting approaches were tested, one with a linear spread from 2/9 to 2 depending on the keyboard distance, and the other had neighbors preferred over non-neighbors (either with half the cost or no cost at all). They were tested against an unweighted baseline as well as inverted versions of themselves (nearer keys more expensive to replace) against a dataset of 1,162,145 searches. No significant improvement in the retrieval of search results were observed when compared to the baseline. However, each of the weightings performed better than its corresponding inversion on a p < 0.05 significance level. This means that while the weighted edit distance did not outperform the baseline, the data still clearly points toward a correlation between the physical position of keys on the keyboard, and what spelling mistakes are made.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)