Data mining on Data sets of Products

Detta är en Kandidat-uppsats från Uppsala universitet/Institutionen för informationsteknologi

Författare: Robin Klingfors; [2022]

Nyckelord: ;

Sammanfattning: With the increase of available data, companies are looking for ways to use it to their advantages. One tool that has become a suitable for this task is Data mining, which is a good tool to process and analyze large amounts of data. With the use of data mining, specifically machine learning, one can recognize patterns and classify data. This thesis compares different supervised machine learning algorithms with different distance algorithms for strings. App Shack, an app-developing company, is currently looking for a solution in the instance of trying to predict a commercial products original attributes in a data set despite their id or color may differ. A solution using different types of machine learning algorithms together with different string matching algorithms is evaluated and compared. The machine learning algorithms k-nearest neighbor and decision tree are compared. The distance metric is determined by the Levenshtein and Soundex algorithms. All combinations of algorithms are evaluated and the results show that the choice of string-matching algorithm is more important than the choice of machine learning algorithm. The combination of k-nearest neighbor together with Levenshtein showed the best result in the given tests.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)