Product Similarity Matching for Food Retail using Machine Learning

Detta är en Master-uppsats från KTH/Matematisk statistik

Sammanfattning: Product similarity matching for food retail is studied in this thesis. The goal is to find products that are similar but not necessarily of the same brand which can be used as a replacement product for a product that is out of stock or does not exist in a specific store. The aim of the thesis is to examine which machine learning model that is best suited to perform the product similarity matching. The product data used for training the models were name, description, nutrients, weight and filters (labels, for example organic). Product similarity matching was performed pairwise and the similarity between the products was measured by jaccard distance for text attributes and relative difference for numeric values. Random Forest, Logistic Regression and Support Vector Machines were tested and compared to a baseline. The baseline computed the jaccard distance for the product names and did the classification based on a threshold value of the jaccard distance. The result was measured by accuracy, F-measure and AUC score. Random Forest performed best in terms of all evaluation metrics and Logistic Regression, Random Forest and Support Vector Machines all performed better than the baseline.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)