Product Matching Using Image Similarity

Detta är en Uppsats för yrkesexamina på avancerad nivå från Uppsala universitet/Institutionen för informationsteknologi

Sammanfattning: PriceRunner is an online shopping comparison company. To maintain up-todate prices, PriceRunner has to process large amounts of data every day. The processing of the data includes matching unknown products, referred to as offers, to known products. Offer data includes information about the product such as: title, description, price and often one image of the product. PriceRunner has previously implemented a textual-based machine learning (ML) model, but is also looking for new approaches to complement the current product matching system. The objective of this master’s thesis is to investigate the potential of using an image-based ML model for product matching. Our method uses a similarity learning approach where the network learns to recognise the similarity between images. To achieve this, a siamese neural network was trained with the triplet loss function. The network is trained to map similar images closer together and dissimilar images further apart in a vector space. This approach is often used for face recognition, where there is an extensive amount of classes and a limited amount of images per class, and new classes are frequently added. This is also the case for the image data used in this thesis project. A general model was trained on images from the Clothing and Accessories hierarchy, one of the 16 toplevel hierarchies at PriceRunner, consisting of 17 product categories. The results varied between each product category. Some categories proved to be less suitable for image-based classification while others excelled. The model handles new classes relatively well without any, or with briefer, retraining. It was concluded that there is potential in using images to complement the current product matching system at PriceRunner.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)