Learning Embeddings for Fashion Images

Detta är en Master-uppsats från Linköpings universitet/Datorseende

Författare: Simon Hermansson; [2023]

Nyckelord: Computer Vision; Machine Learning; Image Retrieval; CLIP; Masked Autoencoders MAE ; Vision Transformers; Image Captioning; Price Prediction; AI for Fashion;

Sammanfattning: Today the process of sorting second-hand clothes and textiles is mostly manual. In this master’s thesis, methods for automating this process as well as improving the manual sorting process have been investigated. The methods explored include the automatic prediction of price and intended usage for second-hand clothes, as well as different types of image retrieval to aid manual sorting. Two models were examined: CLIP, a multi-modal model, and MAE, a self-supervised model. Quantitatively, the results favored CLIP, which outperformed MAE in both image retrieval and prediction. However, MAE may still be useful for some applications in terms of image retrieval as it returns items that look similar, even if they do not necessarily have the same attributes. In contrast, CLIP is better at accurately retrieving garments with as many matching attributes as possible. For price prediction, the best model was CLIP. When fine-tuned on the dataset used, CLIP achieved an F1-Score of 38.08 using three different price categories in the dataset. For predicting the intended usage (either reusing the garment or exporting it to another country) the best model managed to achieve an F1-Score of 59.04.

HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)

Learning Embeddings for Fashion Images

Sökningar just nu

Populära sökningar

Uppsatser med många visningar igår (2024-04-26)