Click me: thumbnail extraction for fashion videos : An approach for selecting engaging video thumbnails based on clothing identification, sharpness, and contrast.

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Sammanfattning: Video thumbnails are essential to represent the content and summary of a video. This thesis proposed a thumbnail extraction approach for fashion videos based on the presence of clothing items, sharpness, and contrast. Furthermore, this thesis investigated how the proposed thumbnail selection method performed concerning user engagement. Other research has been done on user engagement; however, the impact of clothing item presence has yet to be investigated. Firstly, a YOLOv7 model was trained on a fashion dataset to identify clothing items. The proposed selection method used the model to extract labels to determine what frames contain the maximum number of clothing items. The selected frames were filtered based on a contrast threshold, and the sharpest frame was kept as the proposed thumbnail from the remaining frames. The contrast was measured by calculating the standard deviation of the pixels in each frame. The sharpness was measured with the Laplacian operator. The user engagement was investigated by surveying 119 participants on thumbnail preference. The participants were presented with three frames, the thumbnail extracted with the proposed method, and two control frames: the middle frame of the video and a frame where the YOLOv7 model had only identified one object. The results show that the proposed thumbnail selection method performs well, receiving 59.75% of the total votes, compared to a middle frame and a single-item frame that received 17.46% and 22.79% of the votes, respectively. The results indicate that the proposed parameters for the thumbnail extraction could lead to higher user engagement.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)