A Comparison Between KeyFrame Extraction Methods for Clothing Recognition

Detta är en Uppsats för yrkesexamina på avancerad nivå från Uppsala universitet/Datalogi

Sammanfattning: With an ever so high video consumption, applications and services need to use smart approaches to make the experience better for their users. By using key frames from a video, useful information can be retrieved regarding the entire video, and used for better explaining the content. At present, many key frame extraction (KFE) methods aim at selecting multiple frames from videos composed of multiple scenes, and coming from various contexts. In this study a proposed key frame extraction method that extracts a single frame for further clothing recognition purposes is implemented and compared against two other methods. The proposed method utilizes the state-of-the-art object detector YOLO (You Only Look Once) to ensure the extracted key frames contain people, and is referred to as YKFE (YOLO-based Key Frame Extraction). YKFE is then compared against the simple and baseline method named MFE (Middle Frame Extraction) which always extracts the middle frame of the video, and the famous optical flow based method referred to as Wolf KFE, that extracts frames having the lowest amount of optical flow. The YOLO model is pre-trained and further fine tuned on a custom dataset. Furthermore, three versions of the YKFE method are developed and compared, each utilizing different measurements in order to select the best key frame, the first one being optical flow, the second aspect ratio, and the third by combining both optical flow and aspect ratio. At last, three proposed metrics: RDO (Rate of Distinguishable Outfits), RSAR (Rate of Successful API Returns), and AET (Average Extraction Time) were used to evaluate and compare the performance of the methods against each other on two sets of test data containing 100 videos each. The results show that YKFE yields more reliable results while taking significantly more time than both MFE and Wolf KFE. However, both MFE and Wolf KFE do not consider whether frames contain people or not, meaning the context in which the methods are used is of significant importance for the rate of successful key frame extractions. Finally as an experiment, a method named Slim YKFE was developed as a combination of both MFE and YKFE, resulting in a substantially reduced extraction time while still maintaining high accuracy.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)