One Stage Fine- Grained Classification

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Sammanfattning: Fine- grained Visual Classification (FGVC) is a rapidly growing field in image classification. However, it is a challenging task due to subcategories sharing subtle differences. Existing approaches tackle this problem by firstly extracting discriminative regions using part localization or object localization or Region Proposal Networks (RPN), then applying Convolutional Neural Network (CNN) or SVM classifier on those regions. In this work, with the purpose of simplifying the above complicated pipeline while keeping high accuracy, we get inspired by the one- stage object detection model YOLO and design a one- stage end- to- end object detector model for FGVC. Specifically, we apply YOLOv5 as a baseline model and replace its Path Aggregation Network (PANet) structure with Weighted Bidirectional Feature Pyramid Network (BiFPN) structure to efficiently fuse information from different resolutions. We conduct experiments on different classification and localization weight ratios to guide choosing loss weights in different scenarios. We have proved the viability of the one- stage detector model YOLO on FGVC, which has 87.1 % top1 accuracy on the FGVC dataset CUB2002011. Furthermore, we have designed a more accurate one- stage model, achieving 88.1 % accuracy, which is the most accurate method compared to the existing localization state- of- the- art models. Finally, we have shown that the higher the classification loss weight, the faster the convergence speed, while increasing slightly localization loss weight can help achieve a more accurate classification but resulting in slower convergence. 

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)