SEG-YOLO: Real-Time Instance Segmentation Using YOLOv3 and Fully Convolutional Network

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Författare: Zhuoyue Wang; [2019]

Nyckelord: ;

Sammanfattning: Computer vision technology has been widely applied to augment sports TV broadcast experience.  Some of these broadcasts require accurate fore- ground segmentation on the outdoor sports scene, i.e., segmentation of po- tential sports players and highlight them. Segmentation on a full-HD video (1080p) resolution video stream with a high frame rate leads big challenge on the segmentation model.  In this thesis, a deep learning-based frame- work is proposed for real-time instance segmentation.Many traditional computer vision algorithms for segmentation based on background subtraction techniques, which are affected a lot by light-switch and  targets’  movements.   On  the  other  hand,  most  of  the  modern  deep learning-based frameworks run at a deficient speed that cannot support real-time usage, although they have better robustness against scene changes. The proposed model, SEG-YOLO, is an extension of YOLO(You Only Look Once) version 3, which is one of the state of the art object detection model. The extension part is FCN(Fully Convolution Network), which is used for semantic segmentation. SEG-YOLO aims to overcome both the speed and accuracy problems on the specific outdoor sports scene, while its usage can also be generalized to some extent.SEG-YOLO is an end to end model that consists of two neural networks: (a) YOLOv3, for object detection to generate instance bounding boxes and also for feature maps extraction as the input of phase b; (b) FCN, takes bound- ing boxes and feature maps as input and output segmentation masks of the objects.For instance, segmentation in the specific outdoor sport like golf, the frame- work shows an excellent performance both in speed and accuracy accord- ing to the experiments, and it’s superior to the state-of-the-art model. More- over,  it is proved that it can be used in real-time (30 FPS) broadcast TV with GPU acceleration.  For non-specific scenes of the benchmark COCO dataset, its performance does not exceed the current state-of-the-art withrespect to accuracy, but still has advantages regarding speed.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)