Pruning a Single-Shot Detector for Faster Inference : A Comparison of Two Pruning Approaches

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Sammanfattning: Modern state-of-the-art object detection models are based on convolutional neural networks and can be divided into single-shot detectors and two-stage detectors. Two-stage detectors exhibit impressive detection performance but their complex pipelines make them slow. Single-shot detectors are not as accurate as two-stage detectors, but are faster and can be used for real-time object detection. Despite the fact that single-shot detectors are faster, a large number of calculations are still required to produce a prediction that not many embedded devices are capable of doing in a reasonable time. Therefore, it is natural to ask if single-shot detectors could become faster even. Pruning is a technique to reduce the size of neural networks. The main idea behind network pruning is that some model parameters are redundant and do not contribute to the final output. By removing those redundant parameters, fewer computations are needed to produce predictions, which may lead to a faster inference and since the parameters are redundant, the model accuracy should not be affected. This thesis investigates two approaches for pruning the SSD-MobileNet- V2 single-shot detector. The first approach prunes the single-shot detector by a large portion and retrains the remaining parameters only once. In the other approach, a smaller portion is pruned, but pruning and retraining are done in an iterative fashion, where pruning and retraining constitute one iteration. Beyond comparing two pruning approaches, the thesis also studies the tradeoff between model accuracy and inference speed that pruning induces. The results from the experiments suggest that the iterative pruning approach preserves the accuracy of the original model better than the other approach where pruning and finetuning are performed once. For all four pruning levels that the two approaches are compared iterative pruning yields more accurate results. In addition, an inference evaluation indicates that iterative pruning is a good compression method for SSD-MobileNet-V2, finding models that both are faster and more accurate than the original model. The thesis findings could be used to guide future pruning research on SSD-MobileNet- V2, but also on other single-shot detectors such as RetinaNet and the YOLO models. 

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)