MobilePose: Real-Time 3D Hand Pose Estimation from a Single RGB Image

Detta är en Master-uppsats från Lunds universitet/Matematik LTH

Sammanfattning: Estimating 3D hand poses from RGB images is a challenging task. In this work we construct efficient neural networks to regress sparse 3D skeletons consisting of 21 keypoints in the hand. Additionally heatmaps are regressed to locate the keypoints in 2D. The networks created can be divided into three parts: feature extraction, heatmap regression and 3D pose regression. To obtain the 3D coordinates relative to the camera we introduce a method based on the best projection given the predictions. Main focus has been investigating network structures proven to be efficient in other computer vision task: EfficientNet, EfficientDet and MobileNetV2. A weighted bi-directional feature pyramid network, BiFPN, inspired by EfficientDet was added to MobileNetV2. This resulted in a new proposed network structure, MobilePose. The size of a network is affected by the input image resolution. Decreasing the resolution resulted in lower inference time. Images with the size 112 × 112 was used to achieve real-time performance. However the best accuracy was obtained with 224 × 224 images, the highest resolution tested. EfficientDet and MobilePose performed best and similar in terms of accuracy on the FreiHAND dataset. Comparing inference time on a Samsung S10 mobile device MobilePose is preferred. MobilePose was improved by adding complexity to the network. This resulted in achieving the highest accuracy in total with an average keypoint error of 1.4 cm, assuming the depth of a root keypoint is know, and 5.0 cm with calculated root depth.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)