Monocular Depth Prediction in Deep Neural Networks

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Sammanfattning: With the development of artificial neural network (ANN), it has been introduced in more and more computer vision tasks. Convolutional neural networks (CNNs) are widely used in object detection, object tracking, and semantic segmentation, achieving great performance improvement than traditional algorithms. As a classical topic in computer vision, the exploration of applying deep CNNs for depth recovery from monocular images is popular, since the single-view image is more common than stereo image pair and video. However, due to the lack of motion and geometry information, monocular depth estimation is much more difficult. This thesis aims at investigating depth prediction from single images by exploiting state-of-the-art deep CNN models. Two neural networks are studied: the first network uses the idea of a global and local network, and the other one adopts a deeper fully convolutional network by using a pre-trained backbone CNN (ResNet or DenseNet). We compare the performance of the two networks and the result shows that the deeper convolutional neural network with the pre-trained backbone can achieve better performance. The pre-trained model can significantly accelerate the training process. We also find that the amount of training dataset is essential for CNN-based monocular depth prediction.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)