Depth Estimation from Images using Dense Camera-Lidar Correspondences and Deep Learning
Sammanfattning: Depth estimation from 2D images is a fundamental problem in Computer Vision, and is increasingly becoming an important topic for Autonomous Driving. A lot of research is driven by innovations in Convolutional Neural Networks, which efficiently encode low as well as high level image features and are able to fuse them to find accurate pixel correspondences and learn the scale of the objects. Current state-of-the-art deep learning models employ a semi-supervised learning approach, which is a combination of unsupervised and supervised learning. Most of the research community relies on the KITTI datasets for benchmarking of results. But the training performance is known to be limited by the sparseness of the lidar ground truth as well as lack of training data. In this thesis, multiple stereo datasets with increasingly denser depth maps are generated on the corpus of driving data collected at the Audi Electronics Venture GmbH. In this regard, a methodology is presented to obtain an accurate and dense registration between the camera and lidar sensors. Approaches are also outlined to rectify the stereo image datasets and filter the depth maps. Keeping the architecture fixed, a monocular and a stereo depth estimation network each are trained on these datasets and their performances are compared to other networks reported in literature. The results are competitive, with the stereo network exceeding the state-of-the-art accuracy. More work is needed though to establish the influence of increasing depth density on depth estimation performance. The proposed method forms a solid platform for pushing the envelope of depth estimation research as well as other application areas critical to autonomous driving.
HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)