Semantic Segmentation of Historical Document Images Using Recurrent Neural Networks

Detta är en Uppsats för yrkesexamina på avancerad nivå från Blekinge Tekniska Högskola/Institutionen för programvaruteknik

Sammanfattning: Background. This thesis focuses on the task of historical document semantic segmentation with recurrent neural networks. Document semantic segmentation involves the segmentation of a page into different meaningful regions and is an important prerequisite step of automated document analysis and digitisation with optical character recognition. At the time of writing, convolutional neural network based solutions are the state-of-the-art for analyzing document images while the use of recurrent neural networks in document semantic segmentation has not yet been studied. Considering the nature of a recurrent neural network and the recent success of recurrent neural networks in document image binarization, it should be possible to employ a recurrent neural network for document semantic segmentation and further achieve high performance results. Objectives. The main objective of this thesis is to investigate if recurrent neural networks are a viable alternative to convolutional neural networks in document semantic segmentation. By using a combination of a convolutional neural network and a recurrent neural network, another objective is also to determine if the performance of the combination can improve upon the existing case of only using the recurrent neural network. Methods. To investigate the impact of recurrent neural networks in document semantic segmentation, three different recurrent neural network architectures are implemented and trained while their performance are further evaluated with Intersection over Union. Afterwards their segmentation result are compared to a convolutional neural network. By performing pre-processing on training images and multi-class labeling, prediction images are ultimately produced by the employed models. Results. The results from the gathered performance data shows a 2.7% performance difference between the best recurrent neural network model and the convolutional neural network. Notably, it can be observed that this recurrent neural network model has a more consistent performance than the convolutional neural network but comparable performance results overall. For the other recurrent neural network architectures lower performance results are observed which is connected to the complexity of these models. Furthermore, by analyzing the performance results of a model using a combination of a convolutional neural network and a recurrent neural network, it can be noticed that the combination performs significantly better with a 4.9% performance increase compared to the case with only using the recurrent neural network. Conclusions. This thesis concludes that recurrent neural networks are likely a viable alternative to convolutional neural networks in document semantic segmentation but that further investigation is required. Furthermore, by combining a convolutional neural network with a recurrent neural network it is concluded that the performance of a recurrent neural network model is significantly increased.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)