Understanding Structured Documents with a Strong Layout

Detta är en Master-uppsats från KTH/Skolan för datavetenskap och kommunikation (CSC)

Författare: Romeyn Marc; [2017]

Nyckelord: ;

Sammanfattning: This work will focus on named entity recognition on documents with a strong layout using deep recurrent neural networks. Examples of such documents are receipts, invoices, forms and scientific papers, the latter of which will be used in this research. The problem of NER on structured documents is modeled in different ways. First, the prob- lem is modeled as sequence labeling where every word or character has to labeled as belonging to one of the different entity classes. Secondly, the problem is modeled in a way that is typical for object detection in images. Here the network will output bounding boxes around words belonging to the same entity class. In order to be able to do this task successfully not only the words themselves are important but also their locations. Multiple ways of encoding these locations have been researched. Using the relative position compared to the previous word has shown to be the most effective. Exper- iments have revealed that for sequence labeling it works best to split up the documents into multiple smaller sequences of size 200 and process these with 2 bi-directional stateful LSTM layers. In this model the last hidden state of an LSTM is re-used as the initial state for the next partial sequence of a document. This model has an average F1 on all classes of 94.2%. The performance of the models that output bounding boxes are not as good as the ones for sequence labeling but they are still promising. 

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)