Evaluating Hybrid Neural Network Approaches to Multimodal Web Page Classification Based on Textual and Visual Features

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Sammanfattning: Given the explosive growth of web pages on the Internet in the last decade, automatic classification and categorization of web pages have grown into an important task. This thesis sets out to evaluate whether or not methods for text and image analysis, which had not been evaluated for web page classification, could improve on the state-of-the-art methods in web page classification. In web page classification, there is no dataset that is used for benchmarking. Therefore, in order to make comparisons, baseline models are implemented. The methods implemented are Bidirectional Encoder Representations from Transformers (BERT) for text and EfficientNet B4 for images. This thesis also sets out to evaluate methods for combining knowledge from two models. The thesis concludes that the proposed methods do improve on the state-of-the- art methods in web page classification. The proposed methods achieve approximately 92% accuracy while the baselines achieve approximately 87%. The proposed methods and the baselines are shown to be different using McNemar’s test at a significance level 0.05. The thesis also concludes that weighted average of logits could be preferable to weighted average of probabilities; weighted average of logits could be a more robust method, although more research is needed. 

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)