Gating Networks in Learning Machines for Multimodal Data : Decision Fusion on Single Modality Classifiers

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Sammanfattning: Different architectures of gating networks that aggregate information from multiple modalities and their suitability for decision fusion is investigated. The research question, how does a gating network for decision fusion in multimodal classification problem compare to other alternatives, is answered by a quantitative and inductive reasoning approach. This is done by training different machine learning methods on individual modalities and fusing their predictions forthe final classification using M-MNIST, a new data set with three modalities (image, audio, and text). The gating networks achieve greater classification accuracy when fusing information from all modalities, in contrast to considering only one modality, or without fusion. The gating network potential is demonstrated by training it on modalities with different levels of classification accuracy where it achieves the highest average normalized gain when scoring the highest validation accuracy of the three fusion methods, where the results indicate that the gating network can suppress noise in the data. Moreover, by adding an additional weak modality to the gating network, the classification accuracy is improved, hinting at that there might be an incentive to use many weak modalities instead of a few strong ones.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)