Using Bidirectional Encoder Representations from Transformers for Conversational Machine Comprehension

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Sammanfattning: Bidirectional Encoder Representations from Transformers (BERT) is a recently proposed language representation model, designed to pre-train deep bidirectional representations, with the goal of extracting context-sensitive features from an input text [1]. One of the challenging problems in the field of Natural Language Processing is Conversational Machine Comprehension (CMC). Given a context passage, a conversational question and the conversational history, the system should predict the answer span of the question in the context passage. The main challenge in this task is how to effectively encode the conversational history into the prediction of the next answer. In this thesis work, we investigate the use of the BERT language model for the CMC task. We propose a new architecture, named BERT-CMC, using the BERT model as a base. This architecture includes a new module for encoding the conversational history, inspired by the Transformer-XL model [2]. This module serves the role of memory throughout the conversation. The proposed model is trained and evaluated on the Conversational Question Answering dataset (CoQA) [3]. Our hypothesis is that the BERT-CMC model will effectively learn the underlying context of the conversation, leading to better performance than the baseline model proposed for CoQA. Our results of evaluating the BERT-CMC on the CoQA dataset show that the model performs poorly (44.7% F1 score), comparing to the CoQA baseline model (66.2% F1 score). In the light of model explainability, we also perform a qualitative analysis of the model behavior in questions with various linguistic phenomena eg coreference, pragmatic reasoning. Additionally, we motivate the critical design choices made, by performing an ablation study of the effect of these choices on the model performance. The results suggest that fine tuning the BERT layers boost the model performance. Moreover, it is shown that increasing the number of extra layers on top of BERT leads to bigger capacity of the conversational memory.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)