Readability Assessment with Pre-Trained Transformer Models : An Investigation with Neural Linguistic Features

Detta är en Master-uppsats från Uppsala universitet/Institutionen för lingvistik och filologi

Sammanfattning: Readability assessment (RA) is to assign a score or a grade to a given document, which measures the degree of difficulty to read the document. RA originated in language education studies and was used to classify reading materials for language learners. Later, RA was applied to many other applications, such as aiding automatic text simplification.  This thesis is aimed at improving the way of using Transformer for RA. The motivation is the “pipeline” effect (Tenney et al., 2019) of pretrained Transformers: lexical, syntactic, and semantic features are best encoded with different layers of a Transformer model.  After a preliminary test of a basic RA model that resembles the previous works, we proposed several methods to enhance the performance: by using a Transformer layer that is not the last, by concatenating or mixing the outputs of all layers, and by using syntax-augmented Transformer layers. We examined these enhanced methods on three datasets: WeeBit, OneStopEnglish, and CommonLit.  We observed that the improvements showed a clear correlation with the dataset characteristics. On the OneStopEnglish and the CommonLit datasets, we achieved absolute improvements of 1.2% in F1 score and 0.6% in Pearson’s correlation coefficients, respectively. We also show that an

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)