On the Effectiveness of Handcrafted and Learned Features in Automated Essay Scoring

Detta är en Master-uppsats från Lunds universitet/Beräkningsbiologi och biologisk fysik - Genomgår omorganisation

Sammanfattning: The task of Automated Essay Scoring (AES) has been active for more than half a century, starting with handcrafting statistical features used for linear regression, and currently being improved by the latest advancements in machine learning and natural language processing. Most current research uses some form of character or word embeddings to represent the essays rather than statistical features, enabling the models to analyze the text in full and automatically learn what to look for. Handcrafted features have possibly reached their maximum potential, and have been shown to be outperformed by more complex representations of textual data. However, the fundamental differences between handcrafted and learned features have not been properly documented, nor their fundamental strengths and weaknesses compared. In this paper we compare two different kinds of models for automated essay scoring, a Multilayer Perceptron (MLP) using handcrafted features and a standard Convolutional Neural Network (CNN) using word embeddings. The models are trained and tested and their strengths and weaknesses are discussed. We show that a simple CNN outperforms the MLP using handcrafted features, but that the MLP is a viable method to use for small tasks because of the easier implementation and shorter training time. We also provide some tips and suggestions when constructing a CNN for AES, and we discuss a potential downside of the quadratic weighted kappa score that is sometimes a suggested validation metric for AES-systems.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)