Machine Learning to predict student performance based on well-being data : a technical and ethical discussion

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Sammanfattning: The data provided by educational platforms and digital tools offers new ways of analysing students’ learning strategies. One such digital tool is the wellbeing platform created by EdAider, which consists of an interface where students can answer questions about their well-being, and a dashboard where teachers and schools can see insights into the well-being of individual students and groups of students. Both students and teachers can see the development of student well-being on a weekly basis. This thesis project investigates how Machine Learning (ML) can be used along side Learning Analytics (LA) to understand and improve students’ well-being. Real-world data generated by students at Swedish schools using EdAider’s well-being platform is analysed to generate data insights. In addition ML methods are implemented in order to build a model to predict whether students are at risk of failing based from their well-being data, with the goal to inform data-driven improvements of students’ education. This thesis has three primary goals which are to: 1. Generate data insights to further understand patterns in the student wellbeing data. 2. Design a classification model using ML methods to predict student performance based on well-being data, and validate the model against actual performance data provided by the schools. 3. Carry out an ethical evaluation of the data analysis and grade prediction model. The results showed that males report higher well-being on average than females across most well-being factors, with the exception of relationships where females report higher well-being than males. Students identifying as non-binary gender report a considerably lower level of well-being compared with males and females across all 8 well-being factors. However, the amount of data for non-binary students was limited. Primary schools report higher well-being than the older secondary school students. Students reported anxiety/depression as the most closely correlated dimensions, followed by engagement/accomplishment and positive emotion/depression. Logistic regression and random forest models were used to build a performance prediction model, which aims to predict whether a student is at risk of performing poorly based on their reported well-being data. The model achieved accuracy of 80-85 percent. Various methods of feature importance including regularization, recursive feature selection, and impurity decrease for random forest were investigated to examine which well-being factors have the most effect on performance. All methods of examining feature importance consistently identified three features as important: ”accomplishment,” ”depression,” and ”number of surveys answered.” The benefits, risks and ethical value conflicts of the data analysis and prediction model were carefully considered and discussed using a Value Sensitive Design approach. Ethical practices for mitigating risks are discussed.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)