Development of a Machine Learning Survival Analysis Pipeline with Explainable AI for Analyzing the Complexity of ED Crowding : Using Real World Data collected from a Swedish Emergency Department

Detta är en Master-uppsats från KTH/Medicinteknik och hälsosystem

Sammanfattning: One of the biggest challenges in healthcare is Emergency Department (ED)crowding which creates high constraints on the whole healthcare system aswell as the resources within and can be the cause of many adverse events.Is is a well known problem were a lot of research has been done and a lotof solutions has been proposed, yet the problem still stands unsolved. Byanalysing Real-World Data (RWD), complex problems like ED crowding couldbe better understood. Currently very few applications of survival analysis hasbeen adopted for the use of production data in order to analyze the complexityof logistical problems. The aims for this thesis was to apply survival analysisthrough advanced Machine Learning (ML) models to RWD collected at aSwedish hospital too see how the Length Of Stay (LOS) until admission ordischarge were affected by different factors. This was done by formulating thecrowding in the ED for survival analysis through the use of the LOS as thetime and the decision regarding admission or discharge as the event in order tounfold the clinical complexity of the system and help impact clinical practiceand decision making.By formulating the research as time-to-event in combination with ML, thecomplexity and non linearity of the logistics in the ED is viewed from a timeperspective with the LOS acting as a Key Performance Indicator (KPI). Thisenables the researcher to look at the problem from a system perspective andshows how different features affect the time that the patient are processedin the ED, highlighting eventual problems and can therefore be useful forimproving clinical decision making. Five models: Cox Proportional Hazards(CPH), Random Survival Forests (RSF), Gradient Boosting (GB), ExtremeGradient Boosting (XGB) and DeepSurv were used and evaluated using theConcordance index (C-index) were GB were the best performing model witha C-index of 0.7825 showing that the ML models can perform better than thecommonly used CPH model. The models were then explained using SHapleyAdaptive exPlanations (SHAP) values were the importance of the featureswere shown together with how the different features impacted the LOS. TheSHAP also showed how the GB handled the non linearity of the features betterthan the CPH model. The five most important features impacting the LOS wereif the patient received a scan at the ED, if the visited and emergency room,age, triage level and the label indicating what type of medical team seemsmost fit for the patient. This is clinical information that could be implementedto reduce the crowding through correct decision making. These results show that ML based survival analysis models can be used for further investigationregarding the logistic challenges that healthcare faces and could be furtherused for data analysis with production data in similar cases. The ML survivalanalysis pipeline can also be used for further analysis and can act as a first stepin order to pinpoint important information in the data that could be interestingfor deeper data analysis, making the process more efficient.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)