Imputation Methods in Dialysis Data

Detta är en Master-uppsats från Lunds universitet/Matematik LTH

Sammanfattning: Imputation of data is the process of filling in missing values in an incomplete data set. Missing data is a common problem in many fields, not least in clinical research. This report aims to evaluate different methods for imputing missing data in health records of dialysis patients. The imputed data will, in a related project, be used to predict hospitalizations of dialysis patients. The hope is that an imputed data set will give a higher hit rate when predicting the hospitalizations of those patients. Seven different imputation methods, with varying complexity, were considered and compared to the presently used imputation method, which was to simply use the latest observed value as imputed value. The methods were evaluated according to their performance compared to a validation data set, as well as if improvement in prediction of hospitalizations were seen. We found that methods built on within-variable dependencies performed better than methods built on between-variable dependencies. Specifically, time series models using a Kalman filter gave the best results. Also, an improvement in the prediction algorithm could be seen when using more sophisticated imputation methods compared to using the presently used imputation method. When increasing the amount of missing data we still managed to obtain good results in contrast to the present method. All data analyzed in this project was from dialysis patients suffering from end stage renal disease.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)