RNN-based Graph Neural Network for Credit Load Application leveraging Rejected Customer Cases

Detta är en Uppsats för yrkesexamina på avancerad nivå från Högskolan i Halmstad/Akademin för informationsteknologi

Sammanfattning: Machine learning plays a vital role in preventing financial losses within the banking industry, and still, a lot of state of the art and industry-standard approaches within the field neglect rejected customer information and the potential information that they hold to detect similar risk behavior.This thesis explores the possibility of including this information during training and utilizing transactional history through an LSTM to improve the detection of defaults.  The model is structured so an encoder is first trained with or without rejected customers. Virtual distances are then calculated in the embedding space between the accepted customers. These distances are used to create a graph where each node contains an LSTM network, and a GCN passes messages between connected nodes. The model is validated using two datasets, one public Taiwan dataset and one private Swedish one provided through the collaborative company. The Taiwan dataset used 8000 data points with a 50/50 split in labels. The Swedish dataset used 4644 with the same split.  Multiple metrics were used to validate the impact of the rejected customers and the impact of using time-series data instead of static features. For the encoder part, reconstruction error was used to measure the difference in performance. When creating the edges, the homogeny of the neighborhoods and if a node had a majority of the same labeled neighbors as itself were determining factors, and for the classifier, accuracy, f1-score, and confusion matrix were used to compare results. The results of the work show that the impact of rejected customers is minor when it comes to changes in predictive power. Regarding the effects of using time-series information instead of static features, we saw a comparative result to XGBoost on the Taiwan dataset and an improvement in the predictive power on the Swedish dataset. The results also show the importance of a well-defined virtual distance is critical to the classifier's performance.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)