Two-Stage Logistic Regression Models for Improved Credit Scoring

Detta är en Master-uppsats från KTH/Skolan för datavetenskap och kommunikation (CSC)

Sammanfattning: This thesis has investigated two-stage regularized logistic regressions applied on the credit scoring problem. Credit scoring refers to the practice of estimating the probability that a customer will default if given credit. The data was supplied by Klarna AB, and contains a larger number of observations than many other research papers on credit scoring. In this thesis, a two-stage regression refers to two staged regressions were the some kind of information from the first regression is used in the second regression to improve the overall performance. In the best performing models, the first stage was trained on alternative labels, payment status at earlier dates than the conventional. The predictions were then used as input to, or to segment, the second stage. This gave a gini increase of approximately 0.01. Using conventional scorecutoffs or distance to a decision boundary to segment the population did not improve performance.

