Credit Card Fraud Detection by Nearest Neighbor Algorithms

Detta är en Master-uppsats från Göteborgs universitet/Institutionen för matematiska vetenskaper

Sammanfattning: As the usage of internet banking and online purchases have increased dramatically in today’s world, the risk of fraudulent activities and the number of fraud cases are increasing day by day. The most frequent type of bank fraud in recent years is credit card fraud which leads to huge financial losses on a global level. Credit card fraud happens when an unauthorized person uses another person’s credit card information to make purchases. Credit card fraud is an important and increasing problem for banks and individuals, all around the world. This thesis applies supervised and unsupervised nearest neighbor algorithms for fraud detection on a Kaggle data set consisting of 284,807 credit card transactions out of which 492 are frauds, and which includes 30 covariates per transaction. The supervised methods are shown to be quite efficient, but require that the user has access to labelled training data where one knows which transactions are frauds. Unsupervised detection is harder and, e.g., for finding 80% of the frauds, the algorithm classifies more 50 times as many valid transactions as fraud cases. The unsupervised nearest neighbor distance method is compared to methods using the distance to the center of the data for fraud detection, and detection algorithms which combine the two methods. The L2 distance and L2 distance to zero and the combination of both distances are analyzed for unsupervised method. The performance of the methods is evaluated by the Precision-Recall (PR) curves. The results show that based on both area under curve and precision at 80% recall, L2 distance to zero performs slightly better than L2 distance.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)