Machine learning for detecting fraud in an API

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Sammanfattning: An Application Programming Interface (API) provides developers with a high-level framework that abstracts the underlying implementation of services. Using an API reduces the time developers spent on implementation, and it encourages collaboration and innovation from third-party developers. Making an API public has a risk: developers might use it inappropriately. Most APIs have a policy that states which behaviors are considered fraudulent. Detecting applications that fraudulently use an API is a challenging problem: it is unfeasible to review all applications that make requests. API providers aim to implement an automatic tool that accurately detects suspicious applications from all the requesting applications. In this thesis, we study the possibility of using machine learning techniques to detect fraud in Web APIs. We experiment with supervised learning methods (random forests and gradient boosting), clustering methods such as Density-Based Spatial Clustering of Applications with Noise (DBSCAN), and ensemble methods that combine the predictions of supervised learning methods and clustering methods. The dataset available contains data gathered when a developer creates an application and data collected when the application starts making HTTP requests. We derive a meaningful representation from the most important textual fields of the dataset using Sentence-BERT (S-BERT). Furthermore, we experiment with the predictive importance of the S-BERT embeddings. The method that achieves the best performance in the test set is an ensemble method that combines the results from the gradient boosting classifier and DBSCAN. Furthermore, this method performs better when using the S-BERT embeddings of the textual data of the applications, achieving an f1-score of 0.9896.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)