Clustering and Anomaly Detection in Financial Trading Data
Sammanfattning: In this thesis we propose a new form of Variational Autoencoder called the Conditional Latent Space Variational Autoencoder or CL-VAE. By conditioning on a known label in a dataset we can decide what points are being mapped to what prior distribution. This makes the latent space more understandable and separates the classes further. It also subverts the tug-of-war effect between reconstruction loss and KL-divergence somewhat. This is because we're not trying to map all the data to one simple prior distribution, but rather giving every class its own. With this method, we can customize the latent space for a specific task like clustering or anomaly detection. This means that we can send in any kind of data, be it numerical or categorical, and the points will be projected to some more easily understandable structure. This is a big advantage over other dimensionality reduction algorithms like PCA that only deals with continuous variables. The method is applied to trading data from Handelsbanken Capital Markets, a Swedish investment bank. We show that it can be used in modeling the trading behavior of the traders at the bank by performing clustering and anomaly detection in the latent space. CL-VAE outperforms the regular VAE on all our metrics and seems to prepare the data for analysis in a straightforward and interpretable manner. We also discuss the issue of unsupervised anomaly detection at length and use a new form of metric for such problems called the EM-MV measure. Finally, the result is a system that can be used in order to model trading behavior and perform clustering and anomaly detection on the transformed data. We have performed the analysis by conditioning on the traders but the model is not limited to that label. Instead, we can condition on counter parties, instruments, portfolios or any other label in the dataset.
HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)