A Survey & Implementation of Financial Alarm Classification.

Detta är en Kandidat-uppsats från KTH/Skolan för datavetenskap och kommunikation (CSC)

Författare: Jens Wirén; Farhad Kimanos; [2013]

Nyckelord: ;

Sammanfattning: The goal of this thesis is to find, implement and evaluate a suitable machine learning algorithm to classify and predict true and false alerts using labelled data. Alerts are triggered in the Scila Surveillance software when certain parameters are exceeded in a trade, such as a to big volume over a to small time-span. Financial market operators are nowadays required by law to perform market surveillance and due to the huge amounts of data accumulated, machine learning techniques in general and supervised learning in particular comes as a natural choice. This thesis starts with a survey of existing algorithms and their performance as well as related work. The technique of Support Vector Machines (SVM) is the most used and overall best performing algorithm, why it is chosen to be further tested. Next is a thorough derivation of the SVM classifier starting with convex optimisation theory and how SVM are mathematically constructed. When implementing SVM both grid-search and crossvalidation are utilized. The classifier is threaded as much as possible to allow parallelisation which drastically reduced computational time. The characteristics of a good classifier is not trivial and several accuracy-measures are implemented and tested showing that balanced accuracy and a combined analyses of positive and negative recall are the most useful. The provided dataset is huge and a few specific alerts are chosen for the proof-of-concept implementation. These are in turn separated into subsets based on alert-specific subcategories. Several tests are then conducted using a lightly modified Java version of the open-source package libsvm. Results show that it is easy to achieve either a high positive and low negative recall or vice versa but to find parameters where both are high is very difficult. For this thesis the choice of a moderately high recall is likely the most useful one. SVM is definitely an interesting approach and perhaps other techniques such as neural networks or incorporating time-series evaluation might yield even better results but further investigations is needed.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)