Modelling rare events using non-parametric machine learning classiers - Under what circumstances are support vector machines preferable to conventional parametric classiers?
Sammanfattning: Rare event modelling is an important topic in quantitative social science research. However, despite the fact that traditional classiers based upon general linear models (GLM) might lead to biased results, little attention in the social science community is devoted to methodological studies aimed at alleviating such bias, even fewer of them have considered the use of machine learning methods to tackle analytical problems imposed by rare events.In this thesis, I compared the classication performance of the SVMs – a group of machine learning classication algorithms – with that of the GLMs under the presence of imbalanced classes and rare events. The results of this study shows that the standard SVMs have no better classication performance than the traditional GLMs. In addition, the standard SVMs also tend to have low sensitivity, rendering it inappropriate for rare event modelling. Although the cost-sensitive SVMs could lead to more rare events be identied, these methods tend to suer from overtting as the events become rarer. Finally, the results of the empirical analysis using the Military Interstate Dispute (MID) data imply that the probabilistic outputs produced by Platt scaling are not reliable. For the above reasons, a wider application of SVMs in rare event modelling is not supported by the results of this study.
HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)