Predicting Customer Churn in E-commerce Using Statistical Modeling and Feature Importance Analysis : A Comparison of Random Forest and Logistic Regression Approaches

Detta är en Uppsats för yrkesexamina på avancerad nivå från Umeå universitet/Institutionen för matematik och matematisk statistik

Sammanfattning: While operating in online markets offers opportunities for expanded assortment and convenience, it also poses challenges such as increased competition and the need to build personal relationships with customers. Customer retention be- comes crucial in maintaining a successful business, emphasizing the importance of understanding customer behavior. Traditionally, customer behavior analysis has focused on transactional behavior, such as purchase frequency and spending amounts. However, there has been a shift towards non-transactional behavior, driven by the popularity of loyalty programs that reward customers beyond trans- actions and aim to make customers feel appreciated and included, regardless of their spending power. This study is conducted at a global retailer with the aim of enhancing the under- standing of how non-transactional customer behavior influences customer churn. The approach in this study is to understand such behavior by developing a statis- tical model and to analyze statistical approaches of feature importance. Two types of approaches for statistical modeling, each with four variations, are assessed: (1) Random forest; and (2) Logistic regression. Furthermore, three different feature importance methods are considered; (1) Gini importance; (2) Permutation impor- tance and (3) Coefficient importance. The results showed that this approach can be used to analyze customer behavior and gain a better understanding of the driving factors for churn. Furthermore, the results showed that random forest approaches outperform logistic regression. With the definition of churn constructed in this study, the most important factors that affect the probability of churn are the customer’s number of sessions and inter session interval.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)