Defining and predicting fast-selling clothing options

Detta är en Master-uppsats från Linköpings universitet/Statistik och maskininlärning

Sammanfattning: This thesis aims to find a definition of fast-selling clothing options and to find a way to predict them using only a few weeks of sale data as input. The data used for this project contain daily sales and intake quantity for seasonal options, with sale start 2016-2018, provided by the department store chain Åhléns. A definition is found to describe fast-selling clothing options as those having sold a certain percentage of their intake after a fixed number of days. An alternative definition based on cluster affiliation is proven less effective. Two predictive models are tested, the first one being a probabilistic classifier and the second one being a k-nearest neighbor classifier, using the Euclidean distance. The probabilistic model is divided into three steps: transformation, clustering, and classification. The time series are transformed with B-splines to reduce dimensionality, where each time series is represented by a vector with its length and B-spline coefficients. As a tool to improve the quality of the predictions, the B-spline vectors are clustered with a Gaussian mixture model where every cluster is assigned one of the two labels fast-selling or ordinary, thus dividing the clusters into disjoint sets: one containing fast-selling clusters and the other containing ordinary clusters. Lastly, the time series to be predicted are assumed to be Laplace distributed around a B-spline and using the probability distributions provided by the clustering, the posterior probability for each class is used to classify the new observations. In the transformation step, the number of knots for the B-splines are evaluated with cross-validation and the Gaussian mixture models, from the clustering step, are evaluated with the Bayesian information criterion, BIC. The predictive performance of both classifiers is evaluated with accuracy, precision, and recall. The probabilistic model outperforms the k-nearest neighbor model with considerably higher values of accuracy, precision, and recall. The performance of each model is improved by using more data to make the predictions, most prominently with the probabilistic model.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)