ANALYSIS OF BINARY DEPENDENT VARIABLES USING LINEAR PROBABILITY MODEL AND LOGISTIC REGRESSION: A REPLICATION STUDY

Detta är en Master-uppsats från Uppsala universitet/Statistiska institutionen

Sammanfattning: Linear Probability Model (LPM) is commonly used because it is easy to compute and interpret than with logits and probits even though the estimated probabilities may fall outside the $\big[$0,1$\big]$ interval and the linearity concept does not make much sense when dealing with probabilities. This paper extends upon the results of \citeA{Dara} reviewing the use of LPM to examine if alcohol prohibition reduces domestic violence. Regular LPM resulted in inconclusive estimates since prohibition was omitted due to collinearity as controls were added. However \citeA{Dara} had results, and further inspection on their regression commands showed that they ran a linear regression, then a post-estimation on residuals and further used residuals as a dependent variable hence the results were different from the regular LPM. Their method still resulted in unbounded predicted probabilities and heteroscedastic residuals, thus showing that OLS was inefficient and a non-linear binary choice model like logistic regression would be a better option. Logistic regression predicts the probability of an outcome that can only have two values and was therefore used in this paper. Unlike LPM, logistic regression uses a non-linear function which results in a sigmoid bounding the predicted outcome between 0 and 1. Logistic regression had no complication; thus logistic (or any another non-linear dichotomous dependent variable models) regression should have been used on the final analysis while LPM is used at a preliminary stage to get quick results.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)