Identifikation av icke-representativa svar i frågeundersökningar genom detektion av multivariata avvikare

Detta är en Uppsats för yrkesexamina på avancerad nivå från Matematiska institutionen

Sammanfattning: To United Minds, large-scale surveys are an important offering to clients, not least the public opinion poll Väljarbarometern. A risk associated with surveys is satisficing – sub-optimal response behaviour impairing the possibility of correctly describing the sampled population through its results. The purpose of this study is to – through the use of multivariate outlier detection methods - identify those observations assumed to be non-representative of the population. The possibility of categorizing responses generated through satisficing as outliers is investigated. With regards to the character of the Väljarbarometern dataset, three existing algorithms are adapted to detect these outliers. Also, a number of randomly generated observations are added to the data, by all algorithms correctly labelled as outliers. The resulting anomaly scores generated by each algorithm are compared, concluding the Otey algorithm as the most effective for the purpose, above all since it takes into account correlation between variables. A plausible cut-off value for outliers and separation between non-representative and representative outliers are discussed. The resulting recommendation is to handle observations labelled as outliers through respondent follow-up or if not possible, through downweighting, inversely proportional to the anomaly scores. 

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)