Confounder Parsing for Text Matching

Detta är en Master-uppsats från Göteborgs universitet/Institutionen för data- och informationsteknik

Sammanfattning: In observational studies for policy evaluation, matching is used in service of causal inference to simulate randomization and thus reduce selection bias that might occur when treatment assignment differs systematically. This is done by balancing the distribution of confounding covariates measured before treatments. Matching on numerical covariates has been done for decades. In recent years, matching on tex tual covariates has gained popularity. By matching on text data, one can potentially observe confounding information that cannot be observed in tabular data. Further more, when combined with numerical data, matching on text data can potentially improve the balance of numerical covariates. However, confounder parsing, defined as the process of removing treatment text from documents to only end up with con founding text, is nontrivial in policy evaluation. This is because policy documents come in the form of PDFs and typically vary a lot in terms of quality and layout. There are many different ways in which one could approach confounder parsing and each approach comes with its own trade-offs. We have investigated whether different confounder parsing methods influence covariate balance differently. We applied our methodology to labor issue policies of the International Monetary Fund and mea sured the impact of these policies on population health. To ensure the relevancy of our inquiry, we also investigated whether text matching improves covariate balance on numerical covariates. We find that the covariate balance of our text matching procedures is relatively unchanged by the different confounder parsing methods. Moreover, text matching within propensity score calipers improves the covariate balance, compared to merely using propensity score matching or matching on text covariates alone. Our results demonstrate that text matching can be valuable in establishing causal inferences in the domain of policy evaluation. In addition, our results also suggest that the flexibility regarding which confounder parsing method researchers can choose among increases

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)