Evaluating Membership Inference Attacks on Synthetic Data Generated With Formal Privacy Guarantees

Detta är en Kandidat-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Författare: Elliot Beskow; Erik Lindé; [2023]

Nyckelord: ;

Sammanfattning: Synthetic data generation using generative machine learning has been increasinglypublicized as a new tool for data anonymization. It promises to offer privacy whilemaintaining the statistical properties of the original dataset. This study focuses on the riskswith synthetic data by looking mainly at two aspects: privacy and utility. In terms of privacy,we consider what information can be inferred about the underlying dataset by accessing thesynthetic data. To test this, we launch membership inference attacks, which aim to determineif a given data point was used in the training of the generative model. We find that syntheticdata is at risk of considerable leakage for outlier data points, especially for generative modelswithout formal privacy guarantees. We also find that higher privacy comes at a considerablecost in data utility, i.e. how well the synthetic data reflects the raw dataset. With thesefindings we reassert the results of previous works. We also present new contributions in theform evaluating attacks with a cross validation method, an investigation of the connectionbetween the deviation of the point and its susceptibility to attacks as well as a greater focuson different generative models compared to previous literature. We conclude that thesynthetic data generation methods investigated are subject to a significant trade-off betweenprivacy and utility.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)