Random projections in a distributed environment for privacy-preserved deep learning

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Sammanfattning: The field of Deep Learning (DL) only over the last decade has proven useful for increasingly more complex Machine Learning tasks and data, a notable milestone being generative models achieving facial synthesis indistinguishable from real faces. With the increased complexity in DL architecture and training data, follows a steep increase in time and hardware resources required for the training task. These resources are easily accessible via cloud-based platforms if the data owner is willing to share its training data. To allow for cloud-sharing of its training data, The Swedish Transport Administration (TRV) is interested in evaluating resource effective, infrastructure independent, privacy-preserving obfuscation methods to be used on real-time collected data on distributed Internet-of-Things (IoT) devices. A fundamental problem in this setting is to balance the trade-off between privacy and DL utility of the obfuscated training data. We identify statistically measurable relevant metrics of privacy achievable via obfuscation and compare two prominent alternatives from the literature, optimization-based methods (OBM) and random projections (RP). OBM achieve privacy via direct optimization towards a metric, preserving utility-crucial patterns in the data, and is typically in addition evaluated in terms of a DL-based adversary’s sensitive feature estimation error. RP project data via a random matrix to lower dimensions to preserve sample pair-wise distances while offering privacy in terms of difficulty in data recovery. The goals of the project centered around evaluating RP on privacy metric results previously attained for OBM, compare adversarial feature estimation error in OBM and RP, as well as to address the possibly infeasible learning task of using composite multi-device datasets generated using independent projection matrices. The last goal is relevant to TRV in that multiple devices are likely to contribute to the same composite dataset. Our results complement previous research in that they indicate that both privacy and utility guarantees in a distributed setting, vary depending on data type and learning task. These results favor OBM that theoretically should offer more robust guarantees. Our results and conclusions would encourage further experimentation with RP in a distributed setting to better understand the influence of data type and learning task on privacy-utility, target-distributed data sources being a promising starting point. 

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)