Explainable AI as a Defence Mechanism for Adversarial Examples

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Författare: Harald Stiff; [2019]

Nyckelord: ;

Sammanfattning: Deep learning is the gold standard for image classification tasks. With its introduction came many impressive improvements in computer vision outperforming all of the earlier machine learning models. However, in contrast to the success it has been shown that deep neural networks are easily fooled by adversarial examples, data that have been modified slightly to cause the neural networks to make incorrect classifications. This significant disadvantage has caused an increased doubt in neural networks and it has been questioned whether or not they are safe to use in practice. In this thesis we propose a new defence mechanism against adversarial examples that utilizes the explainable AI metrics of neural network predictions to filter out adversarial examples prior to model interference. We evaluate the filters against various attacks and models targeted at the MNIST, Fashion-MNIST, and Cifar10 datasets. The results show that the filters can detect adversarial examples constructed with regular attacks but that they are not robust against adaptive attacks that specifically utilizes the architecture of the defence mechanism.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)