Histogram of Oriented Gradients in a Vision Transformer

Detta är en Kandidat-uppsats från Uppsala universitet/Avdelningen för visuell information och interaktion

Författare: Jakob Malmsten; Heja Cengiz; David Lood; [2022]

Nyckelord: vision transformer; vit; histogram of oriented gradients; hog; ai; machine learning; artificial intelligence; MNIST;

Sammanfattning: This study aims to modify Vision Transformer (ViT) to achieve higher accuracy. ViT is a model used in computer vision to, among other things, classify images. By applying ViT to the MNIST data set, an accuracy of approximately 98% is achieved. ViT is modified by implementing a method called Histogram of Oriented Gradients (HOG) in two different ways. The results show that the first approach with HOG gives an accuracy of 98,74% (setup 1) and the second approach gives an accuracy of 96,87% (patch size 4x4 pixels). The study shows that when HOG is applied on the entire image, a better accuracy is obtained. However, no systematic optimization has taken place, which makes it difficult to draw conclusions with certainty.

HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)

Histogram of Oriented Gradients in a Vision Transformer

Sökningar just nu

Populära sökningar

Uppsatser med många visningar igår (2024-04-28)