Deep Reinforcement LearningA case study of AlphaZero

Detta är en Kandidat-uppsats från Uppsala universitet/Institutionen för informationsteknologi

Författare: Fredrik Mattisson; [2021]

Nyckelord: ;

Sammanfattning: Using deep neural networks for reinforcement learning has proven very successful, as demonstrated by the AlphaZero algorithm developed by DeepMind in 2018. This algorithm is capable of mastering two-player zero-sum board games entirely by playing against itself. However, a drawback of deep learning in general is the immense computational cost associated with training deep neural networks, and AlphaZero is certainly no exception; an absurd amount of compute power was used by DeepMind to produce their results. This thesis project is a first step towards investigating whether DeepMind's approach to reinforcement learning could somehow be made more computationally efficient. We implement the AlphaZero algorithm in a modular fashion, so as to facilitate experimentation with its constituent parts, and also attempt to better understand what the neural network learns by visualizing it. The thesis gives an explanation of the algorithm and its theoretical foundations, how it was implemented, and present some preliminary results of training it on the game of Go on a 5 by 5 board. The agents performance was primarily evaluated against basic      Monte Carlo tree search, which yielded a win-rate of about 50% with the latter using  5 times as many simulations per move. Although training was only conducted for a short period of time on commodity hardware, the results and empirical analysis indicate that the algorithm managed to learn at least some rudimentary aspects of the game. However, since little further improvement was seen asymptotically in these experiments, the configuration was likely sub-optimal.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)