Deep Distributional Temporal Difference Learning for Game Playing

Detta är en Master-uppsats från Lunds universitet/Matematisk statistik

Författare: Frej Berglind; [2019]

Nyckelord: Reinforcement Learning; Deep Learning; Temporal Difference Learning; Distributional Learning; Game Playing; 5-in-a-row; Artificial Intelligence.; Mathematics and Statistics;

Sammanfattning: Temporal difference learning is considered one of the most successful methods in reinforcement learning. Recent developments in deep learning have opened up a new world of opportunities. In this project, we compare classic scalar temporal difference learning with three new distributional algorithms for playing the game of 5-in-a-row using deep neural networks: distributional temporal difference learning with constant learning rate, and two distributional temporal difference algorithms with adaptive learning rate. All these algorithms are applicable to any two-player deterministic zero sum game and can probably be successfully generalized to other settings. As it turned out, all algorithms performed well and developed strong strategies. The algorithms implementing the adaptive methods learned more quickly in the beginning, but in the long run, they were outperformed by the algorithms using constant learning rate which, without any prior knowledge, learned to play the game at a very high level after 200 000 games of self play.

HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)

Deep Distributional Temporal Difference Learning for Game Playing

Sökningar just nu

Populära sökningar

Uppsatser med många visningar igår (2024-04-24)