Asynchronous Advantage Actor-Critic and Flappy Bird

Detta är en Kandidat-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Sammanfattning: Games provide ideal environments for assessingreinforcement learning algorithms because of their simple dynamicsand their inexpensive testing, compared to real-worldenvironments. Asynchronous Advantage Actor-Critic (A3C), developedby DeepMind, has shown significant improvements inperformance over other state-of-the-art algorithms on Atarigames. Additionally, the algorithm A3C(lambda) which is ageneralization of A3C, has previously been shown to furtherimprove upon A3C in these environments. In this work, weimplement A3C and A3C(lambda) on the environment Cart-Poleand Flappy Bird and evaluate their performance via simulation.The simulations show that A3C effectively masters the Cart-Poleenvironment, as expected. In Flappy Bird sparse rewards arepresent, and the simulations reveal that despite this A3C managesto overcome this challenge the majority of times, achievinga linear increase in learning. Further simulations were madeon Flappy Bird with the inclusion of an entropy term andwith A3C(lambda), which display no signs of improvement inperformance when compared to regular A3C.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)