Reinforcement Learning Methods for OpenAI Environments

Detta är en Kandidat-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Sammanfattning: Using the powerful methods developed in the fieldof reinforcement learning requires an understanding of theadvantages and drawbacks of different methods as well as theeffects of the different adjustable parameters. This paper high-lights the differences in performance and applicability betweenthree different Q-learning methods: Q-table, deep Q-network anddouble deep Q-network where Q refers to the value assigned toa given state-action pair. The performance of these algorithms isevaluated on the two OpenAI gym environments MountainCar-v0 and CartPole-v0. The implementations are done in Pythonusing the Tensorflow toolkit with Keras. The results show thatthe Q-table was the best to use in the Mountain car environmentbecause it was the easiest to implement and was much fasterto compute, but it was also shown that the network methodsrequired far less training data. No significant difference inperformance was found between the deep Q-network and thedouble deep Q-network. In the end, there is a trade-off betweenthe number of episodes required and the computation time foreach episode. The network parameters were also harder to tunesince much more time was needed to compute and visualize theresult.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)