Asynchronous Advantage Actor-Critic with Adam Optimization and a Layer Normalized Recurrent Network
Sammanfattning: State-of-the-art deep reinforcement learning models rely on asynchronous training using multiple learner agents and their collective updates to a central neural network. In this thesis, one of the most recent asynchronous policy gradientbased reinforcement learning methods, i.e. asynchronous advantage actor-critic (A3C), will be examined as well as improved using prior research from the machine learning community. With application of the Adam optimization method and addition of a long short-term memory (LSTM) with layer normalization, it is shown that the performance of A3C is increased.
HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)