Asynchronous Advantage Actor-Critic with Adam Optimization and a Layer Normalized Recurrent Network

Detta är en Master-uppsats från KTH/Optimeringslära och systemteori

Författare: Joakim Bergdahl; [2017]

Nyckelord: ;

Sammanfattning: State-of-the-art deep reinforcement learning models rely on asynchronous training using multiple learner agents and their collective updates to a central neural network. In this thesis, one of the most recent asynchronous policy gradientbased reinforcement learning methods, i.e. asynchronous advantage actor-critic (A3C), will be examined as well as improved using prior research from the machine learning community. With application of the Adam optimization method and addition of a long short-term memory (LSTM) with layer normalization, it is shown that the performance of A3C is increased.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)