Application of Deep Q-learning for Vision Control on Atari Environments

Detta är en Master-uppsats från Lunds universitet/Beräkningsbiologi och biologisk fysik - Genomgår omorganisation

Sammanfattning: The success of Reinforcement Learning (RL) has mostly been in artificial domains, with only some successful real-world applications. One of the reasons being that most real-world domains fail to satisfy a set of assumptions of RL theory. In the past years, a popular way to gauge the performance of RL agents has been through a suite of Atari 2600 games. This suite has been used to benchmark the progress of building successively more intelligent agents. However, they do not capture all the challenges that make real-world tasks difficult for RL, such as having to learn and act with incomplete information. This thesis modifies a set of Atari games to include the task of adaptive sensing for RL agents. The games are made partially observable by restricting the visible portion of the screen. The agents are then tasked to learn to control their vision while at the same time learn to play the game. This modification adds one of the extra challenges that are present in many real-world environments. To solve these new tasks an algorithm based on a slight modification of Deep Q-learning is proposed, referred to as Myopic Deep Q-Learning (MyDQL). Furthermore, a comparison is made between two different network architectures for MyDQL, a feed-forward neural network, and a recurrent neural network. It is shown that MyDQL can be successfully applied to the modified Atari games. Additionally, it is shown that using a recurrent neural network greatly enhances the performance of the agent on these tasks. Such an agent is able to achieve near-optimal performance on Pong, Breakout, and Space Invaders, with only 35% of the screen visible at any given time. It is also shown that an agent with its visibility further reduced to 13% is still able to achieve impressive performance on these games.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)