Real-time System Control with Deep Reinforcement Learning

Detta är en Kandidat-uppsats från KTH/Skolan för teknikvetenskap (SCI)

Författare: Gustav Gybäck; Fredrik Röstlund; [2018]

Nyckelord: ;

Sammanfattning: We reproduce the Deep Deterministic Policy Gradient algorithm presented in the paper Continuous Control With Deep Reinforcement Learning to verify its results. We also strive to explain the necessary machine learning framework needed to understand the algorithm. It is a model-free, actor-critic algorithm that implements target networks and mini batch learning from a replay buffer to increase stability. Batch normalisation is introduced to make the algorithm versatile and applicable to multiple environments with varying value ranges and physical units. We use neural networks as function approximators to handle the large state and action spaces. We can show that the algorithm can learn and solve multiple environments using the same set up. After proper training the algorithm has produced a real-time decision policy which acts optimally in any state given that the environment is not too sensitive to noise.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)