Impact of observation noise and reward sparseness on Deep Deterministic Policy Gradient when applied to inverted pendulum stabilization

Detta är en Kandidat-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Författare: Adam Björnberg; Haris Poljo; [2019]

Nyckelord: ;

Sammanfattning: Deep Reinforcement Learning (RL) algorithms have been shown to solve complex problems. Deep Deterministic Policy Gradient (DDPG) is a state-of-the-art deep RL algorithm able to handle environments with continuous action spaces. This thesis evaluates how the DDPG algorithm performs in terms of success rate and results depending on observation noise and reward sparseness using a simple environment. A threshold for how much gaussian noise can be added to observations before algorithm performance starts to decrease was found between a standard deviation of 0.025 and 0.05. It was also con-cluded that reward sparseness leads to result inconsistency and irreproducibility, showing the importance of a well-designed reward function. Further testing is required to thoroughly evaluate the performance impact when noisy observations and sparse rewards are combined.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)