Stuck state avoidance through PID estimation training of Q-learning agent

Detta är en Kandidat-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Sammanfattning: Reinforcement learning is conceptually based on an agent learning through interaction with its environment. This trial-and-error learning method makes the process prone to situations in which the agent is stuck in a dead-end, from which it cannot keep learning. This thesis studies a method to diminish the risk that a wheeled inverted pendulum, or WIP, falls over during training by having a Qlearning based agent estimate a PID controller before training it on the balance problem. We show that our approach is equally stable compared to a Q-learning agent without estimation training, while having the WIP falling over less than half the number of times during training. Both agents succeeds in balancing the WIP for a full hour in repeated tests.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)