A study of the exploration/exploitation trade-off in reinforcement learning : Applied to autonomous driving

Detta är en Kandidat-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Författare: Ruwaid Louis; David Yu; [2019]

Nyckelord: ;

Sammanfattning: A world initiative was set in motion for decreasing the amount of traffic accidents. Autonomous driving is a field which contributes to the initiative. Following report examines exploration/exploitationtrade-off in reinforcement learning applied to decision making in autonomous driving. The approach consisted of modelling the problemas a Markov Decision Process which was solved with the Q-learning. Decision making utilized exploration greed approach. Scenarios consisted of different kinds of intersections, and was built using SUMO. The ego vehicle was controlled using TraCI. Goal was to discuss thetrade-off from two perspectives - time and safety, measured in numberof collision among other things - in the domain of autonomous driving. Furthermore, exploration prompted ego vehicle to pass the scenarios in less time. This lead to increased collisions, and thus decreased safety. In contrast, exploitation preferred deacceleration and stopping which resulted in increased safety but increased the passage time and traffic. Conclusion was to exploit previous experiences when applying reinforcement learning to decision making in autonomous driving because safety is the highest priority when it comes to autonomous driving and the world initiative.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)